CN112243518A

CN112243518A - Method and device for acquiring depth map and computer storage medium

Info

Publication number: CN112243518A
Application number: CN201980031872.5A
Authority: CN
Inventors: 杨志华; 马东东; 梁家斌
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd; SZ DJI Innovations Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-01-19
Also published as: WO2021035627A1

Abstract

A method, an apparatus and a computer storage medium for obtaining a depth map are provided. A method of obtaining a depth map, comprising: acquiring a main image and a target sub-image (S110); obtaining a first corrected image and a second corrected image respectively corresponding to the main image and the target sub-image through epipolar rectification (S120); determining corresponding image blocks in the first corrected image and the second corrected image, respectively (S130); determining feature information of a mapping pixel point of a corresponding image block (S140); a depth map of the main image is determined based on the characteristic information of the pixel points (S150). Therefore, the characteristic information of the image is obtained through epipolar rectification, and the depth map of the main image can be obtained. In addition, the method has low computational complexity and reduces the requirement on hardware.

Description

Method and device for acquiring depth map and computer storage medium

Technical Field

Embodiments of the present invention relate to the field of image processing, and in particular, to a method and an apparatus for obtaining a depth map, and a computer storage medium.

Background

With the development of lens and Charge Coupled Device (CCD) technology, the acquired images become larger and larger, and the contained information is richer, and the instant positioning and mapping (SLAM) or three-dimensional reconstruction by using the images is a hot point of research, and the former is to determine the depth map of the images.

However, the current SLAM map is composed of sparse or semi-dense points, the provided information is limited, a dense, reliable and reusable map product cannot be rapidly output, dense matching among a large number of multi-view images is required during off-line three-dimensional reconstruction, a large amount of time and computing resources are required to be consumed in the step, the three-dimensional reconstruction efficiency is low, and the requirement on hardware is high.

Therefore, a method for obtaining a depth map quickly and accurately is needed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring a depth map and a computer storage medium, which can acquire the depth map of a main image, and have high processing efficiency and low requirement on hardware.

In a first aspect, a method for acquiring a depth map is provided, including:

acquiring a main image and a target auxiliary image;

epipolar rectification is performed on the main image and the target sub-image to acquire a first corrected image corresponding to the main image and a second corrected image corresponding to the target sub-image;

determining an image block in the first corrected image corresponding to a target pixel point in the main image and an image block in the second corrected image corresponding to a pixel point in the target sub-image;

determining feature information of target pixel points in the main image according to image blocks, corresponding to the target pixel points in the main image, in the first corrected image, and determining feature information of pixel points in the target secondary image according to image blocks, corresponding to the pixel points in the target secondary image, in the second corrected image;

and determining a depth map corresponding to the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of the pixel points in the target auxiliary image.

In a second aspect, an apparatus for acquiring a depth map is provided, including: a memory and a processor, wherein,

the memory for storing program code;

the processor, configured to invoke the program code, and when executed, configured to:

acquiring a main image and a target auxiliary image;

In a third aspect, there is provided a movable platform comprising: a shooting device for outputting the main image and the target secondary image and the device for acquiring the depth map of the second aspect.

In a fourth aspect, a computer storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of acquiring a depth map according to the first aspect or any implementation thereof.

Therefore, the characteristic information of the image is obtained through epipolar rectification in the embodiment of the invention, and further the depth map of the main image can be obtained. In addition, the method has low computational complexity and reduces the requirement on hardware. Wherein, the adoption of the epipolar line rectification has higher speed and shortens the calculation time. When the depth map is obtained, cost calculation can be carried out, and optionally dynamic planning can be adopted for optimization, so that the method has higher stability and is more stable particularly for the weak texture image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic flow chart of a method of acquiring a depth map according to an embodiment of the present invention.

Fig. 2 is another schematic flow chart of a method of acquiring a depth map of an embodiment of the present invention.

Fig. 3 is a schematic diagram of sampling and projecting.

Fig. 4 is a schematic diagram of 8-direction dynamic programming optimization.

Fig. 5 is a schematic block diagram of an apparatus for acquiring a depth map according to an embodiment of the present invention.

Fig. 6 is another schematic block diagram of an apparatus for acquiring a depth map according to an embodiment of the present invention.

Fig. 7 is still another schematic block diagram of an apparatus for acquiring a depth map according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method is applied to an image which is corrected or a plurality of images which are not corrected and have known poses (Pose), the depth of any pixel on a main image is calculated to enable the matching cost of the main image and a reference image to be the lowest, and the image matching method is called dense matching. The traditional dense matching method mainly comprises four methods, namely PMVS, PatchMatch MVS, SGM and SGBM.

The PMVS algorithm is mainly divided into three steps of initializing feature matching, generating a patch and filtering the patch. Firstly, extracting angular points, namely DOG (difference of Gaussian) and Harris angular points, obtaining a matching pair by characteristic matching of a known pose along an epipolar direction, triangulating the matching pair to obtain encrypted sparse points, then performing surface tiling on each sparse point to generate a surface patch, screening images capable of seeing the sparse points, randomly giving normal vectors and depths and optimizing the normal vectors and the depths, projecting the surface patches onto visual images, calculating the cost between every two sparse points, finally deleting incorrect surface patches according to the number of the visual images with matching cost meeting a threshold value, and iteratively generating the surface patches and filtering the surface patches for multiple times. The PMVS algorithm has a large number of iteration steps, needs to load a large number of images at a time, has high memory consumption, is difficult to implement in parallel, and requires rich image textures, and a weak texture area cannot usually calculate depth.

The PatchMatch MVS algorithm is a depth map calculation method, and the concept is that a main image and a plurality of auxiliary images are selected, then a scanning direction is selected, the depth and normal vectors of a scanned area are transmitted to an unscanned pixel, certain random disturbance and a plurality of random depth and normal vectors are added, the corresponding relation of image blocks (patch) of the main image and the auxiliary images is found by adopting homography, the matching cost of the patch is calculated, the depth and the normal vector with the minimum cost are left, and the depth map can be obtained by scanning in a plurality of directions. The method of PatchMatch MVS requires serial scanning and rich image texture, is difficult to realize in parallel, and weak texture areas are easy to miss.

The SGM algorithm is a disparity map calculation method, two images which are subjected to epipolar rectification are given, an initial disparity map is randomized, mutual information cost maps between the two images are counted by the initial disparity map, mutual information matching cost between any disparity is calculated to form a matching cost cube, the cost cube is filtered by adopting multi-path dynamic programming, a new disparity map is obtained, a pyramid idea is adopted, the disparity map with high-level and low-resolution is iterated to the original resolution step by step, and the disparity map with the original resolution is obtained by calculation. The SGM algorithm has a strong assumption that a correct disparity map can be calculated in a convergence manner by only multiple iterations, but actually, the convergence is unknown for some situations with large foreground and background differences, and the algorithm is not robust enough.

The SGBM algorithm is a combined planeSweeping algorithm and a multipath dynamic programming depth map estimation method, and the method is characterized in that a main image and a plurality of auxiliary images are selected, the approximate depth range of the main image is determined, a large number of forward parallel planes are assumed through a large number of samples of an inverse depth space, the parallel planes are projected onto other auxiliary images through a projection method, a cube with matched cost is formed by calculating the matched cost of the patch of the plane where each pixel point is located and the patch of the auxiliary image, then the cost is filtered by adopting multipath dynamic programming, and finally the depth corresponding to all the minimum cost is calculated to serve as a final depth map. The projection matching of each pixel in the SGBM algorithm is completely parallel, but the cost calculation amount of the patch of each point is large, so the algorithm is not efficient.

An embodiment of the present invention provides a method for obtaining a depth map, and fig. 1 is a schematic flowchart of a method for obtaining a depth map according to an embodiment of the present invention. The method shown in fig. 1 comprises:

s110, acquiring the main image and the target auxiliary image.

Specifically, the depth map of the main image can be obtained by the method shown in fig. 1. The main image and the target sub-image may be selected according to a required resolution of the depth map, and optionally, the resolution of the main image and the target sub-image (collectively referred to as the original image) may be at least twice higher than the required resolution of the depth map, as an example.

Illustratively, S110 may include acquiring a main image and at least one sub image corresponding to the main image. Specifically, the main image may be obtained first, and then at least one sub-image corresponding to the main image may be selected.

Illustratively, a plurality of frames of candidate sub-images may be acquired; and selecting the target auxiliary image meeting preset requirements from the plurality of frames of candidate auxiliary images.

The multi-frame candidate secondary image may be an image of a frame number adjacent to the main image, or the multi-frame candidate secondary image may be an image of the same scene as the main image captured at another time, or the multi-frame candidate secondary image may be another image, which is not limited in the present invention.

Alternatively, at least one of the sub-images may be selected based on the position and orientation information of the main image.

Optionally, the at least one secondary image may be selected based on depth range information of the primary image and/or sparse point information of the primary image.

Alternatively, as an embodiment, the multi-frame candidate secondary images may include multi-frame images that are closer to the primary image, wherein the distance between the two images may be calculated according to feature vectors of the two images, for example, a dot product between the two feature vectors. The preset requirement may be that the area of the region circumscribed by the sparse point is greater than a preset area. That is to say, when the target secondary image is selected, some virtual three-dimensional points may be randomly selected, and then the three-dimensional points are projected onto the multi-frame candidate secondary images, and the target secondary image is selected according to the order of the area size of the circumscribed region of the sparse points. For example, one sub-image with the largest area of the circumscribed region of the sparse point may be selected as the target sub-image; for example, top N sub-images ordered according to the area size of the circumscribed region of the sparse point may be selected as the target sub-images; for example, all sub-images having an area of a region circumscribed by the sparse point larger than a preset area may be selected as the target sub-images.

Alternatively, as an embodiment, the preset requirement may be that an angle between the optical axis and the optical axis of the main image is smaller than a preset angle range. That is, when selecting the target sub-image, an included angle between the optical axis of the main image and the optical axis of each frame of candidate sub-image may be determined; and determining the candidate sub-image corresponding to the included angle within the preset angle range as the target sub-image. That is, if an angle between the optical axis of a certain frame candidate sub-image and the optical axis of the main image is within a preset angle range, the certain frame candidate sub-image is taken as the target sub-image (or one of the target sub-images). For example, sparse points (three-dimensional points) on the main image may be projected onto multiple frames of candidate sub-images, then the depth range of each frame of candidate sub-image is calculated according to the sparse points, and at least one target sub-image with strong structure is selected. The depth range can be a depth range unified by a whole image or can be a depth range obtained by dividing an image into a plurality of sub image blocks, and each image block has an independent depth range; the strong structure may include that the intersection angle is close to 30 degrees, and the size of the area occupied by the sparse point is larger, and the intersection angle is formed by the sparse point and the optical center of the image.

By selecting the target sub-image in the manner, the target sub-image closer to the main image can be acquired, so that the obtained depth map of the main image is more accurate. It should be understood that other manners may be adopted by the embodiment of the present invention to select the target sub-image, and the present invention is not limited thereto.

Additionally, it is understood that in some embodiments, the primary image and the target secondary image in embodiments of the present invention may be captured by a movable platform (e.g., a drone).

S120, performing epipolar rectification on the main image and the target sub-image to acquire a first corrected image corresponding to the main image and a second corrected image corresponding to the target sub-image.

It is understood that if the target sub-image includes a plurality of frames, the first corrected image of the main image and the second corrected image of the target sub-image of the frame can be obtained as in S120 for each frame of the target sub-image. That is, assuming that there are N target sub-images, N first corrected images of the main image corresponding one-to-one to the N target sub-images may be obtained in S120, and a second corrected image of each frame of the target sub-image may be obtained.

For convenience of explanation, S120 is described in detail taking as an example an image pair composed of the main image and one frame of the target sub-image, and it is understood that the other frame of the target sub-image may be similarly performed.

Specifically, S120 may include: for the two images of the main image and the target sub-image, the main image may be mapped to the first corrected image by homography, the target sub-image may be mapped to the second corrected image by homography, and the same lines of the first corrected image and the second corrected image may be in one-to-one correspondence such that the same-name image points are located on the same lines.

The epipolar rectification is to solve two homography transformation matrixes corresponding to two images aiming at the two images, so that the same rows of the two transformed images after homography transformation of the two images are in one-to-one correspondence, and the image points with the same name are necessarily located in the same rows. Where epipolar line is the intersection of the plane passing through the base line and the two images, and appears in pairs in the two images, since the object points located in the plane must be distributed on the same epipolar line on the images, which is usually used to simplify the search space in matching. Where the baseline refers to the line between the optical centers of the two images.

When epipolar rectification is performed on the main image and the target sub-image, a first homography transformation matrix corresponding to the main image and a second homography transformation matrix corresponding to the target sub-image are determined. Then, the main image can be mapped by using a first homography transformation matrix to obtain a first correction image; and mapping the target secondary image by using the second homography transformation matrix to obtain a second correction image.

The epipolar rectification mainly needs to solve a first homography transformation matrix and a second homography transformation matrixTwo matrices, or H1 matrix and H2 matrix, respectively (collectively referred to as H matrices). Which is essentially to find two rotation matrices R_o,R_rThe relative rotation of the two images, the main image and the target sub-image, is made 0, and the long side of the image and the base line in space are parallel. Specifically, the optical center position of the image can be kept fixed, the direction of the base line is taken as the x direction, the normal direction of the optical center direction of the left image and the plane where the base line is located is taken as the y direction, the direction which is perpendicular to the x direction and the y direction and satisfies the right-hand coordinate system is taken as the z direction, and the postures of the two images are rotated to be under the defined coordinate system. Then the rotation parameter R before and after correction can be used_o,R_rAnd a camera matrix K according to

A homographic transformation matrix H from the original image to the epipolar rectified image can be calculated and the mapping is reversible, so that a homographic transformation matrix H from the epipolar rectified image to the original image can be calculated. That is, the main image may be mapped to the first corrected image by the first homography transform matrix H1, and the target sub-image may be mapped to the second corrected image by the second homography transform matrix H2. Conversely, the first corrected image may be mapped to the primary image by the homographic transformation matrix h1, and the second corrected image may be mapped to the target secondary image by the homographic transformation matrix h 2.

It should be noted that although the above-mentioned right-handed cartesian coordinate system illustrates epipolar rectification, this should not be construed as a limitation of the present invention, and other ways of epipolar rectification may be used. The present invention is not one-to-one listed for this.

S130, determining an image block corresponding to a target pixel point in the main image in the first corrected image and an image block corresponding to a pixel point in the target secondary image in the second corrected image.

Optionally, as an embodiment, for the main image and the first corrected image, in S130, an image block corresponding to a target pixel point in the main image in the first corrected image may be determined.

For example, a pixel point matched with a target pixel point in a main image may be determined in a first corrected image, and an image block corresponding to the target pixel point in the main image in the first corrected image may be determined according to the matched pixel point.

The target pixel points in the main image may include a plurality of target pixel points, which may be all pixel points in the main image, or may be partial pixel points in the main image. S130 may obtain an image block corresponding to each target pixel point.

Optionally, if the target pixel point is a part of pixel points in the main image, the main image may be collected according to a preset sampling rule to obtain the target pixel point.

Specifically, all pixel points or some pixel points in the main image may be sampled, so as to obtain target pixel points. The sampling mode in the embodiment of the present invention is not limited, and for example, uniform sampling, random sampling, and the like may be performed.

For convenience of description, it is assumed that a target pixel point in a main image is a first pixel, and an image block in a corresponding first corrected image is a first image block.

Specifically, based on the first homography transformation matrix H1, which pixel in the first corrected image the first pixel is mapped to is determined, that is, the first position in the first corrected image to which the first pixel is mapped is obtained, and then the first image block where the first position is located can be obtained. Wherein an image block made up of the first position together with a plurality of pixels around it may be determined as the first image block. Alternatively, an image block made up of the pixel at the first position together with a plurality of pixels surrounding it may be determined as the first image block. As an example, an image block centered on the first position may be determined as the first image block.

For example, assume that the two-dimensional coordinates of the first pixel in the main image are represented as [ x1, y1 ]]The coordinates mapping it to a first location in the first corrected image may then be: [ x11 y11 ]]^T＝H1[x1 y1]^TWherein the coordinates of the first position are represented as [ x11, y11 ]]And T denotes transposition.

For example, an image block made up of k1 × k2 pixels where the first position is located may be determined as the first image block. Wherein k1 and k2 are positive integers greater than or equal to 2, and they may be equal or unequal. As an example, k1 is k2, and specifically, for example, an image block made up of 3 × 3 pixels centered on the pixel at the first position is determined as the first image block. It should be understood that the shape or size of the first image block is not limited in the embodiments of the present invention.

Optionally, as an embodiment, for the target sub-image and the second correction image, the image blocks corresponding to the pixel points in the target sub-image in the second correction image may be determined in S130.

For example, pixel points matching with pixel points in the target secondary image may be determined in the second correction image, and an image block corresponding to the pixel points in the target secondary image in the second correction image may be determined according to the matching pixel points.

The pixel points in the target secondary image may include a plurality of pixel points, which may be all pixel points in the target secondary image, or may be partial pixel points in the target secondary image. S130 may obtain an image block corresponding to each pixel point.

For convenience of description, it is assumed that the pixel point in the target sub-image is the second pixel, and the image block in the corresponding second corrected image is the second image block.

Specifically, based on the second homography transformation matrix H2, it may be determined to which pixel in the second correction image the second pixel is mapped, that is, a second position in the second correction image to which the second pixel is mapped is obtained, and then the second image block where the second position is located may be obtained. Wherein the second position together with an image block of a plurality of pixels surrounding it may be determined as the second image block. Alternatively, an image block made up of the second-position pixel together with a plurality of pixels surrounding it may be determined as the second image block. As an example, an image block centered on the second position may be determined as the second image block.

For example, assume that the two-dimensional coordinates of the second pixel in the target sub-image are represented as [ x2, y2 ]]The coordinates mapping it to a second location in the second corrected image may then be: [ x22 y22 ]]^T＝H2[x2 y2]^TWherein the coordinates of the second position are represented as [ x22, y22 ]]And T denotes transposition.

For example, an image block made up of k3 × k4 pixels where the second position is located may be determined as the second image block. Wherein k3 and k4 are positive integers greater than or equal to 2, and they may be equal or unequal. As an example, k3 is k4, and specifically, for example, an image block made up of 3 × 3 pixels centered on the pixel at the second position is determined as the second image block. It should be understood that the shape or size of the second image block is not limited in the embodiments of the present invention.

It can be seen that in S130, the method of determining the image blocks in the respective corrected images for the main image and for the target sub-image is similar. And wherein the first image block and the second image block may comprise the same number of pixels, e.g. the first image block comprises 9 pixels and the second image block comprises 9 pixels.

Similarly, for a plurality of target pixel points in the main image, an image block corresponding to each target pixel point in the first corrected image can be obtained. And aiming at a plurality of pixel points in the target auxiliary image, obtaining the image block corresponding to each pixel point in the second correction image.

Similarly, if the target secondary image includes a plurality of target secondary images, the process of S130 may be performed for the main image and any one of the target secondary images, and details thereof are not repeated here.

S140, determining the characteristic information of the target pixel point in the main image according to the image block, corresponding to the target pixel point in the main image, in the first corrected image, and determining the characteristic information of the pixel point in the target secondary image according to the image block, corresponding to the pixel point in the target secondary image, in the second corrected image.

Optionally, as an embodiment, the feature information of the target pixel in the main image may be determined according to an image block of the target pixel in the main image corresponding to the first corrected image.

Exemplarily, the mapping pixel points of the pixel points in the image block corresponding to the target pixel point in the main image may be determined, and the color information of the mapping pixel points in the main image is compared with the color information of the target pixel point in the main image to obtain the feature information of the target pixel point in the main image.

In the embodiment of the present invention, the feature information may be a descriptor (census), or may be break, daisy, surf, sift, or the like. For convenience of description, the following description will be made in detail by taking a descriptor as an example.

In combination with S130, it is still assumed that the target pixel point in the main image is the first pixel, and the image block in the corresponding first corrected image is the first image block.

In particular, it may be determined which pixels in the first image block the respective pixels are mapped to in the main image based on the homographic transformation matrix H1 (i.e. the inverse of the first homographic transformation matrix H1), and then the characteristic information of the first pixels is calculated from the color information of these pixels mapped to the main image.

That is, each pixel in the first image block may be mapped to the main image using the inverse matrix H1 of the first homography transform matrix H1, resulting in a corresponding plurality of mapped pixels in the main image. For example, assuming that the first image block includes m pixel points, the m pixel points are mapped to the main image to obtain m mapped pixels. It will be appreciated that the m mapped pixels comprise the first pixel and that the m mapped pixels are not necessarily consecutive and may be spaced apart, not consecutive, i.e. the m mapped pixels may not form an image block in the main image. However, the m mapping pixels may be sorted according to the positions of the m pixel points in the first image block. For example, the m pixel points in the first image block are ordered from top to bottom and from left to right, so that the order of the m mapped pixels after mapping can be obtained.

Subsequently, feature information (e.g., descriptors) of the first pixel may be calculated from the color information of the plurality of mapped pixels. Alternatively, the color information may be a grey value, i.e. the descriptor of the first pixel may be derived from the grey values of a plurality of mapped pixels.

Assuming that the first image block includes m pixel points, the plurality of mapping pixels are m mapping pixels. The gray value of each of the m mapping pixels can be obtained; comparing the gray values of the other m-1 mapping pixels except the first pixel in the m mapping pixels with the gray value of the first pixel; and obtains a descriptor of the first pixel according to the comparison result.

Alternatively, the descriptor may be in the form of a binary string. As an implementation manner, if the gray value of a certain mapping pixel in m-1 mapping pixels is smaller than the gray value of the first pixel, the mapping pixel is marked as 1; if the grey value of another mapped pixel of the m-1 mapped pixels is larger than or equal to the grey value of the first pixel, the other mapped pixel is marked as 0. Subsequently, the marker values of m-1 mapped pixels are sequentially concatenated into a binary string as a descriptor of the first pixel. In this example, the descriptor for the first pixel is a string of m-1 binary characters.

For example, assume that the first image block includes 9 (i.e., m equals 9) pixels, and the gray values of the 9 pixels mapped to the 9 mapped pixels in the main image are sequentially: 123. 127, 129, 126, 128, 129, 127, 131 and 130. Where 128 represents the gray value of the first pixel. Then the descriptor of the first pixel is calculated to be 11010100.

It should be noted that the descriptor may also be calculated by other methods according to the embodiments of the present invention, and the descriptor may also be expressed in other forms, which are not listed here.

Optionally, as an embodiment, the feature information of the pixel point in the target secondary image may be determined according to the image block of the pixel point in the target secondary image in the second correction image.

For example, the mapping pixel point of the pixel point in the image block corresponding to the pixel point in the target secondary image may be determined, and the color information of the mapping pixel point in the target secondary image may be compared with the color information of the pixel point in the target secondary image to obtain the feature information of the pixel point in the target secondary image.

In conjunction with S130, it is still assumed that the pixel point in the target sub-image is the second pixel, and the image block in the corresponding second corrected image is the second image block.

Specifically, it may be determined which pixels in the second image block are mapped to in the target sub-image based on the homographic transformation matrix H2 (i.e. the inverse of the second homographic transformation matrix H2), and then the characteristic information of the second pixels is calculated from the color information of these pixels mapped to in the target sub-image.

That is, the pixels in the second image block may be mapped to the target sub-image using the inverse matrix H2 of the second homography transformation matrix H2, resulting in a corresponding plurality of mapped pixels in the target sub-image. For example, assuming that the second image block includes m pixel points, the m pixel points are mapped to the target secondary image to obtain m mapped pixels. It will be appreciated that the m mapped pixels comprise the second pixel, and that the m mapped pixels are not necessarily contiguous and may be spaced apart, not contiguous, i.e. the m mapped pixels may not form an image block in the target secondary image. However, the m mapping pixels may be sorted according to the positions of the m pixel points in the second image block. For example, the m pixel points in the second image block are ordered from top to bottom and from left to right, so that the order of the m mapped pixels after mapping can be obtained.

Subsequently, feature information (e.g., descriptors) of the second pixel may be calculated from the color information of the plurality of mapped pixels. Alternatively, the color information may be a grey value, i.e. the descriptor of the second pixel may be derived from the grey values of a plurality of mapped pixels.

Assuming that the second image block includes m pixel points, the plurality of mapping pixels are m mapping pixels. The gray value of each of the m mapping pixels can be obtained; comparing the gray values of the other m-1 mapping pixels except the second pixel in the m mapping pixels with the gray value of the second pixel; and obtains a descriptor of the second pixel according to the result of the comparison.

Alternatively, the descriptor may be in the form of a binary string. As an implementation manner, if the gray value of a certain mapping pixel in m-1 mapping pixels is smaller than the gray value of the second pixel, the mapping pixel is marked as 1; if the grey value of another mapped pixel of the m-1 mapped pixels is larger than or equal to the grey value of the second pixel, the other mapped pixel is marked as 0. Subsequently, the marker values of m-1 mapped pixels are sequentially concatenated into a binary string as a descriptor of the first pixel. In this example, the descriptor for the second pixel is a string of m-1 binary characters.

It can be seen that, in S140, the method of calculating the feature information of the pixel in the target sub-image is similar to the method of calculating the feature information of the target pixel point in the main image. The obtained feature information of the pixel in the target sub-image and the feature information of the target pixel in the main image may have the same dimension, for example, both are character strings formed by m-1 binary characters.

Similarly, for a plurality of target pixel points in the main image, the feature information of each target pixel point can be obtained. And aiming at a plurality of pixel points in the target secondary image, the characteristic information of each pixel point can be obtained.

Similarly, if the target secondary image includes a plurality of target secondary images, the process of S140 may be performed for the main image and any one of the target secondary images, and details thereof are not repeated here.

S150, determining a depth map corresponding to the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of the pixel points in the target auxiliary image.

Optionally, the matching cost of the target pixel point in the main image may be obtained according to the hamming distance between the characteristic information of the target pixel point in the main image and the characteristic information of the pixel point in the target secondary image. And determining a depth map corresponding to the main image based on the matching cost of a plurality of target pixel points in the main image.

Illustratively, as shown in fig. 2, S150 may include:

s1501, acquiring a plurality of candidate depth indication parameters;

s1502, projecting an object point corresponding to each target pixel point in the main image to the target secondary image according to the candidate depth indication parameters, so as to determine a plurality of projection pixel points matched with the target pixel points in the main image among the pixel points in the target secondary image;

s1503, determining a plurality of matching costs of the target pixel points in the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of a plurality of projection pixel points matched with the target pixel points in the main image;

s1504, determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the matching costs of the target pixel point in the main image.

The depth indication parameter may be depth or inverse depth, or may be other parameters related to depth.

Alternatively, a range of the depth indication parameter corresponding to the main image may be acquired, so that in S1501, sampling may be performed within the range of the depth indication parameter to acquire the plurality of candidate depth indication parameters.

For example, assuming that the depth-indicating parameter is depth, the range of the depth-indicating parameter (i.e., depth range) is [ d ]_min,d_max]. For example, if the depth indication parameter is assumed to be inverse depth, then the depth range of the main image is known [ d ]_min,d_max]Then the range of the depth indication parameter (i.e., the inverse depth range) is [ d ]_max ^-1,d_min ^-1]。

For example, in S1502, a target pixel point of a main image may be sampled multiple times in an inverse depth space (or depth space), and an object point obtained by multiple sampling may be projected onto a target sub-image, so as to obtain a plurality of corresponding projected pixel points. Wherein inverse depth sampling can better match the image.

Wherein the sampling may be uniform sampling, such as equally spaced sampling or approximately equally spaced sampling in the image plane. Alternatively, the sampling may be random sampling, or sampling in other manners, which is not limited in this respect.

The projection may be a central projection, and the object depth of the sampled object point after projection is not uniform, as shown in fig. 3.

Alternatively, as an embodiment, if the number of the target sub-images is plural, that is, if the target sub-image is a target sub-image of plural frames, in S1502, the method may include: and projecting object points corresponding to the target pixel points in the main image to each frame of target secondary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in the pixel points in each frame of target secondary image. That is to say, the target pixel points of the main image may be sampled for multiple times, and the object space points obtained by the multiple sampling may be projected onto each frame of target sub-image, so that a plurality of projected pixel points are correspondingly obtained on each frame of target image.

Exemplarily, in S1503, for each projection pixel point: the Hamming distance between the characteristic information of the target pixel point and the characteristic information of the projection pixel point can be calculated, and the result is used as the matching cost between the characteristic information of the target pixel point and the characteristic information of the projection pixel point. Based on a plurality of projection pixel points, a plurality of corresponding matching costs can be obtained. This number is consistent with the number of times the target pixel is sampled.

Alternatively, the feature information may be a descriptor, which may be represented as the shape of a binary string, and then the hamming distance may be equal to the number of bits in the two descriptors that are not the same.

Alternatively, as an embodiment, if the number of the target sub-images is plural, that is, if the target sub-image is a multi-frame target sub-image, in S1503, it may include: determining a plurality of matching cost sets of target pixel points in the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of a plurality of projection pixel points matched with the target pixel points in each frame of target secondary image, wherein each matching cost set corresponds to one candidate depth indication parameter; and fusing the matching cost in each of a plurality of matching cost sets to obtain the plurality of matching costs.

Specifically, the target pixel points of the main image may be sampled, and the object space points obtained by sampling are projected onto each frame of target sub-image, so that the projection pixel points are correspondingly obtained on each frame of target image (as shown in fig. 3). And taking the matching cost obtained based on the characteristic information of the target pixel points and the characteristic information of the projection pixel points as a matching cost set. By sampling the target pixel point for multiple times, multiple matching cost sets can be obtained similarly. The method for calculating the matching cost may be as described above, and is not described herein again.

For example, assuming that P-frame target sub-images are included, N candidate depth indication parameters (i.e., N sampling times) are provided. Then N matching cost sets can be obtained, and each matching cost set includes P matching costs. And the N matching cost sets correspond to the N candidate depth indication parameters one by one, and P matching costs in each matching cost set correspond to the P frame target auxiliary images one by one.

Specifically, the fusion of the matching costs in one matching cost set may be implemented by taking a minimum, weighting and summing, and the like. For example, a matching cost set includes P matching costs, and the minimum value of the P matching costs is used as the fusion matching cost. In this way, by fusing each matching cost set in the N matching cost sets, N fused matching costs can be obtained, and the N fused matching costs are in one-to-one correspondence with the N candidate depth indication parameters.

Similarly, a plurality of matching costs corresponding to each target pixel point in a plurality of target pixel points can be obtained. For example, for M target pixel points, each target pixel point corresponds to N matching costs.

Exemplarily, S1504 may include: determining the minimum matching cost from a plurality of matching costs of target pixel points in the main image; and determining the candidate depth indication parameter corresponding to the minimum matching cost as the depth corresponding to the target pixel point in the main image.

For example, for the target pixel point a, a plurality of matching costs are obtained, where the smallest matching cost is PPDJ1, and the smallest matching cost PPDJ1 is obtained when the candidate depth indication parameter DEP1 is sampled, then the candidate depth indication parameter DEP1 may be selected, and the depth of the target pixel point a may be determined according to the candidate depth indication parameter DEP 1. If the candidate depth indication parameter is depth, the depth of the target pixel point A is DEP 1; if the candidate depth indication parameter is an inverse depth, the depth of the target pixel point A is DEP1^-1。

Similarly, the depth of each target pixel point can be obtained, so that the depth map of the main image is obtained.

Alternatively, as an embodiment, if the number of the target sub-images is multiple, that is, if the target sub-image is a multi-frame target sub-image, in S1504, the method may include: filtering a plurality of matching cost combinations of target pixel points in the main image; and determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the multiple matching cost combinations of the target pixel point in the main image obtained by filtering.

Specifically, for one target pixel point, a plurality of matching costs can be obtained. For a plurality of target pixel points (for example, M), M plurality of matching costs may be obtained, which may be referred to as a plurality of matching cost combinations.

Optionally, the filtering process may be performed on a plurality of matching cost combinations according to a preset direction.

The filtering process may be performed by using a multipath planning method. For example, the preset direction may be from left to right, and then, starting from the left side of the image, one of the matching costs of the pixel currently actually subjected to the filtering process may be optimized according to the matching costs of the previous pixel. In the optimization process, a plurality of matching costs of pixels are unchanged (i.e., not optimized), and then a plurality of matching costs of pixels from left to right are optimized once, so that a plurality of optimized matching costs are obtained. For example, but not limited to, a dynamic programming optimization method with 8 directions (as shown in fig. 4) may be adopted to obtain the optimized matching costs. For a plurality of matching cost combinations of a plurality of target pixel points, a plurality of optimized matching cost combinations, which are also called a plurality of matching cost combinations after filtering, can be obtained.

It can be understood that the dynamic programming algorithm can generally obtain an approximate global optimal solution, so that the obtained depth map is more accurate and complete. In addition, the dynamic programming algorithm is robust to small-area weak texture regions, such as even non-texture regions, whose depth can be obtained by extending the depth of the edge.

The filtered multiple matching cost combinations comprise multiple matching costs of each target pixel point in multiple target pixel points. Then, for a plurality of matching costs after filtering processing of one target pixel point, the smallest matching cost may be selected, and the candidate depth indication parameter corresponding to the smallest matching cost is determined as the depth corresponding to the target pixel point. For the process, reference may be made to a specific process for determining the depth in the above embodiment, which is not described herein again.

Based on a plurality of matching costs of other target pixel points in a plurality of matching cost combinations, the depth of each target pixel point can be obtained similarly. Thereby enabling a depth map of the main image to be obtained.

Fig. 5 is a schematic block diagram of an apparatus 50 for acquiring a depth map according to an embodiment of the present invention. As shown in fig. 7, the apparatus 50 may include:

an image acquisition module 510 for acquiring a primary image and a target secondary image;

an epipolar rectification module 520 for epipolar-rectifying the main image and the target sub-image to obtain a first corrected image corresponding to the main image and a second corrected image corresponding to the target sub-image;

a mapping module 530 for determining an image block in the first corrected image corresponding to a target pixel in the main image and an image block in the second corrected image corresponding to a pixel in the target secondary image;

a feature information calculation module 540, configured to determine feature information of a target pixel in the main image according to an image block in the first corrected image corresponding to the target pixel in the main image, and determine feature information of a pixel in the target secondary image according to an image block in the second corrected image corresponding to the pixel in the target secondary image;

and a depth map determining module 550, configured to determine a depth map corresponding to the main image according to the feature information of the target pixel in the main image and the feature information of the pixel in the target secondary image.

Optionally, in some embodiments, the mapping module 530 may be specifically configured to: determining pixel points matched with target pixel points in the main image in the first correction image, and determining image blocks corresponding to the target pixel points in the main image in the first correction image according to the matched pixel points; determining pixel points matched with the pixel points in the target auxiliary image in the second correction image, and determining image blocks corresponding to the pixel points in the target auxiliary image in the second correction image according to the matched pixel points.

Optionally, in some embodiments, the feature information calculating module 540 may be specifically configured to: determining mapping pixel points of pixel points in an image block corresponding to a target pixel point in the main image, and comparing color information of the mapping pixel points in the main image with color information of the target pixel points in the main image to obtain characteristic information of the target pixel points in the main image; determining mapping pixel points of pixel points in image blocks corresponding to the pixel points in the target secondary image, and comparing color information of the mapping pixel points in the target secondary image with color information of the pixel points in the target secondary image to obtain characteristic information of the pixel points in the target secondary image.

Optionally, in some embodiments, as shown in fig. 6, the depth map determining module 550 may include:

an acquiring unit 5501 configured to acquire a plurality of candidate depth indication parameters;

a projecting unit 5502, configured to project, according to the candidate depth indication parameters, an object point corresponding to each target pixel point in the main image to the target secondary image, so as to determine, among pixel points in the target secondary image, a plurality of projected pixel points that are matched with target pixel points in the main image;

a matching cost calculation unit 5503, configured to determine multiple matching costs of target pixels in the master image according to the feature information of the target pixels in the master image and the feature information of multiple projection pixels matched with the target pixels in the master image;

a depth determining unit 5504, configured to determine, according to a plurality of matching costs of target pixel points in the main image, a depth corresponding to a target pixel point in the main image from the plurality of candidate depth indicating parameters.

Optionally, in some embodiments, the apparatus further comprises: and the depth range acquisition module is used for acquiring the range of the depth indication parameter corresponding to the main image. The obtaining unit 5501 may be specifically configured to: sampling over the range of depth indication parameters to obtain the plurality of candidate depth indication parameters.

Optionally, in some embodiments, the depth determining unit 5504 may be specifically configured to: determining the minimum matching cost from a plurality of matching costs of target pixel points in the main image; and determining the candidate depth indication parameter corresponding to the minimum matching cost as the depth corresponding to the target pixel point in the main image.

Optionally, in some embodiments, the depth indication parameter comprises depth or inverse depth.

Optionally, in some embodiments, the target sub-image comprises a plurality of frames of the target sub-image, where the projection unit 5502 may be specifically configured to: and projecting object points corresponding to the target pixel points in the main image to each frame of target auxiliary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in the pixel points in each frame of target auxiliary image. The matching cost calculation unit 5503 may be specifically configured to: determining a plurality of matching cost sets of each target pixel point in the main image according to the characteristic information of each target pixel point in the main image and the characteristic information of a plurality of projection pixel points matched with each target pixel point in the main image in each frame of target secondary image, wherein each matching cost set corresponds to one candidate depth indication parameter; and fusing the matching cost in each of a plurality of matching cost sets to obtain the plurality of matching costs.

Optionally, in some embodiments, the apparatus may further include a sampling module, configured to sample the main image according to a preset sampling rule to obtain a target pixel point of the main image.

Optionally, in some embodiments, as illustrated in fig. 6, the image acquisition module 510 may include:

a candidate image obtaining unit 5101 for obtaining a plurality of frames of candidate sub-images;

an image selecting unit 5102 is configured to select the target sub-image that meets a preset requirement from the plurality of frames of candidate sub-images.

Optionally, in some embodiments, the image selection unit 5102 may be specifically configured to: determining an included angle between the optical axis of the main image and the optical axis of each frame of candidate auxiliary image; and determining the candidate sub-image corresponding to the included angle within the preset angle range as the target sub-image.

Optionally, in some embodiments, the depth determining unit 5504 may include:

the filtering unit is used for filtering a plurality of matching cost combinations of target pixel points in the main image;

and the determining unit is used for determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the multiple matching cost combinations of the target pixel point in the main image obtained by the filtering processing.

Optionally, in some embodiments, the filtering unit may be specifically configured to: and filtering a plurality of matching cost combinations of the target pixel points in the main image according to a preset direction.

As shown in fig. 7, the embodiment of the present invention further provides an apparatus 70 for obtaining a depth map. The apparatus 70 includes a processor 710 and a memory 720. The memory 720 stores computer instructions that, when executed by the processor 710, cause the processor 710 to: acquiring a main image and a target auxiliary image; epipolar rectification is performed on the main image and the target sub-image to acquire a first corrected image corresponding to the main image and a second corrected image corresponding to the target sub-image; determining an image block in the first corrected image corresponding to a target pixel point in the main image and an image block in the second corrected image corresponding to a pixel point in the target sub-image; determining feature information of target pixel points in the main image according to image blocks, corresponding to the target pixel points in the main image, in the first corrected image, and determining feature information of pixel points in the target secondary image according to image blocks, corresponding to the pixel points in the target secondary image, in the second corrected image; and determining a depth map corresponding to the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of the pixel points in the target auxiliary image.

Optionally, in some embodiments, the processor 710 may be specifically configured to: determining pixel points matched with target pixel points in the main image in the first correction image, and determining image blocks corresponding to the target pixel points in the main image in the first correction image according to the matched pixel points; determining pixel points matched with the pixel points in the target auxiliary image in the second correction image, and determining image blocks corresponding to the pixel points in the target auxiliary image in the second correction image according to the matched pixel points.

Optionally, in some embodiments, the processor 710 may be specifically configured to: determining mapping pixel points of pixel points in an image block corresponding to a target pixel point in the main image, and comparing color information of the mapping pixel points in the main image with color information of the target pixel points in the main image to obtain characteristic information of the target pixel points in the main image; determining mapping pixel points of pixel points in image blocks corresponding to the pixel points in the target secondary image, and comparing color information of the mapping pixel points in the target secondary image with color information of the pixel points in the target secondary image to obtain characteristic information of the pixel points in the target secondary image.

Optionally, in some embodiments, the processor 710 may be specifically configured to: acquiring a plurality of candidate depth indication parameters; projecting object points corresponding to each target pixel point in the main image to the target auxiliary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in the pixel points in the target auxiliary image; determining a plurality of matching costs of the target pixel points in the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of a plurality of projection pixel points matched with the target pixel points in the main image; and determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the matching costs of the target pixel point in the main image.

Optionally, in some embodiments, the processor 710 may be further configured to: and acquiring the range of the depth indication parameter corresponding to the main image. Processor 710 may thus be configured to: sampling over the range of depth indication parameters to obtain the plurality of candidate depth indication parameters.

Optionally, in some embodiments, the processor 710 may be specifically configured to: determining the minimum matching cost from a plurality of matching costs of target pixel points in the main image; and determining the candidate depth indication parameter corresponding to the minimum matching cost as the depth corresponding to the target pixel point in the main image.

Optionally, in some embodiments, the target secondary image includes multiple frames of target secondary images, where the processor 710 may be specifically configured to: and projecting object points corresponding to the target pixel points in the main image to each frame of target auxiliary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in the pixel points in each frame of target auxiliary image. Processor 710 may be further specifically configured to: determining a plurality of matching cost sets of each target pixel point in the main image according to the characteristic information of each target pixel point in the main image and the characteristic information of a plurality of projection pixel points matched with each target pixel point in the main image in each frame of target secondary image, wherein each matching cost set corresponds to one candidate depth indication parameter; and fusing the matching cost in each of a plurality of matching cost sets to obtain the plurality of matching costs.

Optionally, in some embodiments, the processor 710 may be further configured to sample the main image according to a preset sampling rule to obtain a target pixel point of the main image.

Optionally, in some embodiments, the processor 710 may be specifically configured to: acquiring a plurality of candidate auxiliary images; and selecting the target auxiliary image meeting preset requirements from the plurality of frames of candidate auxiliary images.

Optionally, in some embodiments, the processor 710 may be specifically configured to: determining an included angle between the optical axis of the main image and the optical axis of each frame of candidate auxiliary image; and determining the candidate sub-image corresponding to the included angle within the preset angle range as the target sub-image.

Optionally, in some embodiments, the processor 710 may be specifically configured to: filtering a plurality of matching cost combinations of target pixel points in the main image; and determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the multiple matching cost combinations of the target pixel point in the main image obtained by the filtering processing.

Optionally, in some embodiments, the processor 710 may be specifically configured to: and filtering a plurality of matching cost combinations of the target pixel points in the main image according to a preset direction.

The apparatus 50 shown in fig. 5 or fig. 6 and the apparatus 70 shown in fig. 7 can implement the steps of the method for obtaining a depth map described above, and are not described herein again to avoid repetition.

An embodiment of the present invention further provides a movable platform, including: a camera for outputting the primary image and the target secondary image and a means for obtaining a depth map as described in any of figures 5 to 7.

Embodiments of the present invention also provide a computer storage medium having a computer program stored thereon, where the computer program is executed by a computer, so that the computer executes the method for acquiring a depth map provided in the above method embodiments.

Embodiments of the present invention also provide a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method for acquiring a depth map provided in the above method embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processor, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of obtaining a depth map, comprising:

acquiring a main image and a target auxiliary image;

2. The method according to claim 1, wherein the determining an image block in the first corrected image corresponding to a target pixel point in the main image and an image block in the second corrected image corresponding to a pixel point in the target sub-image comprises:

determining pixel points matched with target pixel points in the main image in the first correction image, and determining image blocks corresponding to the target pixel points in the main image in the first correction image according to the matched pixel points;

determining pixel points matched with the pixel points in the target auxiliary image in the second correction image, and determining image blocks corresponding to the pixel points in the target auxiliary image in the second correction image according to the matched pixel points.

3. The method according to claim 1, wherein determining the feature information of the target pixel in the main image according to the image block in the first corrected image corresponding to the target pixel in the main image, and determining the feature information of the pixel in the target sub-image according to the image block in the second corrected image corresponding to the pixel in the target sub-image comprises:

determining mapping pixel points of pixel points in an image block corresponding to a target pixel point in the main image, and comparing color information of the mapping pixel points in the main image with color information of the target pixel points in the main image to obtain characteristic information of the target pixel points in the main image;

determining mapping pixel points of pixel points in image blocks corresponding to the pixel points in the target secondary image, and comparing color information of the mapping pixel points in the target secondary image with color information of the pixel points in the target secondary image to obtain characteristic information of the pixel points in the target secondary image.

4. The method according to any one of claims 1 to 3, wherein the determining the depth map corresponding to the main image according to the feature information of the target pixel point in the main image and the feature information of the pixel point in the target secondary image comprises:

acquiring a plurality of candidate depth indication parameters;

projecting object points corresponding to each target pixel point in the main image to the target auxiliary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in the pixel points in the target auxiliary image;

determining a plurality of matching costs of the target pixel points in the main image according to the characteristic information of the target pixel points in the main image and the characteristic information of a plurality of projection pixel points matched with the target pixel points in the main image;

and determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the matching costs of the target pixel point in the main image.

5. The method of claim 4, further comprising:

acquiring the range of the depth indication parameter corresponding to the main image;

wherein the obtaining a plurality of candidate depth indication parameters comprises:

sampling over the range of depth indication parameters to obtain the plurality of candidate depth indication parameters.

6. The method according to claim 4 or 5, wherein said determining the depth corresponding to the target pixel point in the main image from the plurality of candidate depth indication parameters according to a plurality of matching costs of the target pixel point in the main image comprises:

determining the minimum matching cost from a plurality of matching costs of target pixel points in the main image;

and determining the candidate depth indication parameter corresponding to the minimum matching cost as the depth corresponding to the target pixel point in the main image.

7. The method of any of claims 4-6, wherein the depth indication parameter comprises depth or inverse depth.

8. The method according to any of claims 4-7, wherein the target sub-image comprises a plurality of frames of target sub-images, wherein,

the projecting, according to the candidate depth indication parameters, an object point corresponding to each target pixel point in the main image to the target secondary image to determine, among pixel points in the target secondary image, a plurality of projected pixel points matched with the target pixel points in the main image, includes:

projecting object points corresponding to target pixel points in the main image to each frame of target auxiliary image according to the candidate depth indication parameters so as to determine a plurality of projection pixel points matched with the target pixel points in the main image in pixel points in each frame of target auxiliary image;

determining a plurality of matching costs of the target pixel points in the main image according to the feature information of the target pixel points in the main image and the feature information of the plurality of projection pixel points matched with the target pixel points in the main image, including:

determining a plurality of matching cost sets of each target pixel point in the main image according to the characteristic information of each target pixel point in the main image and the characteristic information of a plurality of projection pixel points matched with each target pixel point in the main image in each frame of target secondary image, wherein each matching cost set corresponds to one candidate depth indication parameter;

and fusing the matching cost in each of a plurality of matching cost sets to obtain the plurality of matching costs.

9. The method of claim 8, further comprising:

and sampling the main image according to a preset sampling rule to obtain target pixel points of the main image.

10. The method of claim 8, wherein said obtaining the target secondary image comprises:

acquiring a plurality of candidate auxiliary images;

and selecting the target auxiliary image meeting preset requirements from the plurality of frames of candidate auxiliary images.

11. The method according to claim 10, wherein said selecting the target sub-image satisfying a preset requirement from the plurality of frames of candidate sub-images comprises:

determining an included angle between the optical axis of the main image and the optical axis of each frame of candidate auxiliary image;

and determining the candidate sub-image corresponding to the included angle within the preset angle range as the target sub-image.

12. The method according to any one of claims 8 to 11, wherein determining the depth corresponding to the target pixel point in the main image from the plurality of candidate depth indication parameters according to a plurality of matching costs of the target pixel point in the main image comprises:

filtering a plurality of matching cost combinations of target pixel points in the main image;

and determining the depth corresponding to the target pixel point in the main image from the candidate depth indication parameters according to the multiple matching cost combinations of the target pixel point in the main image obtained by the filtering processing.

13. The method according to claim 12, wherein the filtering the plurality of matching cost combinations of the target pixels in the main image comprises:

and filtering a plurality of matching cost combinations of the target pixel points in the main image according to a preset direction.

14. An apparatus for obtaining a depth map, comprising: a memory and a processor, wherein,

the memory to store computer instructions;

the processor for invoking the computer instructions, when executed, for performing the steps of:

acquiring a main image and a target auxiliary image;

15. The apparatus of claim 14, wherein the processor is specifically configured to:

16. The apparatus of claim 14, wherein the processor is specifically configured to:

17. The apparatus according to any one of claims 14 to 16, wherein the processor is specifically configured to:

acquiring a plurality of candidate depth indication parameters;

18. The apparatus according to claim 17, wherein the processor is further configured to obtain a range of a depth indication parameter corresponding to the main image;

the processor is specifically configured to sample within the range of the depth indication parameter to obtain the plurality of candidate depth indication parameters.

19. The apparatus according to claim 17 or 18, wherein the processor is specifically configured to:

20. The apparatus of any of claims 17-19, wherein the depth indication parameter comprises depth or inverse depth.

21. The apparatus according to any of claims 17-20, wherein the target sub-image comprises a plurality of frames of target sub-images, wherein,

the processor is specifically configured to:

22. The apparatus according to claim 21, wherein the processor is further configured to sample the main image according to a preset sampling rule to obtain a target pixel point of the main image.

23. The apparatus of claim 21, wherein the processor is specifically configured to:

acquiring a plurality of candidate auxiliary images;

24. The apparatus of claim 23, wherein the processor is specifically configured to:

25. The apparatus of claim 21, wherein the processor is specifically configured to:

26. The apparatus of claim 25, wherein the processor is specifically configured to:

27. A movable platform, comprising: camera means for outputting the primary image and the target secondary image and means for acquiring a depth map as claimed in any one of claims 14 to 26.

28. A computer storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 13.