CN111815693B

CN111815693B - Depth image generation method and device

Info

Publication number: CN111815693B
Application number: CN202010923720.0A
Authority: CN
Inventors: 周凯; 唐士斌; 欧阳鹏
Original assignee: Beijing Qingwei Intelligent Technology Co ltd
Current assignee: Beijing Qingwei Intelligent Technology Co ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2021-01-12
Anticipated expiration: 2040-09-04
Also published as: CN111815693A

Abstract

The invention discloses a method and a device for generating a depth image, which comprise the following steps: and calculating the gray gradient of each scene pixel point in the scene image block, and taking a plurality of scene pixel points exceeding a gradient threshold value as effective scene pixel points. And acquiring initial coordinates and gray values of corresponding sub-pixel points of the same name in the reference image of the effective scene pixel points, and expressing the parallax of the effective scene pixel points as linear polynomial functions of the pixel point coordinates. And acquiring a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the homonymous sub-pixel point. And acquiring the corresponding target polynomial coefficient when the normalized cross-correlation function value is maximum. And acquiring the target parallax of the scene pixel point in the center of the image block. And calculating the depth value corresponding to the scene pixel point by the parallax to generate a depth image corresponding to the scene image, and improving the accuracy of the depth estimation result of each pixel point in the scene image.

Description

Depth image generation method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a depth image generation method and device.

Background

The monocular speckle structured light system is a depth perception device and is applied to the fields of intelligent security, mobile robots, augmented reality and the like. The system utilizes a speckle projection device to project a fine speckle pattern onto the surface of a scene, and simultaneously, a camera acquires an image of the scene and matches the image with a pre-stored reference image, so that a depth map of the scene is obtained. In the process, image matching is an important step for generating a depth map, and the accuracy of the depth result is directly influenced by the quality of a matching result.

The existing matching methods are mainly divided into two categories, namely a grayscale-based method and a binary-based method, and the two categories of methods generally adopt an image block matching idea to calculate the matching cost. Image block matching is based on the assumption that the disparity values of all pixels in the same image block are substantially equal. For a relatively complex scene image, the parallax of each pixel in a local area may have a large variation range, resulting in low accuracy of the depth estimation result of each pixel in the scene image.

Disclosure of Invention

The invention aims to provide a depth image generation method. And for the region with larger local depth change, the accuracy of the depth estimation result of each pixel point in the scene image is improved.

In order to achieve the purpose, the adopted technical scheme comprises the following steps:

s110: the method comprises the steps of obtaining a scene image and a pre-stored reference image, and calculating a sub-pixel parallax value of each scene pixel point in the scene image according to the coordinates and the gray scale of the scene pixel point in the scene image and the coordinates and the gray scale of the corresponding reference pixel point in the reference image.

S120: and (3) constructing a scene image block E with the size of M multiplied by N scene pixel points by taking a single scene pixel point in the scene image as a center. And setting the sub-pixel parallax value of each scene pixel point in the scene image block to meet the formula (1).

d_i＝A₀X_i+B₀Y_i+C₀Formula (1)

Wherein d is_iIs the sub-pixel disparity value, x, of the ith scene pixel point_iIs the x component, y, of the coordinates of the ith scene pixel point_iIs the y component of the ith scene pixel point coordinate, A₀、B₀、C₀Is a polynomial coefficient.

And generating a transfinite equation set containing M multiplied by N equations according to the formula (1) and the sub-pixel parallax value and the coordinates of each scene pixel point in the scene image block E, and acquiring a polynomial coefficient corresponding to the scene image block according to the transfinite equation set.

S130: and calculating the gray gradient of each scene pixel point in the scene image block, and taking a plurality of scene pixel points exceeding a gradient threshold value as effective scene pixel points.

S140: and acquiring coordinates of the same-name sub-pixel points in the reference image block corresponding to the effective scene pixel points according to the coordinates and the polynomial coefficients of the effective scene pixel points, and acquiring gray values of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and the gray values of two adjacent pixel points at the left and right of the same-name sub-pixel points.

S150: and acquiring a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point according to the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point.

S160: and acquiring a corresponding target polynomial coefficient when the normalized cross-correlation function value is maximum according to the relation between the normalized cross-correlation function value and the polynomial coefficient.

S170: and acquiring the target parallax of the scene pixel points according to the formula (1), the target polynomial coefficients and the coordinates of the single scene pixel points.

S180: and acquiring the depth value corresponding to the scene pixel point according to the target parallax of the scene pixel point, the depth of the reference image, the length of the system baseline and the focal length of the camera.

S190: repeating S120 to S180 for all scene pixel points in the scene image, and obtaining the corresponding depth value of each scene pixel point to generate the depth image corresponding to the scene image.

The present invention also provides a depth image generating apparatus, including:

the sub-pixel parallax value obtaining module is configured to obtain a scene image and a pre-stored reference image, and calculate a sub-pixel parallax value of each scene pixel point in the scene image according to the coordinates and the gray scale of the scene pixel point in the scene image and the coordinates and the gray scale of the corresponding reference pixel point in the reference image.

The system comprises a polynomial coefficient acquisition module, a scene image processing module and a scene image processing module, wherein the polynomial coefficient acquisition module is configured to construct a scene image block with the size of M multiplied by N scene pixels by taking a single scene pixel in a scene image as a center. And setting the sub-pixel parallax value of each scene pixel point in the scene image block E to meet the formula (1).

d_i＝A₀X_i+B₀Y_i+C₀Formula (1)

And the effective scene pixel point acquisition module is configured to calculate the gray gradient of each scene pixel point in the scene image block, and take a plurality of scene pixel points exceeding a gradient threshold value as effective scene pixel points.

And the gray value acquisition module is configured to acquire coordinates of the same-name sub-pixel points in the reference image block corresponding to the effective scene pixel points according to the coordinates and the polynomial coefficients of the effective scene pixel points, and acquire gray values of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and gray values of two adjacent whole pixel points on the left and right of the same-name sub-pixel points.

And the function value acquisition module is configured to acquire a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point according to the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point.

And a target polynomial coefficient acquisition module configured to acquire a corresponding target polynomial coefficient when the normalized cross-correlation function value is maximum, according to a relationship between the normalized cross-correlation function value and the polynomial coefficient.

And the target parallax obtaining module is configured to obtain the target parallax of the scene pixel according to the formula (1), the target polynomial coefficient and the coordinate of the single scene pixel.

And the depth value acquisition module is configured to acquire a depth value corresponding to the scene pixel point according to the target parallax of the scene pixel point, the depth of the reference image, the system baseline length and the camera focal length.

And the depth image generation module is configured to acquire a depth value corresponding to each scene pixel point to generate a depth image corresponding to the scene image.

Compared with the prior art, the invention has the technical effects that: generating a transfinite equation set containing M multiplied by N equations according to the formula (1) and the sub-pixel parallax value and the coordinates of each scene pixel point in the scene image block E, and acquiring a polynomial coefficient corresponding to the scene image block according to the transfinite equation set.

The key point of the method is that a polynomial coefficient is optimized through steps S130 to S160 to obtain a target polynomial coefficient, and the target parallax of a scene pixel point is obtained according to the optimized target polynomial coefficient, a formula (1) and the coordinate of a single scene pixel point. So that the target disparity is closer to the actual disparity value. The depth image corresponding to the scene image is generated through S180 and S190, so that the accuracy of the depth estimation result of each pixel point in the scene image can be improved for the area with large local depth change.

Drawings

Fig. 1 is a schematic flow chart of a depth image generation method according to the present invention.

Fig. 2 is a schematic diagram of mapping of an effective scene pixel point corresponding to a homonymous point in the present invention.

Fig. 3 is a schematic diagram of obtaining a sub-pixel parallax value of a scene pixel point according to the present invention.

Fig. 4 is a schematic flow chart of obtaining a sub-pixel parallax value of a scene pixel point in the present invention.

Fig. 5 is a schematic structural diagram of the depth image generating apparatus according to the present invention.

Detailed Description

The following describes embodiments of the present invention with reference to the drawings.

As shown in fig. 1, an embodiment of the present invention is a depth image generating method, including the following steps:

Fig. 2 shows a scene image 1 and a pre-stored reference image 2. The scene image 1 is an image of a scene actually captured by the camera. Reference image 2 is an image that is projected onto the scene surface with a speckle projection device to form a fine speckle pattern, which is then captured by a camera.

S120: and (3) constructing a scene image block with the size of M multiplied by N scene pixel points by taking a single scene pixel point in the scene image as a center. Setting the sub-pixel parallax value of each scene pixel point in the scene image block to satisfy the formula (1) d_i＝A₀X_i+B₀Y_i+C₀。

How to obtain the polynomial coefficients corresponding to the scene image block E is specifically described below with reference to fig. 2.

And (3) constructing a scene image block E with the size of M multiplied by N scene pixel points by taking a single scene pixel point p in the scene image 1 as a center. Setting the sub-pixel parallax value of each scene pixel point in the scene image block E to satisfy the formula (1) d_i＝A₀X_i+B₀Y_i+C₀。

Specifically, the x component and the y component of the coordinates of each scene pixel point in the scene image block E, and the sub-pixel parallax value calculated in S110 are substituted into d_i＝Ax_i+By_i+ C, generating an over-limit equation set containing M × N equations, wherein the over-limit equation set is as follows:

a, B and C are unknown numbers, the over-limit equation set is solved by a least square method to obtain initial values A of A, B and C corresponding to the scene image block E₀，B₀，C₀，A₀、B₀And C₀I.e. polynomial coefficients.

The present invention obtains a target polynomial coefficient for an optimized polynomial coefficient through the following steps S130 to S160, and steps S130 to S160 are key steps of the present invention. And acquiring the target parallax of the scene pixel points according to the optimized target polynomial coefficients, the formula (1) and the coordinates of the single scene pixel points. The target parallax thus acquired is closer to the actual parallax value.

The following describes the steps for optimizing the polynomial coefficients.

That is, for the scene image block E, the gray gradient of each scene pixel point in the scene image block E may be calculated by the sobel operator, for example, if the scene image block E includes i scene pixel points, the scene pixel point P is calculated₁To P_iThe gray scale gradient of (a).

And setting a gradient threshold value, and taking a plurality of scene pixel points exceeding the gradient threshold value as effective scene pixel points. In other words, the scene pixel point P is eliminated₁To P_iAnd the quality of the pixel points with the middle gray gradient lower than the gradient threshold is improved in the following steps.

The Sobel operator is an important processing method in the field of computer vision. Mainly for obtaining the first order gradient of a digital image. The Sobel operator is used for obtaining the gray gradient of each scene pixel point by adding the weight difference of the gray values of the four upper, lower, left and right neighborhoods of each scene pixel point in the image.

S140: and acquiring coordinates of the same-name sub-pixel points in the reference image block corresponding to the effective scene pixel points according to the coordinates and the polynomial coefficients of the effective scene pixel points, and acquiring the gray value of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and the gray values of two adjacent whole pixel points at the left and right of the same-name sub-pixel points.

In FIG. 2, a pixel point P is shown according to the effective scene_iObtaining effective scene pixel point P by the coordinate and polynomial coefficient_iCorresponding to the same-name sub-pixel point q in the reference image block_iAccording to the co-name sub-pixel point q_iAbscissa and synonym ofPixel point q_iLeft and right adjacent two whole pixel points q_i-1And q is_i+1Obtaining the same-name sub-pixel point q_iThe gray value of (a).

Effective scene pixel point P in scene image 1_iThe mapping of the same name point of (2) in the reference image. And a set of corresponding sub-pixel points with the same name in each scene pixel point in the scene image block forms a mapping area F of the scene image block E. Namely, the scene image blocks E in the scene image 1 are mapped into the mapping area F in the reference image 2.

The homonymous point is an image point of the same point on different images of the actual object. For example, a scene pixel p and a corresponding sub-pixel q are formed on the scene image and the reference image respectively at the same point on the ground.

Specifically, in S140, obtaining coordinates of a corresponding sub-pixel point of the same name in the reference image block of the effective scene pixel point according to the coordinates and the polynomial coefficient of the effective scene pixel point includes:

valid scene pixel p_iIs set as p_i(x_i,y_i) Effective scene pixel point p_iThe coordinate of the sub-pixel point with the same name in the corresponding reference image block is q_i(x_i-d_i，y_i) Wherein d is_iObtained by calculation of formula (1).

Specifically, in S140, obtaining the gray value of the corresponding sub-pixel point according to the abscissa of the corresponding sub-pixel point and the gray values of two adjacent pixel points of the corresponding sub-pixel point includes:

homonymous sub-pixel point q_iGray value of g'_iObtained by the formula (3).

Where round () represents the rounding function, g'_iLAnd g'_iRRespectively being homonymous sub-pixel points q_iThe gray value of the left pixel point and the gray value of the right pixel point.

Specifically, the normalized cross-correlation function value is z, and z is set to 1-c, where c is calculated by formula (4).

Wherein, g_iIs the ith effective scene pixel point p_iGray scale of g'_iIs p_iIs the same as the name point q_iGray scale of g_VIs n effective scene pixel points p_iGrayscale mean value of (i-1 … n), g'_VIs n points of identity q_i(i-1 … n).

Specifically, with A₀，B₀，C₀And calculating a partial derivative matrix and a Hessian matrix of the C pairs A, B and C by an incremental method as a starting point.

Partial derivative matrix of

Hessian matrix is

And (4) gradually calculating the increments delta A, delta B and delta C of the A, B and C by a Newton iteration method until the increments are smaller than a set increment threshold value, and obtaining the solution of target polynomial coefficients A, B and C.

Specifically, the solution formula d of the target polynomial coefficients A, B and C is taken to be Ax + By + C, and the target parallax d of the single scene pixel point p in the center of the scene image block E is obtained. Wherein, x and y are respectively an x component and a y component of a p coordinate of a single scene pixel point.

Specifically, the depth Z corresponding to the scene pixel point p is calculated by formula (5).

Where d is the target parallax, Z₀And b is the depth corresponding to the reference image, b is the length of the system baseline, and f is the focal length of the camera.

In the invention, the polynomial coefficient is obtained through S110 and S120, the parallax of scene pixel points in the scene image is not calculated by using the polynomial coefficient, but the polynomial coefficient is optimized through S130 to S160 to obtain the target polynomial coefficient after the polynomial coefficient is optimized, and the target parallax of the scene pixel points is calculated through the formula (1), the target polynomial coefficient and the coordinates of single scene pixel points, so that for the inclined plane and other areas with large local depth change in the scene to be measured, a more accurate depth estimation result of the scene pixel points can be obtained, and meanwhile, the calculation speed is higher.

As shown in fig. 3 and 4, based on the above embodiments of the present invention, S110 in the present invention specifically includes the following steps.

S111: based on an initial scene pixel point in a scene image, an initial scene image block with a set size is obtained by taking the initial scene pixel point as a center.

Specifically, for any initial scene pixel point p (x, y) in the scene image 1, an M × N initial scene image block of a set size is selected with the initial scene pixel point p as a center.

S112: and acquiring a reference pixel point with the same position as the scene pixel point in the reference image. And acquiring an initial reference image block with a set size in the reference image by taking the reference pixel point as the center.

In fig. 3, a reference pixel point p in the same position as the scene pixel point p in the reference image 2 is obtained₁'. In reference image 2 with reference pixel point p₁' obtaining an initial reference image block B of a set size for the center₁'。

A reference pixel point p' (x, y) having the same position as p (x, y) is acquired in the reference image 2. Selecting a search region G [ x-Delta 1-x + Delta 2, y ] near the reference pixel point p' (x, y)]For each pixel point p 'in region A'_iSelecting image block B 'of scene pixel points with size of M multiplied by N by taking the pixel point as center'_i. Respectively calculate B'_i(i ═ 1, … r, where r is the number of pixels in the search area G) and the scene image block E, i.e. cost, the cost value is preferably calculated by using the following formula:

s113: and setting a search area based on the positions of reference pixel points of the reference image, and selecting a plurality of reference image blocks by taking each pixel point in the search area as a center.

S114: calculating the difference value between the initial scene image block and each reference image block, taking the reference image block with the minimum difference value as a matching reference image block, and taking a reference pixel point in the center of the matching image block as a matching pixel point.

For example, the difference values between the initial scene image block E and the reference image blocks B1', B2' and B3 'in fig. 3 are the smallest reference image block B1' as the matching reference image block.

S115: and taking the horizontal coordinate difference value of the matching pixel point and the scene pixel point as the initial parallax value of the scene pixel point.

S116: and acquiring a difference value corresponding to two reference pixel points adjacent to the left and right of the matching pixel point.

By formula (2)

The corresponding sub-pixel disparity values.

And d is the sub-pixel parallax corresponding to the initial pixel point. The initial disparity value corresponding to the initial pixel point is d^I. The difference value of the corresponding matched pixel points of the initial pixel points in the reference image is c_a. The difference value of the left adjacent point of the matched pixel point is c₁. The difference value of the adjacent points on the right side of the matched pixel point is c₂。

In the invention, the sub-pixel parallax value of each scene pixel point in the scene image is obtained by a block matching method, so that the accuracy of the sub-pixel parallax value of each scene pixel point is improved.

Referring to fig. 5, another embodiment of the present invention further provides a depth image generating apparatus, including:

the sub-pixel disparity value obtaining module 202 is configured to obtain a scene image and a pre-stored reference image, and calculate a sub-pixel disparity value of each scene pixel point in the scene image according to the coordinates and the gray scale of the scene pixel point in the scene image and the coordinates and the gray scale of the corresponding reference pixel point in the reference image.

A polynomial coefficient obtaining module 204 configured to construct a scene image block E with a size of M × N scene pixels centered on a single scene pixel in the scene image. And setting the sub-pixel parallax value of each scene pixel point in the scene image block E to meet the formula (1).

d_i＝A₀X_i+B₀Y_i+C₀Formula (1)

An effective scene pixel point obtaining module 206, configured to calculate a gray gradient of each scene pixel point in the scene image block, and take a plurality of scene pixel points exceeding a gradient threshold as effective scene pixel points.

The gray value obtaining module 208 is configured to obtain coordinates of the same-name sub-pixel points in the reference image block corresponding to the effective scene pixel points according to the coordinates and the polynomial coefficients of the effective scene pixel points, and obtain gray values of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and gray values of two whole pixels adjacent to the same-name sub-pixel points from left to right.

The function value obtaining module 210 is configured to obtain a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point according to the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point.

And a target polynomial coefficient obtaining module 212 configured to obtain a corresponding target polynomial coefficient when the normalized cross-correlation function value is maximum according to the relationship between the normalized cross-correlation function value and the polynomial coefficient.

A target disparity obtaining module 214 configured to obtain a target disparity of a scene pixel according to formula (1), a target polynomial coefficient, and a coordinate of a single scene pixel.

A depth value obtaining module 216 configured to obtain a depth value corresponding to a scene pixel point according to the target parallax of the scene pixel point, the depth of the reference image, the system baseline length, and the camera focal length.

A depth image generation module 218 configured to obtain a depth value corresponding to each scene pixel point to generate a depth image corresponding to the scene image.

The depth image generation apparatus of this embodiment and the depth image generation method are the same inventive concept, and refer to the specific description of the depth image generation method, which is not described herein again.

Claims

1. A depth image generation method, comprising:

s110: acquiring a scene image and a pre-stored reference image, and calculating a sub-pixel parallax value of each scene pixel point in the scene image according to the coordinate and the gray level of the scene pixel point in the scene image and the coordinate and the gray level of a corresponding reference pixel point in the reference image;

s120: constructing a scene image block with the size of M multiplied by N scene pixels by taking a single scene pixel in the scene image as a center; setting the sub-pixel parallax value of each scene pixel point in the scene image block to meet a formula (1);

d_i＝A₀X_i+B₀Y_i+C₀formula (1)

Wherein d is_iIs the sub-pixel disparity value, x, of the ith scene pixel point_iIs the x component, y, of the coordinates of the ith scene pixel point_iIs the y component of the ith scene pixel point coordinate, A₀、B₀、C₀Is a polynomial coefficient;

generating an over-limit equation set containing M multiplied by N equations according to the formula (1) and the sub-pixel parallax value and the coordinates of each scene pixel point in the scene image block, and acquiring a polynomial coefficient corresponding to the scene image block according to the over-limit equation set;

s130: calculating the gray gradient of each scene pixel in the scene image block, and taking a plurality of scene pixels exceeding a gradient threshold as effective scene pixels;

s140: acquiring coordinates of the effective scene pixel points corresponding to the same-name sub-pixel points in the reference image block according to the coordinates and polynomial coefficients of the effective scene pixel points, and acquiring gray values of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and the gray values of two adjacent whole pixel points on the left and right of the same-name sub-pixel points;

s150: acquiring a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point according to the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point;

s160: acquiring a corresponding target polynomial coefficient when the normalized cross-correlation function value is maximum according to the relation between the normalized cross-correlation function value and the polynomial coefficient;

s170: acquiring the target parallax of the scene pixel points according to the formula (1), the target polynomial coefficient and the coordinates of the single scene pixel points;

s180: acquiring a depth value corresponding to the scene pixel point according to the target parallax of the scene pixel point, the depth of the reference image, the length of a system base line and the focal length of a camera;

2. The depth image generation method according to claim 1, wherein the S110 includes:

s111: based on initial scene pixel points in the scene image, acquiring an initial scene image block with a set size by taking the initial scene pixel points as a center;

s112: acquiring reference pixel points with the same positions as the scene pixel points in the reference image; acquiring an initial reference image block with a set size by taking a reference pixel point as a center in a reference image;

s113: setting a search area based on the positions of reference pixel points of the reference image, and selecting a plurality of reference image blocks by taking each pixel point in the search area as a center;

s114: calculating difference values of the initial scene image blocks and each reference image block, taking the reference image block with the minimum difference value as a matching reference image block, and taking a reference pixel point in the center of the matching image block as a matching pixel point;

s115: taking the horizontal coordinate difference value of the matching pixel point and the scene pixel point as the initial parallax value of the scene pixel point;

s116: acquiring a difference value corresponding to two reference pixel points adjacent to the left and right of the matching pixel point;

calculating a sub-pixel parallax value corresponding to an initial scene pixel point through a formula (2);

wherein d is the sub-pixel parallax corresponding to the initial pixel point; the initial parallax value corresponding to the initial pixel point is d^I(ii) a The difference value of the corresponding matched pixel points of the initial pixel points in the reference image is c_a(ii) a The difference value of the left adjacent point of the matched pixel point is c₁(ii) a The difference value of the adjacent points on the right side of the matched pixel point is c₂。

3. The depth image generation method according to claim 1, wherein the S130 includes:

and calculating the gray gradient of each scene pixel point in the scene image block through a Sobel operator.

4. The method of claim 1, wherein the obtaining, in S140, coordinates of a pixel point of the effective scene corresponding to a same-name sub-pixel point in the reference image block according to the coordinates and polynomial coefficients of the pixel point of the effective scene comprises:

setting the coordinate of the effective scene pixel point as p_i(x_i,y_i) The effective scene pixel point p_iThe coordinate of the sub-pixel point corresponding to the same name in the reference image block is q_i(x_i-d_i，y_i) Wherein d is_iObtained by calculation of formula (1);

in the S140, obtaining the gray value of the corresponding sub-pixel according to the abscissa of the corresponding sub-pixel and the gray values of the two adjacent pixels on the left and right of the corresponding sub-pixel, includes:

the homonymous sub-pixel point q_iGray value of g'_iObtained by formula (3);

g′_i＝(round(x_i-d_i)+1-(x_i-d_i))g′_iL+((x_i-d_i)-round(x_i-d_i))g′_iRformula (3)

5. The depth image generation method according to claim 1, wherein the S150 includes:

the normalized cross-correlation function value is z, and z is set to be 1-c, wherein c is obtained by calculation of formula (4);

6. The depth image generation method according to claim 5, wherein the S160 includes:

with A₀、B₀And C₀Calculating partial derivative matrixes and Hessian matrixes of the pairs of C A, B and C by an incremental method as a starting point;

the partial derivative matrix is

The Hessian matrix is

Gradually calculating the increments delta A, delta B and delta C of A, B and C by a Newton iteration method until the increments are smaller than a set increment threshold value, and obtaining the solution of target polynomial coefficients A, B and C; the target polynomial is: d is Ax + By + C.

7. The depth image generation method according to claim 6, wherein the S170 includes:

bringing the solution d of the target polynomial coefficient A, B and C to Ax + By + C to obtain the target parallax d of a single scene pixel point p in the center of the image block; wherein x and y are the x component and y component of the image coordinate of a single scene pixel point p, respectively.

8. The depth image generation method according to claim 7, wherein the depth Z corresponding to the scene pixel point is calculated by formula (5);

wherein Z is₀And b is the depth corresponding to the reference image, b is the length of the system baseline, and f is the focal length of the camera.

9. A depth image generation apparatus, characterized by comprising:

the sub-pixel parallax value acquisition module is configured to acquire a scene image and a pre-stored reference image, and calculate the sub-pixel parallax value of each scene pixel point in the scene image according to the coordinates and the gray scale of the scene pixel point in the scene image and the coordinates and the gray scale of the corresponding reference pixel point in the reference image;

a polynomial coefficient obtaining module configured to construct a scene image block E of size M × N scene pixels centered on a single scene pixel in the scene image; setting the sub-pixel parallax value of each scene pixel point in the scene image block E to meet a formula (1);

d_i＝A₀X_i+B₀Y_i+C₀formula (1)

generating an over-limit equation set containing M multiplied by N equations according to the formula (1) and the sub-pixel parallax value and the coordinates of each scene pixel point in the scene image block E, and acquiring a polynomial coefficient corresponding to the scene image block according to the over-limit equation set;

an effective scene pixel point obtaining module configured to calculate a gray gradient of each scene pixel point in the scene image block, and take a plurality of scene pixel points exceeding a gradient threshold as effective scene pixel points;

the gray value acquisition module is configured to acquire coordinates of the pixel points of the effective scene corresponding to the same-name sub-pixel points in the reference image block according to the coordinates and polynomial coefficients of the pixel points of the effective scene, and acquire gray values of the same-name sub-pixel points according to the abscissa of the same-name sub-pixel points and gray values of two whole pixels adjacent to the left and right of the same-name sub-pixel points;

a function value obtaining module configured to obtain a normalized cross-correlation function value between the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point according to the gray value of the effective scene pixel point and the gray value of the same-name sub-pixel point;

a target polynomial coefficient obtaining module configured to obtain a target polynomial coefficient corresponding to the maximum normalized cross-correlation function value according to a relationship between the normalized cross-correlation function value and the polynomial coefficient;

a target parallax obtaining module configured to obtain a target parallax of the scene pixel according to the formula (1), a target polynomial coefficient and a coordinate of the single scene pixel;

a depth value obtaining module configured to obtain a depth value corresponding to the scene pixel point according to a target parallax of the scene pixel point, a depth of a reference image, a system baseline length, and a camera focal length;

a depth image generation module configured to obtain a depth value corresponding to each scene pixel point to generate a depth image corresponding to the scene image.