CN111295667B

CN111295667B - Method for stereo matching of images and auxiliary driving device

Info

Publication number: CN111295667B
Application number: CN201980005230.8A
Authority: CN
Inventors: 周啸林
Original assignee: SZ DJI Technology Co Ltd
Current assignee: Shenzhen Zhuoyu Technology Co ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2024-05-14
Anticipated expiration: 2039-04-24
Also published as: CN111295667A; WO2020215257A1

Abstract

A method for stereoscopic image matching and a driving assistance apparatus, the method comprising: acquiring a plurality of views, wherein the views are two-dimensional views of the same scene; determining a surface cost of each corresponding pixel point in the multiple views under the preset space plane based on the preset space plane; from the cost of the face, a disparity map of the multiple views in the scene is determined. According to the method, the phenomenon that the noise resistance is poor due to the fact that the disparity map is calculated based on the similarity cost of the pixel points can be avoided, and the stereo matching precision and the noise resistance of the image can be improved.

Description

Method for stereo matching of images and auxiliary driving device

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office patent files or records.

Technical Field

The application relates to the field of image processing, in particular to a method for three-dimensional image matching and an auxiliary driving device.

Background

In recent years, due to the rapid development of the internet of vehicles (vehicle to everything, V2X), the concept of intelligent automobiles has been proposed. Wherein, intelligent automobile can realize functions such as intelligent driving. It should be appreciated that in order to implement functions such as intelligent driving of a smart car, the smart car needs to perceive the surrounding environment. For example, an environment detection sensor is mounted on the smart car, and provides surrounding environment information for the smart car, specifically, the environment information includes dense depth information of a three-dimensional (three dimensional, 3D) scene, and the like. Dense depth information of the 3D scene can be used for 3D reconstruction, 3D travelable region detection, obstacle detection and following, 3D lane line detection, and the like.

Currently, the mainstream three-dimensional dense depth detection mode applied in the intelligent automobile environment sensing field comprises a three-dimensional laser radar (3D Lidar) positioning technology and a stereoscopic vision technology. The stereoscopic vision technology has the advantages of low price and dense point cloud, and becomes a research hotspot in the field of intelligent automobile environment perception. The manner of capturing two images at different angles through two cameras to obtain parallax is generally called binocular stereoscopic vision, and is mainly divided into three steps: firstly, parameter calibration is carried out on a camera system to obtain an inner parameter and an outer parameter of the camera, then a matching relation between pixel points in two images, namely visual difference, is searched, and finally, three-dimensional information of a scene is restored according to the matching relation and the inner parameter and the outer parameter of the camera. The core part is a process of searching the matching relation of the pixel points, which is called stereo matching.

Some existing stereo matching methods are usually only aimed at common binocular stereo vision image processing, and in some specific scenes, a good effect cannot be obtained, and the matching result is large in noise. For example, in stereo matching of traffic scenes, a good continuity cannot be obtained for the characteristics of the road surface, which may lead to inaccuracy in the subsequent further calculation. Therefore, it is necessary to provide a stereo matching method capable of obtaining a better effect for a specific scene.

Disclosure of Invention

The application provides a method for three-dimensional matching of images and an auxiliary driving device, which can improve the three-dimensional matching precision and noise resistance of the images.

In a first aspect, a method for stereo matching images is provided, including:

acquiring a plurality of views, wherein the views are two-dimensional views of the same scene;

determining a surface cost of each corresponding pixel point in the multiple views under the preset space plane based on the preset space plane;

And determining a disparity map of the multiple views under the scene according to the surface cost.

The method for stereo matching of images does not determine the parallax images of multiple views based on the similarity cost of pixel levels, but determines the surface cost of each pixel under the preset space plane, and then determines the parallax images of the multiple views according to the surface cost. The concept of the face cost is provided, and the stereo matching precision and the noise resistance of the image can be improved.

In a second aspect, there is provided a driving assistance apparatus including:

At least one memory for storing computer-executable instructions;

At least one processor configured to, individually or collectively: accessing the at least one memory and executing the computer-executable instructions to perform the operations of:

In a third aspect, there is provided a computer-readable storage medium having instructions stored thereon that, when run on a computer, cause the computer to perform the method of stereo matching images of the first aspect.

In a fourth aspect, there is provided a vehicle including the driving assist apparatus of the second aspect.

Drawings

FIG. 1 is a diagram of a system architecture to which embodiments of the present application are applicable.

Fig. 2 is a schematic flow chart of a stereo matching method 100 of one embodiment of the application.

Fig. 3 (a) - (d) are schematic diagrams of scanning pixel points according to an embodiment of the present application.

Fig. 4 is a schematic flow chart of calculating a face cost according to an embodiment of the present application.

Fig. 5 is a schematic block diagram of a driving assistance apparatus 50 provided in an embodiment of the application.

Fig. 6 is a schematic block diagram of a vehicle provided by an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

First, a scenario in which the stereo matching method provided by the embodiment of the present application can be applied is described with reference to fig. 1. FIG. 1 is a diagram of a system architecture to which embodiments of the present application are applicable. Including view acquisition devices (camera #1 and camera #2 as shown in fig. 1) and an internal view processing chip (internal chip structure is shown in fig. 1).

Specifically, fig. 1 shows a binocular stereo vision (Binocular Stereo Vision) three-dimensional measurement. The binocular stereo vision is an important form of machine vision, and is a method for acquiring three-dimensional geometric information of an object by calculating position deviation between corresponding points of images based on parallax principle and by utilizing imaging equipment to acquire two images of the object to be measured from different positions.

From fig. 1, the principle of head-up binocular stereoscopic imaging can be derived. For example, the distance between the projection center (OL) of camera #1 and the projection center (OR) of camera #2 is referred to as a base line distance (L). Assume that camera #1 and camera #2 view the same feature point (P) of an object at the same time, and the three-dimensional coordinates of the feature point are P (x, y, z). The camera #1 and the camera #2 acquire images of the feature point P, respectively, wherein the image of the feature point P acquired by the camera #1 is Pleft coordinates (xleft, yleft); the image of the feature point P acquired by the camera #2 is Pright coordinates (xright, yright). Further, assuming Pleft and Pright are in the same plane, the ordinate of Pleft and Pright is the same, i.e., yleft = yright =y. The following relationship can be obtained from the triangle geometry:

where f is the focal length of camera #1 and camera #2, the same camera is typically used for the left and right eyes of binocular stereo vision, so here camera #1 and camera #2 are equal.

Binocular stereoscopic vision refers to a visual processing method for acquiring three-dimensional information of a scene by fusing two images of the same scene obtained by two cameras. Binocular stereo vision matches corresponding pixels of the same spatial physical point in different images, and the difference between corresponding pixels of the same spatial physical point in different images is called parallax (disparity). The parallax between P _left and P _right in fig. 1 is: disparity=x _left-x_right. Specifically, three-dimensional coordinates (x 1, y1, z 1) of the feature point P in the coordinate system constituted by the camera #1 and the camera #2 can be calculated therefrom:

The binocular stereo vision is described in detail above in connection with fig. 1, and specifically, after the binocular stereo images (P _left and P _right as shown in fig. 1) are acquired by the binocular stereo vision method, a process of finding the degree of correlation of the binocular stereo images is called stereo matching. The stereo matching method provided by the embodiment of the application can be applied to determining the correlation degree of binocular stereo images shown in fig. 1.

It should be understood that fig. 1 is only a view illustrating a scenario where the stereo matching method provided by the embodiment of the present application can be applied, and the protection scope of the present application is not limited in any way. The stereo matching method provided by the embodiment of the application can be applied to a process of determining the correlation degree of a plurality of stereo images, for example, image matching by using more cameras.

In order to facilitate understanding of the stereo matching method provided in the embodiments of the present application, first, several basic concepts related to the embodiments of the present application are briefly described:

1. And (5) stereo matching.

Stereo matching can be regarded as a process of searching for the correlation degree of two groups of data, and three-dimensional information of a scene is restored based on a plurality of two-dimensional images obtained from the same scene.

2. Hamming distance.

Hamming distance is used in data transmission error control coding and is a concept that represents the number of corresponding bits of two (same length) words that differ, and L (x, y) represents the hamming distance between the two words x, y. And performing exclusive OR operation on the two character strings, and counting the number of which is 1, wherein the number is the Hamming distance.

3. Robustness.

Critical to system survival in abnormal and dangerous situations. For example, in the case of input errors, disk failures, network overloads, or intentional attacks, computer software can be crashed or crashed, or the robustness of the software. By "robustness" is meant the characteristic of the control system to maintain certain other properties under perturbation of certain (structural, size) parameters. The robustness referred to in the present application refers to noise immunity.

4. Color distance.

Refers to Color difference (Color difference), which is a concern in colorimetry. The color difference can be calculated simply by Euclidean distance in the color space, or can be calculated by using a more complex and uniform human perception formula.

The scene to which the stereo matching method provided by the embodiment of the present application can be applied and the related basic concept related to the present application are briefly described above, and the stereo matching method provided by the present application is described in detail below with reference to fig. 2 to 4.

The embodiment of the application provides a stereo matching method 100. Fig. 2 is a schematic flow chart of a stereo matching method 100 of one embodiment of the application. As shown in fig. 1, the method 100 includes:

s110, acquiring a plurality of views.

The acquired multiple views are multiple two-dimensional views of the same scene. The multiple views may also be referred to as multi-view based on the binocular views described above in fig. 1. Specifically, the plurality of views acquired in the present application are corrected plurality of views. And how to correct the plurality of views acquired by the plurality of view acquisition devices is not limited in the present application. For example, the multiple view correction method involved in the present application includes:

And simultaneously acquiring multiple Red Green Blue (RGB) color images of the same scene by using the multi-camera, calibrating cameras of the multi-camera respectively, obtaining internal and external parameters of the plurality of cameras corresponding to the multi-camera, respectively carrying out view correction on the plurality of views according to the obtained internal and external parameters of the plurality of cameras, and removing fisheye effects and position error influences of the plurality of cameras to obtain corrected plurality of views.

It should be understood that the above-described multiple view correction method is only one multiple view correction method illustrated for the sake of understanding the present application, and is not limited to any particular method, and the embodiment of the present application is not limited to how to correct multiple views, and any multiple view correction method may be used for performing view correction, which is not described herein.

Further, the multiple views acquired by what multiple view acquiring device is used in the present application is not limited, and may be any device capable of acquiring multiple views, for example, the above-mentioned multiple-view camera, where the multiple-view camera may be a multiple-path video capturing device that is composed of multiple cameras of the same specification.

As a possible implementation, the multiple views may be two views, i.e. the binocular views described above.

Specifically, the method for stereo matching of images provided by the embodiment of the application does not determine the disparity maps of multiple views based on the similarity cost of pixel points, but determines the surface cost of each pixel point under the preset space plane, and then determines the disparity maps of multiple views according to the surface cost. In the application, the surface cost of each pixel point under the preset space plane can be called global surface cost.

Further, the global area cost in the embodiment of the present application is calculated based on the similarity cost of the pixel points, that is, the method flowchart shown in fig. 2 further includes S120, where the similarity cost of each pixel point in the multiple views is determined.

Specifically, the similarity cost of each pixel point in the multiple views is not limited in the present application as to what algorithm is used. Any algorithm for calculating the similarity cost of the pixel points can be used in the existing stereo matching algorithm.

For the sake of completeness of the solution, a solution for calculating the similarity cost of each pixel in multiple views is provided in the embodiments of the present application, and it should be understood that the solution for calculating the similarity cost of the pixel is only provided by way of example, and the protection scope of the present application is not limited, and the method for determining the similarity cost of the pixel in the stereo image matching method provided by the present application is not limited to this solution.

The calculation scheme of the similarity cost of the pixel point comprises the following steps: after the corrected plurality of views are acquired, feature points in each of the plurality of views are determined. The feature points in the view consist of preset key points and feature descriptors.

And determining similarity cost between the corresponding feature points in the multiple views.

For ease of understanding, the concept of feature points is briefly described as follows:

The concept of feature points is proposed in the existing image matching algorithm to efficiently and accurately match the same object in two images with different viewing angles, which is also the first step in many computer vision applications. Although the images exist in the form of gray matrix in the computer, the same object in the two images cannot be accurately found by using the gray of the images. This is because the gray scale is affected by illumination and the gray scale value of the same object will change as the image viewing angle changes. Therefore, it is desirable to find a feature that can be moved and rotated (change in view angle) in the camera, yet remain unchanged, and use these unchanged features to find the same object in the images at different view angles.

In order to enable better image matching, it is necessary to select a region (local feature map) having a representative property in the image and determine feature points; or directly determining the feature points. For example corner points, edges in the image and pixel points in some feature areas. The image is easiest to identify the corner points, that is, the recognition degree of the corner points is the highest, so that in many computer vision processes, the corner points are extracted as characteristic points to match the image. But the mere corner points do not meet our needs well. For example, the camera gets a corner from far, but may not be near; or the corner point changes when the camera rotates. For this reason, researchers in computer vision have devised a number of more stable feature points that do not change with camera movement, rotation, or changes in illumination.

Specifically, the feature point of one view is composed of two parts: key points (Keypoint) and descriptors (descriptors). The key points refer to the positions of the characteristic points in the image, and some key points also have direction and scale information; the descriptor is usually a vector, and describes the information of pixels around the key point in an artificial design manner. Typically descriptors are designed with similar descriptors for features that are similar in appearance. Therefore, at the time of matching, as long as the descriptors of two feature points are close in distance in the vector space, they can be considered as the same feature point.

The descriptor of a feature is typically a well-designed vector that describes the information of the keypoints and their surrounding pixels. Descriptors generally have the following properties:

Invariance: the finger features do not change as the image is rotated in magnification or magnification.

Robustness: insensitive to noise, light or some other small deformation

Distinguishability: each feature descriptor is unique and exclusive, minimizing similarity to each other. The feature descriptors are usually a vector, and the distance between two feature descriptors can reflect the similarity between two feature points, i.e. whether the two feature points are identical. Depending on the descriptor, different distance measures may be chosen. If a floating point type descriptor, its Euclidean distance can be used; the hamming distance for binary descriptors can be used.

Specifically, the algorithm for determining the corresponding feature points in the multiple views may be a census census algorithm, and the similarity cost of the pixel points may be a hamming distance between the corresponding feature points. It should be understood that the above-mentioned census algorithm and taking the hamming distance between the corresponding feature points as the pixel similarity cost only exemplifies that the similarity cost of each pixel point in the multiple views can be obtained by the method in the present application, but the matching algorithm in the present application is not limited to only obtaining the corresponding feature points in the multiple views according to the census census algorithm and taking the hamming distance between the corresponding feature points as the similarity cost of the pixel points. The application is not limited to the flow of calculating the similarity cost of each pixel point in the multiple views, and any method for calculating the similarity cost of each pixel point in the multiple views in the existing stereo matching algorithm can be adopted, which is not illustrated here.

After the similarity cost of each pixel point in the multiple views is calculated, the surface cost of each pixel point can be determined based on a preset space plane according to the similarity cost. I.e. executing S130, determining the cost of the pixel under the preset spatial plane.

For any one pixel point in each corresponding pixel point in the multiple views, determining a preset space plane, and calculating plane parameters of the preset space plane. Specifically, based on the preset spatial plane, the surface cost of each pixel under the preset spatial plane is determined according to the similarity cost of the pixels calculated in S120. In some embodiments, the predetermined spatial plane is at least two planes, including a parallel plane and a perpendicular plane. Wherein, the parallel plane refers to a plane parallel to the running road surface of the vehicle, and the vertical plane refers to a plane perpendicular to the running road surface of the vehicle and the normal direction of which is consistent with or opposite to the extending direction of the road surface; in some general cases, the parallel plane may also be a horizontal plane, and the vertical plane may also be a vertical plane. In other embodiments, the predetermined spatial plane may further include a second vertical plane other than the parallel plane and the vertical plane, the second vertical plane referring to a plane perpendicular to the road surface on which the vehicle travels and a normal direction thereof perpendicular to the direction in which the road surface extends. The preset space plane comprises a plurality of pixel points, and the parallax values of the pixel points are different.

The specific process for calculating the surface cost of the pixel point under the preset space plane based on the plane parameters of the preset space plane and the similarity cost of the pixel point comprises the following steps:

Assuming that the plane equation of the predetermined space plane is d=a×x+b×y+c, where (x, y) is the coordinates of the pixel point, so that it can be obtained Since the robustness of a single pixel point is poor and is susceptible to noise, in the embodiment of the present application, the cost of each pixel point under the preset spatial plane is defined, specifically, the cost of each pixel point under the preset spatial plane is a weighted average result of using the plane parameters (a and b above) of the pixel point to calculate the similarity cost of other pixel points on the preset spatial plane except the pixel point. Further, the weight of each pixel point in the other pixel points except the pixel point is given to each pixel point to be related to the color distance and the coordinate distance of the pixel point, and the formula is expressed as follows:

In the above formula, p is the pixel point, q is any one of other pixel points except the pixel point p on the preset space plane, and σ _r,σ_s is a parameter for controlling the influence of the color distance and the coordinate distance between the pixel point q and the pixel point p.

How to determine the surface cost of each pixel point under the preset space plane according to the similarity cost in the embodiment of the application is described in detail below with reference to fig. 3 and 4. First, how to scan a pixel in an embodiment of the present application is described with reference to fig. 3, and fig. 3 is a schematic diagram of scanning a pixel provided in an embodiment of the present application.

In order to improve the calculation rate, in the embodiment of the application, the calculation of the surface cost of the pixel point under the preset space plane can be performed by adopting a mode of separate calculation of image rows and columns. The specific flow comprises the following steps:

Step one: as shown in fig. 3 (a), for the Pixel point p, scanning from left to right in the preset spatial Plane, and marking the Plane cost of the Pixel point p under the preset spatial Plane as plane_cost (p, a, b), where a, b are the Plane parameters of the preset spatial Plane, and marking the similarity cost of the Pixel point p as pixel_cost (p), where the similarity cost is calculated in S120 above, and will not be repeated here. Specifically, for the pixel point p, scanning from left to right in the preset spatial plane includes:

the left pixel point of the pixel point p in the preset space plane is denoted as p_left, and since the plane equation of the preset space plane is known, the relationship between the parallax disp (p_left) of the left pixel point p_left and the parallax disp (p) of the pixel point p can be calculated as follows:

disp (p) =disp (p_left) +a, and for the sake of not losing generality, a >0 is assumed in the present application;

Specifically, the matching cost of the pixel point on the left of the pixel point in the preset spatial plane may be calculated according to the sum of the weight of the pixel point on the left of the pixel point multiplied by the similarity cost of the pixel point on the left and the similarity cost of the pixel point:

The matching cost result of all the left Pixel points of the Pixel points p in the preset space Plane is plane_cost_x_left_right (p, a) =w (p, p_left) ×plane_cost_x_left_right (p_left) +pixel_cost (p, a), and the plane_cost_x_left_right (p, a) can be obtained by solving through dynamic programming. Plane_cost_x_left_right (p, a) may be referred to as the first matching cost.

Step two: as shown in fig. 3 (b), for the Pixel point p, scanning from right to left in the preset spatial Plane, and marking the surface cost of the Pixel point p under the preset spatial Plane as plane_cost (p, a, b), where a, b are the Plane parameters of the preset spatial Plane, and marking the similarity cost of the Pixel point p as pixel_cost (p), where the similarity cost has been calculated in S120. Specifically, for the pixel point p, scanning from right to left in the preset spatial plane includes:

The right pixel point of the pixel point p in the preset spatial plane is denoted as p_right, and since the plane equation in the preset spatial plane is known, the relationship between the parallax disp (p_right) of the right pixel point p_right and the parallax disp (p) of the pixel point p can be calculated as:

disp(p)＝disp(p_right)-a；

specifically, the matching cost of the pixel point on the right of the pixel point in the preset spatial plane may be calculated according to the sum of the similarity cost of the pixel point on the right multiplied by the weight of the pixel point on the right and the similarity cost of the pixel point:

The matching cost result of all right Pixel points of the Pixel points p in the preset space Plane is plane_cost_x_right_left (p, a) =w (p, p_right) ×plane_cost_x_right_left (p_right) +pixel_cost (p, a), and the right-to-left scanning result plane_cost_x_right_left (p, a) can be obtained through dynamic programming. Plane_cost_x_right_left (p, a) may be referred to as the second matching cost.

Step three: as shown in fig. 3 (c), for the Pixel point p, scanning from top to bottom in the preset spatial Plane, and marking the surface cost of the Pixel point p under the preset spatial Plane as plane_cost (p, a, b), where a, b are the Plane parameters of the preset spatial Plane, and marking the similarity cost of the Pixel point p as pixel_cost (p), where the similarity cost has been calculated in S120. Specifically, for the pixel point p, scanning from top to bottom in the preset spatial plane includes:

The upper pixel point of the pixel point p in the preset space plane is denoted as p_up, and since the plane equation of the preset space plane is known, the relationship between the parallax disp (p_up) of the upper pixel point p_up of the pixel point p and the parallax disp (p) of the pixel point p can be calculated as follows:

disp (p) =disp (p_up) +b, and in order not to lose generality, b >0 is assumed in the present application;

Specifically, the matching cost of the pixel point above the pixel point in the preset spatial plane may be calculated according to the sum of the weight of the pixel point above the pixel point multiplied by the similarity cost of the pixel point above the pixel point and the similarity cost of the pixel point above the pixel point:

The matching cost of all the above Pixel points of the Pixel points p in the preset space Plane is plane_cost_y_up_down (p, b) =w (p, p_up) ×plane_cost_y_up_down (p_up) +pixel_cost (p, b), and the plane_cost_y_up_down (p, b) can be obtained through dynamic programming. The result plane_cost_y_up_down (p, b) is scanned from top to bottom. Lane_cost_y_up_Down (p, b) may be referred to as a third matching cost.

Step four: as shown in fig. 3 (d), for the Pixel p, scanning from bottom to top in the preset spatial Plane, and marking the surface cost of the Pixel p under the preset spatial Plane as plane_cost (p, a, b), where a, b are the Plane parameters of the preset spatial Plane, and marking the similarity cost of the Pixel p as pixel_cost (p), where the similarity cost has been calculated in S120. Specifically, for the pixel point p, scanning from bottom to top in the preset spatial plane includes:

The relation between the parallax disp (p_down) of the lower pixel point p_down of the pixel point p and the parallax disp (p) of the pixel point p can be calculated by recording the lower pixel point p_down of the pixel point p in the preset space plane as p_down, and the plane equation of the preset space plane is known:

disp(p)＝disp(p_down)-b；

Specifically, the matching cost of the pixel point below the pixel point in the preset spatial plane may be calculated according to the sum of the weight of the pixel point below the pixel point multiplied by the similarity cost of the pixel point below the pixel point and the similarity cost of the pixel point:

The matching cost result of all the lower Pixel points of the Pixel points p in the preset space Plane is plane_cost_y_down_up (p, b) =w (p, p_down) ×plane_cost_y_down_up (p_down) +pixel_cost (p, b), and plane_cost_y_down_up (p, b) can be obtained through dynamic programming. The result plane_cost_y_up_down (p, b) is scanned from bottom to top. Plane_cost_y_up_Down (p, b) may be referred to as the fourth matching cost.

The above-mentioned calculation of the surface cost of the pixel point p under the preset spatial plane with the plane parameter equation of d=a×x+b×y+c is obtained by the flow calculation shown in fig. 4, and fig. 4 is a schematic flow chart of calculating the surface cost according to the embodiment of the present application, including S210-S230.

S210, scanning from left to right and scanning results from right to left are obtained through calculation.

Specifically, the left-to-right scanning result of the pixel points in the preset space plane is shown in fig. 3 (a) and step one in fig. 3, which are not repeated here; similarly, the scanning result of the pixel point from right to left in the preset space plane is shown in fig. 3 (b) and step two in fig. 3, and will not be described again here.

S220, calculating to obtain a first average value of left-to-right scanning and right-to-left scanning results.

Specifically, an average value of the first matching cost plane_cost_x_left_right (p, a) and the second matching cost plane_cost_x_right_left (p, a) in S230 is calculated, to obtain plane_cost_x.

S211, scanning from top to bottom and scanning from bottom to top are calculated.

Specifically, the scanning result of the pixel points in the preset space plane from top to bottom is shown in fig. 3 (c) and step three in fig. 3, which are not repeated here; similarly, the scanning result of the pixel point in the preset space plane from bottom to top is shown in fig. 3 (d) and step four in fig. 3, which are not repeated here.

S221, calculating to obtain a second average value of scanning results from top to bottom and from bottom to top.

Specifically, an average value of the third matching cost plane_cost_y_up_down (p, b) and the fourth matching cost plane_cost_y_up_down (p, b) in S240 is calculated, resulting in plane_cost_y.

S230, calculating the surface cost of the pixel point under the preset space plane.

And (3) averaging the first average value plane_cost_x calculated in the step (220) and the second average value plane_cost_y calculated in the step (221) to obtain the cost of the pixel point under the preset space Plane.

The method shown in fig. 4 can calculate the weighted average matching cost of the pixel point in the ordinate direction of the pixel point and the weighted average matching cost of the pixel point in the abscissa direction of the pixel point in parallel, and is easy to accelerate processing.

In the above-mentioned manner, the stereo matching method provided by the embodiment of the present application can calculate the cost of each pixel under the preset space plane through the flow shown in fig. 3 and fig. 4. Because the plane parameters of each pixel point under the preset space plane are needed to be based on the plane parameters of the preset space plane in the calculation flow of the plane cost of each pixel point under the preset space plane, the plane cost of each pixel point under the preset space plane can be vividly and simply called as the 'plane' cost of each pixel point.

When the stereo matching method provided by the embodiment of the application is applied to a traffic scene, the traffic scene accords with the Manhattan world assumption and is basically composed of a horizontal plane and a vertical plane, so that in the traffic scene application, the preset space plane can comprise a front view parallel plane equation and a ground plane equation.

And based on a preset space plane, after determining the surface cost of each pixel point under the preset space plane according to the similarity cost, executing S140 to determine the disparity maps of the multiple views.

Specifically, according to the cost of each pixel under the preset space plane calculated in S130, the disparity map of the multiple views obtained under the scene is determined. Further, in order to support large parallax variations. The stereo matching method provided by the embodiment of the application introduces Total Variation (Total Variation) smoothing terms into the defined energy equation when calculating the disparity map. The energy equation is expressed as the sum of a data item and a smoothing item, and is called the energy of the disparity map to be solved, wherein the data item is a normalized normal modulus of the second matching cost of all pixel points, and the smoothing item is a total variation. Illustratively, the energy equation is expressed as:

Where D represents the disparity map to be finally solved, planes represents the standard normalization (normalized norm) of the surface cost of all the pixels calculated in S130 under the preset spatial plane, and Total variance is used by the pair _cost.

The energy equation is equivalent to combining the surface cost and the total variation under the preset space plane, and finally solving the energy equation.

It should be understood that the method for solving the energy equation is not limited in the present application, and any one of algorithms for solving the energy equation in the existing stereo matching method may be used. For example BeliefPropagation, graphCut, TVL ¹, viterbi, GRADIENT DESCENT based, etc., the Viterbi algorithm may be used to solve for the resolution and real time requirements of the traffic scene.

Finally, performing Plane Fit (Plane Fit) on the parallax map through the preset Plane parameters of the space Plane to obtain sub-pixel precision. It should be understood that the plane fitting method according to the present application may be any plane fitting method in the prior art, and will not be described in detail herein.

The method for stereo matching of images provided in the embodiments of the present application is described in detail above with reference to fig. 2 to 4, and it should be understood that, in the embodiments of the method, the sequence numbers of the above processes do not mean the execution sequence, and the execution sequence of each process should be determined by its functions and internal logic, and should not be limited to any implementation process of the embodiments of the present application. The following describes in detail the driving assistance device provided by the embodiment of the present application with reference to fig. 5, and describes in detail the vehicle provided by the embodiment of the present application with reference to fig. 6.

Fig. 5 is a schematic block diagram of a driving assistance apparatus 50 provided in an embodiment of the application. As shown in fig. 5, the driving support apparatus 50 includes:

at least one memory 501 for storing computer-executable instructions;

At least one processor 502, individually or collectively, for: accessing the at least one memory and executing the computer-executable instructions to perform the operations of:

determining a similarity cost of each corresponding pixel point in the multiple views;

Based on a preset space plane, determining the surface cost of each pixel point under the preset space plane according to the similarity cost;

In some embodiments, the processor 502 is specifically configured to:

And determining a plurality of local feature graphs corresponding to the views respectively, wherein the Hamming distance of each two local feature graphs in the plurality of local feature graphs is used as the similarity cost of the pixel points in the two local feature graphs.

In some embodiments, the processor 502 is further configured to:

a local feature map is obtained for each of the plurality of views based on a census census algorithm.

In some embodiments, the processor 502 is further configured to:

Determining a first matching cost, a second matching cost, a third matching cost and a fourth matching cost according to the preset space plane parameters and the similarity cost of the pixel points, wherein the average value of the first matching cost, the second matching cost, the third matching cost and the fourth matching cost is the surface cost of the pixel points under the preset space plane;

The first matching cost is a matching cost of a pixel point on the left of the pixel point in the preset space plane, the second matching cost is a matching cost of a pixel point on the right of the pixel point in the preset space plane, the third matching cost is a matching cost of a pixel point on the upper side of the pixel point in the preset space plane, and the fourth matching cost is a matching cost of a pixel point on the lower side of the pixel point in the preset space plane.

In some embodiments, the predetermined spatial plane is at least two planes, including a parallel plane and a perpendicular plane.

In some embodiments, the processor 502 is specifically configured to:

determining a first average value of the first matching cost and the second matching cost;

determining a second average of the third matching cost and the fourth matching cost;

and the average value of the first average value and the second average value is the surface cost of the pixel point under the preset space plane.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel point on the left side of the pixel point;

the first matching cost is the sum of the similarity cost of the pixel point on the left side multiplied by the weight of the pixel point on the left side and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel point on the right side of the pixel point;

the second matching cost is the sum of the similarity cost of the pixel point on the right multiplied by the weight of the pixel point on the right and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

Determining the weight of the pixel above the pixel;

The third matching cost is the sum of the similarity cost of the pixel point above the pixel point multiplied by the weight of the pixel point above the pixel point and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel below the pixel;

The fourth matching cost is the sum of the similarity cost of the pixel points below the pixel points multiplied by the weight of the pixel points below the pixel points and the similarity cost of the pixel points.

In some embodiments, the processor 502 is specifically configured to:

And determining the weight of the pixel points except the pixel points in the plane, wherein the weight of the pixel points is related to the color distance and the coordinate distance between the pixel points.

In some embodiments, the processor 502 is specifically configured to:

determining an energy equation, wherein the energy equation is expressed as the sum of a data item and a smooth item and is called the energy of a parallax map to be solved;

The data item is a normalized normal model of the second matching cost of all the pixel points, and the smoothing item total variation is obtained;

and solving the energy equation to obtain the parallax map.

In some embodiments, the driving assistance device 50 may further include a visual sensor (not shown) for acquiring the aforementioned view.

It should be appreciated that the driving assistance device 50 may be an integrated device or apparatus, such as a device with at least one memory 501 and at least one processor 502, which may acquire data of existing sensors of the vehicle by means of an on-board communication bus, wireless communication, etc., and perform the foregoing processes on the data; or a device or apparatus that includes at least one visual sensor at the same time, so that the driving assistance device 50 can acquire sensor data by itself and perform the aforementioned processing of the data. The driver assistance device 50 may then be easily installed or removed as a separate device or apparatus. The driving assistance device 50 may also be a distributed device or apparatus, such as at least one memory 501, at least one processor 502, which are provided or mounted scattered on the vehicle, in which case the driving assistance device 50 may be mounted in the vehicle as a front-mounted device or apparatus; or may also include at least one visual sensor, and the visual sensor may be mounted in the same or a different location as the memory 501 or the processor 502. The driving assistance device 50 may then be more easily arranged on the vehicle as a plurality of discrete devices or apparatuses.

It should be appreciated that the driving assistance device 50 may also be implemented by a corresponding software module, which is not described here again.

In other embodiments, the driver assistance device 50 and the visual sensor described above are two modules, i.e., the visual sensor is not necessarily integrated into the driver assistance device 50. As shown in fig. 6, fig. 6 is a schematic diagram of a vehicle according to an embodiment of the present application. The schematic diagram includes a driving assistance device 50 and a visual sensor 60.

It should be appreciated that the vehicle shown in fig. 6 includes the driving assistance apparatus 50 described above and a vision sensor for acquiring the above-described multiple views.

It should be further understood that the vehicle shown in fig. 6 includes the driving assistance device 50 and the vision sensor 60 is only an example, and the vehicle may include only the driving assistance device 50 in the actual application process, and the vision sensor 60 may be an external device of the vehicle, or the vision sensor 60 is integrated on the driving assistance device 50, which is not illustrated in the present application.

It should be appreciated that the Processor referred to in the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory referred to in embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) is integrated into the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Embodiments of the present application also provide a computer-readable storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of the method embodiments described above.

Embodiments of the present application also provide a computing device including the above computer-readable storage medium.

The embodiment of the application also provides a vehicle comprising the driving assistance device 50.

In some embodiments, the vehicle further comprises a vision sensor for acquiring a view.

The embodiment of the application can be applied to traffic scenes, in particular to the field of intelligent automobile environment sensing.

It should be understood that the partitioning of circuits, sub-circuits, and sub-units of embodiments of the present application is illustrative only. Those of ordinary skill in the art will recognize that the various example circuits, sub-circuits, and sub-units described in the embodiments disclosed herein can be split or combined.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (Digital Video Disc, DVD)), or a semiconductor medium (e.g., a Solid state disk (Solid STATE DISK, SSD)), or the like.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

It should be understood that in embodiments of the present application, "B corresponding to a" means that B is associated with a, from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of stereo matching of images, comprising:

determining a similarity cost for each pixel point in the multiple views;

Based on a preset space plane, determining the surface cost of each corresponding pixel point in the multiple views under the preset space plane according to the similarity cost, wherein the surface cost of each pixel point is a global surface cost;

determining a disparity map of the multiple views under the scene according to the face cost;

Wherein, the determining, according to the similarity cost, a cost of each pixel point corresponding to the multiple views under the preset spatial plane includes:

Determining the matching cost of the pixel points on the left side of the pixel points, the matching cost of the pixel points on the right side, the matching cost of the pixel points on the upper side and the matching cost of the pixel points on the lower side according to the plane parameters of the preset space plane and the similarity cost of the pixel points;

and the average value of the matching cost of the pixel points on the left, the matching cost of the pixel points on the right, the matching cost of the pixel points on the upper side and the matching cost of the pixel points on the lower side is the surface cost of the pixel points under the preset space plane.

2. The method of claim 1, wherein the plurality of views is two views, obtained by a binocular view obtaining device.

3. The method of claim 1, wherein prior to said determining a similarity cost for each corresponding pixel point in the plurality of views, the method further comprises:

feature points in each of the plurality of views are determined.

4. A method according to claim 3, wherein said determining a similarity cost for each corresponding pixel point in said plurality of views comprises:

And determining similarity cost between corresponding feature points in the multiple views.

5. The method of claim 4, wherein the similarity cost between the corresponding feature points comprises:

And the hamming distances between the corresponding feature points.

6. The method according to any one of claims 3-5, further comprising:

and determining the characteristic points through a census algorithm.

7. The method of claim 1, wherein the predetermined spatial plane is at least two planes, the at least two planes including a parallel plane and a perpendicular plane.

8. The method according to claim 1, wherein the method further comprises:

determining a first average value of the matching cost of the pixel points on the left side and the matching cost of the pixel points on the right side;

determining a second average value of the matching cost of the upper pixel point and the matching cost of the lower pixel point;

9. The method of claim 1, wherein the determining a disparity map for the plurality of views in the scene from the cost of the surface comprises:

Determining an energy equation, wherein the energy equation is expressed as the sum of a data item and a smoothing item, and is called as the energy of a parallax map to be solved, wherein the data item is a normalized normal modulus of second matching costs of all pixel points, and the smoothing item is a total variation;

and solving the energy equation to obtain the parallax map.

10. An auxiliary driving device, comprising:

At least one memory for storing computer-executable instructions;

determining a similarity cost for each pixel point in the multiple views;

11. The driving assist device according to claim 10, wherein the plurality of views are two views, obtained by a binocular view obtaining apparatus.

12. The driving assistance device according to claim 10, wherein the processor is specifically configured to:

feature points in each of the plurality of views are determined.

13. The driving assistance apparatus as claimed in claim 12, wherein said processor determining a similarity cost for each corresponding pixel in said plurality of views comprises:

14. The driving assistance apparatus according to claim 13, wherein the distance between the corresponding feature points includes:

And the hamming distances between the corresponding feature points.

15. The driving assistance device according to any one of claims 12 to 14, wherein the processor is specifically configured to:

and determining the characteristic points through a census algorithm.

16. The driving assistance device according to claim 10, wherein the predetermined spatial plane is at least two planes including a parallel plane and a vertical plane.

17. The driving assistance device according to claim 10, wherein the processor is specifically configured to:

18. The driving assistance device according to claim 10, wherein the processor is specifically configured to:

the data item is a normalized normal model of the second matching cost of all pixel points, and the smoothing item total variation is obtained;

and solving the energy equation to obtain the parallax map.

19. The driving assistance apparatus as claimed in claim 10, further comprising a visual sensor for acquiring the view.

20. A vehicle comprising the driving assist device according to any one of claims 10 to 19.

21. The vehicle of claim 20, further comprising at least two vision sensors for acquiring the view.

22. A computer readable storage medium having instructions stored thereon which, when run on a computer, cause the computer to perform the method of stereo matching of images of any of claims 1 to 9.