CN111738061A

CN111738061A - Binocular vision stereo matching method based on regional feature extraction and storage medium

Info

Publication number: CN111738061A
Application number: CN202010380080.3A
Authority: CN
Inventors: 赵勇; 丘文峰; 陈天健; 刘钢
Original assignee: Guiguzi Artificial Intelligence Technology Shenzhen Co ltd
Current assignee: Guiguzi Artificial Intelligence Technology Shenzhen Co ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-10-02

Abstract

The binocular vision stereo matching method based on regional feature extraction and a storage medium are provided, wherein the binocular vision stereo matching method comprises the following steps: acquiring a first image and a second image under two viewpoints; setting a first pixel point in the first image, expanding similar textures of other pixel points distributed around the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area; setting a second pixel point in the second image, and forming a second texture area through expansion of similar textures to obtain a fourth image; and performing cost aggregation on any pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset parallax values so as to obtain the optimal parallax value of the pixel point from each parallax value. The method can overcome the defect that the low-texture content matching process between two images is easy to generate errors, and greatly improves the performance of the binocular vision stereo matching algorithm.

Description

Binocular vision stereo matching method based on regional feature extraction and storage medium

Technical Field

The invention relates to the technical field of binocular stereo vision, in particular to a binocular vision stereo matching method based on regional feature extraction and a storage medium.

Background

It is known that light from a scene is collected in a human eye, a sophisticated imaging system, and is transmitted through a neural center to a brain containing hundreds of millions of neurons to be processed in parallel, thereby obtaining real-time, high-definition, accurate depth perception information. This enables the human adaptability to the environment to be greatly improved, and many complex actions can be completed: such as walking, sports, driving vehicles, and performing scientific experiments.

Computer vision is just the discipline of using a computer to simulate the human visual system in order to recover a 3D image from two planar images acquired. Currently, the level of computer stereo vision is far from the level of human binocular vision, and thus its research is still a very active neighborhood.

Binocular Stereo Vision (Binocular Stereo Vision) is an important form of computer Vision, and is a method for acquiring three-dimensional geometric information of an object by acquiring two images of the object to be measured from different positions by using imaging equipment based on a parallax principle and calculating position deviation between corresponding points of the images. Therefore, the real world is processed through the visual system of the simulator, the perception capability of the computer or the robot to the environment can be greatly enhanced for the research of stereo vision matching, the robot can better adapt to the environment and is more intelligent, and people can be better served. Through technical development for many years, binocular stereo vision has been applied in the neighborhoods of robot vision, aerial surveying and mapping, reverse engineering, military application, medical imaging, industrial detection and the like.

Currently, binocular stereo vision integrates images obtained by two image capturing devices and observes the difference between the images, so that a computer can obtain accurate depth information, establish the corresponding relationship between features, and correspond mapping points of the same spatial physical point in different images, and the difference is generally called parallax (disparity). However, the most important and difficult problem in binocular stereo vision is stereo vision matching, i.e. finding matching corresponding points from different viewpoint images.

However, in some images in binocular stereo vision, the images often contain a large amount of low-texture content, such as sky, mountains, grasslands, and other areas, each pixel in these areas has little difference with other pixels, and the traditional method can only realize extraction of area features by means of local color texture information, and cannot solve the problem no matter a color gradient processing form or a local area cost aggregation processing form is adopted, or a more complex matching processing method which has a certain inhibition effect on light ray change is adopted, so that some adverse effects are brought to the landing application of binocular vision stereo matching.

Disclosure of Invention

The invention mainly solves the technical problem of how to improve the accuracy of pixel point stereo matching in the utilization process of the existing binocular vision technology. In order to solve the technical problem, the application discloses a binocular vision stereo matching method based on regional feature extraction and a storage medium.

According to a first aspect, an embodiment provides a binocular vision stereo matching method based on regional feature extraction, including: acquiring a first image and a second image under two viewpoints; setting a first pixel point in the first image, expanding similar textures of other pixel points distributed around the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area; setting a second pixel point in the second image, starting to expand similar textures of other pixel points distributed around from the second pixel point to form a second texture area, and obtaining a fourth image by using the second texture area and an unexpanded area outside the second texture area; and performing cost aggregation on any pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset parallax values so as to obtain the optimal parallax value of the pixel point from each parallax value.

Setting a first pixel point in the first image, starting to expand similar textures of other pixel points distributed around from the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area, wherein the third image comprises: determining low-texture content in the first image, setting a first pixel point in the low-texture content, and taking the first pixel point as a central point to establish a first limited area; in the range of the first limited area, sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixels from the first pixel to obtain each pixel with the similar texture with the first pixel; forming a first texture area by using each pixel point with similar texture with the first pixel point, and marking each pixel point in the first texture area as a first value; taking a region outside the first texture region and inside the first defined region as an unexpanded region, and marking each pixel point in the unexpanded region as a second value; and generating a third image according to each pixel point marked as a first value and each pixel point marked as a second value in the first image.

In the range of the first limited region, sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixels from the first pixel to obtain each pixel having the similar texture with the first pixel, including: sequentially expanding continuous pixels distributed along the longitudinal direction from the first pixel, and when the gray difference value between one pixel and the last expanded pixel is judged to be smaller than a preset threshold value, determining that the pixel is a pixel with similar texture to the first pixel until the next pixel in the longitudinal direction is determined to be a pixel with non-similar texture; and sequentially expanding continuous pixels distributed along the transverse direction from each pixel point which is obtained in the longitudinal direction and is similar to the texture, and determining that the gray value between one pixel point and the last expanded pixel point is smaller than the threshold value, and determining the pixel point which is similar to the texture between the pixel point and the first pixel point until determining that the next pixel point in the transverse direction is a pixel point which is not similar to the texture.

The setting of the second pixel point in the second image, starting from the second pixel point, performing expansion of similar textures on other pixel points distributed around to form a second texture region, and obtaining a fourth image by using the second texture region and an unexpanded region outside the second texture region, includes: correspondingly setting a second pixel point in the second image according to the position of the first pixel point in the first image, and taking the second pixel point as a central point to establish a second limited area; in the range of the second limited area, sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixel points from the second pixel point to obtain each pixel point with the similar texture between the second pixel point and the similar texture; forming a second texture region by using each pixel point with similar texture between the second pixel point, and marking each pixel point in the second texture region as a first value; taking the region outside the second texture region and inside the second limited region as an unexpanded region, and marking each pixel point in the unexpanded region as a second value; and generating a fourth image according to the pixel points marked as the first value and the pixel points marked as the second value in the second image.

In the third image and the fourth image, each pixel point marked as a first value has the same gray value as the original pixel point, and each pixel point marked as a second value has a full black gray value.

After acquiring the first image and the second image under two viewpoints and before setting a first pixel point in the first image, the method further includes: taking each edge pixel point of the first image as a symmetry axis, and performing mirror image extension processing on the first image along the symmetry axis so as to extend the first image to the periphery to a preset height and a preset width; taking each edge pixel point of the second image as a symmetry axis, and performing mirror image extension processing on the second image along the symmetry axis so as to extend the second image to the periphery to a preset height and a preset width; the preset height is greater than or equal to a half height value of the first limited area, and the preset width is greater than or equal to a half width value of the first limited area; the second defined area is the same size as the first defined area.

The performing cost aggregation on any one pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset disparity values to obtain an optimal disparity value of the pixel point from each disparity value includes: acquiring any one pixel point from the third image or the fourth image and setting the pixel point as a third pixel point; performing cost aggregation on the third pixel points according to a preset cost function and a plurality of preset parallax values to obtain a cost aggregation function corresponding to each parallax value; and obtaining the optimal parallax value of the third pixel point from each parallax value by using a cost aggregation function corresponding to each parallax value.

The cost function is corresponding to color, gradient, rank or NCC; the disparity value is represented by d and has a value range of {0,1_maxIn which d is_maxRepresenting the maximum allowed value of said disparity value.

The obtaining the optimal disparity value of the third pixel point from each disparity value by using the cost aggregation function corresponding to each disparity value includes: calculating a cost aggregation function corresponding to each parallax value, wherein the cost aggregation function is represented as C (y, x, d), obtaining a substituted parallax value when the minimum function value is met, and taking the parallax value as an optimal parallax value; and y and x are respectively a longitudinal coordinate value and an abscissa coordinate value of the third pixel point.

According to a second aspect, an embodiment provides a computer-readable storage medium comprising a program executable by a processor to implement the binocular visual stereo matching method described in the first aspect above.

The beneficial effect of this application is:

according to the binocular vision stereo matching method and the storage medium of the embodiment, the binocular vision stereo matching method comprises the following steps: acquiring a first image and a second image under two viewpoints; setting a first pixel point in the first image, expanding similar textures of other pixel points distributed around the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area; setting a second pixel point in the second image, expanding similar textures of other pixel points distributed around the second pixel point to form a second texture area, and obtaining a fourth image by utilizing the second texture area and an unexpanded area outside the second texture area; and performing cost aggregation on any pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset parallax values so as to obtain the optimal parallax value of the pixel point from each parallax value. On the first hand, because the first image and the second image under two viewpoints are respectively expanded by similar textures to respectively obtain a third image and a fourth image which comprise specific texture areas, the correlation coefficient between the third image and the fourth image can be improved, the defect that the low-texture content matching process between the two images is easy to generate errors is overcome, and the performance of a binocular vision stereo matching algorithm based on the traditional method is greatly improved; in the second aspect, because similar texture expansion is performed on other pixel points distributed around the first image and the second image from the set pixel points, a specific texture region containing structural information can be obtained by the expansion method, so that a third image and a fourth image required by binocular vision stereo matching are easily formed according to the specific texture region, the structural information, the color information and the texture information contained in the images are used for participating in the stereo matching process together, a more accurate cost aggregation calculation result is obtained, and the stereo matching accuracy is improved; in the third aspect, because the limited areas are respectively established in the first image and the second image, each pixel point with similar texture between the pixel points is obtained and set through judgment of the similar texture in the limited areas, so that the pixel extension ranges can be respectively normalized in the first image and the second image, a specific texture area with higher similarity is obtained, and the correlation coefficient between the images and the matching cost of subsequent stereo matching processing are improved; in the fourth aspect, before the expansion of the similar texture is performed on the first image and the second image, the image is subjected to mirror image extension along the symmetry axis, so that the expandable range in the first image and the second image is increased, and the similar texture expansion of surrounding pixels can be easily realized even if edge pixel points of the original image are used for performing the similar texture expansion; in the fifth aspect, since cost aggregation is performed on the third pixel points acquired from the third image or the fourth image according to the preset cost function and the preset multiple parallax values, a cost aggregation function with higher robustness can be obtained through cost aggregation, and the method is favorable for obtaining the optimal parallax value with higher accuracy from each parallax value. In a sixth aspect, the technical scheme of the application can effectively solve the problem of mismatching during stereo matching, and is beneficial to accurately finding matched corresponding points in different viewpoint images, so that the precision of stereo matching is improved.

Drawings

Fig. 1 is a flowchart of a binocular vision stereo matching method based on regional feature extraction in the present application;

FIG. 2 is a flowchart of expanding a first texture region in a first image to obtain a third image;

FIG. 3 is a flowchart of expanding a second texture region in a second image to obtain a fourth image;

FIG. 4 is a flow chart for obtaining optimal disparity values through cost aggregation;

FIG. 5 is a flowchart of image mirror extension processing performed on a first image and a second image, respectively;

fig. 6 is a schematic structural diagram of a stereo matching device in the present application;

FIG. 7 is a schematic image of a first image;

FIG. 8 is a schematic image of a first image after being subjected to a mirror extension process;

FIG. 9 is a schematic view of a first defined area in a first image;

fig. 10 is a schematic image of the first texture region and the unexpanded region in the third image.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

In binocular vision stereo matching, a key problem is to find matching points in left and right images to obtain the horizontal position difference of corresponding pixels in the two images, which is also called as parallax, so that the depth of the pixel point can be further calculated. Pixel points which are not at the same depth can have the same color, texture, gradient and the like, so that the pixel points often cause the occurrence of mismatching during stereo matching, thereby further causing great error in parallax calculation and greatly influencing the application of binocular vision in depth measurement. To overcome this problem, in the existing stereo matching method for binocular images, the pixel points in the peripheral region of the pixel points are generally used to estimate the pixel points, for example, the problem of finding a matching block in motion search of video coding, and the problem of directly solving SAD (sum of absolute difference) matching are solved. However, in these methods, the weighting of the matching cost of the pixel can still only be calculated by using the above features having no direct relation with parallax, such as color, texture, and gradient, and therefore these methods have great robustness.

In order to improve the robustness of matching cost and the accuracy of pixel matching, the first image and the second image under two viewpoints are respectively subjected to similar texture expansion to respectively obtain a third image and a fourth image which comprise specific texture regions, the correlation coefficient between the third image and the fourth image is improved by means of the structure of the texture regions, and the three-dimensional matching process is participated by utilizing the structure information, the color information and the texture information which are contained in the images, so that the problem of low-texture mismatching during three-dimensional matching can be effectively solved, matching corresponding points can be accurately found in different viewpoint images, and the accuracy of three-dimensional matching is improved.

The technical solution of the present application will be described in detail with reference to the following examples.

The first embodiment,

Referring to fig. 1, the present application further discloses a binocular vision stereo matching method based on regional feature extraction, which mainly includes steps S100-S400, which are respectively described below.

Step S100, a first image and a second image under two viewpoints are acquired.

In an embodiment, the target object is captured by a binocular camera, and since the binocular camera forms two image capturing viewpoints, a frame of image is obtained from the two image capturing viewpoints, so as to obtain two left and right images, i.e., a first image and a second image. The first image and the second image have substantially the same content, and a disparity difference exists between the first image and the second image.

In addition, the first image and the second image may also be derived from a certain data set, such as a KITTI data set (automatic driving test set), which is a computer vision algorithm evaluation data set in the current international largest automatic driving scene, and is used for evaluating the performance of computer vision technologies such as stereo image (stereo), optical flow (optical flow), visual ranging (visual ranging), 3D object detection (object detection), and 3D tracking (tracking) in a vehicle-mounted environment. The KITTI data set comprises real image data acquired in urban areas, villages, highways and other scenes, wherein each image contains at most 15 vehicles and 30 pedestrians, and is shielded and cut off in various degrees, so that the KITTI data set has rich image content and is very suitable for training a binocular vision stereo matching network.

Step S200, setting a first pixel point in the first image, expanding similar textures of other pixel points distributed around the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area.

It should be noted that the first pixel point set in the first image may be a pixel point in a specific texture region (e.g., low-texture content such as sky, grassland, mountain, river, street, etc.), and the first pixel point may be randomly selected by a user in advance on the first image, and preferably, a pixel point with a representative color is selected in the region with low-texture content.

It can be understood that after the first pixel point is selected in a certain low-texture content, other pixel points in the area can be found in a similar texture expansion mode, so that other pixel points meet the requirement of having similar texture with the first pixel point, and the first texture area is formed by the first pixel point and the found other pixel points.

Step S300, setting a second pixel point in the second image, expanding similar textures of other pixel points distributed around the second pixel point to form a second texture area, and obtaining a fourth image by using the second texture area and an unexpanded area outside the second texture area.

Similarly, the second pixel point set in the second image may also be a pixel point in a region with a specific texture (e.g., low-texture content such as sky, grassland, mountain, river, street, etc.). Because the image contents between the first image and the second image are basically the same and only the difference exists in the aspect of parallax, the setting position of the second pixel point in the second image should be at the setting position corresponding to the first pixel point in the first image, so that the expansion of similar textures of all pixel points in the same type of low-texture content in the second image is facilitated.

It can be understood that after the second pixel point is selected in the same low texture content, other pixel points in the region can be found in a similar texture expansion mode, so that other pixel points meet the requirement of having similar texture with the second pixel point, and the second texture region is formed by the second pixel point and the found other pixel points.

And step S400, performing cost aggregation on any pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset parallax values so as to obtain the optimal parallax value of the pixel point from each parallax value.

It should be noted that after cost aggregation is performed on any one pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset disparity values, a cost aggregation function corresponding to each disparity value can be obtained, for example, the cost aggregation function { C (y, x, d) | d ═ 0,1_maxAnd (y, x) represents the coordinate position of a pixel point, and d represents a parallax value. Then, the cost aggregation functions can be processed by using the existing matching cost weighting algorithm, for example, the Winner-take-all algorithm WTA (Winner-take-all) is adopted, and only the point with the optimal overlapping matching cost (the minimum value of SAD and SSD, or the maximum value of NCC) needs to be selected as the corresponding matching point in a certain range.

In the present embodiment, the above step S200 mainly relates to the process of forming the first texture region in the first image and obtaining the third image, and by referring to fig. 2, the step specifically includes steps S210 to S250, which are respectively described as follows.

Step S210, determining low texture content in the first image, setting a first pixel point therein, and using the first pixel point as a center point to establish a first defined region.

For example, in fig. 7, for a first image composed of streets, cars, grass, mountains, sky, and buildings, if the image content that the user desires to match is sky, a first pixel point P may be set in the low-texture content represented by the sky in the first image, and a first defined region may be established with the first pixel point P as a central point, where the first defined region may be as shown by reference P in fig. 8.

It should be noted that the first defined area may be configured in any shape, such as rectangular, circular, triangular, etc. In this embodiment, the first defined area is preferably arranged as a rectangle, so that a rectangular strip is formed around the first pixel, and the rectangular strip should cover enough low-texture content, and in some cases, image content other than low-texture content may be included in the first defined area.

For example, in fig. 8, the first defined area P is a rectangular strip with a height of 500 pixels and a width of 200 pixels, and the rectangular strip includes not only low-texture content represented by sky but also high-texture content represented by buildings, numbers and traffic signs.

Step S220, in the range of the first limited region, sequentially determining similar textures of longitudinally distributed and transversely distributed continuous pixels from the first pixel, to obtain each pixel having a similar texture with the first pixel.

In one embodiment, the method sequentially expands the continuous pixels distributed along the longitudinal direction from the first pixel in the first limited region, and determines that the gray difference between one pixel and the last expanded pixel is smaller than a preset threshold, and then determines that the pixel is a pixel having similar texture to the first pixel until the next pixel in the longitudinal direction is determined to be a pixel having non-similar texture. And then, sequentially expanding continuous pixels distributed along the transverse direction from each pixel point which is obtained in the longitudinal direction and has similar texture, and when the gray value between one pixel point and the last expanded pixel point is judged to be smaller than a threshold value, confirming the pixel point which has the similar texture with the first pixel point when the pixel point is confirmed until the next pixel point in the transverse direction is confirmed to be the pixel point which has non-similar texture.

For example, fig. 9 is a drawing of the first limited region P in fig. 8, and for convenience of description, an x-y coordinate system is established in the upper left corner of the first limited region P, any one pixel point in the coordinate system can be represented by (y, x), and the coordinate of the origin O is (0, 0); in addition, the first pixel P in the first limited region P is located at the center of the region, and if the height of the first limited region P is set to 2 × L +1 and the width is set to 2 × R +1, the coordinate of the first pixel P is (y0, x0) ═ L +1, R + 1.

The first pixel point p can be used as the starting point of expansion to sequentially expand the continuous pixel points in the longitudinal distribution and the transverse distribution.

A first expansion step: the low-texture content is longitudinally distributed and upwards expanded, and whether the gray difference value between one pixel point and the last expanded pixel point is smaller than a preset threshold value can be judged through the following formula (1).

max(|r(j,x0)-r(j+1,x0)|,|g(j,x0)-g(j+1,x0)|,|b(j,x0)-b(j+1,x0)|)<(1)

Wherein max () is a maximum function, R () is a gray value of an R channel in an RGB color channel, G () is a gray value of a G channel in the RGB color channel, B () is a gray value of a B channel in the RGB color channel, and | | is an absolute value operation; the value of j is a preset threshold value, and j is an ordinate value and satisfies the conditions of L, L-1, L-2, … and 1. At this time, the coordinate (j, x0) represents a pixel point distributed upward along the longitudinal direction, and the coordinate (j +1, x0) represents a last expandable pixel point distributed along the longitudinal direction. For a certain pixel (j, x0), if formula (1) is satisfied, it can be determined that the pixel (j, x0) is a pixel having similar texture to the first pixel (y0, x 0); if formula (1) is not satisfied, it can be determined that the pixel (j, x0) is a pixel having a non-similar texture to the first pixel (y0, x 0); when it is confirmed that a certain pixel (j, x0) is a pixel having a non-similar texture, the expansion of the continuous pixels distributed upward in the longitudinal direction is terminated.

A second expansion step: the low-texture content is longitudinally distributed and downwardly expanded, and whether the gray difference value between one pixel point and the last expanded pixel point is smaller than a preset threshold value can be judged through the following formula (2).

max(|r(j,x0)-r(j-1,x0)|,|g(j,x0)-g(j-1,x0)|,|b(j,x0)-b(j-1,x0)|)<(2)

Wherein, j is an ordinate value and satisfies j ═ L +2, L +3, …, and 2L +1, which are preset threshold values. At this time, the coordinate (j, x0) represents a pixel point distributed downward along the longitudinal direction, and the coordinate (j-1, x0) represents a last expandable pixel point distributed along the longitudinal direction. For a certain pixel (j, x0), if formula (2) is satisfied, it can be determined that the pixel (j, x0) is a pixel having similar texture to the first pixel (y0, x 0); if formula (2) is not satisfied, it can be determined that the pixel (j, x0) is a pixel having a non-similar texture to the first pixel (y0, x 0); when it is confirmed that a certain pixel (j, x0) is a pixel having a non-similar texture, the expansion of the continuous pixels distributed downward in the longitudinal direction is terminated.

A third expandable step: the low texture content is laterally distributed and expanded leftwards, and whether the gray difference value between one pixel point and the last expanded pixel point is smaller than a preset threshold value can be judged through the following formula (3).

max(|r(jˊ,i)-r(jˊ,i+1)|,|g(jˊ,i)-g(jˊ,i+1)|,|b(jˊ,i)-b(jˊ,i+1)|)<(3)

For example, j' may be a pixel point (j, x0) having a similar texture between the confirmed first pixel point (y0, x0) and the confirmed first pixel point (y0, x0) in the above process. At this time, the coordinate (j ', i) represents a pixel point which is distributed leftwards along the transverse direction, and the coordinate (j', i +1) represents a last expandable pixel point which is distributed along the transverse direction. For a certain pixel point (j ', i), if formula (3) is satisfied, it can be determined that the pixel point (j', i) is a pixel point having a similar texture to the first pixel point (y0, x 0); if formula (3) is not satisfied, it can be determined that the pixel point (j', i) is a pixel point having a non-similar texture to the first pixel point (y0, x 0); when it is determined that a certain pixel point (j', i) is a pixel point with a non-similar texture, the expansion of the continuous pixel points which are distributed leftward and in the transverse direction is terminated.

A fourth expanding step: the low texture content is distributed along the transverse direction and expanded to the right, and whether the gray difference value between one pixel point and the last expanded pixel point is smaller than a preset threshold value can be judged through the following formula (4).

max(|r(jˊ,i)-r(jˊ,i-1)|,|g(jˊ,i)-g(jˊ,i-1)|,|b(jˊ,i)-b(jˊ,i-1)|)<(4)

For a preset threshold, i is an abscissa value and satisfies that i ═ R +2, R +3, …, 2R +1, j 'is each pixel point with a similar texture obtained in the longitudinal direction, for example, j' may be a pixel point (j, x0) with a similar texture between the first pixel point (y0, x0) that has been confirmed in the above process. At this time, the coordinate (j ', i) represents one pixel point distributed along the transverse direction to the right, and the coordinate (j', i-1) represents the last expandable pixel point distributed along the transverse direction. For a certain pixel point (j ', i), if formula (4) is satisfied, it can be determined that the pixel point (j', i) is a pixel point having a similar texture to the first pixel point (y0, x 0); if formula (4) is not satisfied, it can be determined that the pixel point (j', i) is a pixel point having a non-similar texture to the first pixel point (y0, x 0); when it is determined that a certain pixel point (j', i) is a pixel point with a non-similar texture, the expansion of the continuous pixel points distributed rightward and in the transverse direction is terminated.

Further, a fifth step may also be included. In the low texture expansion process distributed in the transverse direction, if a certain pixel point (j ', i) distributed in the transverse direction is a pixel point having a texture similar to that of the first pixel point (y0, x0), upward or downward expansion distributed in the longitudinal direction may be continuously performed from the certain pixel point (j', i), and the upward or downward expansion process distributed in the longitudinal direction may refer to the first step and the second step, which will not be described here.

In step S230, a first texture region is formed by using each pixel point having a similar texture to the first pixel point, and each pixel point in the first texture region is marked as a first value (for example, 1).

In step S220, each pixel point having similar texture to the first pixel point in the first defined region can be found, and the first texture region is formed by using the pixel points. For example, fig. 10 shows the result of pixel expansion from fig. 9, and the region D1 in fig. 10 contains pixels with similar textures, so the region D1 is the first texture region.

For example, when the above formula (1), (2), (3) or (4) satisfies the threshold limit condition, the pixel point (j, x0) or the pixel point (j', i) participating in the formula calculation is a pixel point with similar texture. At this time, the pixel points can be marked, so that

mask(j,x0)＝1；mask(jˊ,i)＝1；

Wherein, mask is a mask function commonly used in image processing.

It should be noted that, since the pixel point marked as the first value and the first pixel point have similar textures, the first pixel (y0, x0) can obtain the same marking result, i.e., mask (y0, x0) ═ 1.

In step S240, a region outside the first texture region and inside the first limited region is regarded as an unexpanded region, and each pixel point in the unexpanded region is marked as a second value (for example, 0).

For example, when the above formula (1), (2), (3) or (4) does not satisfy the threshold limit condition, the pixel point (j, x0) or the pixel point (j', i) participating in the formula calculation is a pixel point that does not have similar texture. At this time, the pixel points can be marked, so that

mask(j,x0)＝0；mask(jˊ,i)＝0。

It should be noted that, for example, the region D2 in fig. 10 is an unexpanded region outside the first texture region D1, and each pixel point in the unexpanded region may be marked as a second value (for example, 0).

Step S250, a third image is generated according to each pixel point marked as the first value and each pixel point marked as the second value in the first image.

For example, in fig. 10, each pixel in the first texture region D1 is labeled as a first value, and each pixel in the unexpanded region D2 is labeled as a second value, so that a third image can be generated by using the first texture region D1 and the second texture region D3.

It should be noted that, in the third image, each pixel point marked as the first value in the first texture region D1 has the same gray value as its original pixel point, that is, the color of the pixel points remains unchanged; however, each pixel point marked as the second value has a gray value of full black, that is, the gray values of the pixel points on the color channel are all 0 and are expressed as full black, for the pixel point (j, x0) marked as the second value, r (j, x0) ═ g (j, x0) ═ b (j, x0) ═ 0 is satisfied, and for the pixel point (j ', i) marked as the second value, r (j', i) ═ g (j ', i) ═ b (j', i) ═ 0 is satisfied.

In the present embodiment, the above step S300 mainly relates to the process of forming the second texture region in the second image and obtaining the fourth image, and by referring to fig. 3, the step specifically includes steps S310 to S350, which are respectively described as follows.

Step S310, correspondingly setting a second pixel point in the second image according to the position of the first pixel point in the first image, and using the second pixel point as a center point to establish a second limited region.

It should be noted that, because the image contents between the first image and the second image are substantially the same, and only the difference in parallax exists, the setting position of the second pixel in the second image should be at the setting position corresponding to the first pixel in the first image, which is beneficial to the expansion of the similar texture of each pixel in the same type of low-texture content in the second image. If any pixel in the first image is represented by coordinates (y, x), the corresponding pixel in the second image is represented by coordinates (y, x-d), where d represents parallax.

For the process of establishing the second defined area by using the second pixel point as the central point, reference may be specifically made to the step S210 described above. The shape and size of the second defined region may be the same as the first defined region, for example, the height of the second defined region is set to be 2 × L +1, the width is set to be 2 × R +1, and the coordinates of the second pixel point are (y '0, x' 0) ═ L +1, R + 1.

Step S320, in the range of the second limited region, sequentially determining similar textures of longitudinally distributed and transversely distributed continuous pixels from the second pixel, to obtain each pixel having a similar texture with the second pixel.

For the process of obtaining each pixel point having similar texture to the second pixel point within the second limited range, reference may be specifically made to step S220 described above, which is not repeated herein.

Step S330, a second texture region is formed by using each pixel point having a similar texture to the second pixel point, and each pixel point in the second texture region is marked as a first value (for example, 1).

For the process of forming the second texture region and marking each pixel point therein as the first value, reference may be specifically made to step S230 described above, which is not described herein again.

Step S340, using a region outside the second texture region and inside the second limited region as an unexpanded region, and marking each pixel point in the unexpanded region as a second value (for example, 0).

For the process of obtaining the unexpanded region and marking each pixel point in the unexpanded region as the second value, reference may be specifically made to step S240 described above, which is not described herein again.

Step S350, a fourth image is generated according to each pixel point marked as the first value and each pixel point marked as the second value in the second image.

Regarding the process of generating and obtaining the fourth image, reference may be made to the step S210, but it should be noted that in the fourth image, each pixel point marked as the first value has the same gray value as its original pixel point, that is, the color of the pixel points remains unchanged; each pixel point marked as the second value has a gray value of full black, that is, the gray values of the pixel points on the color channel are all 0 and are expressed as full black.

In this embodiment, the step S400 mainly involves a process of performing cost aggregation on any pixel point in the third image or the fourth image to obtain the optimal disparity value, and as shown in fig. 4, the step specifically includes steps S410 to S430, which are respectively described as follows.

Step S410, obtaining any one pixel point from the third image or the fourth image and setting the pixel point as a third pixel point.

It should be noted that the third image and the fourth image both have a texture region and an unexpanded region, and because of the stereo matching condition of low texture content in the image mainly concerned by the user, it is preferable to select a pixel point from the first texture region of the third image or the second texture region of the fourth image, and set the selected pixel point as the third pixel point.

Step S420, performing cost aggregation on the third pixel point according to a preset cost function and a plurality of preset disparity values, to obtain a cost aggregation function corresponding to each disparity value.

In this embodiment, the cost function is a cost function corresponding to a color, a gradient, rank, or NCC; the disparity value can be represented by d and has a value range of {0,1_maxIn which d is_maxRepresenting the maximum allowed value of the disparity value.

In an embodiment, for each cost function, a function value of each disparity value at the third pixel point under the cost function is calculated, and the function values of the disparity values at the pixel points are aggregated to obtain a cost aggregation function corresponding to the cost function.

It should be noted that the cost function in the present application includes, but is not limited to, cost functions corresponding to color, gradient, rank, NCC, or mutual-information; wherein, the cost function with respect to color can be referred to technical literature "IEEE Transactions on Pattern Analysis and machine Intelligence,1994, Vol.16(9), pp.920-932 CrossRef", and the cost function with respect to gradient can be referred to technical literature "Yanxin, an image matching algorithm based on gradient operators [ J ]. electronic bulletin, 1999(10): 30-33"; the cost function related to rank can be referred to in the technical literature "A constraint to improve the reliability of the ranking using the ranking transform: Acoustics, Speech, and SignalProcessing,1999.on 1999IEEE International Conference,1999[ C ]", and the cost function related to NCC can be referred to in the technical literature "blog article, image processing based on NCC template matching identification, query address https:// blog.csdn.net/jia20003/article/details/48852549, which considers NCC as an algorithm for statistically calculating the correlation between two sets of sample data". Since all the enumerated cost functions belong to the prior art, they are not described one by one here. Furthermore, it should be understood by those skilled in the art that, as the technology develops, some other types of cost functions may appear in the future, and these cost functions appearing in the future may still be applied to the technical solution disclosed in the present embodiment, and do not limit the technical solution of the present embodiment.

Step S430, obtaining the optimal disparity value of the third pixel point from each disparity value by using the cost aggregation function corresponding to each disparity value.

In a specific embodiment, a cost aggregation function corresponding to each parallax value is calculated and is denoted as C (y, x, d), a substituted parallax value when the minimum function value is satisfied is obtained, and the parallax value is taken as an optimal parallax value; and y and x are respectively a longitudinal coordinate value and an abscissa coordinate value of the third pixel point. If the optimal disparity value is denoted by d, the maximum value formula d ═ arg max C (y, x, d) can be used to calculate the optimal disparity value d for the third pixel (y, x).

The minimum matching cost and the corresponding parallax value of the pixel point can be obtained by adopting the existing method, and the minimum matching cost and the corresponding parallax value of the pixel point can also be obtained by adopting a method appearing in the future, which is not limited here.

For example, for the region algorithm, after the superposition of the matching costs is completed, the parallax can be easily obtained, and only a point with the optimal superposition matching cost (the SAD and the SSD take the minimum value, or the NCC takes the maximum value) needs to be selected as a corresponding matching point in a certain range, for example, the Winner is the waner-take-all (WTA). For a global algorithm, the original matching cost is directly processed, an energy evaluation function is generally given first, then the minimum value of energy is obtained through different optimization algorithms, and meanwhile the minimum matching cost of each pixel point and a parallax value corresponding to the minimum matching cost can be calculated.

For another example, for most stereo matching algorithms, the computed disparity is some discrete specific integer values, which can meet the accuracy requirements of general applications. However, in some situations with higher precision requirements, such as accurate three-dimensional reconstruction, some measures need to be taken to refine the parallax after the initial parallax is acquired, such as curve fitting of matching cost, image filtering, image segmentation, and the like, so that some matching cost with higher precision and corresponding parallax value can be acquired.

In a specific embodiment, the minimum matching cost and the corresponding parallax value of any pixel point can be obtained by an extremum checking method. Obtaining a cost aggregation function of a plurality of parallax values according to the existing cost aggregation processing method, calculating a minimum value (namely, matching cost) corresponding to each parallax value under the cost aggregation function, and performing ascending sequence sequencing on the matching costs to obtain the minimum matching cost and a corresponding parallax value; another method is that after the matching cost corresponding to each disparity value is obtained through the cost aggregation function, the left and right consistency check of the left and right images is performed on the disparity value corresponding to each matching cost, and the minimum matching cost and the disparity value corresponding to the minimum matching cost are selected from the results passing the check. If the minimum matching cost of a pixel is represented by c, the confidence of the pixel can be represented as exp (-c). In addition, some errors may exist in the parallax value of each pixel point in the image obtained by the conventional method at the minimum matching cost, which may cause inaccuracy of the parallax value of an individual pixel point.

For the process of obtaining the cost aggregation function corresponding to each disparity value and obtaining the optimal disparity value of the third pixel point, specific reference may be made to the previously applied patent documents.

For example, in patent document CN2019101607989, a binocular vision stereo matching method and system based on matching cost weighting are disclosed, and when the above steps S410 to S430 are implemented, the third image and the fourth image related to the present application may be respectively used as the first image and the second image in the patent document, so that the optimal disparity value of the third pixel point is calculated by using the binocular vision stereo matching method disclosed in the patent document.

For example, in patent document CN2019101614342, a binocular vision stereo matching method based on weighted voting and a system thereof are disclosed, and when the steps S410 to S430 are implemented, the third image and the fourth image related to the present application can be respectively used as images under two viewpoints in the patent document, so that the optimal disparity value of the third pixel point is calculated by using the binocular vision stereo matching method disclosed in the patent document.

It can be understood by those skilled in the art that the binocular vision stereo matching method disclosed in the present embodiment can be applied to achieve the following technical advantages: (1) the first image and the second image under two viewpoints are respectively subjected to similar texture expansion to respectively obtain a third image and a fourth image which comprise specific texture areas, so that the correlation coefficient between the third image and the fourth image can be improved, the defect that the low-texture content matching process between the two images is easy to generate errors is overcome, and the performance of a binocular vision stereo matching algorithm based on a traditional method is greatly improved; (2) similar texture expansion is carried out on other pixel points distributed around the first image and the second image from the set pixel points, a specific texture region containing structural information can be obtained through the expansion method, and therefore a third image and a fourth image required by binocular vision stereo matching are easily formed according to the specific texture region, so that the structural information, the color information and the texture information contained in the images are used for participating in the stereo matching process together, a more accurate cost aggregation calculation result is obtained, and the accuracy of stereo matching is improved; (3) respectively establishing limited areas in the first image and the second image, and obtaining each pixel point with similar texture between the pixel points and the set pixel point through judgment of the similar texture in the limited areas, so that the pixel expansion range can be standardized in the first image and the second image respectively, a specific texture area with higher similarity can be obtained, and the correlation coefficient between the images and the matching cost of subsequent stereo matching processing can be improved; (4) since cost aggregation is performed on the third pixel points acquired from the third image or the fourth image according to the preset cost function and the preset multiple parallax values, the cost aggregation function with higher robustness can be obtained through cost aggregation, and the optimal parallax value with higher accuracy can be obtained from all the parallax values.

Example II,

The technical solution provided in this embodiment is an improvement of the binocular vision stereo matching method in the first embodiment, and after the first image and the second image at two viewpoints are obtained and before the first pixel point is set in the first image, the method further includes a process of performing image mirror image extension processing on the first image and the second image respectively.

Referring specifically to fig. 5, the image mirror extension process is performed on the first image and the second image between step S100 and step S200, and is represented by steps S510-S520.

Step S510, after the step S100 is finished, the step may be directly performed, and the edge pixel points of the first image are taken as symmetry axes, and the mirror image extension processing along the symmetry axes is performed on the first image, so as to extend the first image to the periphery to a preset height and a preset width.

For example, for the first image shown in fig. 7, performing a mirror extension process on the first image may result in the image shown in fig. 8. In fig. 8, dotted lines a, b, c, d are all symmetry axes set at edge pixel points of the first image, and mirror image extension processing along the symmetry axes can be realized by the dotted line a, extending upward to a preset height; mirror image extension processing along the symmetry axis can be realized through the dotted line b, and the mirror image extension processing extends downwards to a preset height; mirror image extension processing along the symmetry axis can be realized through the dotted line c, and the mirror image extension processing extends leftwards to a preset width; a mirror-image expansion process along this axis of symmetry can be realized by means of the dashed line d, extending to the right to a preset height.

It should be noted that the preset height is greater than or equal to a half height value (such as the value L in the above step S220) of the first limited region (such as the region P in fig. 8), and the preset width is greater than or equal to a half width value (such as the data R in the above step S220) of the first limited region. Through the mirror image extension processing of the first image, even if the first pixel point is set at the edge of the original first image, the first limited area can be ensured to be always in the image range of the extended first image, and the situation that the first limited area cannot be established is avoided.

Step S520, taking each edge pixel point of the second image as a symmetry axis, and performing mirror image extension processing on the second image along the symmetry axis to extend the second image to the periphery to a preset height and a preset width.

With respect to the process of performing the mirror-image extension process along the symmetry axis on the second image, reference may be made specifically to step S510 described above. It should be noted that, in order to ensure consistency in the mirror image extension processing performed on the first image and the second image, the sizes of the second defined area and the first defined area may be required to be consistent. Then, the second image extends to a height greater than or equal to half the height of the second defined area (e.g., the value L in step S220) and extends to a width greater than or equal to half the width of the second defined area (e.g., the data R in step S220).

Referring to fig. 5, step S200 may be entered directly after step S520 is completed.

It can be understood by those skilled in the art that the binocular vision stereo matching method disclosed in the present embodiment can be applied to achieve the following technical advantages: (1) before the expansion of the similar texture of the first image and the second image, mirror image extension processing along the symmetry axis is carried out on the images, so that the expandable range in the first image and the second image is increased, and the similar texture expansion of surrounding pixels can be easily realized even by utilizing edge pixel points of the original image; (2) the technical scheme of the application can effectively solve the problem of mismatching during stereo matching, is favorable for accurately finding matched corresponding points in different viewpoint images, and improves the precision of stereo matching.

Those skilled in the art will also appreciate that the texture regions in the third and fourth images have boundary shape information of flat low texture content, facilitating the calculation of the similarity (i.e., NCC) of the two texture regions. According to the expansion result, the color in the texture area is the color and the texture in the original expansion area, and the non-expansion area is full black. A first texture region represented by a pixel (y, x) can be expanded in a left image of binocular vision, a second texture region represented by a corresponding pixel (y, x-d) can be expanded in a right image of binocular vision, if pixel points in the left image and the right image cannot be matched, the shapes of the expanded texture regions are different, and therefore when the correlation coefficient is calculated by adopting NCC, a larger matching error (namely a smaller correlation coefficient NCC) can be obtained; conversely, in the corresponding matching, when the expanded texture regions have the same shape, a smaller matching error (i.e., a larger correlation coefficient NCC) can be obtained, thereby overcoming the defect that the low-texture content matching is easy to generate errors, and further greatly improving the performance of the binocular vision stereo matching algorithm based on the traditional method.

Example III,

On the basis of implementing the binocular vision stereo matching method in the first embodiment or the second embodiment, the present embodiment further provides an image stereo matching method, please refer to fig. 3, the image vision stereo matching method includes steps S210-S220, which are respectively described below.

In step S210, images of at least two viewpoints are acquired. In one embodiment, the stereo matching object may be imaged by a plurality of cameras, such that images from a plurality of viewpoints may be obtained.

Step S220, performing similar texture expansion on each image by using the binocular vision stereo matching method disclosed in the first embodiment or the first embodiment, and performing stereo matching on each pixel point in an expanded image to obtain an optimal disparity value of each pixel point respectively.

Those skilled in the art can understand that the binocular vision stereo matching method in the first embodiment or the second embodiment obtains the optimal disparity value of one pixel point in an image, and a matching corresponding point in another image can be found according to the optimal disparity value, so that the optimal disparity values of all pixel points in the image can be continuously calculated according to the method, and thus, the one-to-one stereo matching of the pixel points between two or more images can be realized, and the effect of stereo matching of the images can be further achieved.

Example four,

Referring to fig. 6, on the basis of implementing the binocular vision stereo matching method disclosed in the first embodiment or the second embodiment or implementing the image stereo matching method in the third embodiment, the present application further discloses a stereo matching device 6, and the stereo matching device 6 may include a memory 61 and a processor 62 which are connected by signals, which are respectively described below.

The memory 61 is used to store programs.

The processor 62 is configured to execute the program stored in the memory 61 to implement the binocular vision stereo matching method disclosed in the first embodiment or the second embodiment, or to implement the image stereo matching method disclosed in the third embodiment.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A binocular vision stereo matching method based on regional feature extraction is characterized by comprising the following steps:

acquiring a first image and a second image under two viewpoints;

setting a first pixel point in the first image, expanding similar textures of other pixel points distributed around the first pixel point to form a first texture area, and obtaining a third image by using the first texture area and an unexpanded area outside the first texture area;

setting a second pixel point in the second image, starting to expand similar textures of other pixel points distributed around from the second pixel point to form a second texture area, and obtaining a fourth image by using the second texture area and an unexpanded area outside the second texture area;

and performing cost aggregation on any pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset parallax values so as to obtain the optimal parallax value of the pixel point from each parallax value.

2. The binocular vision stereo matching method of claim 1, wherein a first pixel point is set in the first image, similar texture expansion is performed on other pixel points distributed around the first pixel point from the first pixel point to form a first texture region, and a third image is obtained by using the first texture region and an unexpanded region outside the first texture region, comprising:

determining low-texture content in the first image, setting a first pixel point in the low-texture content, and taking the first pixel point as a central point to establish a first limited area;

in the range of the first limited area, sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixels from the first pixel to obtain each pixel with the similar texture with the first pixel;

forming a first texture area by using each pixel point with similar texture with the first pixel point, and marking each pixel point in the first texture area as a first value;

taking a region outside the first texture region and inside the first defined region as an unexpanded region, and marking each pixel point in the unexpanded region as a second value;

and generating a third image according to each pixel point marked as a first value and each pixel point marked as a second value in the first image.

3. The binocular vision stereo matching method of claim 2, wherein the step of sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixel points from the first pixel point within the range of the first limited region to obtain each pixel point having a similar texture with the first pixel point comprises:

sequentially expanding continuous pixels distributed along the longitudinal direction from the first pixel, and when the gray difference value between one pixel and the last expanded pixel is judged to be smaller than a preset threshold value, determining that the pixel is a pixel with similar texture to the first pixel until the next pixel in the longitudinal direction is determined to be a pixel with non-similar texture;

and sequentially expanding continuous pixels distributed along the transverse direction from each pixel point which is obtained in the longitudinal direction and is similar to the texture, and determining that the gray value between one pixel point and the last expanded pixel point is smaller than the threshold value, and determining the pixel point which is similar to the texture between the pixel point and the first pixel point until determining that the next pixel point in the transverse direction is a pixel point which is not similar to the texture.

4. The binocular vision stereo matching method of claim 2, wherein the setting of the second pixel point in the second image, the expansion of similar texture from the second pixel point to other pixel points distributed around to form a second texture region, and the obtaining of the fourth image using the second texture region and an unexpanded region outside the second texture region, comprises:

correspondingly setting a second pixel point in the second image according to the position of the first pixel point in the first image, and taking the second pixel point as a central point to establish a second limited area;

in the range of the second limited area, sequentially judging similar textures of longitudinally distributed and transversely distributed continuous pixel points from the second pixel point to obtain each pixel point with the similar texture between the second pixel point and the similar texture;

forming a second texture region by using each pixel point with similar texture between the second pixel point, and marking each pixel point in the second texture region as a first value;

taking the region outside the second texture region and inside the second limited region as an unexpanded region, and marking each pixel point in the unexpanded region as a second value;

and generating a fourth image according to the pixel points marked as the first value and the pixel points marked as the second value in the second image.

5. The binocular vision stereo matching method of claim 4, wherein in the third image and the fourth image, each pixel point marked as the first value has the same gray value as its original pixel point, and each pixel point marked as the second value has a gray value of full black.

6. The binocular vision stereo matching method of claim 4, further comprising, after acquiring the first image and the second image at two viewpoints and before setting the first pixel point in the first image:

taking each edge pixel point of the first image as a symmetry axis, and performing mirror image extension processing on the first image along the symmetry axis so as to extend the first image to the periphery to a preset height and a preset width;

taking each edge pixel point of the second image as a symmetry axis, and performing mirror image extension processing on the second image along the symmetry axis so as to extend the second image to the periphery to a preset height and a preset width;

the preset height is greater than or equal to a half height value of the first limited area, and the preset width is greater than or equal to a half width value of the first limited area; the second defined area is the same size as the first defined area.

7. The binocular vision stereo matching method of any one of claims 1 to 6, wherein the cost aggregation of any one pixel point in the third image or the fourth image according to a preset cost function and a plurality of preset disparity values to obtain an optimal disparity value of the pixel point from each of the disparity values comprises:

acquiring any one pixel point from the third image or the fourth image and setting the pixel point as a third pixel point;

performing cost aggregation on the third pixel points according to a preset cost function and a plurality of preset parallax values to obtain a cost aggregation function corresponding to each parallax value;

and obtaining the optimal parallax value of the third pixel point from each parallax value by using a cost aggregation function corresponding to each parallax value.

8. The binocular vision stereo matching method of claim 7, wherein the cost function is a cost function corresponding to a color, a gradient, rank or NCC; the disparity value is represented by d and has a value range of {0,1_maxIn which d is_maxRepresenting the maximum allowed value of said disparity value.

9. The binocular vision stereo matching method of claim 8, wherein the obtaining the optimal disparity value of the third pixel point from each disparity value by using the cost aggregation function corresponding to each disparity value comprises:

calculating a cost aggregation function corresponding to each parallax value, wherein the cost aggregation function is represented as C (y, x, d), obtaining a substituted parallax value when the minimum function value is met, and taking the parallax value as an optimal parallax value; and y and x are respectively a longitudinal coordinate value and an abscissa coordinate value of the third pixel point.

10. A computer-readable storage medium characterized by comprising a program executable by a processor to implement the binocular vision stereo matching method of any one of claims 1 to 9.