CN116580072A - Cost aggregation calculation method and device for stereo matching - Google Patents

Cost aggregation calculation method and device for stereo matching Download PDF

Info

Publication number
CN116580072A
CN116580072A CN202310365491.9A CN202310365491A CN116580072A CN 116580072 A CN116580072 A CN 116580072A CN 202310365491 A CN202310365491 A CN 202310365491A CN 116580072 A CN116580072 A CN 116580072A
Authority
CN
China
Prior art keywords
cost
volume structure
aggregation
block
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310365491.9A
Other languages
Chinese (zh)
Inventor
刘钢
赵昀
钱刃
李建华
杨文帮
丘文峰
赵勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Xuzhigu Intelligent Technology Co ltd
Original Assignee
Suzhou Xuzhigu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Xuzhigu Intelligent Technology Co ltd filed Critical Suzhou Xuzhigu Intelligent Technology Co ltd
Priority to CN202310365491.9A priority Critical patent/CN116580072A/en
Publication of CN116580072A publication Critical patent/CN116580072A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The application discloses a cost aggregation calculation method and a device for stereo matching. Wherein the second and third cost volume structures are obtained using a fully connected neural network technique. The full-connection neural network is adopted to replace three-dimensional convolution calculation, so that the difference between different pixels between the left image and the right image is not lost, and the problem that the neural network has technical defects in matching of a thin structure area and a repeated texture area at the present stage is solved.

Description

Cost aggregation calculation method and device for stereo matching
Technical Field
The application relates to the technical field of machine vision stereo matching, in particular to a cost aggregation calculation method and device for stereo matching.
Background
Stereo matching is also known as disparity estimation (disparity estimation), or binocular depth estimation. Input is two images (left image I l And right image I r ) The output is a disparity map d composed of disparity values corresponding to each pixel in a reference image (typically, a left image is taken as a reference image). Referring to fig. 1, a schematic view of disparity map acquisition is shown, where disparity is a pixel level difference between positions of corresponding points in left and right images of a certain point in a three-dimensional scene, and after a disparity map d is acquired, a depth map can be acquired according to a depth acquisition formula, where the depth acquisition formula is as follows:
z=(b×f)/d;
where f is the focal length of the camera lens, b is the distance between the centers of the two cameras, d is the parallax, and z is the depth value of the parallax d of the corresponding pixel on the left and right images by prediction. How to accurately and quickly predict parallax under limited computing resources through a given pair of corrected stereo images is a core problem in stereo matching computation.
Disclosure of Invention
The application mainly solves the technical problem of how to improve the technical problem of poor capability of the neural network in matching left and right images in a fine texture region and a repeated texture region.
According to a first aspect, in one embodiment, a cost aggregation method for stereo matching is provided, including an aggregation calculation, where the aggregation calculation includes:
acquiring a second cost volume structure; the obtaining process of the second cost volume structure comprises the following steps:
the cost volume structure obtained in the stereo matching process is used as a first cost volume structure, and the first cost volume structure is divided into 4n cost blocks with the same size according to a preset first block dividing mode; each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point; wherein n and m are integers greater than 0;
flattening 2m cost points of each cost block into a one-dimensional first block sequence according to a preset first flattening sequence;
inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model; each first block sequence corresponds to one first eigenvector sequence;
recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing a cost volume structure according to the first partitioning mode to obtain a second cost volume structure;
acquiring a third price roll structure; the obtaining process of the third cost volume structure comprises the following steps:
dividing the first cost volume structure into 4p cost blocks with the same size according to a preset second partitioning mode, wherein each cost block comprises at least 2q cost points; wherein p and q are integers greater than 0; the first block division mode and the second block division mode are different;
flattening 2p cost points of each second cost block into a one-dimensional second block sequence according to a preset second flattening sequence;
inputting each second block sequence into a preset second full-connection aggregation neural network model according to a preset second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model; each of the second block sequences corresponds to one of the second feature vector sequences;
recovering the second feature vector sequence into cost blocks according to the second flattening sequence, and reconstructing a cost volume structure according to the second block division mode to obtain a third cost volume structure;
acquiring an aggregation calculation result; the acquisition process of the aggregation calculation result comprises the following steps:
and fusing the first cost volume structure, the second cost volume structure and the third cost volume structure, and outputting the cost volume structure obtained after fusion as a result of cost aggregation calculation.
In an embodiment, the first fully connected polymeric neural network model and the second fully connected polymeric neural network model each include at least two fully connected layers.
In an embodiment, the first fully connected and aggregated neural network model and the first fully connected and aggregated neural network model are obtained by training the calibrated Sceneflow data, the KITTI2015 data and/or the KITTI016 data as training sets.
In an embodiment, the model parameters of the first fully connected polymeric neural network model and the first fully connected polymeric neural network model are the same.
In an embodiment, the first fully connected polymeric neural network model and the first fully connected polymeric neural network model do not set bias terms of the fully connected layers.
In an embodiment, the first partitioning method and the second partitioning method are different, including:
the second block division mode is to offset each cost block in the first block division mode according to a preset offset; the offset is not greater than the cost block.
In one embodiment, the cost aggregation method further includes:
completing at least two polymerization calculations;
the input of the first aggregation calculation is a cost volume structure obtained in the stereo matching process; the input of the next aggregation calculation is the output of the last aggregation calculation, and the cost volume structure output by the last aggregation calculation is output as the result of cost aggregation.
In an embodiment, the cost aggregation method further includes:
and obtaining a parallax image according to the cost volume structure obtained by the last aggregation calculation.
According to a second aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement a cost aggregation method as described above.
According to a third aspect, in one embodiment, a cost aggregation device for stereo matching is provided, and the cost aggregation device is used for performing cost aggregation calculation by applying the cost aggregation method as described above, where the cost aggregation device includes a fully-connected aggregation module;
the full-connection aggregation module comprises a first generation volume structure acquisition unit, a second cost volume structure acquisition unit, a third cost volume structure acquisition unit and an aggregation calculation result acquisition unit;
the first generation volume structure acquisition unit is used for acquiring a cost volume structure in the stereo matching process and outputting the cost volume structure as a first generation volume structure to the second cost volume structure acquisition unit, the third cost volume structure acquisition unit and the aggregation calculation result acquisition unit;
the second cost volume structure acquisition unit is used for acquiring a second cost volume structure; the obtaining process of the second cost volume structure comprises the following steps:
dividing the first cost volume structure into 4n cost blocks with the same size according to a preset first block dividing mode; each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point; wherein n and m are integers greater than 0;
flattening 2m cost points of each cost block into a one-dimensional first block sequence according to a preset first flattening sequence;
inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model; each first block sequence corresponds to one first eigenvector sequence;
recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing a cost volume structure according to the first partitioning mode to obtain a second cost volume structure;
the third price roll structure acquisition unit is used for acquiring a third price roll structure; the obtaining process of the third cost volume structure comprises the following steps:
dividing the first cost volume structure into 4p cost blocks with the same size according to a preset second partitioning mode, wherein each cost block comprises at least 2q cost points; wherein p and q are integers greater than 0; the first block division mode and the second block division mode are different;
flattening 2p cost points of each second cost block into a one-dimensional second block sequence according to a preset second flattening sequence;
inputting each second block sequence into a preset second full-connection aggregation neural network model according to a preset second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model; each of the second block sequences corresponds to one of the second feature vector sequences;
recovering the second feature vector sequence into cost blocks according to the second flattening sequence, and reconstructing a cost volume structure according to the second block division mode to obtain a third cost volume structure;
the aggregation calculation result acquisition unit is used for acquiring an aggregation calculation result; the acquisition process of the aggregation calculation result comprises the following steps:
and fusing the first cost volume structure, the second cost volume structure and the third cost volume structure, and outputting the cost volume structure obtained after fusion as a result of cost aggregation calculation.
According to the cost aggregation method of the embodiment, the full-connection neural network is adopted to replace three-dimensional convolution calculation, so that the difference between different pixels between the left image and the right image is not lost, and the technical problem that the neural network has defects in matching a thin structure area and a repeated texture area at the present stage is solved.
Drawings
Fig. 1 is a parallax map acquisition schematic diagram;
FIG. 2 is a schematic workflow diagram of a stereo matching system in one embodiment;
FIG. 3 is a flow chart of a stereo matching algorithm;
FIG. 4 is a schematic view of a disparity map acquisition in one embodiment;
FIG. 5 is a flow diagram of aggregate computation in one embodiment;
FIG. 6 is a schematic diagram of an architecture of a fully connected aggregated neural network model in one embodiment;
FIG. 7 is a schematic diagram of a framework of a cost aggregation apparatus in one embodiment;
fig. 8 is a graph of cost blocks obtained in two block manners in one embodiment.
Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.
The stereo matching technology is an important branch of computer vision, is also a key technology in many application scenes such as automatic driving, robot navigation and the like, and is a beneficial supplement for laser radar and automatic driving application. Please refer to fig. 2, which is a schematic diagram of a workflow of a stereo matching system in an embodiment, wherein the left and right images acquired by two image acquisition devices at the same time are firstly acquired, then calibrated according to the relative positions and acquisition parameters of the two image acquisition devices, then polar correction is performed according to the calibration values, and finally stereo matching calculation is performed by applying a stereo matching algorithm to acquire a parallax image. The method is used for optimizing the stereo matching algorithm, please refer to fig. 3, which is a flow diagram of the stereo matching algorithm, the traditional stereo matching algorithm adopts manual matching cost and regularization strategy, and the stereo matching method based on deep learning at the present stage obtains excellent performance, but has defects in matching of bad areas such as fine texture areas and repeated texture areas. The reason for this can be summarized as the strong inductive bias (i.e. locality and weight sharing) with three-dimensional convolution, which based networks can regularize computational cost and memory consumption cost volumes with few parameters. But the way in which all output neurons share the same convolution parameters loses the difference between different pixels in the thin-structure region and the repetitive texture region, which limits the ability of the neural network to match left-right image pairs in these bad regions (e.g., the thin-texture region and the repetitive texture region).
In an embodiment of the application, a fully connected neural network (MLP) is applied to stereo matching, and cost aggregation is adopted to replace three-dimensional convolution. First dividing the cost volume (cost volume construct) into fixed size blocks (dc, hc, wc); each block is then ordered in a flattened manner and the flattened sequence of blocks is embedded into the channel size to fuse inter-channel and intra-channel features using an MLP composed of a multi-layer fully connected network, i.e., to input heterogeneous features into the MLP. And finally reconstructing the fused characteristic vector to be a cost volume. The full-connection neural network is adopted to replace three-dimensional convolution calculation, so that the difference between different pixels between the left image and the right image is not lost, and the technical problem that the thin structure area and the repeated texture area have defects in matching at the present stage is solved.
Embodiment one:
referring to fig. 4, which is a schematic diagram of obtaining a disparity map in an embodiment, firstly, feature extraction is performed on left and right images through a feature extraction network with shared weights respectively to obtain a cost volume structure; then, carrying out cost aggregation calculation on the cost volume structure to obtain a primary parallax image; and performing parallax optimization on the primary parallax image according to the left image, and finally obtaining an optimized parallax image after optimization. The cost volume structure is a three-dimensional tensor and represents the cost value of each pixel point in the parallax map, wherein the first dimension represents the number of rows, the second dimension represents the number of columns, and the third dimension represents different parallax values. In the prior art, a convolution operation is performed on a cost volume structure, for example, a two-dimensional tensor convolution kernel is subjected to sliding window operation on the cost volume structure, the cost value in a window is calculated, and the cost value is assigned to a pixel point at a corresponding position in a parallax map, so that cost aggregation calculation is completed. The result of the cost aggregation calculation is to obtain a primary disparity map. In addition, the resolution and computational efficiency of the final disparity map can be controlled by varying the size of the convolution kernel and the step size of the convolution operation.
It is important to note that the fundamental purpose of cost aggregation computation is to enable cost values to accurately reflect correlations between pixels. The matching cost is calculated by considering local information only, and the cost value is calculated by pixel information in a window with a certain size in the neighborhood of two pixels, which is easily affected by image noise, and when the image is in a weak texture or repeated texture area, the cost value is very likely to not accurately reflect the correlation between pixels, and the direct expression is that the cost value of a real homonymy point is not minimum. The cost aggregation calculation establishes a connection between adjacent pixels, and optimizes the cost matrix by a certain criterion, such as that adjacent pixels should have continuous parallax values, the optimization is often global, and new cost values of each pixel under a certain parallax are recalculated according to cost values of the adjacent pixels under the same parallax value or near-parallax value. In practice, the cost aggregation calculation is similar to a parallax transmission step, the region matching effect of high signal-to-noise ratio is good, the initial cost can well reflect the correlation, the optimal parallax value can be obtained more accurately, the cost aggregation calculation is transmitted to the region with low signal-to-noise ratio and poor matching effect, and finally, the cost values of all images can accurately reflect the true correlation. Common cost aggregation calculation methods include a scanning line method, a dynamic programming method, a path aggregation method in an SGM algorithm and the like.
In an embodiment of the present application, the cost aggregation method includes an aggregation calculation, please refer to fig. 5, which is a schematic flow chart of the aggregation calculation in an embodiment, where the aggregation calculation includes:
step 101, obtaining a second cost volume structure.
Referring to fig. 6, an architecture diagram of a fully connected aggregated neural network model in an embodiment, the obtaining process of the second cost volume structure includes:
firstly, taking a cost volume structure obtained in a stereo matching process as a first cost volume structure, dividing the first cost volume structure into 4n cost blocks with the same size according to a preset first block dividing mode, wherein each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point. Wherein n and m are integers greater than 0.
Then, 2m cost points of each cost block are flattened into a one-dimensional first block sequence according to a preset first flattening sequence.
And inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model, wherein each first block sequence corresponds to one first feature vector sequence.
And finally, recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing the cost volume structure according to the first partitioning mode to obtain a second cost volume structure.
Step 102, obtaining a third price roll structure.
The obtaining process of the third cost volume structure comprises the following steps:
first, the first price volume structure is divided into 4p cost blocks with the same size according to a preset second block dividing mode, and each cost block comprises at least 2q cost points. Wherein p and q are integers greater than 0, and the first block division mode and the second block division mode are different.
Then, 2p cost points of each second cost block are flattened into a one-dimensional second block sequence according to a second flattening sequence.
And inputting each second block sequence into a preset second full-connection aggregation neural network model according to the second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model. Each second block sequence corresponds to a second feature vector sequence.
And finally, recovering the second characteristic vector sequence into a cost block according to the second flattening sequence, and reconstructing the cost volume structure according to a second block division mode to obtain a third generation cost volume structure.
And step 103, acquiring an aggregation calculation result.
The acquisition process of the aggregation calculation result comprises the following steps: and fusing the first price volume structure, the second price volume structure and the third price volume structure, and outputting the price volume structure obtained after the fusion as a result of price aggregation calculation.
In one embodiment, the aggregate computation further comprises:
step 104, at least two aggregation calculations are completed.
And sequentially performing aggregation calculation for a plurality of times on the first cost volume structure. The input of the first aggregation calculation is the cost volume structure acquired in the stereo matching process, the input of the next aggregation calculation is the output of the last aggregation calculation, and the cost volume structure output by the last aggregation calculation is used as the result output of cost aggregation.
Step 105, obtaining a disparity map.
And obtaining a parallax image according to the cost volume structure obtained by the last aggregation calculation.
Referring to fig. 7, a schematic diagram of a cost aggregation apparatus in an embodiment is shown, and in an embodiment of the present application, a cost aggregation apparatus is further disclosed, which is configured to perform cost aggregation calculation by applying the cost aggregation method as described above. The cost aggregation apparatus 100 includes a full connection aggregation module 1, and the full connection aggregation module 1 includes a first generation volume structure acquisition unit 10, a second cost volume structure acquisition unit 20, a third cost volume structure acquisition unit 30, and an aggregation calculation result acquisition unit 40.
The first volume structure obtaining unit 10 is configured to obtain a cost volume structure in the stereo matching process, and output the cost volume structure as a first cost volume structure to the second cost volume structure obtaining unit 20, the third cost volume structure obtaining unit 30, and the aggregate calculation result obtaining unit 40.
The second cost volume structure obtaining unit 20 is configured to obtain a second cost volume structure, where the obtaining process of the second cost volume structure includes:
firstly, dividing a first price volume structure into 4n cost blocks with the same size according to a preset first block dividing mode, wherein each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point. Wherein n and m are integers greater than 0.
Then, 2m cost points of each cost block are flattened into a one-dimensional first block sequence according to a preset first flattening sequence.
And inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model. Each first block sequence corresponds to a first feature vector sequence.
And finally, recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing the cost volume structure according to the first partitioning mode to obtain a second cost volume structure.
The third cost volume structure obtaining unit 30 is configured to obtain a third cost volume structure, where the obtaining process of the third cost volume structure includes:
first, the first price volume structure is divided into 4p cost blocks with the same size according to a preset second block dividing mode, and each cost block comprises at least 2q cost points. Wherein p and q are integers greater than 0, and the first block division mode and the second block division mode are different.
Then, 2p cost points of each second cost block are flattened into a one-dimensional second block sequence according to the second flattening sequence.
And inputting each second block sequence into a preset second full-connection aggregation neural network model according to a second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model, wherein each second block sequence corresponds to one second feature vector sequence.
And finally, recovering the second characteristic vector sequence into a cost block according to the second flattening sequence, and reconstructing the cost volume structure according to a second block division mode to obtain a third generation cost volume structure.
The aggregate calculation result acquiring unit 40 is configured to acquire an aggregate calculation result, and the acquiring process of the aggregate calculation result includes: and fusing the first price volume structure, the second price volume structure and the third price volume structure, and outputting the price volume structure obtained after the fusion as a result of price aggregation calculation.
As shown in fig. 4, in an embodiment of the present application, three cascaded full-connection aggregation modules are used to perform cost aggregation calculation on a cost volume structure, where each full-connection aggregation module includes two full-connection aggregation neural network models (a first full-connection aggregation neural network model and a second full-connection aggregation neural network model), and cost blocks obtained in two different block modes (a first block mode and a second block mode) are respectively input into the two full-connection aggregation neural network models. The first-order disparity map and the second-order disparity map are neural network model training processes for the fully-connected aggregation module.
In one embodiment, the first fully connected polymeric neural network model and the second fully connected polymeric neural network model each include at least two fully connected layers. In an embodiment, the first fully connected polymeric neural network model and the first fully connected polymeric neural network model are obtained by training the calibrated Sceneflow data, the KITTI2015 data and/or the KITTI016 data as training sets. In an embodiment, the model parameters of the first fully connected polymeric neural network model and the first fully connected polymeric neural network model are the same. In an embodiment, the first fully connected polymeric neural network model and the first fully connected polymeric neural network model do not set bias terms for the fully connected layers.
In an embodiment of the present application, the second partitioning method is to obtain each cost block in the first partitioning method by shifting according to a preset offset, where the offset is not greater than the cost block.
Referring to FIG. 8, a graph of cost blocks obtained in two different partitioning ways is shown in one embodiment, where two different partitioning ways are used to establish a connection between blocks, and two ways are used to partition a cost volume, the first is to partition from one of the cost volume vertices (dc, hc, wc) with size and abbreviated as a norm path, and the other is to shift from point (dc/2, hc/2, wc/2) and abbreviated as an offset path. This partitioning approach enables the proposed module to flexibly gauge cost of different resolutions. The weights of the first fully connected aggregated neural network model and the first fully connected aggregated neural network model are shared between cost blocks, which reduces the number of parameters of the MLP-based structure. Unlike convolution, the weights of the output tokens within a block are independent of each other. These strategies enable fully connected aggregation modules to have local connections and global weight sharing, and weights are independent in local spatial regions, with inductive bias weaker than convolution. Therefore, the pixels of the repeated texture area and the fine texture area can be independently processed, the difference information of the pixels is reserved, and accurate matching is achieved.
The following describes a flow chart of the cost aggregation method disclosed by the application through a specific embodiment.
A cost volume F of (F, D, H, W) is set to represent the logarithm of the image processed by each network, D represents the parallax dimension, H represents the image height, and W represents the image width.
The cost volume F is first divided into fixed size blocks (dc, hc, wc) in which there is no overlap. The same color patches means that they are located in the same position in different channels.
The tiles are then flattened pixel by pixel, each being reshaped into a one-dimensional embedding of length N, where n=dc×hc×wc.
These one-dimensional embeddings are then merged into the channel dimension, instead of creating a new separate dimension.
Therefore, the cost volume is:
C∈R F×D×H×W
is remolded as embedded:
E∈R L×(NF)
wherein, L is the block number, and the acquisition formula of L is:
L=(D/dc)×(H/hc)×(W/wc)。
thus, an MLP layer comprising multiple fully connected layers (typically two layers) is used to fuse heterogeneous features. The matching cost of the network may be adjusted pixel by pixel. Finally, the spatial relationship between pixels is restored synchronously by reconstructing the cost volume by reconstructing the output embedding of the MLP layer.
In one embodiment, the designed module can be formulated as follows:
wherein, →represents a remodeling operation,the parameters W1 εRc×cf and W2 εRcf×c are complete connection layers, while the parameter F means the spreading factor, LN and GELU reference layer normalization and Gaussian error linear units. In one embodiment, the bias term of the model full connection layer is omitted to simplify the model.
Because the cost volume is divided in a non-overlapping way, the blocks are independent, and information gaps exist between the blocks. The lack of information interaction between modules limits the modeling capabilities of the modules. To make up for this gap and establish information interaction between blocks, we partition the cost amounts in two ways simultaneously, the first is to partition the cost volume of the block size (dc, hc, wc) from one vertex, called the normal path in this embodiment, and the other is to start from the start of the normal path offset (dc/2, hc/2, wc/2), called this path in this embodiment, called the offset path. To reduce the number of parameters, the standard path and the offset path share the same weight. Finally, the cost volume calculated for these two paths is added directly to the skipped connection path. For each pixel in the cost volume, it has a local accepted field, consisting of two partially overlapping blocks.
According to the cost aggregation calculation method disclosed by the embodiment of the application, first, second and third cost volume structures are acquired according to the cost volume structures acquired in the stereo matching process, then the first, second and third cost volume structures are fused, and the fused acquired cost volume structures are output as the result of cost aggregation calculation. Wherein the second and third cost volume structures are obtained using a fully connected neural network technique. The full-connection neural network is adopted to replace three-dimensional convolution calculation, so that the difference between different pixels between the left image and the right image is not lost, and the problem that the neural network has technical defects in matching of a thin structure area and a repeated texture area at the present stage is solved.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims (10)

1. A cost aggregation method for stereo matching, comprising an aggregation calculation, the aggregation calculation comprising:
acquiring a second cost volume structure; the obtaining process of the second cost volume structure comprises the following steps:
the cost volume structure obtained in the stereo matching process is used as a first cost volume structure, and the first cost volume structure is divided into 4n cost blocks with the same size according to a preset first block dividing mode; each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point; wherein n and m are integers greater than 0;
flattening 2m cost points of each cost block into a one-dimensional first block sequence according to a preset first flattening sequence;
inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model; each first block sequence corresponds to one first eigenvector sequence;
recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing a cost volume structure according to the first partitioning mode to obtain a second cost volume structure;
acquiring a third price roll structure; the obtaining process of the third cost volume structure comprises the following steps:
dividing the first cost volume structure into 4p cost blocks with the same size according to a preset second partitioning mode, wherein each cost block comprises at least 2q cost points; wherein p and q are integers greater than 0; the first block division mode and the second block division mode are different;
flattening 2p cost points of each second cost block into a one-dimensional second block sequence according to a preset second flattening sequence;
inputting each second block sequence into a preset second full-connection aggregation neural network model according to a preset second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model; each of the second block sequences corresponds to one of the second feature vector sequences;
recovering the second feature vector sequence into cost blocks according to the second flattening sequence, and reconstructing a cost volume structure according to the second block division mode to obtain a third cost volume structure;
acquiring an aggregation calculation result; the acquisition process of the aggregation calculation result comprises the following steps:
and fusing the first cost volume structure, the second cost volume structure and the third cost volume structure, and outputting the cost volume structure obtained after fusion as a result of cost aggregation calculation.
2. The cost aggregation method of claim 1, wherein the first fully connected aggregated neural network model and the second fully connected aggregated neural network model each comprise at least two fully connected layers.
3. The cost aggregation method of claim 2, wherein the first fully connected aggregated neural network model and the first fully connected aggregated neural network model are obtained by training the calibrated Sceneflow data, the KITTI2015 data and/or the KITTI016 data as a training set.
4. The cost aggregation method of claim 2, wherein the model parameters of the first fully connected aggregated neural network model and the first fully connected aggregated neural network model are the same.
5. The cost aggregation method of claim 4, wherein the first fully connected aggregated neural network model and the first fully connected aggregated neural network model do not set bias terms for fully connected layers.
6. The cost aggregation method of claim 1, wherein the first partitioning scheme and the second partitioning scheme are different, comprising:
the second block division mode is to offset each cost block in the first block division mode according to a preset offset; the offset is not greater than the cost block.
7. The cost aggregation method of claim 1, further comprising:
completing at least two polymerization calculations;
the input of the first aggregation calculation is a cost volume structure obtained in the stereo matching process; the input of the next aggregation calculation is the output of the last aggregation calculation, and the cost volume structure output by the last aggregation calculation is output as the result of cost aggregation.
8. The cost aggregation method of claim 7, further comprising:
and obtaining a parallax image according to the cost volume structure obtained by the last aggregation calculation.
9. A computer readable storage medium having stored thereon a program executable by a processor to implement the cost aggregation method of any one of claims 1-8.
10. A cost aggregation device for stereo matching, which is characterized by being used for performing cost aggregation calculation by applying the cost aggregation method according to any one of claims 1-8, wherein the cost aggregation device comprises a fully-connected aggregation module;
the full-connection aggregation module comprises a first generation volume structure acquisition unit, a second cost volume structure acquisition unit, a third cost volume structure acquisition unit and an aggregation calculation result acquisition unit;
the first generation volume structure acquisition unit is used for acquiring a cost volume structure in the stereo matching process and outputting the cost volume structure as a first generation volume structure to the second cost volume structure acquisition unit, the third cost volume structure acquisition unit and the aggregation calculation result acquisition unit;
the second cost volume structure acquisition unit is used for acquiring a second cost volume structure; the obtaining process of the second cost volume structure comprises the following steps:
dividing the first cost volume structure into 4n cost blocks with the same size according to a preset first block dividing mode; each cost block comprises at least 2m cost points, and each cost point corresponds to one image pixel point; wherein n and m are integers greater than 0;
flattening 2m cost points of each cost block into a one-dimensional first block sequence according to a preset first flattening sequence;
inputting each first block sequence into a preset first full-connection aggregation neural network model according to a preset first input sequence, and obtaining a first feature vector sequence output by the first full-connection aggregation neural network model; each first block sequence corresponds to one first eigenvector sequence;
recovering the first characteristic vector sequence into cost blocks according to the first flattening sequence, and reconstructing a cost volume structure according to the first partitioning mode to obtain a second cost volume structure;
the third price roll structure acquisition unit is used for acquiring a third price roll structure; the obtaining process of the third cost volume structure comprises the following steps:
dividing the first cost volume structure into 4p cost blocks with the same size according to a preset second partitioning mode, wherein each cost block comprises at least 2q cost points; wherein p and q are integers greater than 0; the first block division mode and the second block division mode are different;
flattening 2p cost points of each second cost block into a one-dimensional second block sequence according to a preset second flattening sequence;
inputting each second block sequence into a preset second full-connection aggregation neural network model according to a preset second input sequence, and obtaining a second feature vector sequence output by the second full-connection aggregation neural network model; each of the second block sequences corresponds to one of the second feature vector sequences;
recovering the second feature vector sequence into cost blocks according to the second flattening sequence, and reconstructing a cost volume structure according to the second block division mode to obtain a third cost volume structure;
the aggregation calculation result acquisition unit is used for acquiring an aggregation calculation result; the acquisition process of the aggregation calculation result comprises the following steps:
and fusing the first cost volume structure, the second cost volume structure and the third cost volume structure, and outputting the cost volume structure obtained after fusion as a result of cost aggregation calculation.
CN202310365491.9A 2023-04-07 2023-04-07 Cost aggregation calculation method and device for stereo matching Pending CN116580072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310365491.9A CN116580072A (en) 2023-04-07 2023-04-07 Cost aggregation calculation method and device for stereo matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310365491.9A CN116580072A (en) 2023-04-07 2023-04-07 Cost aggregation calculation method and device for stereo matching

Publications (1)

Publication Number Publication Date
CN116580072A true CN116580072A (en) 2023-08-11

Family

ID=87543637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310365491.9A Pending CN116580072A (en) 2023-04-07 2023-04-07 Cost aggregation calculation method and device for stereo matching

Country Status (1)

Country Link
CN (1) CN116580072A (en)

Similar Documents

Publication Publication Date Title
CN109191515B (en) Image parallax estimation method and device and storage medium
Xu et al. Pvsnet: Pixelwise visibility-aware multi-view stereo network
US11348270B2 (en) Method for stereo matching using end-to-end convolutional neural network
GB2553782A (en) Predicting depth from image data using a statistical model
CN113762358B (en) Semi-supervised learning three-dimensional reconstruction method based on relative depth training
WO2018168539A1 (en) Learning method and program
JP2021196951A (en) Image processing apparatus, image processing method, program, method for manufacturing learned model, and image processing system
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN110335222B (en) Self-correction weak supervision binocular parallax extraction method and device based on neural network
CN109949354B (en) Light field depth information estimation method based on full convolution neural network
CN113538243B (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN116740162B (en) Stereo matching method based on multi-scale cost volume and computer storage medium
CN112288788A (en) Monocular image depth estimation method
CN112509021A (en) Parallax optimization method based on attention mechanism
US11967096B2 (en) Methods and apparatuses of depth estimation from focus information
CN115272437A (en) Image depth estimation method and device based on global and local features
CN116468769A (en) Depth information estimation method based on image
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN112116646B (en) Depth estimation method for light field image based on depth convolution neural network
CN111368882B (en) Stereo matching method based on simplified independent component analysis and local similarity
CN116503553A (en) Three-dimensional reconstruction method and device based on binocular vision and diffusion model
CN110335228B (en) Method, device and system for determining image parallax
CN116580072A (en) Cost aggregation calculation method and device for stereo matching
CN116630238A (en) Binocular stereo matching method and device, electronic equipment and storage medium
CN109934863B (en) Light field depth information estimation method based on dense connection type convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication