CN111368882B

CN111368882B - Stereo matching method based on simplified independent component analysis and local similarity

Info

Publication number: CN111368882B
Application number: CN202010103827.0A
Authority: CN
Inventors: 陈苏婷; 张婧霖; 邓仲; 张闯
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2023-04-18
Anticipated expiration: 2040-02-20
Also published as: CN111368882A

Abstract

The invention discloses a stereo matching method based on simplified independent component analysis and local similarity, which is used for the technical field of image processing and improves a DispNet network, wherein the method firstly proposes simplified Independent Component Analysis (ICA) cost aggregation, introduces a matching cost volume pyramid, simplifies the preprocessing process of an ICA algorithm and defines a simplified ICA loss function; secondly, introducing a regional loss function, and defining a local similarity loss function by combining a single-pixel point loss function so as to perfect the spatial structure of the disparity map; and finally, combining the simplified ICA loss function with the local similarity loss function, training the network to predict the disparity map, and making up the edge information of the disparity map. The method and the device improve the prediction accuracy of the edge and the detail part of the parallax image and reduce the dependence degree on a single pixel point in the prediction process while ensuring the prediction speed of the parallax image.

Description

A stereo matching method based on simplified independent component analysis and local similarity

技术领域Technical Field

本发明属于图像处理技术领域，尤其涉及一种基于简化独立成分分析和局部相似性的立体匹配方法。The invention belongs to the technical field of image processing, and in particular relates to a stereo matching method based on simplified independent component analysis and local similarity.

背景技术Background Art

立体匹配是立体视觉研究中的关键部分，在车辆的自动驾驶，3D模型重建，物体的检测与识别等方面有广泛的应用。立体匹配的目的是求出立体图像对中左右图像像素点之间的对应关系，以获得视差图。然而，立体匹配面临着很大的挑战，遇到遮挡、弱纹理、深度不连续等复杂场景，不易获取稠密且精细的视差图。因此，如何准确的从立体图对中获取密集视差，具有重大的研究意义。Stereo matching is a key part of stereo vision research and has a wide range of applications in autonomous driving, 3D model reconstruction, object detection and recognition, etc. The purpose of stereo matching is to find the correspondence between the pixels of the left and right images in a stereo image pair to obtain a disparity map. However, stereo matching faces great challenges. When encountering complex scenes such as occlusion, weak texture, and depth discontinuity, it is not easy to obtain dense and detailed disparity maps. Therefore, how to accurately obtain dense disparity from stereo image pairs is of great research significance.

传统的立体匹配方法匹配效果的好坏依赖于匹配代价的准确性，计算十分缓慢，十分依赖匹配窗口的合理性，对弱纹理区的处理效果不好，算法实现时收敛速度较慢。在传统的立体匹配算法中，采用手动的方法提取图像特征以及代价卷的设计，图像信息表达不全面，影响了后续步骤的实施，视差图精度受到影响。The quality of the traditional stereo matching method depends on the accuracy of the matching cost. The calculation is very slow and depends heavily on the rationality of the matching window. The processing effect on weak texture areas is not good, and the convergence speed of the algorithm is slow. In the traditional stereo matching algorithm, manual methods are used to extract image features and design cost volumes. The image information is not fully expressed, which affects the implementation of subsequent steps and the accuracy of the disparity map.

发明内容Summary of the invention

发明目的：针对在实际场景中现存的立体匹配网络在视差图的边缘，细节信息以及弱纹理区域的预测准确率较低问题，本发明提出一种基于简化独立成分分析(SICA)和局部相似性的立体匹配方法。该方法提高了视差图边缘以及细节部分的预测准确率，减少了在预测过程中对单像素点的依赖程度。Purpose of the invention: In order to solve the problem that the existing stereo matching network has low prediction accuracy in the edge, detail information and weak texture area of the disparity map in actual scenes, the present invention proposes a stereo matching method based on simplified independent component analysis (SICA) and local similarity. This method improves the prediction accuracy of the edge and detail parts of the disparity map and reduces the reliance on single pixels in the prediction process.

技术方案：为实现本发明的目的，本发明所采用的技术方案是：一种基于简化独立成分分析和局部相似性的立体匹配方法，包括以下步骤：Technical solution: To achieve the purpose of the present invention, the technical solution adopted by the present invention is: a stereo matching method based on simplified independent component analysis and local similarity, comprising the following steps:

步骤一，将双目相机拍摄的立体图像对输入DispNetC网络的卷积层，提取每个像素的特征，通过计算特征相关性构建初始匹配代价卷，完成初始匹配代价计算；Step 1: Input the stereo image pair taken by the binocular camera into the convolutional layer of the DispNetC network, extract the features of each pixel, construct the initial matching cost volume by calculating the feature correlation, and complete the initial matching cost calculation;

步骤二，将初始匹配代价卷输入DispNetC网络的编码-解码结构，进行简化独立成分分析匹配代价聚合，定义简化独立成分分析损失函数L_SICA，更新像素点的权值；Step 2: Input the initial matching cost volume into the encoding-decoding structure of the DispNetC network, perform simplified independent component analysis matching cost aggregation, define the simplified independent component analysis loss function L _SICA , and update the pixel weights;

步骤三，聚合后的匹配代价卷输入解码结构的最后一层反卷积层，反卷积的结果即为视差图，构建局部相似性损失函数L_l，并结合简化独立成分分析损失函数L_SICA，得到总损失函数L；Step 3: The aggregated matching cost volume is input into the last deconvolution layer of the decoding structure. The result of the deconvolution is the disparity map. The local similarity loss function L _l is constructed and combined with the simplified independent component analysis loss function L _SICA to obtain the total loss function L;

步骤四，利用真实视差图和预测视差图以及定义的总损失函数L进行网络训练，更新网络参数，通过训练完成的网络预测得到全尺寸视差图。Step 4: Use the real disparity map, the predicted disparity map and the defined total loss function L to perform network training, update the network parameters, and obtain the full-size disparity map through the trained network prediction.

进一步地，所述步骤一，实现从特征表达到像素点相似性衡量的转换，初始匹配代价计算方法如下：Furthermore, in step 1, the conversion from feature expression to pixel similarity measurement is realized, and the initial matching cost calculation method is as follows:

通过DispNetC网络的卷积层提取立体图像对的特征，得到两张图像各自的特征图；将特征输入DispNetC网络的相关层，获取其在特征空间内对应位置的关系，获得初始匹配代价；通过DispNetC网络的相关层比较两个特征图各个块的关系，即计算各块之间的相关性，公式如下：The features of the stereo image pair are extracted through the convolutional layer of the DispNetC network to obtain the feature maps of the two images respectively; the features are input into the correlation layer of the DispNetC network to obtain the relationship between the corresponding positions in the feature space and obtain the initial matching cost; the relationship between the blocks of the two feature maps is compared through the correlation layer of the DispNetC network, that is, the correlation between the blocks is calculated, and the formula is as follows:

其中c(x₁,x₂)表示特征图的块的相关性，f₁和f₂分别表示两个特征图，x₁表示特征图f₁中以x₁为中心的一块，x₂表示特征图f₂中以x₂为中心的一块，k为图像块大小，d为图像位移范围，即视差搜索范围；Where c(x ₁ ,x ₂ ) represents the correlation of the feature map block, f ₁ and f ₂ represent two feature maps respectively, x ₁ represents a block centered on x ₁ in the feature map f ₁ , x ₂ represents a block centered on x ₂ in the feature map f ₂ , k is the image block size, and d is the image displacement range, that is, the disparity search range;

在求取匹配代价的过程中，将左图设为参考图像，在范围d内进行移动，计算相关性大小，得到初始匹配代价卷。In the process of obtaining the matching cost, the left image is set as the reference image, moved within the range d, the correlation size is calculated, and the initial matching cost volume is obtained.

进一步地，所述步骤二，将初始匹配代价卷输入DispNetC网络的编码-解码结构，将匹配代价卷堆叠成空间金字塔并结合简化独立成分分析损失函数，利用通道向量之间的相关性，完成像素点在所有视差搜索范围内与其邻域像素的重要性衡量，完成像素点的权值更新，具体如下：Furthermore, in step 2, the initial matching cost volume is input into the encoding-decoding structure of the DispNetC network, the matching cost volume is stacked into a spatial pyramid and combined with a simplified independent component analysis loss function, and the correlation between channel vectors is used to complete the importance measurement of the pixel point and its neighboring pixels in all disparity search ranges, and the weight update of the pixel point is completed, as follows:

(1)基于简化独立成分分析的代价聚合在解码阶段完成，匹配代价卷经过解码结构的若干反卷积层，每个反卷积层得到一个反卷积结果，即每层输出一个匹配代价卷，堆叠不同层的匹配代价卷f_s构成空间金字塔；对每层匹配代价卷进行上采样，上采样的匹配代价卷的大小和最后一层输出的匹配代价卷f_s'的大小相同；(1) Cost aggregation based on simplified independent component analysis is completed in the decoding stage. The matching cost volume passes through several deconvolution layers of the decoding structure. Each deconvolution layer obtains a deconvolution result, that is, each layer outputs a matching cost volume, and the matching cost volumes _fs of different layers are stacked to form a spatial pyramid; each layer of the matching cost volume is upsampled, and the size of the upsampled matching cost volume is the same as the size of the matching cost volume _fs ' output by the last layer;

(2)保持f_s'的通道数不变，将f_s'拉平成

其中X_j由W_iH_i个通道向量

组成，W_i、H_i分别表示匹配代价卷的长、宽，d_j表示上采样后匹配代价卷的层数，i表示像素点的位置，j表示第j个视差搜索范围；(2) Keep the number of channels of f _s ' unchanged and flatten f _s ' into

where _Xj consists of _WiHi channel _vectors

Composition, _Wi and _Hi represent the length and width of the matching cost volume respectively, _dj represents the number of layers of the matching cost volume after upsampling, i represents the position of the pixel point, and j represents the jth disparity search range;

(3)由拉平的X_j中得到权重矩阵Y_j，Y_j由通道向量

的各个点的权重之和求得；(3) The weight matrix Y _j is obtained from the flattened X _j . Y _j is composed of the channel vector

The sum of the weights of each point is obtained;

其中W_a和b_a分别表示网络权重和偏置项；Where _Wa and _ba represent the network weight and bias term respectively;

(4)对权重矩阵Y_j中对应位置i上的权重进行softmax规范化，得到归一化后的权重矩阵A_i，公式如下：(4) Perform softmax normalization on the weight at the corresponding position i in the weight matrix _Yj to obtain the normalized weight matrix _Ai , as shown in the following formula:

aⁱ＝softmax(Γ(y¹,...,yⁱ))a ⁱ =softmax(Γ(y ¹ ,...,y ⁱ ))

其中aⁱ表示归一化后像素点的权重，i表示像素点的位置，W_iH_i表示矩阵A_i元素个数，yⁱ为权重矩阵Y_j中的元素，表示未归一化之前位置i的像素点的权重，Γ是采用element-wise sum的融合函数，T表示矩阵转置；Where ^ai represents the weight of the pixel after normalization, i represents the position of the pixel, _WiHi _represents the number of elements in the matrix _Ai , ^yi is the element in the weight matrix _Yj , which represents the weight of the pixel at position i before normalization, Γ is the fusion function using element-wise sum, and T represents matrix transpose;

(5)将权重矩阵A_i与X_j相乘，得到聚合后的向量M_i，M_i＝A_iX_j；将完成代价聚合后的向量

转换为代价卷

d_i表示代价聚和后的代价卷层数。(5) Multiply the weight matrix _Ai and _Xj to obtain the aggregated vector _Mi , _Mi = _AiXj _;

Convert to price roll

d _i represents the number of cost volume layers after cost aggregation.

进一步地，由于传统的独立成分分析(ICA)算法，需要进行预处理，特征提取等一系列操作，本发明中仅在构建匹配代价卷金字塔时，根据ICA损失函数定义新的简化独立成分分析(SICA)损失函数，将SICA损失函数参数对应于ICA损失函数中的参数；Furthermore, since the traditional independent component analysis (ICA) algorithm requires a series of operations such as preprocessing and feature extraction, in the present invention, only when constructing the matching cost volume pyramid, a new simplified independent component analysis (SICA) loss function is defined according to the ICA loss function, and the SICA loss function parameters correspond to the parameters in the ICA loss function;

权重矩阵A_i由通道向量

本身加权获得，考虑其他像素点的影响，结合独立成分分析损失函数，定义简化独立成分分析损失函数如下：The weight matrix _Ai is composed of the channel vector

It is weighted itself, considering the influence of other pixels, combined with the independent component analysis loss function, and the simplified independent component analysis loss function is defined as follows:

其中L_SICA表示简化独立成分分析损失函数，I表示单位矩阵，x表示平方和函数。Where L _SICA represents the simplified independent component analysis loss function, I represents the identity matrix, and x represents the square sum function.

进一步地，所述步骤三，在单像素点损失函数的基础上结合区域损失函数，构建局部相似性损失函数，结合简化独立成分分析损失函数，得到总损失函数；Furthermore, in step three, based on the single pixel loss function, the regional loss function is combined to construct a local similarity loss function, and the total loss function is obtained by combining the simplified independent component analysis loss function;

在立体匹配中，通过计算预测视差图和真实视差图之间的差异，将该差异作为训练损失，其中单像素点的损失函数L_s表示为：In stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as the training loss, where the loss function _Ls of a single pixel is expressed as:

其中N是像素数量，d_n和

分别是第n个像素的预测视差以及真实的视差。where N is the number of pixels, d _n and

are the predicted disparity and the actual disparity of the n-th pixel, respectively.

进一步地，采用KL散度衡量两个相邻像素之间的相似性，当像素n和其邻域像素t的真实视差相同，在训练网络时，使像素n和t的预测视差的差别越小，同时损失函数值越小越满足预期；当像素n和其邻域像素t的真实视差不同，在训练网络时，使像素n和t的预测视差的差别越大，同时损失函数越小越满足预期；根据两个相邻像素之间的相似性定义区域损失函数L_r为：Furthermore, KL divergence is used to measure the similarity between two adjacent pixels. When the real disparity of pixel n and its neighboring pixel t is the same, when training the network, the difference between the predicted disparity of pixels n and t is made smaller, and the smaller the loss function value is, the more it meets expectations; when the real disparity of pixel n and its neighboring pixel t is different, when training the network, the difference between the predicted disparity of pixels n and t is made larger, and the smaller the loss function is, the more it meets expectations; according to the similarity between two adjacent pixels, the regional loss function _Lr is defined as:

其中D_kl()表示Kullback-Leibler散度，d_n和d_t分别是中心像素点n和其领域像素点t的预测视差值，

和

分别是中心像素点n和其领域像素点t的真实视差值，m为边界参数。Where D _kl () represents the Kullback-Leibler divergence, d _n and d _t are the predicted disparity values of the center pixel n and its neighborhood pixel t, respectively.

and

are the true disparity values of the center pixel n and its surrounding pixel t, and m is the boundary parameter.

进一步地，在单像素点损失函数的基础上结合区域损失函数，将局部相似性损失函数定义L_l为：Furthermore, based on the single pixel loss function and combined with the regional loss function, the local similarity loss function is _defined as:

其中N是像素数量，区域损失函数L_r中R(d_n)代表区域内预测的视差值，

代表区域内真实的视差值，n代表区域的中心像素，R(*)代表p*q的邻域，R代表p*q邻域的面积。Where N is the number of pixels, and R(d _n ) in the regional loss function L _r represents the predicted disparity value within the region.

represents the true disparity value in the area, n represents the central pixel of the area, R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q.

进一步地，结合简化独立成分分析损失函数L_SICA以及局部相似性损失函数L_l，将总损失函数L定义为：Furthermore, combining the simplified independent component analysis loss function L _SICA and the local similarity loss function L _l , the total loss function L is defined as:

其中ω和λ为权重参数，用来控制简化独立成分分析损失函数L_SICA和局部相似性损失函数L_l的重要性比例，R(*)代表p*q的邻域，R代表p*q邻域的面积。Where ω and λ are weight parameters used to control the importance ratio of the simplified independent component analysis loss function _LSICA and the local similarity loss function _Ll , R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q.

进一步地，所述步骤四，利用BPTT算法实现网络参数更新，所述参数包括权重、偏置。Furthermore, in the step four, the BPTT algorithm is used to update the network parameters, and the parameters include weights and biases.

本发明对DispNetC进行改进。DispNetC网络结构用于立体匹配，求视差图，该网络包括三部分：特征提取，特征相关性计算，编解码结构。立体图像对输入DispNetC网络经过特征提取，特征相关性计算，编解码结构，就可以得到视差图。The present invention improves DispNetC. The DispNetC network structure is used for stereo matching and disparity map. The network includes three parts: feature extraction, feature correlation calculation, and encoding and decoding structure. The stereo image pair is input into the DispNetC network, and after feature extraction, feature correlation calculation, and encoding and decoding structure, the disparity map can be obtained.

本发明在DispNetC的编解码结构上引入ICA代价聚合以及对应的ICA损失函数，同时在DispNetC原本的单像素点损失函数的基础上加入了一个区域损失函数。首先提出了简化独立成分分析代价聚合，在DispNetC编码-解码结构的解码部分引入匹配代价卷金字塔，同时定义了简化独立成分分析损失函数，简化了独立成分分析算法的预处理过程；其次，引入区域损失函数，结合单像素点损失函数，定义局部相似性损失函数，以完善视差图的空间结构；最后，简化独立成分分析损失函数和局部相似性损失函数相结合，进行视差图预测，弥补视差图的边缘信息。The present invention introduces ICA cost aggregation and the corresponding ICA loss function on the encoding and decoding structure of DispNetC, and adds a regional loss function based on the original single-pixel loss function of DispNetC. First, a simplified independent component analysis cost aggregation is proposed, and a matching cost volume pyramid is introduced in the decoding part of the DispNetC encoding-decoding structure. At the same time, a simplified independent component analysis loss function is defined, which simplifies the preprocessing process of the independent component analysis algorithm; secondly, a regional loss function is introduced, combined with a single-pixel loss function, and a local similarity loss function is defined to improve the spatial structure of the disparity map; finally, the simplified independent component analysis loss function is combined with the local similarity loss function to perform disparity map prediction and compensate for the edge information of the disparity map.

有益效果：与现有技术相比，本发明的技术方案具有以下有益的技术效果：Beneficial effects: Compared with the prior art, the technical solution of the present invention has the following beneficial technical effects:

本发明构建了基于简化独立成分分析和局部相似性的立体匹配方法，提出了集匹配代价卷金字塔和简化独立成分分析损失函数为一体的简化独立成分分析匹配代价聚合以及局部相似性损失函数。这个立体匹配方法所提出的匹配代价聚合模型完善了视差图的场景结构以及细节部分。而局部相似性损失函数弥补单像素点损失函数的不足，从依靠独立像素点扩展到依靠邻域像素点信息，学习像素之间的内在关系，在保证视差图预测速度的同时，提高了视差图边缘以及细节部分的预测准确率，减少了在预测过程中对单像素点的依赖程度。The present invention constructs a stereo matching method based on simplified independent component analysis and local similarity, and proposes a simplified independent component analysis matching cost aggregation and a local similarity loss function that integrates a matching cost volume pyramid and a simplified independent component analysis loss function. The matching cost aggregation model proposed by this stereo matching method improves the scene structure and details of the disparity map. The local similarity loss function makes up for the shortcomings of the single pixel loss function, expanding from relying on independent pixels to relying on neighborhood pixel information, learning the intrinsic relationship between pixels, and improving the prediction accuracy of the edges and details of the disparity map while ensuring the prediction speed of the disparity map, reducing the degree of dependence on single pixels in the prediction process.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明方法的实现流程图；Fig. 1 is a flow chart of the implementation of the method of the present invention;

图2是简化独立成分分析匹配代价聚合示意图；FIG2 is a simplified schematic diagram of independent component analysis matching cost aggregation;

图3是构建局部相似性损失函数示意图。FIG3 is a schematic diagram of constructing a local similarity loss function.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明的技术方案作进一步的说明。The technical solution of the present invention is further described below in conjunction with the accompanying drawings and embodiments.

本发明所提出的基于简化独立成分分析和局部相似性的立体匹配方法，实现流程如图1所示，具体实现步骤如下：The stereo matching method based on simplified independent component analysis and local similarity proposed in the present invention has an implementation process as shown in FIG1 , and the specific implementation steps are as follows:

步骤一，将双目相机拍摄的立体图像对输入DispNetC网络的卷积层，提取每个像素的特征，通过计算特征相关性构建初始匹配代价卷，完成初始匹配代价计算，实现从特征表达到像素点相似性衡量的转换；具体如下：Step 1: Input the stereo image pair taken by the binocular camera into the convolution layer of the DispNetC network, extract the features of each pixel, construct the initial matching cost volume by calculating the feature correlation, complete the initial matching cost calculation, and realize the conversion from feature expression to pixel similarity measurement; the details are as follows:

为了比较输入图像对中两个像素点的相似性，需要得到每个像素点的强有力的表达，通过DispNetC网络的卷积层提取立体图像对中左图I_l和右图I_r的特征，得到左图像特征F_l和右图像特征F_r，其中I和F分别表示原图及特征图，l,r分别表示左和右，为匹配代价的构建做准备；In order to compare the similarity of two pixels in the input image pair, it is necessary to obtain a strong expression of each pixel. The features of the left image I _l and the right image I _r in the stereo image pair are extracted through the convolutional layer of the DispNetC network to obtain the left image feature F _l and the right image feature F _r , where I and F represent the original image and feature map respectively, l and r represent left and right respectively, in preparation for the construction of the matching cost;

将特征F_l和F_r输入DispNetC网络的相关层，获取F_l和F_r在特征空间内对应位置的关系，获得初始匹配代价F_c，完成从特征表达到像素点相似性衡量的转换；Input the features F _l and F _r into the relevant layer of the DispNetC network, obtain the relationship between the corresponding positions of F _l and F _r in the feature space, obtain the initial matching cost F _c , and complete the conversion from feature expression to pixel similarity measurement;

DispNetC网络的相关层用于比较两个特征图各个块的关系，即计算各块之间的相关性，公式如下：The correlation layer of the DispNetC network is used to compare the relationship between the blocks of the two feature maps, that is, to calculate the correlation between the blocks. The formula is as follows:

步骤二，将初始匹配代价卷输入DispNetC网络的编码-解码结构，将匹配代价卷堆叠成空间金字塔，进行简化独立成分分析匹配代价聚合，定义简化独立成分分析损失函数L_SICA，利用通道向量之间的相关性，完成像素点在所有视差搜索范围内与其邻域像素的重要性衡量，完成像素点的权值更新；图2所示为简化独立成分分析匹配代价聚合的执行流程示意图，具体包括：Step 2: Input the initial matching cost volume into the encoding-decoding structure of the DispNetC network, stack the matching cost volume into a spatial pyramid, perform simplified independent component analysis matching cost aggregation, define the simplified independent component analysis loss function L _SICA , and use the correlation between channel vectors to measure the importance of pixels and their neighboring pixels in all disparity search ranges, and complete the weight update of pixels; Figure 2 shows a schematic diagram of the execution process of simplified independent component analysis matching cost aggregation, which specifically includes:

(2)保持f_s'的通道数不变，将f_s'拉平成

其中X_j由W_iH_i个通道向量

where _Xj consists of _WiHi channel _vectors

(3)由拉平的X_j中得到权重矩阵Y_j，Y_j由通道向量

The sum of the weights of each point is obtained;

aⁱ＝softmax(Γ(y¹,...,yⁱ))a ⁱ =softmax(Γ(y ¹ ,...,y ⁱ ))

转换为代价卷

Convert to price roll

d _i represents the number of cost volume layers after cost aggregation.

由于传统的独立成分分析(ICA)算法，需要进行预处理，特征提取等一系列操作，本发明中仅在构建匹配代价卷金字塔时，根据ICA损失函数定义新的简化独立成分分析(SICA)损失函数，将SICA损失函数参数对应于ICA损失函数中的参数；Since the traditional independent component analysis (ICA) algorithm requires a series of operations such as preprocessing and feature extraction, in the present invention, only when constructing the matching cost volume pyramid, a new simplified independent component analysis (SICA) loss function is defined according to the ICA loss function, and the SICA loss function parameters correspond to the parameters in the ICA loss function;

以上获取通道本身的权重可以看成一个简化的独立成分分析过程：X_j可以看成独立成分分析处理过程中的待降维信号；由

获得的权重，在计算权值时

可以看作在独立成分分析处理过程中的中心化步骤，其中W_a和b_a分别表示权重和偏置项，这个权重和偏置在网络训练的过程中更新；权重矩阵A_i与独立成分分析中变换矩阵W相对应；在匹配代价卷f_j'给重要部分赋予权值类似于在独立成分分析中提取主要成分，所述重要部分是指有图像中具有特征的位置，比如图像的边缘，对于预测视差较为重要，这些有特征的位置被赋予的权重越高，视差准确性越高；在独立成分分析中提取主要成分是指独立成分分析适用于主成分分析，提取最有代表性的特征。The above acquisition of the channel weights can be regarded as a simplified independent component analysis process: _Xj can be regarded as the signal to be reduced in the independent component analysis process;

The weight obtained, when calculating the weight

It can be regarded as a centralization step in the independent component analysis process, where _Wa and _ba represent weights and bias terms respectively, and the weights and biases are updated during the network training process; the weight matrix _Ai corresponds to the transformation matrix W in the independent component analysis; assigning weights to important parts in the matching cost volume _fj ' is similar to extracting the main components in the independent component analysis, and the important parts refer to the positions with features in the image, such as the edges of the image, which are more important for predicting disparity. The higher the weights assigned to these characteristic positions, the higher the disparity accuracy; extracting the main components in the independent component analysis means that the independent component analysis is applicable to the principal component analysis to extract the most representative features.

当前的权重矩阵A_i由通道向量

本身加权获得，并没有考虑到其他像素点的影响，因此需要结合独立成分分析重建损失函数，定义简化独立成分分析损失函数如下：The current weight matrix _Ai is composed of the channel vector

The weighted result itself does not take into account the influence of other pixels. Therefore, it is necessary to combine the independent component analysis to reconstruct the loss function. The simplified independent component analysis loss function is defined as follows:

步骤三，聚合后的匹配代价卷输入解码结构的最后一层反卷积层，反卷积的结果即为视差图，构建局部相似性损失函数L_l，并结合简化独立成分分析损失函数L_SICA，得到总损失函数L；具体包括：Step 3: The aggregated matching cost volume is input into the last deconvolution layer of the decoding structure. The result of the deconvolution is the disparity map. The local similarity loss function L _l is constructed and combined with the simplified independent component analysis loss function L _SICA to obtain the total loss function L. Specifically, it includes:

其中N是像素数量，d_n和

分别是第n个像素的预测视差以及真实的视差；where N is the number of pixels, d _n and

are the predicted disparity and the actual disparity of the nth pixel respectively;

采用KL散度衡量两个相邻像素之间的相似性，当像素n和其邻域像素t的真实视差相同，在训练网络时，使像素n和t的预测视差的差别越小，同时损失函数值越小越满足预期；当像素n和其邻域像素t的真实视差不同，在训练网络时，使像素n和t的预测视差的差别越大，同时损失函数越小越满足预期；根据两个相邻像素之间的相似性定义区域损失函数L_r为：KL divergence is used to measure the similarity between two adjacent pixels. When the real disparity of pixel n and its neighboring pixel t is the same, when training the network, the difference between the predicted disparity of pixels n and t is made smaller, and the smaller the loss function value is, the more it meets expectations; when the real disparity of pixel n and its neighboring pixel t is different, when training the network, the difference between the predicted disparity of pixels n and t is made larger, and the smaller the loss function is, the more it meets expectations; according to the similarity between two adjacent pixels, the regional loss function _Lr is defined as:

和

分别是中心像素点n和其领域像素点t的真实视差值，m为边界参数；Where D _kl () represents the Kullback-Leibler divergence, d _n and d _t are the predicted disparity values of the center pixel n and its neighborhood pixel t, respectively.

and

are the true disparity values of the center pixel n and its surrounding pixel t, and m is the boundary parameter;

在单像素点损失函数的基础上结合区域损失函数，构建局部相似性损失函数，将局部相似性损失函数定义L_l为：Based on the single pixel loss function and the regional loss function, a local similarity loss function is constructed and the local similarity loss function is _defined as:

代表区域内真实的视差值，n代表区域的中心像素，本实施例中，R(*)代表3*3的邻域，R代表3*3邻域的面积，局部相似性损失函数示意图如图3所示；Where N is the number of pixels, and R(d _n ) in the regional loss function L _r represents the predicted disparity value within the region.

represents the real disparity value in the area, n represents the central pixel of the area, in this embodiment, R(*) represents a 3*3 neighborhood, R represents the area of the 3*3 neighborhood, and the schematic diagram of the local similarity loss function is shown in FIG3 ;

综上，结合简化独立成分分析损失函数L_SICA以及局部相似性损失函数L_l，将总损失函数L定义为：In summary, combined with the simplified independent component analysis loss function L _SICA and the local similarity loss function L _l , the total loss function L is defined as:

其中ω和λ为权重参数，用来控制简化独立成分分析损失函数L_SICA和局部相似性损失函数L_l的重要性比例，本实施例中，R(*)代表3*3的邻域，R代表3*3邻域的面积。Wherein ω and λ are weight parameters, which are used to control the importance ratio of the simplified independent component analysis loss function L _SICA and the local similarity loss function L _l . In this embodiment, R(*) represents a 3*3 neighborhood, and R represents the area of the 3*3 neighborhood.

步骤四，利用真实视差图和预测视差图以及定义的总损失函数L进行网络训练，利用BPTT算法实现网络参数更新，所述参数包括权重、偏置，通过训练完成的网络预测得到全尺寸视差图。Step 4: Use the real disparity map and the predicted disparity map and the defined total loss function L to perform network training, and use the BPTT algorithm to update network parameters, wherein the parameters include weights and biases, and obtain a full-size disparity map through network prediction after training.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变形，这些改进和变形也应视为本发明的保护范围。The above is a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the technical principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.

Claims

1. An image stereo matching method based on simplified independent component analysis and local similarity, characterized in that the method comprises the following steps:

Step 1: Input the stereo image pair taken by the binocular camera into the convolutional layer of the DispNetC network, extract the features of each pixel, construct the initial matching cost volume by calculating the feature correlation, and complete the initial matching cost calculation;

Step 2: Input the initial matching cost volume into the encoding-decoding structure of the DispNetC network, perform simplified independent component analysis matching cost aggregation, define the simplified independent component analysis loss function L _SICA , and update the pixel weights;

Step 3: The aggregated matching cost volume is input into the last deconvolution layer of the decoding structure. The result of the deconvolution is the disparity map. The local similarity loss function L _l is constructed and combined with the simplified independent component analysis loss function L _SICA to obtain the total loss function L; the details are as follows:

Based on the single pixel loss function and the regional loss function, a local similarity loss function is constructed, and combined with the simplified independent component analysis loss function, the total loss function is obtained;

In stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as the training loss, where the loss function _Ls of a single pixel is expressed as:

where N is the number of pixels, d _n and

KL divergence is used to measure the similarity between two adjacent pixels. When the real disparity of pixel n and its neighboring pixel t is the same, when training the network, the difference between the predicted disparity of pixels n and t is made smaller, and the smaller the loss function value is, the more it meets expectations; when the real disparity of pixel n and its neighboring pixel t is different, when training the network, the difference between the predicted disparity of pixels n and t is made larger, and the smaller the loss function is, the more it meets expectations; according to the similarity between two adjacent pixels, the regional loss function _Lr is defined as:

Where D _kl () represents the Kullback-Leibler divergence, d _n and d _t are the predicted disparity values of the center pixel n and its neighborhood pixel t, respectively.

and

Based on the single pixel loss function and the regional loss function, the local similarity loss function is _defined as:

Where N is the number of pixels, and R(d _n ) in the regional loss function L _r represents the predicted disparity value within the region.

represents the true disparity value in the area, n represents the central pixel of the area, R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q;

Combining the simplified independent component analysis loss function L _SICA and the local similarity loss function L _l , the total loss function L is defined as:

Where ω and λ are weight parameters used to control the importance ratio of the simplified independent component analysis loss function L _SICA and the local similarity loss function L _l , R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q;

Step 4: Use the real disparity map, the predicted disparity map and the defined total loss function L to perform network training, update the network parameters, and obtain the full-size disparity map through the trained network prediction.

2. The image stereo matching method based on simplified independent component analysis and local similarity according to claim 1, characterized in that: in step 1, the initial matching cost calculation method is as follows:

The features of the stereo image pair are extracted through the convolutional layer of the DispNetC network to obtain the feature maps of the two images respectively; the features are input into the correlation layer of the DispNetC network to obtain the relationship between the corresponding positions in the feature space and obtain the initial matching cost; the relationship between the blocks of the two feature maps is compared through the correlation layer of the DispNetC network, that is, the correlation between the blocks is calculated, and the formula is as follows:

Where c(x ₁ ,x ₂ ) represents the correlation of the feature map block, f ₁ and f ₂ represent two feature maps respectively, x ₁ represents a block centered on x ₁ in the feature map f ₁ , x ₂ represents a block centered on x ₂ in the feature map f ₂ , k is the image block size, and d is the image displacement range, that is, the disparity search range;

In the process of obtaining the matching cost, the left image in the stereo image pair is set as the reference image, moved within the range d, and the correlation size is calculated to obtain the initial matching cost volume.

3. According to claim 1, a method for image stereo matching based on simplified independent component analysis and local similarity is characterized in that: in step 2, the initial matching cost volume is input into the encoding-decoding structure of the DispNetC network, the matching cost volume is stacked into a spatial pyramid and combined with the simplified independent component analysis loss function, and the correlation between channel vectors is used to complete the importance measurement of the pixel point and its neighboring pixels in all disparity search ranges, and the weight update of the pixel point is completed, which is specifically as follows:

(1) Cost aggregation based on simplified independent component analysis is completed in the decoding stage. The matching cost volume passes through several deconvolution layers of the decoding structure. Each deconvolution layer obtains a deconvolution result, that is, each layer outputs a matching cost volume, and the matching cost volumes _fs of different layers are stacked to form a spatial pyramid; each layer of the matching cost volume is upsampled, and the size of the upsampled matching cost volume is the same as the size of the matching cost volume _fs ' output by the last layer;

(2) Keep the number of channels of f _s ' unchanged and flatten f _s ' into

where _Xj consists of _WiHi channel _vectors

(3) The weight matrix Y _j is obtained from the flattened X _j . Y _j is composed of the channel vector

The sum of the weights of each point is obtained;

Where _Wa and _ba represent the network weight and bias term respectively;

(4) Perform softmax normalization on the weight at the corresponding position i in the weight matrix _Yj to obtain the normalized weight matrix _Ai , as shown in the following formula:

a ⁱ =softmax(Γ(y ¹ ,...,y ⁱ ))

Where ^ai represents the weight of the pixel after normalization, i represents the position of the pixel, _WiHi _represents the number of elements in the matrix _Ai , ^yi is the element in the weight matrix _Yj , which represents the weight of the pixel at position i before normalization, Γ is the fusion function using element-wisesum, and T represents matrix transpose;

(5) Multiply the weight matrix _Ai and _Xj to obtain the aggregated vector _Mi , _Mi = _AiXj _;

Convert to price roll

d _i represents the number of cost volume layers after cost aggregation.

4. The image stereo matching method based on simplified independent component analysis and local similarity according to claim 3 is characterized in that: the weight matrix _Ai is composed of channel vectors

Where L _SICA represents the simplified independent component analysis loss function, I represents the identity matrix, and x represents the square sum function.

5. The image stereo matching method based on simplified independent component analysis and local similarity according to any one of claims 1 to 4, characterized in that: in the step 4, the BPTT algorithm is used to update the network parameters, and the parameters include weights and biases.