CN111368882B - Stereo matching method based on simplified independent component analysis and local similarity - Google Patents
Stereo matching method based on simplified independent component analysis and local similarity Download PDFInfo
- Publication number
- CN111368882B CN111368882B CN202010103827.0A CN202010103827A CN111368882B CN 111368882 B CN111368882 B CN 111368882B CN 202010103827 A CN202010103827 A CN 202010103827A CN 111368882 B CN111368882 B CN 111368882B
- Authority
- CN
- China
- Prior art keywords
- loss function
- component analysis
- pixel
- independent component
- matching cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012880 independent component analysis Methods 0.000 title claims abstract description 75
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000002776 aggregation Effects 0.000 claims abstract description 16
- 238000004220 aggregation Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 29
- 239000013598 vector Substances 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005259 measurement Methods 0.000 claims description 5
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 76
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域Technical Field
本发明属于图像处理技术领域,尤其涉及一种基于简化独立成分分析和局部相似性的立体匹配方法。The invention belongs to the technical field of image processing, and in particular relates to a stereo matching method based on simplified independent component analysis and local similarity.
背景技术Background Art
立体匹配是立体视觉研究中的关键部分,在车辆的自动驾驶,3D模型重建,物体的检测与识别等方面有广泛的应用。立体匹配的目的是求出立体图像对中左右图像像素点之间的对应关系,以获得视差图。然而,立体匹配面临着很大的挑战,遇到遮挡、弱纹理、深度不连续等复杂场景,不易获取稠密且精细的视差图。因此,如何准确的从立体图对中获取密集视差,具有重大的研究意义。Stereo matching is a key part of stereo vision research and has a wide range of applications in autonomous driving, 3D model reconstruction, object detection and recognition, etc. The purpose of stereo matching is to find the correspondence between the pixels of the left and right images in a stereo image pair to obtain a disparity map. However, stereo matching faces great challenges. When encountering complex scenes such as occlusion, weak texture, and depth discontinuity, it is not easy to obtain dense and detailed disparity maps. Therefore, how to accurately obtain dense disparity from stereo image pairs is of great research significance.
传统的立体匹配方法匹配效果的好坏依赖于匹配代价的准确性,计算十分缓慢,十分依赖匹配窗口的合理性,对弱纹理区的处理效果不好,算法实现时收敛速度较慢。在传统的立体匹配算法中,采用手动的方法提取图像特征以及代价卷的设计,图像信息表达不全面,影响了后续步骤的实施,视差图精度受到影响。The quality of the traditional stereo matching method depends on the accuracy of the matching cost. The calculation is very slow and depends heavily on the rationality of the matching window. The processing effect on weak texture areas is not good, and the convergence speed of the algorithm is slow. In the traditional stereo matching algorithm, manual methods are used to extract image features and design cost volumes. The image information is not fully expressed, which affects the implementation of subsequent steps and the accuracy of the disparity map.
发明内容Summary of the invention
发明目的:针对在实际场景中现存的立体匹配网络在视差图的边缘,细节信息以及弱纹理区域的预测准确率较低问题,本发明提出一种基于简化独立成分分析(SICA)和局部相似性的立体匹配方法。该方法提高了视差图边缘以及细节部分的预测准确率,减少了在预测过程中对单像素点的依赖程度。Purpose of the invention: In order to solve the problem that the existing stereo matching network has low prediction accuracy in the edge, detail information and weak texture area of the disparity map in actual scenes, the present invention proposes a stereo matching method based on simplified independent component analysis (SICA) and local similarity. This method improves the prediction accuracy of the edge and detail parts of the disparity map and reduces the reliance on single pixels in the prediction process.
技术方案:为实现本发明的目的,本发明所采用的技术方案是:一种基于简化独立成分分析和局部相似性的立体匹配方法,包括以下步骤:Technical solution: To achieve the purpose of the present invention, the technical solution adopted by the present invention is: a stereo matching method based on simplified independent component analysis and local similarity, comprising the following steps:
步骤一,将双目相机拍摄的立体图像对输入DispNetC网络的卷积层,提取每个像素的特征,通过计算特征相关性构建初始匹配代价卷,完成初始匹配代价计算;Step 1: Input the stereo image pair taken by the binocular camera into the convolutional layer of the DispNetC network, extract the features of each pixel, construct the initial matching cost volume by calculating the feature correlation, and complete the initial matching cost calculation;
步骤二,将初始匹配代价卷输入DispNetC网络的编码-解码结构,进行简化独立成分分析匹配代价聚合,定义简化独立成分分析损失函数LSICA,更新像素点的权值;Step 2: Input the initial matching cost volume into the encoding-decoding structure of the DispNetC network, perform simplified independent component analysis matching cost aggregation, define the simplified independent component analysis loss function L SICA , and update the pixel weights;
步骤三,聚合后的匹配代价卷输入解码结构的最后一层反卷积层,反卷积的结果即为视差图,构建局部相似性损失函数Ll,并结合简化独立成分分析损失函数LSICA,得到总损失函数L;Step 3: The aggregated matching cost volume is input into the last deconvolution layer of the decoding structure. The result of the deconvolution is the disparity map. The local similarity loss function L l is constructed and combined with the simplified independent component analysis loss function L SICA to obtain the total loss function L;
步骤四,利用真实视差图和预测视差图以及定义的总损失函数L进行网络训练,更新网络参数,通过训练完成的网络预测得到全尺寸视差图。Step 4: Use the real disparity map, the predicted disparity map and the defined total loss function L to perform network training, update the network parameters, and obtain the full-size disparity map through the trained network prediction.
进一步地,所述步骤一,实现从特征表达到像素点相似性衡量的转换,初始匹配代价计算方法如下:Furthermore, in step 1, the conversion from feature expression to pixel similarity measurement is realized, and the initial matching cost calculation method is as follows:
通过DispNetC网络的卷积层提取立体图像对的特征,得到两张图像各自的特征图;将特征输入DispNetC网络的相关层,获取其在特征空间内对应位置的关系,获得初始匹配代价;通过DispNetC网络的相关层比较两个特征图各个块的关系,即计算各块之间的相关性,公式如下:The features of the stereo image pair are extracted through the convolutional layer of the DispNetC network to obtain the feature maps of the two images respectively; the features are input into the correlation layer of the DispNetC network to obtain the relationship between the corresponding positions in the feature space and obtain the initial matching cost; the relationship between the blocks of the two feature maps is compared through the correlation layer of the DispNetC network, that is, the correlation between the blocks is calculated, and the formula is as follows:
其中c(x1,x2)表示特征图的块的相关性,f1和f2分别表示两个特征图,x1表示特征图f1中以x1为中心的一块,x2表示特征图f2中以x2为中心的一块,k为图像块大小,d为图像位移范围,即视差搜索范围;Where c(x 1 ,x 2 ) represents the correlation of the feature map block, f 1 and f 2 represent two feature maps respectively, x 1 represents a block centered on x 1 in the feature map f 1 , x 2 represents a block centered on x 2 in the feature map f 2 , k is the image block size, and d is the image displacement range, that is, the disparity search range;
在求取匹配代价的过程中,将左图设为参考图像,在范围d内进行移动,计算相关性大小,得到初始匹配代价卷。In the process of obtaining the matching cost, the left image is set as the reference image, moved within the range d, the correlation size is calculated, and the initial matching cost volume is obtained.
进一步地,所述步骤二,将初始匹配代价卷输入DispNetC网络的编码-解码结构,将匹配代价卷堆叠成空间金字塔并结合简化独立成分分析损失函数,利用通道向量之间的相关性,完成像素点在所有视差搜索范围内与其邻域像素的重要性衡量,完成像素点的权值更新,具体如下:Furthermore, in step 2, the initial matching cost volume is input into the encoding-decoding structure of the DispNetC network, the matching cost volume is stacked into a spatial pyramid and combined with a simplified independent component analysis loss function, and the correlation between channel vectors is used to complete the importance measurement of the pixel point and its neighboring pixels in all disparity search ranges, and the weight update of the pixel point is completed, as follows:
(1)基于简化独立成分分析的代价聚合在解码阶段完成,匹配代价卷经过解码结构的若干反卷积层,每个反卷积层得到一个反卷积结果,即每层输出一个匹配代价卷,堆叠不同层的匹配代价卷fs构成空间金字塔;对每层匹配代价卷进行上采样,上采样的匹配代价卷的大小和最后一层输出的匹配代价卷fs'的大小相同;(1) Cost aggregation based on simplified independent component analysis is completed in the decoding stage. The matching cost volume passes through several deconvolution layers of the decoding structure. Each deconvolution layer obtains a deconvolution result, that is, each layer outputs a matching cost volume, and the matching cost volumes fs of different layers are stacked to form a spatial pyramid; each layer of the matching cost volume is upsampled, and the size of the upsampled matching cost volume is the same as the size of the matching cost volume fs ' output by the last layer;
(2)保持fs'的通道数不变,将fs'拉平成其中Xj由WiHi个通道向量组成,Wi、Hi分别表示匹配代价卷的长、宽,dj表示上采样后匹配代价卷的层数,i表示像素点的位置,j表示第j个视差搜索范围;(2) Keep the number of channels of f s ' unchanged and flatten f s ' into where Xj consists of WiHi channel vectors Composition, Wi and Hi represent the length and width of the matching cost volume respectively, dj represents the number of layers of the matching cost volume after upsampling, i represents the position of the pixel point, and j represents the jth disparity search range;
(3)由拉平的Xj中得到权重矩阵Yj,Yj由通道向量的各个点的权重之和求得;(3) The weight matrix Y j is obtained from the flattened X j . Y j is composed of the channel vector The sum of the weights of each point is obtained;
其中Wa和ba分别表示网络权重和偏置项;Where Wa and ba represent the network weight and bias term respectively;
(4)对权重矩阵Yj中对应位置i上的权重进行softmax规范化,得到归一化后的权重矩阵Ai,公式如下:(4) Perform softmax normalization on the weight at the corresponding position i in the weight matrix Yj to obtain the normalized weight matrix Ai , as shown in the following formula:
ai=softmax(Γ(y1,...,yi))a i =softmax(Γ(y 1 ,...,y i ))
其中ai表示归一化后像素点的权重,i表示像素点的位置,WiHi表示矩阵Ai元素个数,yi为权重矩阵Yj中的元素,表示未归一化之前位置i的像素点的权重,Γ是采用element-wise sum的融合函数,T表示矩阵转置;Where ai represents the weight of the pixel after normalization, i represents the position of the pixel, WiHi represents the number of elements in the matrix Ai , yi is the element in the weight matrix Yj , which represents the weight of the pixel at position i before normalization, Γ is the fusion function using element-wise sum, and T represents matrix transpose;
(5)将权重矩阵Ai与Xj相乘,得到聚合后的向量Mi,Mi=AiXj;将完成代价聚合后的向量转换为代价卷di表示代价聚和后的代价卷层数。(5) Multiply the weight matrix Ai and Xj to obtain the aggregated vector Mi , Mi = AiXj ; Convert to price roll d i represents the number of cost volume layers after cost aggregation.
进一步地,由于传统的独立成分分析(ICA)算法,需要进行预处理,特征提取等一系列操作,本发明中仅在构建匹配代价卷金字塔时,根据ICA损失函数定义新的简化独立成分分析(SICA)损失函数,将SICA损失函数参数对应于ICA损失函数中的参数;Furthermore, since the traditional independent component analysis (ICA) algorithm requires a series of operations such as preprocessing and feature extraction, in the present invention, only when constructing the matching cost volume pyramid, a new simplified independent component analysis (SICA) loss function is defined according to the ICA loss function, and the SICA loss function parameters correspond to the parameters in the ICA loss function;
权重矩阵Ai由通道向量本身加权获得,考虑其他像素点的影响,结合独立成分分析损失函数,定义简化独立成分分析损失函数如下:The weight matrix Ai is composed of the channel vector It is weighted itself, considering the influence of other pixels, combined with the independent component analysis loss function, and the simplified independent component analysis loss function is defined as follows:
其中LSICA表示简化独立成分分析损失函数,I表示单位矩阵,x表示平方和函数。Where L SICA represents the simplified independent component analysis loss function, I represents the identity matrix, and x represents the square sum function.
进一步地,所述步骤三,在单像素点损失函数的基础上结合区域损失函数,构建局部相似性损失函数,结合简化独立成分分析损失函数,得到总损失函数;Furthermore, in step three, based on the single pixel loss function, the regional loss function is combined to construct a local similarity loss function, and the total loss function is obtained by combining the simplified independent component analysis loss function;
在立体匹配中,通过计算预测视差图和真实视差图之间的差异,将该差异作为训练损失,其中单像素点的损失函数Ls表示为:In stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as the training loss, where the loss function Ls of a single pixel is expressed as:
其中N是像素数量,dn和分别是第n个像素的预测视差以及真实的视差。where N is the number of pixels, d n and are the predicted disparity and the actual disparity of the n-th pixel, respectively.
进一步地,采用KL散度衡量两个相邻像素之间的相似性,当像素n和其邻域像素t的真实视差相同,在训练网络时,使像素n和t的预测视差的差别越小,同时损失函数值越小越满足预期;当像素n和其邻域像素t的真实视差不同,在训练网络时,使像素n和t的预测视差的差别越大,同时损失函数越小越满足预期;根据两个相邻像素之间的相似性定义区域损失函数Lr为:Furthermore, KL divergence is used to measure the similarity between two adjacent pixels. When the real disparity of pixel n and its neighboring pixel t is the same, when training the network, the difference between the predicted disparity of pixels n and t is made smaller, and the smaller the loss function value is, the more it meets expectations; when the real disparity of pixel n and its neighboring pixel t is different, when training the network, the difference between the predicted disparity of pixels n and t is made larger, and the smaller the loss function is, the more it meets expectations; according to the similarity between two adjacent pixels, the regional loss function Lr is defined as:
其中Dkl()表示Kullback-Leibler散度,dn和dt分别是中心像素点n和其领域像素点t的预测视差值,和分别是中心像素点n和其领域像素点t的真实视差值,m为边界参数。Where D kl () represents the Kullback-Leibler divergence, d n and d t are the predicted disparity values of the center pixel n and its neighborhood pixel t, respectively. and are the true disparity values of the center pixel n and its surrounding pixel t, and m is the boundary parameter.
进一步地,在单像素点损失函数的基础上结合区域损失函数,将局部相似性损失函数定义Ll为:Furthermore, based on the single pixel loss function and combined with the regional loss function, the local similarity loss function is defined as:
其中N是像素数量,区域损失函数Lr中R(dn)代表区域内预测的视差值,代表区域内真实的视差值,n代表区域的中心像素,R(*)代表p*q的邻域,R代表p*q邻域的面积。Where N is the number of pixels, and R(d n ) in the regional loss function L r represents the predicted disparity value within the region. represents the true disparity value in the area, n represents the central pixel of the area, R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q.
进一步地,结合简化独立成分分析损失函数LSICA以及局部相似性损失函数Ll,将总损失函数L定义为:Furthermore, combining the simplified independent component analysis loss function L SICA and the local similarity loss function L l , the total loss function L is defined as:
其中ω和λ为权重参数,用来控制简化独立成分分析损失函数LSICA和局部相似性损失函数Ll的重要性比例,R(*)代表p*q的邻域,R代表p*q邻域的面积。Where ω and λ are weight parameters used to control the importance ratio of the simplified independent component analysis loss function LSICA and the local similarity loss function Ll , R(*) represents the neighborhood of p*q, and R represents the area of the neighborhood of p*q.
进一步地,所述步骤四,利用BPTT算法实现网络参数更新,所述参数包括权重、偏置。Furthermore, in the step four, the BPTT algorithm is used to update the network parameters, and the parameters include weights and biases.
本发明对DispNetC进行改进。DispNetC网络结构用于立体匹配,求视差图,该网络包括三部分:特征提取,特征相关性计算,编解码结构。立体图像对输入DispNetC网络经过特征提取,特征相关性计算,编解码结构,就可以得到视差图。The present invention improves DispNetC. The DispNetC network structure is used for stereo matching and disparity map. The network includes three parts: feature extraction, feature correlation calculation, and encoding and decoding structure. The stereo image pair is input into the DispNetC network, and after feature extraction, feature correlation calculation, and encoding and decoding structure, the disparity map can be obtained.
本发明在DispNetC的编解码结构上引入ICA代价聚合以及对应的ICA损失函数,同时在DispNetC原本的单像素点损失函数的基础上加入了一个区域损失函数。首先提出了简化独立成分分析代价聚合,在DispNetC编码-解码结构的解码部分引入匹配代价卷金字塔,同时定义了简化独立成分分析损失函数,简化了独立成分分析算法的预处理过程;其次,引入区域损失函数,结合单像素点损失函数,定义局部相似性损失函数,以完善视差图的空间结构;最后,简化独立成分分析损失函数和局部相似性损失函数相结合,进行视差图预测,弥补视差图的边缘信息。The present invention introduces ICA cost aggregation and the corresponding ICA loss function on the encoding and decoding structure of DispNetC, and adds a regional loss function based on the original single-pixel loss function of DispNetC. First, a simplified independent component analysis cost aggregation is proposed, and a matching cost volume pyramid is introduced in the decoding part of the DispNetC encoding-decoding structure. At the same time, a simplified independent component analysis loss function is defined, which simplifies the preprocessing process of the independent component analysis algorithm; secondly, a regional loss function is introduced, combined with a single-pixel loss function, and a local similarity loss function is defined to improve the spatial structure of the disparity map; finally, the simplified independent component analysis loss function is combined with the local similarity loss function to perform disparity map prediction and compensate for the edge information of the disparity map.
有益效果:与现有技术相比,本发明的技术方案具有以下有益的技术效果:Beneficial effects: Compared with the prior art, the technical solution of the present invention has the following beneficial technical effects:
本发明构建了基于简化独立成分分析和局部相似性的立体匹配方法,提出了集匹配代价卷金字塔和简化独立成分分析损失函数为一体的简化独立成分分析匹配代价聚合以及局部相似性损失函数。这个立体匹配方法所提出的匹配代价聚合模型完善了视差图的场景结构以及细节部分。而局部相似性损失函数弥补单像素点损失函数的不足,从依靠独立像素点扩展到依靠邻域像素点信息,学习像素之间的内在关系,在保证视差图预测速度的同时,提高了视差图边缘以及细节部分的预测准确率,减少了在预测过程中对单像素点的依赖程度。The present invention constructs a stereo matching method based on simplified independent component analysis and local similarity, and proposes a simplified independent component analysis matching cost aggregation and a local similarity loss function that integrates a matching cost volume pyramid and a simplified independent component analysis loss function. The matching cost aggregation model proposed by this stereo matching method improves the scene structure and details of the disparity map. The local similarity loss function makes up for the shortcomings of the single pixel loss function, expanding from relying on independent pixels to relying on neighborhood pixel information, learning the intrinsic relationship between pixels, and improving the prediction accuracy of the edges and details of the disparity map while ensuring the prediction speed of the disparity map, reducing the degree of dependence on single pixels in the prediction process.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明方法的实现流程图;Fig. 1 is a flow chart of the implementation of the method of the present invention;
图2是简化独立成分分析匹配代价聚合示意图;FIG2 is a simplified schematic diagram of independent component analysis matching cost aggregation;
图3是构建局部相似性损失函数示意图。FIG3 is a schematic diagram of constructing a local similarity loss function.
具体实施方式DETAILED DESCRIPTION
下面结合附图和实施例对本发明的技术方案作进一步的说明。The technical solution of the present invention is further described below in conjunction with the accompanying drawings and embodiments.
本发明所提出的基于简化独立成分分析和局部相似性的立体匹配方法,实现流程如图1所示,具体实现步骤如下:The stereo matching method based on simplified independent component analysis and local similarity proposed in the present invention has an implementation process as shown in FIG1 , and the specific implementation steps are as follows:
步骤一,将双目相机拍摄的立体图像对输入DispNetC网络的卷积层,提取每个像素的特征,通过计算特征相关性构建初始匹配代价卷,完成初始匹配代价计算,实现从特征表达到像素点相似性衡量的转换;具体如下:Step 1: Input the stereo image pair taken by the binocular camera into the convolution layer of the DispNetC network, extract the features of each pixel, construct the initial matching cost volume by calculating the feature correlation, complete the initial matching cost calculation, and realize the conversion from feature expression to pixel similarity measurement; the details are as follows:
为了比较输入图像对中两个像素点的相似性,需要得到每个像素点的强有力的表达,通过DispNetC网络的卷积层提取立体图像对中左图Il和右图Ir的特征,得到左图像特征Fl和右图像特征Fr,其中I和F分别表示原图及特征图,l,r分别表示左和右,为匹配代价的构建做准备;In order to compare the similarity of two pixels in the input image pair, it is necessary to obtain a strong expression of each pixel. The features of the left image I l and the right image I r in the stereo image pair are extracted through the convolutional layer of the DispNetC network to obtain the left image feature F l and the right image feature F r , where I and F represent the original image and feature map respectively, l and r represent left and right respectively, in preparation for the construction of the matching cost;
将特征Fl和Fr输入DispNetC网络的相关层,获取Fl和Fr在特征空间内对应位置的关系,获得初始匹配代价Fc,完成从特征表达到像素点相似性衡量的转换;Input the features F l and F r into the relevant layer of the DispNetC network, obtain the relationship between the corresponding positions of F l and F r in the feature space, obtain the initial matching cost F c , and complete the conversion from feature expression to pixel similarity measurement;
DispNetC网络的相关层用于比较两个特征图各个块的关系,即计算各块之间的相关性,公式如下:The correlation layer of the DispNetC network is used to compare the relationship between the blocks of the two feature maps, that is, to calculate the correlation between the blocks. The formula is as follows:
其中c(x1,x2)表示特征图的块的相关性,f1和f2分别表示两个特征图,x1表示特征图f1中以x1为中心的一块,x2表示特征图f2中以x2为中心的一块,k为图像块大小,d为图像位移范围,即视差搜索范围;Where c(x 1 ,x 2 ) represents the correlation of the feature map block, f 1 and f 2 represent two feature maps respectively, x 1 represents a block centered on x 1 in the feature map f 1 , x 2 represents a block centered on x 2 in the feature map f 2 , k is the image block size, and d is the image displacement range, that is, the disparity search range;
在求取匹配代价的过程中,将左图设为参考图像,在范围d内进行移动,计算相关性大小,得到初始匹配代价卷。In the process of obtaining the matching cost, the left image is set as the reference image, moved within the range d, the correlation size is calculated, and the initial matching cost volume is obtained.
步骤二,将初始匹配代价卷输入DispNetC网络的编码-解码结构,将匹配代价卷堆叠成空间金字塔,进行简化独立成分分析匹配代价聚合,定义简化独立成分分析损失函数LSICA,利用通道向量之间的相关性,完成像素点在所有视差搜索范围内与其邻域像素的重要性衡量,完成像素点的权值更新;图2所示为简化独立成分分析匹配代价聚合的执行流程示意图,具体包括:Step 2: Input the initial matching cost volume into the encoding-decoding structure of the DispNetC network, stack the matching cost volume into a spatial pyramid, perform simplified independent component analysis matching cost aggregation, define the simplified independent component analysis loss function L SICA , and use the correlation between channel vectors to measure the importance of pixels and their neighboring pixels in all disparity search ranges, and complete the weight update of pixels; Figure 2 shows a schematic diagram of the execution process of simplified independent component analysis matching cost aggregation, which specifically includes:
(1)基于简化独立成分分析的代价聚合在解码阶段完成,匹配代价卷经过解码结构的若干反卷积层,每个反卷积层得到一个反卷积结果,即每层输出一个匹配代价卷,堆叠不同层的匹配代价卷fs构成空间金字塔;对每层匹配代价卷进行上采样,上采样的匹配代价卷的大小和最后一层输出的匹配代价卷fs'的大小相同;(1) Cost aggregation based on simplified independent component analysis is completed in the decoding stage. The matching cost volume passes through several deconvolution layers of the decoding structure. Each deconvolution layer obtains a deconvolution result, that is, each layer outputs a matching cost volume, and the matching cost volumes fs of different layers are stacked to form a spatial pyramid; each layer of the matching cost volume is upsampled, and the size of the upsampled matching cost volume is the same as the size of the matching cost volume fs ' output by the last layer;
(2)保持fs'的通道数不变,将fs'拉平成其中Xj由WiHi个通道向量组成,Wi、Hi分别表示匹配代价卷的长、宽,dj表示上采样后匹配代价卷的层数,i表示像素点的位置,j表示第j个视差搜索范围;(2) Keep the number of channels of f s ' unchanged and flatten f s ' into where Xj consists of WiHi channel vectors Composition, Wi and Hi represent the length and width of the matching cost volume respectively, dj represents the number of layers of the matching cost volume after upsampling, i represents the position of the pixel point, and j represents the jth disparity search range;
(3)由拉平的Xj中得到权重矩阵Yj,Yj由通道向量的各个点的权重之和求得;(3) The weight matrix Y j is obtained from the flattened X j . Y j is composed of the channel vector The sum of the weights of each point is obtained;
其中Wa和ba分别表示网络权重和偏置项;Where Wa and ba represent the network weight and bias term respectively;
(4)对权重矩阵Yj中对应位置i上的权重进行softmax规范化,得到归一化后的权重矩阵Ai,公式如下:(4) Perform softmax normalization on the weight at the corresponding position i in the weight matrix Yj to obtain the normalized weight matrix Ai , as shown in the following formula:
ai=softmax(Γ(y1,...,yi))a i =softmax(Γ(y 1 ,...,y i ))
其中ai表示归一化后像素点的权重,i表示像素点的位置,WiHi表示矩阵Ai元素个数,yi为权重矩阵Yj中的元素,表示未归一化之前位置i的像素点的权重,Γ是采用element-wise sum的融合函数,T表示矩阵转置;Where ai represents the weight of the pixel after normalization, i represents the position of the pixel, WiHi represents the number of elements in the matrix Ai , yi is the element in the weight matrix Yj , which represents the weight of the pixel at position i before normalization, Γ is the fusion function using element-wise sum, and T represents matrix transpose;
(5)将权重矩阵Ai与Xj相乘,得到聚合后的向量Mi,Mi=AiXj;将完成代价聚合后的向量转换为代价卷di表示代价聚和后的代价卷层数。(5) Multiply the weight matrix Ai and Xj to obtain the aggregated vector Mi , Mi = AiXj ; Convert to price roll d i represents the number of cost volume layers after cost aggregation.
由于传统的独立成分分析(ICA)算法,需要进行预处理,特征提取等一系列操作,本发明中仅在构建匹配代价卷金字塔时,根据ICA损失函数定义新的简化独立成分分析(SICA)损失函数,将SICA损失函数参数对应于ICA损失函数中的参数;Since the traditional independent component analysis (ICA) algorithm requires a series of operations such as preprocessing and feature extraction, in the present invention, only when constructing the matching cost volume pyramid, a new simplified independent component analysis (SICA) loss function is defined according to the ICA loss function, and the SICA loss function parameters correspond to the parameters in the ICA loss function;
以上获取通道本身的权重可以看成一个简化的独立成分分析过程:Xj可以看成独立成分分析处理过程中的待降维信号;由获得的权重,在计算权值时可以看作在独立成分分析处理过程中的中心化步骤,其中Wa和ba分别表示权重和偏置项,这个权重和偏置在网络训练的过程中更新;权重矩阵Ai与独立成分分析中变换矩阵W相对应;在匹配代价卷fj'给重要部分赋予权值类似于在独立成分分析中提取主要成分,所述重要部分是指有图像中具有特征的位置,比如图像的边缘,对于预测视差较为重要,这些有特征的位置被赋予的权重越高,视差准确性越高;在独立成分分析中提取主要成分是指独立成分分析适用于主成分分析,提取最有代表性的特征。The above acquisition of the channel weights can be regarded as a simplified independent component analysis process: Xj can be regarded as the signal to be reduced in the independent component analysis process; The weight obtained, when calculating the weight It can be regarded as a centralization step in the independent component analysis process, where Wa and ba represent weights and bias terms respectively, and the weights and biases are updated during the network training process; the weight matrix Ai corresponds to the transformation matrix W in the independent component analysis; assigning weights to important parts in the matching cost volume fj ' is similar to extracting the main components in the independent component analysis, and the important parts refer to the positions with features in the image, such as the edges of the image, which are more important for predicting disparity. The higher the weights assigned to these characteristic positions, the higher the disparity accuracy; extracting the main components in the independent component analysis means that the independent component analysis is applicable to the principal component analysis to extract the most representative features.
当前的权重矩阵Ai由通道向量本身加权获得,并没有考虑到其他像素点的影响,因此需要结合独立成分分析重建损失函数,定义简化独立成分分析损失函数如下:The current weight matrix Ai is composed of the channel vector The weighted result itself does not take into account the influence of other pixels. Therefore, it is necessary to combine the independent component analysis to reconstruct the loss function. The simplified independent component analysis loss function is defined as follows:
其中LSICA表示简化独立成分分析损失函数,I表示单位矩阵,x表示平方和函数。Where L SICA represents the simplified independent component analysis loss function, I represents the identity matrix, and x represents the square sum function.
步骤三,聚合后的匹配代价卷输入解码结构的最后一层反卷积层,反卷积的结果即为视差图,构建局部相似性损失函数Ll,并结合简化独立成分分析损失函数LSICA,得到总损失函数L;具体包括:Step 3: The aggregated matching cost volume is input into the last deconvolution layer of the decoding structure. The result of the deconvolution is the disparity map. The local similarity loss function L l is constructed and combined with the simplified independent component analysis loss function L SICA to obtain the total loss function L. Specifically, it includes:
在立体匹配中,通过计算预测视差图和真实视差图之间的差异,将该差异作为训练损失,其中单像素点的损失函数Ls表示为:In stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as the training loss, where the loss function Ls of a single pixel is expressed as:
其中N是像素数量,dn和分别是第n个像素的预测视差以及真实的视差;where N is the number of pixels, d n and are the predicted disparity and the actual disparity of the nth pixel respectively;
采用KL散度衡量两个相邻像素之间的相似性,当像素n和其邻域像素t的真实视差相同,在训练网络时,使像素n和t的预测视差的差别越小,同时损失函数值越小越满足预期;当像素n和其邻域像素t的真实视差不同,在训练网络时,使像素n和t的预测视差的差别越大,同时损失函数越小越满足预期;根据两个相邻像素之间的相似性定义区域损失函数Lr为:KL divergence is used to measure the similarity between two adjacent pixels. When the real disparity of pixel n and its neighboring pixel t is the same, when training the network, the difference between the predicted disparity of pixels n and t is made smaller, and the smaller the loss function value is, the more it meets expectations; when the real disparity of pixel n and its neighboring pixel t is different, when training the network, the difference between the predicted disparity of pixels n and t is made larger, and the smaller the loss function is, the more it meets expectations; according to the similarity between two adjacent pixels, the regional loss function Lr is defined as:
其中Dkl()表示Kullback-Leibler散度,dn和dt分别是中心像素点n和其领域像素点t的预测视差值,和分别是中心像素点n和其领域像素点t的真实视差值,m为边界参数;Where D kl () represents the Kullback-Leibler divergence, d n and d t are the predicted disparity values of the center pixel n and its neighborhood pixel t, respectively. and are the true disparity values of the center pixel n and its surrounding pixel t, and m is the boundary parameter;
在单像素点损失函数的基础上结合区域损失函数,构建局部相似性损失函数,将局部相似性损失函数定义Ll为:Based on the single pixel loss function and the regional loss function, a local similarity loss function is constructed and the local similarity loss function is defined as:
其中N是像素数量,区域损失函数Lr中R(dn)代表区域内预测的视差值,代表区域内真实的视差值,n代表区域的中心像素,本实施例中,R(*)代表3*3的邻域,R代表3*3邻域的面积,局部相似性损失函数示意图如图3所示;Where N is the number of pixels, and R(d n ) in the regional loss function L r represents the predicted disparity value within the region. represents the real disparity value in the area, n represents the central pixel of the area, in this embodiment, R(*) represents a 3*3 neighborhood, R represents the area of the 3*3 neighborhood, and the schematic diagram of the local similarity loss function is shown in FIG3 ;
综上,结合简化独立成分分析损失函数LSICA以及局部相似性损失函数Ll,将总损失函数L定义为:In summary, combined with the simplified independent component analysis loss function L SICA and the local similarity loss function L l , the total loss function L is defined as:
其中ω和λ为权重参数,用来控制简化独立成分分析损失函数LSICA和局部相似性损失函数Ll的重要性比例,本实施例中,R(*)代表3*3的邻域,R代表3*3邻域的面积。Wherein ω and λ are weight parameters, which are used to control the importance ratio of the simplified independent component analysis loss function L SICA and the local similarity loss function L l . In this embodiment, R(*) represents a 3*3 neighborhood, and R represents the area of the 3*3 neighborhood.
步骤四,利用真实视差图和预测视差图以及定义的总损失函数L进行网络训练,利用BPTT算法实现网络参数更新,所述参数包括权重、偏置,通过训练完成的网络预测得到全尺寸视差图。Step 4: Use the real disparity map and the predicted disparity map and the defined total loss function L to perform network training, and use the BPTT algorithm to update network parameters, wherein the parameters include weights and biases, and obtain a full-size disparity map through network prediction after training.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。The above is a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the technical principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103827.0A CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103827.0A CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111368882A CN111368882A (en) | 2020-07-03 |
CN111368882B true CN111368882B (en) | 2023-04-18 |
Family
ID=71206367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010103827.0A Active CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111368882B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149547B (en) * | 2020-09-17 | 2023-06-02 | 南京信息工程大学 | Water Body Recognition Method Based on Image Pyramid Guidance and Pixel Pair Matching |
CN113470099B (en) * | 2021-07-09 | 2022-03-25 | 北京的卢深视科技有限公司 | Depth imaging method, electronic device and storage medium |
CN114049510B (en) * | 2021-10-26 | 2025-01-07 | 北京中科慧眼科技有限公司 | Binocular camera stereo matching method, system and intelligent terminal based on loss function |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584290A (en) * | 2018-12-03 | 2019-04-05 | 北京航空航天大学 | A kind of three-dimensional image matching method based on convolutional neural networks |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
-
2020
- 2020-02-20 CN CN202010103827.0A patent/CN111368882B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584290A (en) * | 2018-12-03 | 2019-04-05 | 北京航空航天大学 | A kind of three-dimensional image matching method based on convolutional neural networks |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
基于卷积神经网络的立体匹配研究;王润;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 * |
基于卷积神经网络的立体匹配算法研究;严邓涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111368882A (en) | 2020-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510535B (en) | High-quality depth estimation method based on depth prediction and enhancer network | |
CN110930342B (en) | A network construction method for depth map super-resolution reconstruction based on color map guidance | |
CN111368882B (en) | Stereo matching method based on simplified independent component analysis and local similarity | |
CN110163213B (en) | Remote sensing image segmentation method based on disparity map and multi-scale depth network model | |
CN113592018A (en) | Infrared light and visible light image fusion method based on residual dense network and gradient loss | |
CN111739082A (en) | An Unsupervised Depth Estimation Method for Stereo Vision Based on Convolutional Neural Networks | |
CN111310598B (en) | A Hyperspectral Remote Sensing Image Classification Method Based on 3D and 2D Hybrid Convolution | |
CN113592026A (en) | Binocular vision stereo matching method based on void volume and cascade cost volume | |
CN112115951B (en) | A RGB-D Image Semantic Segmentation Method Based on Spatial Relationship | |
CN110909615B (en) | Target detection method based on multi-scale input mixed perception neural network | |
CN111402311A (en) | Knowledge distillation-based lightweight stereo parallax estimation method | |
CN110826500B (en) | Method for estimating 3D human body posture based on antagonistic network of motion link space | |
CN110070574A (en) | A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet | |
CN113780389B (en) | Deep learning semi-supervised dense matching method and system based on consistency constraint | |
CN110351548B (en) | A Stereo Image Quality Evaluation Method Based on Deep Learning and Disparity Map Weighted Guidance | |
CN106056622A (en) | Multi-view depth video recovery method based on Kinect camera | |
CN111462211A (en) | Binocular parallax calculation method based on convolutional neural network | |
CN109087247A (en) | The method that a kind of pair of stereo-picture carries out oversubscription | |
Li et al. | No-reference stereoscopic image quality assessment based on local to global feature regression | |
CN108596831B (en) | Super-resolution reconstruction method based on AdaBoost example regression | |
CN101739684B (en) | Color segmentation and pixel significance estimation-based parallax estimation method | |
CN113807417B (en) | Dense matching method and system based on deep learning visual field self-selection network | |
CN112907641A (en) | Multi-view depth estimation method based on detail information preservation | |
CN115375746A (en) | Stereo Matching Method Based on Dual Spatial Pooling Pyramid | |
CN113361375A (en) | Vehicle target identification method based on improved BiFPN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |