CN111368882B - Stereo matching method based on simplified independent component analysis and local similarity - Google Patents
Stereo matching method based on simplified independent component analysis and local similarity Download PDFInfo
- Publication number
- CN111368882B CN111368882B CN202010103827.0A CN202010103827A CN111368882B CN 111368882 B CN111368882 B CN 111368882B CN 202010103827 A CN202010103827 A CN 202010103827A CN 111368882 B CN111368882 B CN 111368882B
- Authority
- CN
- China
- Prior art keywords
- loss function
- component analysis
- independent component
- pixel
- simplified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012880 independent component analysis Methods 0.000 title claims abstract description 75
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002776 aggregation Effects 0.000 claims abstract description 16
- 238000004220 aggregation Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 29
- 239000013598 vector Substances 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005259 measurement Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a stereo matching method based on simplified independent component analysis and local similarity, which is used for the technical field of image processing and improves a DispNet network, wherein the method firstly proposes simplified Independent Component Analysis (ICA) cost aggregation, introduces a matching cost volume pyramid, simplifies the preprocessing process of an ICA algorithm and defines a simplified ICA loss function; secondly, introducing a regional loss function, and defining a local similarity loss function by combining a single-pixel point loss function so as to perfect the spatial structure of the disparity map; and finally, combining the simplified ICA loss function with the local similarity loss function, training the network to predict the disparity map, and making up the edge information of the disparity map. The method and the device improve the prediction accuracy of the edge and the detail part of the parallax image and reduce the dependence degree on a single pixel point in the prediction process while ensuring the prediction speed of the parallax image.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a stereo matching method based on simplified independent component analysis and local similarity.
Background
The stereo matching is a key part in stereo vision research, and has wide application in the aspects of automatic driving of vehicles, 3D model reconstruction, object detection and identification and the like. The purpose of stereo matching is to solve the corresponding relation between the pixel points of the left and right images in the stereo image pair to obtain a disparity map. However, stereo matching faces a great challenge, and it is not easy to acquire a dense and fine disparity map when complex scenes such as occlusion, weak texture, depth discontinuity, and the like are encountered. Therefore, how to accurately acquire the dense parallax from the stereogram pair has great research significance.
The traditional stereo matching method has the advantages that the matching effect depends on the accuracy of the matching cost, the calculation is very slow, the matching window rationality is very dependent, the processing effect on the weak texture area is poor, and the convergence speed is slow when the algorithm is realized. In the traditional stereo matching algorithm, the image characteristics and the design of a cost volume are extracted by a manual method, the image information expression is incomplete, the implementation of the subsequent steps is influenced, and the parallax image precision is influenced.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problem that the prediction accuracy of the existing stereo matching network at the edge of a disparity map, detail information and a weak texture area is low in an actual scene, the invention provides a stereo matching method based on Simplified Independent Component Analysis (SICA) and local similarity. The method improves the prediction accuracy of the parallax image edge and the detail part, and reduces the dependence degree on a single pixel point in the prediction process.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a stereo matching method based on simplified independent component analysis and local similarity comprises the following steps:
inputting a stereo image shot by a binocular camera into a convolution layer of a DispNeTC network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, and finishing the initial matching cost calculation;
inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA Updating the weight of the pixel point;
inputting the aggregated matched cost volume into the last deconvolution layer of the decoding structure, wherein the deconvolution result is a disparity map, and constructing a local similarity lossFunction L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L;
and fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters, and predicting through the trained network to obtain the full-size disparity map.
Further, in the first step, the conversion from the feature expression to the pixel point similarity measurement is realized, and the initial matching cost calculation method is as follows:
extracting the characteristics of the stereo image pair through the convolution layer of the DispNet C network to obtain the characteristic diagrams of the two images respectively; inputting the characteristics into a relevant layer of the DispNetC network, and acquiring the relationship of the characteristics in the corresponding position of the characteristics in a characteristic space to obtain initial matching cost; comparing the relations of the blocks of the two feature maps through the correlation layer of the DispNet C network, namely calculating the correlation between the blocks, wherein the formula is as follows:
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature diagram f 1 In the formula x 1 A block of center, x 2 Representation feature diagram f 2 In the formula x 2 The image is a central block, k is the size of the image block, and d is the image displacement range, namely the parallax search range;
and in the process of solving the matching cost, setting the left image as a reference image, moving within the range d, and calculating the correlation size to obtain an initial matching cost volume.
Further, in the second step, the initial matching cost volume is input into the encoding-decoding structure of the dispnet c network, the matching cost volumes are stacked into a spatial pyramid and combined with a simplified independent component analysis loss function, and by using the correlation between the channel vectors, the importance measurement of the pixel point and the adjacent pixel in all parallax search ranges is completed, and the weight update of the pixel point is completed, specifically as follows:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' leveling intoWherein X j From W i H i Individual channel vector->Composition of W i 、H i Respectively represent the length and width of the matched cost volume, d j The number of layers of the matched cost volume after upsampling is represented, i represents the position of a pixel point, and j represents the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vectorThe sum of the weights of the points is obtained;
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i representing the position of the pixel point, W i H i Representation matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
(5) Weighting matrix A i And X j Multiplying to obtain a polymerized vector M i ,M i =A i X j (ii) a Aggregating the completed cost vectorSwitch to cost @>d i Representing the number of cost volume layers after the cost aggregation.
Furthermore, because a series of operations such as preprocessing, feature extraction and the like are required in the traditional Independent Component Analysis (ICA) algorithm, a new Simplified Independent Component Analysis (SICA) loss function is defined according to the ICA loss function only when a matching cost volume pyramid is constructed, and the SICA loss function parameters are corresponding to the parameters in the ICA loss function;
weight matrix A i By channel vectorThe self weighting is obtained, the influence of other pixel points is considered, and the simplified independent component analysis loss function is defined as follows by combining the independent component analysis loss function: />
Wherein L is SICA Representing simplified independent component analysisThe loss function, I denotes the identity matrix, x denotes the sum of squares function.
Further, in the third step, a local similarity loss function is constructed by combining the regional loss function on the basis of the single-pixel point loss function, and a total loss function is obtained by combining the simplified independent component analysis loss function;
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
where N is the number of pixels, d n Andthe predicted disparity and the true disparity of the nth pixel are respectively.
Further, KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallax of the pixel n is the same as that of the pixel t in the neighborhood of the pixel n, the difference of the predicted parallax of the pixel n and the predicted parallax of the pixel t is smaller when the network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the larger the difference of the predicted parallaxes of the pixel n and the pixel t is, and the smaller the loss function is, the more the difference meets the expectation in the network training process; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,and &>The true disparity values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter.
Further, a regional loss function is combined on the basis of a single-pixel point loss function, and a local similarity loss function is defined to be L l Comprises the following steps:
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,representing the true disparity value within the region, n representing the center pixel of the region, R (×) representing the neighborhood of p × q, R representing the area of the neighborhood of p × q.
Further, a simplified independent component analysis loss function L is combined SICA And a local similarity loss function L l The total loss function L is defined as:
where ω and λ are weight parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l R (×) represents the neighborhood of p × q, R represents the area of the neighborhood of p × q.
Further, in the fourth step, updating network parameters by using a BPTT algorithm, wherein the parameters include weight and bias.
The invention improves DispNetC. The DispNet network structure is used for stereo matching and solving a disparity map, and the network comprises three parts: feature extraction, feature correlation calculation and coding and decoding structure. The disparity map can be obtained by inputting the stereo image to a DispNet C network through feature extraction, feature correlation calculation and coding and decoding structures.
According to the invention, ICA cost aggregation and a corresponding ICA loss function are introduced on the encoding and decoding structure of the DispNet C, and a region loss function is added on the basis of the original single-pixel-point loss function of the DispNet C. Firstly, the simplified independent component analysis cost aggregation is proposed, a matching cost volume pyramid is introduced into a decoding part of a DispNet C coding-decoding structure, a simplified independent component analysis loss function is defined, and a preprocessing process of an independent component analysis algorithm is simplified; secondly, introducing a regional loss function, and defining a local similarity loss function by combining a single-pixel point loss function so as to perfect the spatial structure of the disparity map; and finally, combining the simplified independent component analysis loss function with the local similarity loss function to predict the disparity map and make up the edge information of the disparity map.
Has the beneficial effects that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention constructs a stereo matching method based on simplified independent component analysis and local similarity, and provides a simplified independent component analysis matching cost aggregation and a local similarity loss function which integrates a matching cost volume pyramid and a simplified independent component analysis loss function. The matching cost aggregation model provided by the stereo matching method perfects the scene structure and the detail part of the disparity map. The local similarity loss function makes up the defect of the single-pixel loss function, the internal relation between pixels is learned by depending on independent pixels and depending on neighborhood pixel information, the prediction accuracy of the edge and detail parts of the disparity map is improved while the prediction speed of the disparity map is ensured, and the degree of dependence on the single pixel in the prediction process is reduced.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a simplified independent component analysis matching cost aggregation diagram;
FIG. 3 is a schematic diagram of constructing a local similarity loss function.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The stereo matching method based on simplified independent component analysis and local similarity provided by the invention has the implementation flow as shown in figure 1, and comprises the following specific implementation steps:
inputting stereo images shot by a binocular camera into a convolution layer of a DispNet network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, completing the calculation of initial matching cost, and realizing the conversion from characteristic expression to pixel point similarity measurement; the method comprises the following specific steps:
in order to compare the similarity of two pixel points in an input image pair, the powerful expression of each pixel point needs to be obtained, and a left image I in a stereo image pair is extracted through a convolution layer of a DispNet C network l And right picture I r To obtain a left image feature F l And right image feature F r Wherein I and F respectively represent an original image and a characteristic diagram, and l and r respectively represent left and right to prepare for construction of matching cost;
will be characterized by F l And F r Inputting relevant layers of the DispNet network to obtain F l And F r Obtaining an initial matching cost F according to the relation of corresponding positions in the feature space c Completing the conversion from feature expression to pixel point similarity measurement;
the correlation layer of the dispnet c network is used for comparing the relations of the blocks of the two feature maps, i.e. calculating the correlation between the blocks, and the formula is as follows:
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature diagram f 1 In the formula x 1 A block of center, x 2 Representation feature map f 2 In the middle with x 2 The image is a central block, k is the size of the image block, and d is the image displacement range, namely the parallax search range;
and in the process of solving the matching cost, setting the left image as a reference image, moving within the range d, and calculating the correlation size to obtain an initial matching cost volume.
Inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, stacking the matching cost volumes into a spatial pyramid, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA The relevance between the channel vectors is utilized to finish the importance measurement of the pixel point and the adjacent pixels in all parallax search ranges, and the weight value updating of the pixel point is finished; fig. 2 is a schematic diagram illustrating an execution flow of simplified independent component analysis matching cost aggregation, which specifically includes:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' Laping to getWherein X j From W i H i Multiple channel vector>Composition of W i 、H i Respectively representing the length and width of the matching cost volume, d j Representing the number of layers of the matched cost volume after upsampling, i representing the position of a pixel point, and j representing the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vectorThe sum of the weights of the points is obtained;
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i represents the position of the pixel point, W i H i Representation matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, wherein gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
(5) Weighting matrix A i And X j Multiplying to obtain a polymerized vector M i ,M i =A i X j (ii) a Aggregating the vectors after completing costSwitch to cost @>d i Representing the number of cost volume layers after the cost aggregation.
Because the traditional Independent Component Analysis (ICA) algorithm needs a series of operations such as preprocessing, feature extraction and the like, a new Simplified Independent Component Analysis (SICA) loss function is defined according to the ICA loss function only when a matched cost volume pyramid is constructed, and the SICA loss function parameters correspond to the parameters in the ICA loss function;
the above weighting of the acquisition channel itself can be regarded as a simplified independent component analysis process: x j Can be regarded as a signal to be reduced in the process of independent component analysis processing; byThe obtained weight, when calculating the weight->Can be viewed as a centralized step in the process of independent component analysis, where W a And b a Respectively representing a weight and an offset item, wherein the weight and the offset are updated in the network training process; weight matrix A i Corresponding to the transformation matrix W in the independent component analysis; at matching cost volume f j Assigning weights to important parts is similar to extracting main components in independent component analysis, the important parts refer to positions with characteristics in an image, such as the edges of the image, and are important for predicting parallax, and the higher the weight assigned to the positions with the characteristics is, the higher the parallax accuracy is; extracting principal components in the independent component analysis means that the independent component analysis is applied to principal component analysis, and the most representative feature is extracted.
The current weight matrix A i By channel vectorThe self weighting is obtained, and the influence of other pixel points is not considered, so that the independent component analysis reconstruction loss function needs to be combined, and the simplified independent component analysis loss function is defined as follows:
wherein L is SICA Represents a simplified independent component analysis loss function, I represents an identity matrix, and x represents a sum of squares function.
Step three, inputting the aggregated matching cost volumeDecoding the last deconvolution layer of the structure, obtaining a deconvolution result as a disparity map, and constructing a local similarity loss function L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L; the method specifically comprises the following steps:
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
where N is the number of pixels, d n Andthe predicted disparity and the real disparity of the nth pixel are respectively;
the KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallaxes of a pixel n and a pixel t in the neighborhood of the pixel n are the same, the difference of the predicted parallaxes of the pixel n and the pixel t is smaller when a network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the difference of the predicted parallaxes of the pixel n and the pixel t is larger, and the smaller the loss function is, the more the expectation is met; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,and &>The real parallax values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter;
combining the regional loss function on the basis of the single-pixel point loss function to construct a local similarity loss function, and defining L for the local similarity loss function l Comprises the following steps:
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,representing the actual disparity value in the region, n represents the central pixel of the region, in this embodiment, R (×) represents the 3 × 3 neighborhood, R represents the area of the 3 × 3 neighborhood, and the local similarity loss function is schematically shown in fig. 3;
in summary, the loss function L is combined with the simplified independent component analysis SICA And a local similarity loss function L l The total loss function L is defined as:
wherein ω and λ are weighting parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l In this embodiment, R (×) represents the 3 × 3 neighborhood, and R represents the area of the 3 × 3 neighborhood.
And fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters by using a BPTT algorithm, wherein the parameters comprise weight and offset, and obtaining a full-size disparity map through the trained network prediction.
The foregoing is a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (5)
1. An image stereo matching method based on simplified independent component analysis and local similarity is characterized in that: the method comprises the following steps:
inputting a stereo image shot by a binocular camera into a convolutional layer of a DispNet network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, and completing the calculation of initial matching cost;
inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA Updating the weight of the pixel point;
inputting the aggregated matched cost volume into a last deconvolution layer of a decoding structure, wherein a deconvolution result is a disparity map, and constructing a local similarity loss function L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L; the method comprises the following specific steps:
combining the regional loss functions on the basis of the single-pixel point loss functions, constructing local similarity loss functions, and combining simplified independent component analysis loss functions to obtain a total loss function;
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
where N is the number of pixels, d n Andpredicted disparity and true disparity for the nth pixel, respectively;
The KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallaxes of the pixel n and the neighborhood pixel t are the same, the difference of the predicted parallaxes of the pixel n and the pixel t is smaller when a network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the difference of the predicted parallaxes of the pixel n and the pixel t is larger, and the smaller the loss function is, the more the expectation is met; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,and &>The real parallax values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter;
on the basis of the single-pixel point loss function, the regional loss function is combined, and the local similarity loss function is defined to be L l Comprises the following steps:
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,representing the true disparity value within the region, n representing the central pixel of the region, R (. + -.) representing the neighborhood of p × q, R representing the face of the neighborhood of p × qAccumulating;
combining simplified independent component analysis loss function L SICA And a local similarity loss function L l The total loss function L is defined as:
wherein ω and λ are weighting parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l R (—) represents the neighborhood of p × q, R represents the area of the neighborhood of p × q;
and fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters, and predicting through the trained network to obtain the full-size disparity map.
2. The method for stereo image matching based on simplified independent component analysis and local similarity as claimed in claim 1, wherein: in the first step, the initial matching cost calculation method is as follows:
extracting the characteristics of the stereo image pair through the convolution layer of the DispNet network to obtain the characteristic images of the two images; inputting the characteristics into a relevant layer of the DispNetC network, and acquiring the relationship of the characteristics in the corresponding position of the characteristics in a characteristic space to obtain initial matching cost; comparing the relations of the blocks of the two feature maps through the correlation layer of the DispNet C network, namely calculating the correlation between the blocks, wherein the formula is as follows:
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature map f 1 In the formula x 1 A block of center, x 2 Representation feature diagram f 2 In the formula x 2 A block as a center, k is a figureThe size of the image block, d is the image displacement range, namely the parallax search range;
in the process of solving the matching cost, the left image in the stereo image pair is set as a reference image, the reference image is moved within the range d, and the correlation size is calculated to obtain an initial matching cost volume.
3. The image stereo matching method based on simplified independent component analysis and local similarity according to claim 1, wherein: inputting the initial matching cost volume into a coding-decoding structure of the DispNetC network, stacking the matching cost volumes into a spatial pyramid, combining with a simplified independent component analysis loss function, and completing the importance measurement of the pixel points and the adjacent pixels in all parallax search ranges by utilizing the correlation among the channel vectors, and completing the weight update of the pixel points, wherein the specific steps are as follows:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' leveling intoWherein X j From W i H i Individual channel vector->Composition of W i 、H i Respectively representing the length and width of the matching cost volume, d j The number of layers of the matched cost volume after upsampling is represented, i represents the position of a pixel point, and j represents the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vectorThe sum of the weights of the points is obtained;
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i represents the position of the pixel point, W i H i Represents matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, wherein gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
4. Root of herbaceous plantThe image stereo matching method based on simplified independent component analysis and local similarity as claimed in claim 3, wherein: weight matrix A i By channel vectorThe self weighting is obtained, the influence of other pixel points is considered, and the simplified independent component analysis loss function is defined as follows by combining the independent component analysis loss function:
wherein L is SICA Represents a simplified independent component analysis loss function, I represents an identity matrix, and x represents a sum of squares function.
5. The image stereo matching method based on simplified independent component analysis and local similarity according to any one of claims 1 to 4, wherein: and fourthly, updating network parameters by using a BPTT algorithm, wherein the parameters comprise weight and bias.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103827.0A CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103827.0A CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111368882A CN111368882A (en) | 2020-07-03 |
CN111368882B true CN111368882B (en) | 2023-04-18 |
Family
ID=71206367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010103827.0A Active CN111368882B (en) | 2020-02-20 | 2020-02-20 | Stereo matching method based on simplified independent component analysis and local similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111368882B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149547B (en) * | 2020-09-17 | 2023-06-02 | 南京信息工程大学 | Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching |
CN113470099B (en) * | 2021-07-09 | 2022-03-25 | 北京的卢深视科技有限公司 | Depth imaging method, electronic device and storage medium |
CN114049510A (en) * | 2021-10-26 | 2022-02-15 | 北京中科慧眼科技有限公司 | Binocular camera stereo matching algorithm and system based on loss function and intelligent terminal |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584290A (en) * | 2018-12-03 | 2019-04-05 | 北京航空航天大学 | A kind of three-dimensional image matching method based on convolutional neural networks |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
-
2020
- 2020-02-20 CN CN202010103827.0A patent/CN111368882B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584290A (en) * | 2018-12-03 | 2019-04-05 | 北京航空航天大学 | A kind of three-dimensional image matching method based on convolutional neural networks |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
基于卷积神经网络的立体匹配研究;王润;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 * |
基于卷积神经网络的立体匹配算法研究;严邓涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111368882A (en) | 2020-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510535B (en) | High-quality depth estimation method based on depth prediction and enhancer network | |
CN111368882B (en) | Stereo matching method based on simplified independent component analysis and local similarity | |
CN109598754B (en) | Binocular depth estimation method based on depth convolution network | |
CN110009674B (en) | Monocular image depth of field real-time calculation method based on unsupervised depth learning | |
Mehta et al. | Structured adversarial training for unsupervised monocular depth estimation | |
CN108846473B (en) | Light field depth estimation method based on direction and scale self-adaptive convolutional neural network | |
CN110728707B (en) | Multi-view depth prediction method based on asymmetric depth convolution neural network | |
CN109472819A (en) | A kind of binocular parallax estimation method based on cascade geometry context neural network | |
CN112288758B (en) | Infrared and visible light image registration method for power equipment | |
CN104867135A (en) | High-precision stereo matching method based on guiding image guidance | |
CN113592026A (en) | Binocular vision stereo matching method based on void volume and cascade cost volume | |
CN111985551B (en) | Stereo matching algorithm based on multi-attention network | |
CN102263957B (en) | Search-window adaptive parallax estimation method | |
CN109831664B (en) | Rapid compressed stereo video quality evaluation method based on deep learning | |
CN111402311A (en) | Knowledge distillation-based lightweight stereo parallax estimation method | |
CN111914913B (en) | Novel stereo matching optimization method | |
CN111583313A (en) | Improved binocular stereo matching method based on PSmNet | |
CN108171249A (en) | A kind of local description learning method based on RGBD data | |
CN111462211B (en) | Binocular parallax calculation method based on convolutional neural network | |
CN113780389A (en) | Deep learning semi-supervised dense matching method and system based on consistency constraint | |
CN109801323A (en) | Pyramid binocular depth with self-promotion ability estimates model | |
CN117152580A (en) | Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method | |
CN116468769A (en) | Depth information estimation method based on image | |
CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
CN118115559A (en) | Stereo matching algorithm combining Transformer and HITNet networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |