CN111368882B - Stereo matching method based on simplified independent component analysis and local similarity - Google Patents

Stereo matching method based on simplified independent component analysis and local similarity Download PDF

Info

Publication number
CN111368882B
CN111368882B CN202010103827.0A CN202010103827A CN111368882B CN 111368882 B CN111368882 B CN 111368882B CN 202010103827 A CN202010103827 A CN 202010103827A CN 111368882 B CN111368882 B CN 111368882B
Authority
CN
China
Prior art keywords
loss function
component analysis
independent component
pixel
simplified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010103827.0A
Other languages
Chinese (zh)
Other versions
CN111368882A (en
Inventor
陈苏婷
张婧霖
邓仲
张闯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202010103827.0A priority Critical patent/CN111368882B/en
Publication of CN111368882A publication Critical patent/CN111368882A/en
Application granted granted Critical
Publication of CN111368882B publication Critical patent/CN111368882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2134Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a stereo matching method based on simplified independent component analysis and local similarity, which is used for the technical field of image processing and improves a DispNet network, wherein the method firstly proposes simplified Independent Component Analysis (ICA) cost aggregation, introduces a matching cost volume pyramid, simplifies the preprocessing process of an ICA algorithm and defines a simplified ICA loss function; secondly, introducing a regional loss function, and defining a local similarity loss function by combining a single-pixel point loss function so as to perfect the spatial structure of the disparity map; and finally, combining the simplified ICA loss function with the local similarity loss function, training the network to predict the disparity map, and making up the edge information of the disparity map. The method and the device improve the prediction accuracy of the edge and the detail part of the parallax image and reduce the dependence degree on a single pixel point in the prediction process while ensuring the prediction speed of the parallax image.

Description

Stereo matching method based on simplified independent component analysis and local similarity
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a stereo matching method based on simplified independent component analysis and local similarity.
Background
The stereo matching is a key part in stereo vision research, and has wide application in the aspects of automatic driving of vehicles, 3D model reconstruction, object detection and identification and the like. The purpose of stereo matching is to solve the corresponding relation between the pixel points of the left and right images in the stereo image pair to obtain a disparity map. However, stereo matching faces a great challenge, and it is not easy to acquire a dense and fine disparity map when complex scenes such as occlusion, weak texture, depth discontinuity, and the like are encountered. Therefore, how to accurately acquire the dense parallax from the stereogram pair has great research significance.
The traditional stereo matching method has the advantages that the matching effect depends on the accuracy of the matching cost, the calculation is very slow, the matching window rationality is very dependent, the processing effect on the weak texture area is poor, and the convergence speed is slow when the algorithm is realized. In the traditional stereo matching algorithm, the image characteristics and the design of a cost volume are extracted by a manual method, the image information expression is incomplete, the implementation of the subsequent steps is influenced, and the parallax image precision is influenced.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problem that the prediction accuracy of the existing stereo matching network at the edge of a disparity map, detail information and a weak texture area is low in an actual scene, the invention provides a stereo matching method based on Simplified Independent Component Analysis (SICA) and local similarity. The method improves the prediction accuracy of the parallax image edge and the detail part, and reduces the dependence degree on a single pixel point in the prediction process.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a stereo matching method based on simplified independent component analysis and local similarity comprises the following steps:
inputting a stereo image shot by a binocular camera into a convolution layer of a DispNeTC network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, and finishing the initial matching cost calculation;
inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA Updating the weight of the pixel point;
inputting the aggregated matched cost volume into the last deconvolution layer of the decoding structure, wherein the deconvolution result is a disparity map, and constructing a local similarity lossFunction L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L;
and fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters, and predicting through the trained network to obtain the full-size disparity map.
Further, in the first step, the conversion from the feature expression to the pixel point similarity measurement is realized, and the initial matching cost calculation method is as follows:
extracting the characteristics of the stereo image pair through the convolution layer of the DispNet C network to obtain the characteristic diagrams of the two images respectively; inputting the characteristics into a relevant layer of the DispNetC network, and acquiring the relationship of the characteristics in the corresponding position of the characteristics in a characteristic space to obtain initial matching cost; comparing the relations of the blocks of the two feature maps through the correlation layer of the DispNet C network, namely calculating the correlation between the blocks, wherein the formula is as follows:
Figure BDA0002387805400000021
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature diagram f 1 In the formula x 1 A block of center, x 2 Representation feature diagram f 2 In the formula x 2 The image is a central block, k is the size of the image block, and d is the image displacement range, namely the parallax search range;
and in the process of solving the matching cost, setting the left image as a reference image, moving within the range d, and calculating the correlation size to obtain an initial matching cost volume.
Further, in the second step, the initial matching cost volume is input into the encoding-decoding structure of the dispnet c network, the matching cost volumes are stacked into a spatial pyramid and combined with a simplified independent component analysis loss function, and by using the correlation between the channel vectors, the importance measurement of the pixel point and the adjacent pixel in all parallax search ranges is completed, and the weight update of the pixel point is completed, specifically as follows:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' leveling into
Figure BDA0002387805400000022
Wherein X j From W i H i Individual channel vector->
Figure BDA0002387805400000023
Composition of W i 、H i Respectively represent the length and width of the matched cost volume, d j The number of layers of the matched cost volume after upsampling is represented, i represents the position of a pixel point, and j represents the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vector
Figure BDA0002387805400000024
The sum of the weights of the points is obtained;
Figure BDA0002387805400000025
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
Figure BDA0002387805400000026
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i representing the position of the pixel point, W i H i Representation matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
(5) Weighting matrix A i And X j Multiplying to obtain a polymerized vector M i ,M i =A i X j (ii) a Aggregating the completed cost vector
Figure BDA0002387805400000031
Switch to cost @>
Figure BDA0002387805400000032
d i Representing the number of cost volume layers after the cost aggregation.
Furthermore, because a series of operations such as preprocessing, feature extraction and the like are required in the traditional Independent Component Analysis (ICA) algorithm, a new Simplified Independent Component Analysis (SICA) loss function is defined according to the ICA loss function only when a matching cost volume pyramid is constructed, and the SICA loss function parameters are corresponding to the parameters in the ICA loss function;
weight matrix A i By channel vector
Figure BDA0002387805400000033
The self weighting is obtained, the influence of other pixel points is considered, and the simplified independent component analysis loss function is defined as follows by combining the independent component analysis loss function: />
Figure BDA0002387805400000034
Wherein L is SICA Representing simplified independent component analysisThe loss function, I denotes the identity matrix, x denotes the sum of squares function.
Further, in the third step, a local similarity loss function is constructed by combining the regional loss function on the basis of the single-pixel point loss function, and a total loss function is obtained by combining the simplified independent component analysis loss function;
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
Figure BDA0002387805400000035
where N is the number of pixels, d n And
Figure BDA0002387805400000036
the predicted disparity and the true disparity of the nth pixel are respectively.
Further, KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallax of the pixel n is the same as that of the pixel t in the neighborhood of the pixel n, the difference of the predicted parallax of the pixel n and the predicted parallax of the pixel t is smaller when the network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the larger the difference of the predicted parallaxes of the pixel n and the pixel t is, and the smaller the loss function is, the more the difference meets the expectation in the network training process; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
Figure BDA0002387805400000037
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,
Figure BDA0002387805400000038
and &>
Figure BDA0002387805400000039
The true disparity values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter.
Further, a regional loss function is combined on the basis of a single-pixel point loss function, and a local similarity loss function is defined to be L l Comprises the following steps:
Figure BDA00023878054000000310
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,
Figure BDA0002387805400000041
representing the true disparity value within the region, n representing the center pixel of the region, R (×) representing the neighborhood of p × q, R representing the area of the neighborhood of p × q.
Further, a simplified independent component analysis loss function L is combined SICA And a local similarity loss function L l The total loss function L is defined as:
Figure BDA0002387805400000042
where ω and λ are weight parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l R (×) represents the neighborhood of p × q, R represents the area of the neighborhood of p × q.
Further, in the fourth step, updating network parameters by using a BPTT algorithm, wherein the parameters include weight and bias.
The invention improves DispNetC. The DispNet network structure is used for stereo matching and solving a disparity map, and the network comprises three parts: feature extraction, feature correlation calculation and coding and decoding structure. The disparity map can be obtained by inputting the stereo image to a DispNet C network through feature extraction, feature correlation calculation and coding and decoding structures.
According to the invention, ICA cost aggregation and a corresponding ICA loss function are introduced on the encoding and decoding structure of the DispNet C, and a region loss function is added on the basis of the original single-pixel-point loss function of the DispNet C. Firstly, the simplified independent component analysis cost aggregation is proposed, a matching cost volume pyramid is introduced into a decoding part of a DispNet C coding-decoding structure, a simplified independent component analysis loss function is defined, and a preprocessing process of an independent component analysis algorithm is simplified; secondly, introducing a regional loss function, and defining a local similarity loss function by combining a single-pixel point loss function so as to perfect the spatial structure of the disparity map; and finally, combining the simplified independent component analysis loss function with the local similarity loss function to predict the disparity map and make up the edge information of the disparity map.
Has the beneficial effects that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention constructs a stereo matching method based on simplified independent component analysis and local similarity, and provides a simplified independent component analysis matching cost aggregation and a local similarity loss function which integrates a matching cost volume pyramid and a simplified independent component analysis loss function. The matching cost aggregation model provided by the stereo matching method perfects the scene structure and the detail part of the disparity map. The local similarity loss function makes up the defect of the single-pixel loss function, the internal relation between pixels is learned by depending on independent pixels and depending on neighborhood pixel information, the prediction accuracy of the edge and detail parts of the disparity map is improved while the prediction speed of the disparity map is ensured, and the degree of dependence on the single pixel in the prediction process is reduced.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a simplified independent component analysis matching cost aggregation diagram;
FIG. 3 is a schematic diagram of constructing a local similarity loss function.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The stereo matching method based on simplified independent component analysis and local similarity provided by the invention has the implementation flow as shown in figure 1, and comprises the following specific implementation steps:
inputting stereo images shot by a binocular camera into a convolution layer of a DispNet network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, completing the calculation of initial matching cost, and realizing the conversion from characteristic expression to pixel point similarity measurement; the method comprises the following specific steps:
in order to compare the similarity of two pixel points in an input image pair, the powerful expression of each pixel point needs to be obtained, and a left image I in a stereo image pair is extracted through a convolution layer of a DispNet C network l And right picture I r To obtain a left image feature F l And right image feature F r Wherein I and F respectively represent an original image and a characteristic diagram, and l and r respectively represent left and right to prepare for construction of matching cost;
will be characterized by F l And F r Inputting relevant layers of the DispNet network to obtain F l And F r Obtaining an initial matching cost F according to the relation of corresponding positions in the feature space c Completing the conversion from feature expression to pixel point similarity measurement;
the correlation layer of the dispnet c network is used for comparing the relations of the blocks of the two feature maps, i.e. calculating the correlation between the blocks, and the formula is as follows:
Figure BDA0002387805400000051
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature diagram f 1 In the formula x 1 A block of center, x 2 Representation feature map f 2 In the middle with x 2 The image is a central block, k is the size of the image block, and d is the image displacement range, namely the parallax search range;
and in the process of solving the matching cost, setting the left image as a reference image, moving within the range d, and calculating the correlation size to obtain an initial matching cost volume.
Inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, stacking the matching cost volumes into a spatial pyramid, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA The relevance between the channel vectors is utilized to finish the importance measurement of the pixel point and the adjacent pixels in all parallax search ranges, and the weight value updating of the pixel point is finished; fig. 2 is a schematic diagram illustrating an execution flow of simplified independent component analysis matching cost aggregation, which specifically includes:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' Laping to get
Figure BDA0002387805400000052
Wherein X j From W i H i Multiple channel vector>
Figure BDA0002387805400000061
Composition of W i 、H i Respectively representing the length and width of the matching cost volume, d j Representing the number of layers of the matched cost volume after upsampling, i representing the position of a pixel point, and j representing the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vector
Figure BDA0002387805400000062
The sum of the weights of the points is obtained;
Figure BDA0002387805400000063
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
Figure BDA0002387805400000064
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i represents the position of the pixel point, W i H i Representation matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, wherein gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
(5) Weighting matrix A i And X j Multiplying to obtain a polymerized vector M i ,M i =A i X j (ii) a Aggregating the vectors after completing cost
Figure BDA0002387805400000065
Switch to cost @>
Figure BDA0002387805400000066
d i Representing the number of cost volume layers after the cost aggregation.
Because the traditional Independent Component Analysis (ICA) algorithm needs a series of operations such as preprocessing, feature extraction and the like, a new Simplified Independent Component Analysis (SICA) loss function is defined according to the ICA loss function only when a matched cost volume pyramid is constructed, and the SICA loss function parameters correspond to the parameters in the ICA loss function;
the above weighting of the acquisition channel itself can be regarded as a simplified independent component analysis process: x j Can be regarded as a signal to be reduced in the process of independent component analysis processing; by
Figure BDA0002387805400000067
The obtained weight, when calculating the weight->
Figure BDA0002387805400000068
Can be viewed as a centralized step in the process of independent component analysis, where W a And b a Respectively representing a weight and an offset item, wherein the weight and the offset are updated in the network training process; weight matrix A i Corresponding to the transformation matrix W in the independent component analysis; at matching cost volume f j Assigning weights to important parts is similar to extracting main components in independent component analysis, the important parts refer to positions with characteristics in an image, such as the edges of the image, and are important for predicting parallax, and the higher the weight assigned to the positions with the characteristics is, the higher the parallax accuracy is; extracting principal components in the independent component analysis means that the independent component analysis is applied to principal component analysis, and the most representative feature is extracted.
The current weight matrix A i By channel vector
Figure BDA0002387805400000069
The self weighting is obtained, and the influence of other pixel points is not considered, so that the independent component analysis reconstruction loss function needs to be combined, and the simplified independent component analysis loss function is defined as follows:
Figure BDA00023878054000000610
wherein L is SICA Represents a simplified independent component analysis loss function, I represents an identity matrix, and x represents a sum of squares function.
Step three, inputting the aggregated matching cost volumeDecoding the last deconvolution layer of the structure, obtaining a deconvolution result as a disparity map, and constructing a local similarity loss function L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L; the method specifically comprises the following steps:
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
Figure BDA0002387805400000071
where N is the number of pixels, d n And
Figure BDA0002387805400000072
the predicted disparity and the real disparity of the nth pixel are respectively;
the KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallaxes of a pixel n and a pixel t in the neighborhood of the pixel n are the same, the difference of the predicted parallaxes of the pixel n and the pixel t is smaller when a network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the difference of the predicted parallaxes of the pixel n and the pixel t is larger, and the smaller the loss function is, the more the expectation is met; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
Figure BDA0002387805400000073
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,
Figure BDA0002387805400000074
and &>
Figure BDA0002387805400000075
The real parallax values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter;
combining the regional loss function on the basis of the single-pixel point loss function to construct a local similarity loss function, and defining L for the local similarity loss function l Comprises the following steps:
Figure BDA0002387805400000076
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,
Figure BDA0002387805400000077
representing the actual disparity value in the region, n represents the central pixel of the region, in this embodiment, R (×) represents the 3 × 3 neighborhood, R represents the area of the 3 × 3 neighborhood, and the local similarity loss function is schematically shown in fig. 3;
in summary, the loss function L is combined with the simplified independent component analysis SICA And a local similarity loss function L l The total loss function L is defined as:
Figure BDA0002387805400000078
wherein ω and λ are weighting parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l In this embodiment, R (×) represents the 3 × 3 neighborhood, and R represents the area of the 3 × 3 neighborhood.
And fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters by using a BPTT algorithm, wherein the parameters comprise weight and offset, and obtaining a full-size disparity map through the trained network prediction.
The foregoing is a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. An image stereo matching method based on simplified independent component analysis and local similarity is characterized in that: the method comprises the following steps:
inputting a stereo image shot by a binocular camera into a convolutional layer of a DispNet network, extracting the characteristics of each pixel, constructing an initial matching cost volume by calculating the characteristic correlation, and completing the calculation of initial matching cost;
inputting the initial matching cost volume into an encoding-decoding structure of the DispNet network, performing simplified independent component analysis matching cost aggregation, and defining a simplified independent component analysis loss function L SICA Updating the weight of the pixel point;
inputting the aggregated matched cost volume into a last deconvolution layer of a decoding structure, wherein a deconvolution result is a disparity map, and constructing a local similarity loss function L l Combined with a simplified independent component analysis loss function L SICA Obtaining a total loss function L; the method comprises the following specific steps:
combining the regional loss functions on the basis of the single-pixel point loss functions, constructing local similarity loss functions, and combining simplified independent component analysis loss functions to obtain a total loss function;
in stereo matching, the difference between the predicted disparity map and the real disparity map is calculated and used as training loss, wherein the loss function L of a single pixel point s Expressed as:
Figure FDA0004058404930000011
where N is the number of pixels, d n And
Figure FDA0004058404930000012
predicted disparity and true disparity for the nth pixel, respectively;
The KL divergence is adopted to measure the similarity between two adjacent pixels, when the true parallaxes of the pixel n and the neighborhood pixel t are the same, the difference of the predicted parallaxes of the pixel n and the pixel t is smaller when a network is trained, and meanwhile, the smaller the loss function value is, the more the expectation is met; when the real parallaxes of the pixel n and the adjacent pixel t are different, the difference of the predicted parallaxes of the pixel n and the pixel t is larger, and the smaller the loss function is, the more the expectation is met; defining a regional loss function L based on similarity between two adjacent pixels r Comprises the following steps:
Figure FDA0004058404930000013
wherein D kl () Denotes the Kullback-Leibler divergence, d n And d t Respectively, the predicted parallax values of the central pixel point n and the field pixel point t,
Figure FDA0004058404930000014
and &>
Figure FDA0004058404930000015
The real parallax values of the central pixel point n and the field pixel point t are respectively, and m is a boundary parameter;
on the basis of the single-pixel point loss function, the regional loss function is combined, and the local similarity loss function is defined to be L l Comprises the following steps:
Figure FDA0004058404930000016
where N is the number of pixels, the regional loss function L r Wherein R (d) n ) Represents the predicted disparity value within the region,
Figure FDA0004058404930000017
representing the true disparity value within the region, n representing the central pixel of the region, R (. + -.) representing the neighborhood of p × q, R representing the face of the neighborhood of p × qAccumulating;
combining simplified independent component analysis loss function L SICA And a local similarity loss function L l The total loss function L is defined as:
Figure FDA0004058404930000021
wherein ω and λ are weighting parameters for controlling the simplified independent component analysis loss function L SICA And local similarity loss function L l R (—) represents the neighborhood of p × q, R represents the area of the neighborhood of p × q;
and fourthly, performing network training by using the real disparity map, the predicted disparity map and the defined total loss function L, updating network parameters, and predicting through the trained network to obtain the full-size disparity map.
2. The method for stereo image matching based on simplified independent component analysis and local similarity as claimed in claim 1, wherein: in the first step, the initial matching cost calculation method is as follows:
extracting the characteristics of the stereo image pair through the convolution layer of the DispNet network to obtain the characteristic images of the two images; inputting the characteristics into a relevant layer of the DispNetC network, and acquiring the relationship of the characteristics in the corresponding position of the characteristics in a characteristic space to obtain initial matching cost; comparing the relations of the blocks of the two feature maps through the correlation layer of the DispNet C network, namely calculating the correlation between the blocks, wherein the formula is as follows:
Figure FDA0004058404930000022
wherein c (x) 1 ,x 2 ) Correlation of blocks representing a feature map, f 1 And f 2 Respectively representing two characteristic diagrams, x 1 Representation feature map f 1 In the formula x 1 A block of center, x 2 Representation feature diagram f 2 In the formula x 2 A block as a center, k is a figureThe size of the image block, d is the image displacement range, namely the parallax search range;
in the process of solving the matching cost, the left image in the stereo image pair is set as a reference image, the reference image is moved within the range d, and the correlation size is calculated to obtain an initial matching cost volume.
3. The image stereo matching method based on simplified independent component analysis and local similarity according to claim 1, wherein: inputting the initial matching cost volume into a coding-decoding structure of the DispNetC network, stacking the matching cost volumes into a spatial pyramid, combining with a simplified independent component analysis loss function, and completing the importance measurement of the pixel points and the adjacent pixels in all parallax search ranges by utilizing the correlation among the channel vectors, and completing the weight update of the pixel points, wherein the specific steps are as follows:
(1) Cost aggregation based on simplified independent component analysis is completed in a decoding stage, the matched cost volume passes through a plurality of deconvolution layers of a decoding structure, each deconvolution layer obtains a deconvolution result, namely, each layer outputs a matched cost volume, and matched cost volumes f of different layers are stacked s Forming a spatial pyramid; up-sampling each layer of matching cost volume, the size of the up-sampled matching cost volume and the matching cost volume f output by the last layer s ' are the same size;
(2) Holding f s ' the number of channels is constant, f s ' leveling into
Figure FDA0004058404930000023
Wherein X j From W i H i Individual channel vector->
Figure FDA0004058404930000031
Composition of W i 、H i Respectively representing the length and width of the matching cost volume, d j The number of layers of the matched cost volume after upsampling is represented, i represents the position of a pixel point, and j represents the jth parallax search range;
(3) From flattened X j To obtain a weight matrix Y j ,Y j By channel vector
Figure FDA0004058404930000032
The sum of the weights of the points is obtained;
Figure FDA0004058404930000033
wherein W a And b a Respectively representing network weight and bias terms;
(4) For the weight matrix Y j Carrying out softmax normalization on the weight at the corresponding position i to obtain a normalized weight matrix A i The formula is as follows:
Figure FDA0004058404930000034
a i =softmax(Γ(y 1 ,...,y i ))
wherein a is i Representing the weight of the pixel point after normalization, i represents the position of the pixel point, W i H i Represents matrix A i Number of elements, y i Is a weight matrix Y j The element in (1) represents the weight of a pixel point at a position i before normalization, wherein gamma is a fusion function adopting element-wise sum, and T represents matrix transposition;
(5) Weighting matrix A i And X j Multiplying to obtain a polymerized vector M i ,M i =A i X j (ii) a Aggregating the completed cost vector
Figure FDA0004058404930000035
Conversion into a cost volume>
Figure FDA0004058404930000036
d i Representing the number of cost volume layers after the cost aggregation.
4. Root of herbaceous plantThe image stereo matching method based on simplified independent component analysis and local similarity as claimed in claim 3, wherein: weight matrix A i By channel vector
Figure FDA0004058404930000037
The self weighting is obtained, the influence of other pixel points is considered, and the simplified independent component analysis loss function is defined as follows by combining the independent component analysis loss function:
Figure FDA0004058404930000038
wherein L is SICA Represents a simplified independent component analysis loss function, I represents an identity matrix, and x represents a sum of squares function.
5. The image stereo matching method based on simplified independent component analysis and local similarity according to any one of claims 1 to 4, wherein: and fourthly, updating network parameters by using a BPTT algorithm, wherein the parameters comprise weight and bias.
CN202010103827.0A 2020-02-20 2020-02-20 Stereo matching method based on simplified independent component analysis and local similarity Active CN111368882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010103827.0A CN111368882B (en) 2020-02-20 2020-02-20 Stereo matching method based on simplified independent component analysis and local similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010103827.0A CN111368882B (en) 2020-02-20 2020-02-20 Stereo matching method based on simplified independent component analysis and local similarity

Publications (2)

Publication Number Publication Date
CN111368882A CN111368882A (en) 2020-07-03
CN111368882B true CN111368882B (en) 2023-04-18

Family

ID=71206367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010103827.0A Active CN111368882B (en) 2020-02-20 2020-02-20 Stereo matching method based on simplified independent component analysis and local similarity

Country Status (1)

Country Link
CN (1) CN111368882B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149547B (en) * 2020-09-17 2023-06-02 南京信息工程大学 Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching
CN113470099B (en) * 2021-07-09 2022-03-25 北京的卢深视科技有限公司 Depth imaging method, electronic device and storage medium
CN114049510A (en) * 2021-10-26 2022-02-15 北京中科慧眼科技有限公司 Binocular camera stereo matching algorithm and system based on loss function and intelligent terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584290A (en) * 2018-12-03 2019-04-05 北京航空航天大学 A kind of three-dimensional image matching method based on convolutional neural networks
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584290A (en) * 2018-12-03 2019-04-05 北京航空航天大学 A kind of three-dimensional image matching method based on convolutional neural networks
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的立体匹配研究;王润;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 *
基于卷积神经网络的立体匹配算法研究;严邓涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 *

Also Published As

Publication number Publication date
CN111368882A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN108510535B (en) High-quality depth estimation method based on depth prediction and enhancer network
CN111368882B (en) Stereo matching method based on simplified independent component analysis and local similarity
CN109598754B (en) Binocular depth estimation method based on depth convolution network
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
Mehta et al. Structured adversarial training for unsupervised monocular depth estimation
CN108846473B (en) Light field depth estimation method based on direction and scale self-adaptive convolutional neural network
CN110728707B (en) Multi-view depth prediction method based on asymmetric depth convolution neural network
CN109472819A (en) A kind of binocular parallax estimation method based on cascade geometry context neural network
CN112288758B (en) Infrared and visible light image registration method for power equipment
CN104867135A (en) High-precision stereo matching method based on guiding image guidance
CN113592026A (en) Binocular vision stereo matching method based on void volume and cascade cost volume
CN111985551B (en) Stereo matching algorithm based on multi-attention network
CN102263957B (en) Search-window adaptive parallax estimation method
CN109831664B (en) Rapid compressed stereo video quality evaluation method based on deep learning
CN111402311A (en) Knowledge distillation-based lightweight stereo parallax estimation method
CN111914913B (en) Novel stereo matching optimization method
CN111583313A (en) Improved binocular stereo matching method based on PSmNet
CN108171249A (en) A kind of local description learning method based on RGBD data
CN111462211B (en) Binocular parallax calculation method based on convolutional neural network
CN113780389A (en) Deep learning semi-supervised dense matching method and system based on consistency constraint
CN109801323A (en) Pyramid binocular depth with self-promotion ability estimates model
CN117152580A (en) Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method
CN116468769A (en) Depth information estimation method based on image
CN114155406A (en) Pose estimation method based on region-level feature fusion
CN118115559A (en) Stereo matching algorithm combining Transformer and HITNet networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant