CN111709977A - Binocular depth learning method based on adaptive unimodal stereo matching cost filtering - Google Patents

Binocular depth learning method based on adaptive unimodal stereo matching cost filtering Download PDF

Info

Publication number
CN111709977A
CN111709977A CN202010185728.1A CN202010185728A CN111709977A CN 111709977 A CN111709977 A CN 111709977A CN 202010185728 A CN202010185728 A CN 202010185728A CN 111709977 A CN111709977 A CN 111709977A
Authority
CN
China
Prior art keywords
matching cost
network
stereo
unimodal
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010185728.1A
Other languages
Chinese (zh)
Inventor
百晓
张友敏
于洋
安冬
石翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Research Institute Of Beijing University Of Aeronautics And Astronautics
Goertek Robotics Co Ltd
Original Assignee
Qingdao Research Institute Of Beijing University Of Aeronautics And Astronautics
Goertek Robotics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Research Institute Of Beijing University Of Aeronautics And Astronautics, Goertek Robotics Co Ltd filed Critical Qingdao Research Institute Of Beijing University Of Aeronautics And Astronautics
Priority to CN202010185728.1A priority Critical patent/CN111709977A/en
Publication of CN111709977A publication Critical patent/CN111709977A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a binocular depth learning method based on adaptive unimodal stereo matching cost filtering, which is characterized in that: the method is characterized in that unimodal distribution supervision with real parallax as the center is directly applied to the matching cost predicted by the network, and adaptive matching cost filtering is realized, and the method comprises the following steps: 1) constructing a data set, wherein the data set comprises a left image and a right image, and the left image and the right image are used as a stereo image pair; 2) inputting a stereo image pair into a PSMNET stereo matching model basic network by taking PSMNT as the stereo matching model basic network, and outputting three matching Cost bodies (Cost volumes) aggregated by a stacked hourglass 3D convolutional neural network by the PSMNT stereo matching model basic network; 3) for each matching Cost body (Cost Volume), a Confidence evaluation Network (Confidence evaluation Network) is used for estimating a Confidence map and adjusting a real matching Cost body (Ground Truth Cost Volume) to generate a Unimodal Distribution (Unimodal Distribution) of a pixel level as a Network training mark. The invention has the advantages that the defects in the prior art can be overcome, and the structural design is reasonable and novel.

Description

Binocular deep learning method based on adaptive unimodal stereo matching cost filtering
Technical Field
The invention relates to a binocular deep learning method based on self-adaptive unimodal stereo matching cost filtering, and belongs to the technical field of binocular stereo matching visual image processing.
Background
Binocular stereo vision obtains rich three-dimensional stereo data, especially depth information, by mimicking the principles of human vision. Through the development of many years, binocular stereo vision plays a great role in the fields of industrial measurement, three-dimensional reconstruction, unmanned driving and the like. The binocular stereo vision is a method for acquiring three-dimensional geometric information of an object by acquiring two images of the object to be measured from different positions by using imaging equipment based on the principle of parallax and calculating the position deviation between corresponding points of the images. The binocular stereo matching process generally comprises four steps: calculating matching cost, aggregating the matching cost, calculating a disparity map and optimizing the disparity map. Wherein the matching cost calculation is the core part of the whole algorithm. The traditional stereo method generally adopts manually designed image characteristics and cost functions to calculate matching cost, and due to the limitation of manual design, the obtained stereo matching result has weak anti-interference capability and limited applicable scenes. In recent years, many convolutional neural network-based stereo matching methods propose image feature extraction and cost function learning modeling as network layers. For example, the DispNet C proposes that a correlation layer is used as an approach of a cost function, then the image feature extraction is learned through a parallax regression loss constraint network, and as too much information is lost in the process of calculating the matching cost by the correlation layer, the binocular matching result precision is low; the GCNet further releases the flexibility of network learning image features and cost functions, the left and right image features are connected in channel dimensions, a series of three-dimensional convolutional layer learning matching cost calculation is utilized, however, the end-to-end network design is subjected to network learning through parallax regression (regression by soft argmin function) loss supervision, clear constraints are not provided for the matching cost calculation process, and therefore image feature extraction and cost calculation functions cannot be effectively learned.
Disclosure of Invention
The invention provides a binocular deep learning method (AcfNet) based on self-adaptive unimodal stereo matching cost filtering, which improves the existing stereo matching method based on a convolutional neural network and directly supervises and learns the matching cost calculation process.
In order to solve the technical problems, the invention adopts the technical scheme that a binocular deep learning method based on adaptive unimodal stereo matching cost filtering directly applies unimodal distribution supervision with real parallax as the center to the matching cost predicted by a network to realize adaptive matching cost filtering, and comprises the following steps of:
1) constructing a data set, wherein the data set comprises a left image and a right image, and the left image and the right image are used as a stereo image pair;
2) inputting a stereo image pair into a PSMNET stereo matching model basic network by taking PSMNT as the stereo matching model basic network, and outputting three matching Cost bodies (Cost volumes) aggregated by a stacked hourglass 3D convolutional neural network by the PSMNT stereo matching model basic network;
3) for each matching Cost body (Cost Volume), respectively estimating a confidence map by using a confidence evaluation Network (confidence evaluation Network) and adjusting a real matching Cost body (Ground Truth Costvolume) to generate a Unimodal Distribution (Unimodal Distribution) of a pixel level as a Network training mark;
4) proposing a matching cost body of a Stereo Focal Loss (Stereo Focal local) constraint estimation and a real matching cost body;
5) generating a disparity map of sub-pixels from the estimated matching cost volume by the Soft Argmin function and using a regressive L1Loss supervised estimated disparity maps and true disparity maps.
In the technical scheme of the application, under an ideal condition, the matching cost distribution of each pixel point is unimodal distribution with real parallax as a center. In order to explicitly constrain the network to learn this Cost distribution to learn more robust image features and Cost computation functions, we propose to generate a unimodal Cost distribution centered on true disparity for each pixel from the true disparity map and to apply direct supervision on the matching Cost Volume (Cost Volume) of the network prediction. In order to reveal the matching uncertainty of each pixel point, a confidence degree estimation network is designed to estimate the confidence degree of each pixel point and is used for adjusting the corresponding real unimodal distribution. The confidence evaluation network is the core for realizing the adaptive matching cost filtering, and can adaptively adjust the smoothness of unimodal distribution, namely the distribution variance according to the learning difficulty of the network.
Optimized, the binocular deep learning method based on adaptive unimodal stereo matching cost filtering performs real unimodal matching cost distribution generation in step 2), that is, for each pixel point p in the reference image, a series of pixel points to be matched are searched on the same polar line of the target image, a matching cost body reflects the similarity of the pixel pairs to be matched, the matching cost between real matching pairs should be minimum, and the matching cost of disparity values of other parameters should increase with the distance from real disparity; the matching cost distribution of each pixel should be centered on the true disparity.
In the optimized binocular deep learning method based on the adaptive unimodal stereo matching cost filtering, in the step 3), a confidence evaluation network is constructed to estimate the confidence level of the matching cost predicted by the trunk network, and the confidence evaluation network adaptively adjusts the smoothness of the unimodal distribution, namely adaptively adjusts the variance of the unimodal distribution according to the learning difficulty of the network.
In the optimized binocular deep learning method based on the adaptive unimodal stereo matching cost filtering, in the step 4), stereo focusing loss calculation is performed, and D matching costs { C ] are constructed for each pixel point by the matching cost body0,C1,···CD-1}, i.e. matching cost distributions; for a pixel point pFor the estimated matching cost distribution
Figure BDA0002414099650000021
And true matching cost distribution Pp(d) And similarity between the estimated matching cost distribution and the real matching cost distribution by using the cross entropy loss unbalance amount is used as a network supervision item.
In the optimized binocular deep learning method based on adaptive unimodal stereo matching cost filtering, in the step 2), in the generation of the real unimodal matching cost distribution, the parallax d search set in the target image is assumed to be {0,1, LD-1}, wherein the real parallax value is dgtThe true unimodal distribution is defined as:
Figure BDA0002414099650000022
wherein
Figure BDA0002414099650000023
σ>0 is a variance for controlling the degree of sharpness of the peak shape around the true parallax.
Optimally, in the self-confidence evaluation network in the step 3), the self-confidence evaluation network firstly consists of a convolution layer of 3 × 3, a normalization layer and a ReLU layer, and then outputs a value belonging to [0,1 ] in another convolution layer of 1 × 1 and the sigmoid function]The network directly outputs the confidence degree chart f ∈ [0,1 ] for the input aggregated matching cost body]H×WWherein H, W are image height, width, respectively.
In the optimized binocular deep learning method based on the adaptive unimodal stereo matching cost filtering, in the confidence evaluation network in the step 3), for the pixel point p, the variance of the true matching cost distribution can be estimated from the estimated confidence value fpAnd (3) carrying out dynamic adjustment: sigmap=s(1-fp) Plus, where s is greater than or equal to 0 and is a constant, reflecting the variance σ to the confidence value change fpVarying degrees of sensitivity, and ∈ > 0 defines the lower bound of σ, and correspondingly σp∈[,s+]。
For a pixel p, if its predicted confidence fpVery large means that the network can find its unique matching points with great confidence; conversely, if its predicted confidence value is small, it means that there is a match ambiguity. Thus, the variance of the true matching cost distribution can be dynamically adjusted by the estimated confidence value: sigmap=s(1-fp)+. Wherein s is a constant value greater than or equal to 0 and reflects the variance sigma to the confidence value fpVarying degree of sensitivity, and>0 defines the lower bound of σ and can effectively prevent the mathematical problem of dividing 0.
Optimally, in the binocular deep learning method based on the adaptive unimodal stereo matching cost filtering, in the stereo focusing loss calculation in the step 4), a weight factor focusing on the positive parallax loss is introduced to improve the cross entropy loss and finally form a mathematical form of the stereo focusing loss:
Figure BDA0002414099650000031
wherein α ≧ 0 is the focusing parameter, and when α ≧ 0, the loss function directly degenerates to the cross-entropy loss, when α>At 0, the stereoscopic focusing loss will be according to Pp(d) Assigns more weight to positive disparity samples.
In an optimized binocular deep learning method based on adaptive unimodal stereo matching cost filtering, the PSMNet stereo matching model base network includes a Spatial Pyramid Module (Spatial Pyramid filtering Module) for extracting image features, and the Spatial Pyramid Module (Spatial Pyramid filtering Module) extracts image features including multi-scale context information by setting 4 parallel average Pooling modules (Pooling) with fixed sizes; the PSmNet stereo matching model base network comprises a 3D CNN framework with an encoding and decoding structure in a hourglass shape, wherein the 3D CNN framework carries out repeated processing processes from top to bottom and from bottom to top and supervises and learns matching cost bodies output by the PSmNet stereo matching model base network in three stages.
The application has the advantages thatIn the following steps: in the conventional stereo matching method based on parallax regression, such as GCNet and PSmNet, the estimated matching cost distribution
Figure BDA0002414099650000032
Obtaining an estimated parallax value by regression through soft argmin function
Figure BDA0002414099650000033
Figure BDA0002414099650000034
In the stage of utilizing network training, for the pixel point p, the real parallax value is dpsmoothL is generally used1Loss constraint:
Figure BDA0002414099650000035
wherein
Figure BDA0002414099650000036
And (5) supervising network learning.
Since the whole process is conductive, the network can directly use the real disparity map for supervised training. However, as can be seen from the mathematical formula of the soft argmin function, the matching cost body is only used as the weighting weight of the parallax interpolation process in the whole regression process, and only the real parallax value needs to be obtained, so that the matching cost body can participate in the parallax interpolation process in any state, and no requirement is imposed on the mathematical distribution of the matching cost body. This fact contradicts the fact that the matching cost distribution of each pixel should exhibit a unimodal distribution, and the direct reason is the lack of direct supervision constraints on the matching cost distribution, which also motivates us to propose an adaptive unimodal matching cost filtering scheme. The scheme provided by the application can restrain network learning and estimate matching cost distribution of a unimodal mode, and the matching cost is minimum at a real parallax value, namely the similarity is highest. Compared with the conventional PSmNet method which takes parallax regression loss as supervision, due to the fact that a clear unimodal constraint is not applied to matching cost in the network learning process, the matching cost distribution which is learned and estimated by the network not only presents multiple peaks, but also the parallax value corresponding to the two peaks with the minimum cost (the maximum similarity) is far from the real parallax value, and therefore the fact that the network does not learn a robust feature similarity judgment standard is shown. On the contrary, the scheme AcfNet can find out the most matched pixel points in the left and right images and estimate the maximum similarity probability.
The confidence evaluation network is the core for realizing the adaptive matching cost filtering, and can adaptively adjust the smoothness of unimodal distribution, namely the distribution variance according to the learning difficulty of the network. To quantitatively evaluate its performance, the present application employs sparsiconfiguration plots technology. It can reveal the coincidence of the predicted confidence evaluation result and the real error magnitude. Drawing sparse placement plots of AcfNet on a Scene Flow test set, and evaluating EPE errors of other pixel points after the part of pixel points with relatively low self-confidence is continuously removed; the Oracal curve corresponds to EPE errors of the rest pixels after the part of pixels with relatively large errors are continuously removed; meanwhile, according to an EPE error curve after the pixels are randomly removed, namely a Random curve, a confidence evaluation curve of the method is very close to an Oracla curve according to the curve, only 6.9% of the pixels are removed, the error is reduced by half, and the performance is far better than the performance of randomly removing the pixels. This fully demonstrates the superior performance of our confidence assessment in detecting and interpreting outliers.
Drawings
FIG. 1 is a schematic diagram of the overall network architecture framework of the present application;
fig. 2 is a schematic diagram of a PSMNet network structure according to the present application;
FIG. 3 is a graph of ablation test results for various parameters of the present application;
FIG. 4 is a distribution histogram of variance σ in a Scene Flow test set in the present application;
FIG. 5 is a schematic diagram of the effectiveness of variance adjustment in the present application;
fig. 6 is an illustration of matching cost distribution samples in the disparity dimension in the present application;
FIG. 7 is a diagram of qualitative evaluation results in a Scene Flow test set according to the present disclosure;
fig. 8 is a graph of a visualization result of the KITTI2012 in the present application;
fig. 9 is a visualization result diagram of the technical solution of the present application at KITTI 2015;
FIG. 10 is a table of an adaptive unimodal matching cost filtering effectiveness analysis of the present application;
FIG. 11 is a matching cost filtering comparison analysis table of the present application;
fig. 12 is a table comparing performance of the method in the prior art on three data sets, namely Scene Flow, KITTI2012 and KITTI2015 according to the technical solution of the present application.
Detailed Description
The technical features of the present invention will be further explained with reference to the accompanying drawings and specific embodiments.
As shown in the figure, the invention is a binocular deep learning method based on adaptive unimodal stereo matching cost filtering, and ideally, the matching cost distribution of each pixel point is unimodal distribution with real parallax as the center. In order to explicitly constrain the network to learn this Cost distribution to learn more robust image features and Cost computation functions, we propose to generate a unimodal Cost distribution centered on the true disparity for each pixel from the true disparity map and to apply direct supervision to the matching Cost Volume (Cost Volume) of the network prediction. In order to reveal the matching uncertainty of each pixel point, a confidence degree estimation network is designed to estimate the confidence degree of each pixel point and is used for adjusting the corresponding real unimodal distribution. Fig. 1 illustrates the overall network architecture framework of the present application. Since PSMNet is currently the stereo matching model using the most advanced technology, we adopt it as our base network. For the input left and right image pairs, the PSmNet outputs 3 matched Cost bodies (Cost volumes) aggregated by a stacked hourglass 3D convolutional neural network; for each matching Cost body (Costvolume), a Confidence degree graph is estimated by using a Confidence evaluation Network (Confidence evaluation Network) and used for adjusting a real matching Cost body (Ground Truth Cost Volume) respectively in the application, so as to generate Unimodal Distribution (Unimodal Distribution) at a pixel level as a Network training mark; and we propose that the Stereo Focal Loss (Stereo local) constrains the estimated and true matching cost volumes. Finally, a disparity map of the sub-pixels is generated from the estimated matching cost volume by the SoftArgmin function, and one regression L1 penalty is used to supervise the estimated and true disparity maps.
Stereo matching algorithm based on 3DCNN
In the corrected image pair, for each pixel point p (x, y) in the left image, the objective of binocular stereo matching is to find the corresponding point in the right image, i.e., p' (x + d, y), d ∈ R+For the sake of calculation and memory access, the disparity is generally discretized into a series of possible disparity reference values, namely {0,1, LD-1}, so that a matching Cost body (Cost Volume) of H × W × D can be constructed, where H, W, and D are the image height, width, and maximum disparity value, respectively.
For the adopted PSmNet model, the network structure is shown in FIG. 2, and the PSmNet model mainly comprises 4 parts, namely feature extraction, calculation of a matching cost, and matching cost aggregation and parallax regression based on a 3D Convolutional Neural Network (CNN).
It is very difficult to determine the context only from the pixel intensities, and semantic information containing object levels would be very beneficial for matching, especially for disparity estimation of ill-conditioned areas. In order to learn and extract the relative relationship between objects, the PSMNet proposes a Spatial Pyramid Module (Spatial Pyramid power Module) for image feature extraction, and the final image feature representation contains multi-scale context information through 4 parallel fixed-size average Pooling modules. The matching cost body is formed by adopting a GCNet connection mode, and the most original image information is reserved for left and right feature matching. In order to gather feature information in parallax dimension and spatial dimension, the PSmNet provides a 3D CNN framework with an hourglass-shaped encoding and decoding structure, which comprises repeated processing processes from top to bottom and from bottom to top, and also monitors and learns matching cost bodies output from three stages of a network. Overall, PSMNet achieves very superior stereo matching performance, which is also the reason for selecting the network framework as the underlying network of the present application.
Generally speaking, a matching Cost body (Cost Volume) constructs D matching costs { C ] for each pixel point0,C1,···CD-1I.e. the matching cost distribution. To estimate the disparity values of the sub-pixels from this distribution, GCNet proposes to perform a regression using the soft argmin function:
Figure BDA0002414099650000051
wherein the content of the first and second substances,
Figure BDA0002414099650000052
the more the disparity value with the smallest matching cost contributes to the final interpolation result. In the network training stage, for the pixel point p, the real parallax value is dpsmoothL is generally used1Loss constraint:
Figure BDA0002414099650000053
wherein
Figure BDA0002414099650000054
Since the whole process is conductive, the network can directly use the real disparity map for supervised training. However, as can be seen from the soft argmin regression process of the formula (3.1), the matching cost body is only used as the weighting weight of the parallax interpolation process, and only the true parallax value needs to be obtained, so that the matching cost body can participate in the parallax interpolation process in any state, and no requirement is imposed on the mathematical distribution of the matching cost body. This fact contradicts the fact that the matching cost distribution of each pixel should exhibit a unimodal distribution, and the direct reason is the lack of direct supervised constraints on the matching cost distribution, which also motivates us to propose an adaptive unimodal matching cost filtering scheme.
An adaptive unimodal matching cost filtering module:
as shown in fig. 1, the proposed AcfNet network structure can complete learning of a unimodal matching cost distribution by embedding only one adaptive unimodal matching cost filtering module on the basis of PSMNet. For 3 aggregated matching cost bodies output by the PSmNet, the adaptive filtering effect is realized by three parts of unimodal matching cost distribution generation, a confidence estimation network and stereo focusing loss.
Unimodal matching cost distribution generation
The matching cost body reflects the similarity of the pixel pairs to be matched, the matching cost between the real matched pairs should be minimum, and the matching cost of the parallax value of other parameters should increase along with the distance from the real parallax. This property requires that the matching cost distribution of each pixel should be centered around the true disparity. Given a true parallax dgtUnimodal distribution is defined as:
Figure BDA0002414099650000061
wherein
Figure BDA0002414099650000062
σ>0 is variance, and the sharpness of the peak around the real parallax can be controlled.
Generally, the context information of each pixel is not the same. Therefore, it is not reasonable to keep a uniform true matching cost distribution p (d) for each pixel. For example, a pixel located at a corner of a table may be more biased toward a very sharp single peak, while a region with less texture may be more biased toward a relatively flat distribution. In order to establish a reasonable matching cost body, a confidence evaluation network is designed to adaptively regulate each imageUnimodal distribution variance σ of prime pointp
Confidence estimation network
In the conventional confidence evaluation method, a great deal of research work is focused on researching an aggregated matching cost distribution curve, and further, outliers in a predicted disparity map are effectively detected and used for improving the prediction accuracy of the disparity map, in the prior art, matching cost filtering methods based on confidence guidance are proposed, the methods generally directly use confidence evaluation as prior information or additional characteristic information to optimize matching cost and the disparity map, but the method is directly used for adjusting the smoothness of real matching cost distribution according to the confidence score predicted by a network, so that each pixel can adaptively adjust the smoothness of unimodal distribution according to context information]H×W. For a pixel point, if its predicted confidence fpVery large means that the network can find its unique matching points with great confidence; conversely, if its predicted confidence value is small, it means that there is a match ambiguity. Thus, the variance of the true matching cost distribution can be dynamically adjusted by the estimated confidence value:
σp=s(1-fp)+
wherein s is a constant value greater than or equal to 0 and reflects the variance sigma to the confidence value fpVarying degree of sensitivity, of>0 defines the lower bound of σ and can effectively prevent the mathematical problem of dividing 0. Corresponding sigmap∈[,s+]. In our experiments, two types of pixels are likely to have large variance values σ: few textures and occluded pixels. For a region with few textures, a plurality of matched pixel points may exist; for the occluded pixels, the correct matching point cannot be found. Due to sigma of each pixelpCan be dynamically adjusted to be trueThe cost counterpart can be modified accordingly according to equation (3.5).
Loss of stereo focus
For the pixel point p, the technical scheme in the application obtains the estimated matching cost distribution
Figure BDA0002414099650000071
And true matching cost distribution Pp(d) Calculating the distribution error with cross entropy loss is the most straightforward way. However, each pixel has a serious disparity sample imbalance problem, that is, each pixel has only one true disparity value (positive sample) and hundreds of unmatched disparity values (negative samples). Therefore, inspired by the fact that Focal local solves the sample imbalance in the one-stage target detection, the present application proposes a Stereo Focal Loss (Stereo Focal local) for the prediction of the focus positive samples, in case the network training is dominated by negative samples,
Figure BDA0002414099650000072
wherein α ≧ 0 is the focusing parameter, and when α ≧ 0, the loss function directly degenerates to the cross-entropy loss, when α>At 0, the Stereo Focal Loss (Stereo Focal Loss) will be according to Pp(d) Assigns more weight to positive disparity samples.
The overall loss function:
in summary, our final loss function contains a total of three parts:
Figure BDA0002414099650000073
wherein λregression,λconfidenceTwo trade-off parameters.
Figure BDA0002414099650000074
The learning of the matching cost body is supervised,
Figure BDA0002414099650000075
supervising the return of the parallax
Figure BDA0002414099650000076
Then acting as a regularizer encourages more pixels to have a large confidence value,
Figure BDA0002414099650000077
the above is demonstrated experimentally as follows, the demonstration process being:
database and evaluation index and implementation details
(1) Database with a plurality of databases
For qualitative and quantitative evaluation of the methods proposed in this application, evaluations will be performed on three public datasets (Scene Flow, KITTI2012, KITTI 2015). The Scene Flow is a synthetic data set, which comprises 35454 training picture pairs and 4370 testing picture pairs, and provides dense real parallax labeling information, so that the Scene Flow is very suitable for training and testing network models. The KITTI2012 and the KITTI2015 are two real street view data sets, and the provided parallax labeling information is obtained by radar scanning, so that the parallax labeling information is sparse. The former comprises 194 training picture pairs and 195 test picture pairs; and the latter contains 200 training picture pairs and 200 test picture pairs. Both KITTI data sets are too small for training neural networks. Therefore, we refer to GC-Net to design ablation experiments mainly on Scene Flow and analyze the network design.
(2) Evaluation index
In the experiment, we used two standard evaluation indexes: 3-pixel-error (3PE), which refers to the percentage of the total number of pixels with predicted parallax and real parallax greater than 3 pixels; end-point-error (epe), refers to the average difference between the predicted disparity and the true disparity. EPE emphasizes sub-pixel errors, while 3PE emphasizes the percentage of outliers. Moreover, in order to further evaluate the performance of the method proposed in the present application on processing the occlusion region, we divided the SceneFlow test set into Occlusion (OCC) and non-Occlusion (OCC) according to left-right consistency checkAnd (4) notch regions (NOC). Firstly, label the left real disparity map DLThe coordinate of the middle pixel point is p, and then the formula is as follows:
NOC if|d-DR(p-d)|≤1ford=DL(V),(3.10)
OOC otherwise. (3.11)
NOC if|d-DR(p-d)|≤1ford=DL(p),(3.10)
OOC otherwise(3.11)
wherein DRAnd p-d is the right real disparity map, and the corresponding position p in the right map is shifted by d pixel values to the left. According to our statistics, occluded pixels account for 16% of the entire test set.
(3) Implementation details
The method of the application is realized by adopting PyTorch, and all models are trained end to end by adopting RMSProp standard setting. For all images in the data set, color normalization is adopted for data processing. At the time of training, H256 and W512 image blocks are randomly truncated, and the maximum disparity value D is set to 192. For network training, network parameters were randomly initialized and trained on Scene Flow for 10 cycles (Epoch) at a constant learning rate of 0.001 and tested directly with the trained model. For the KITTI dataset, fine-tuning was performed for 600 cycles (Epoch) using a pre-trained model on Scene Flow. The initial fine learning rate was set to 0.001 and decayed at 100 and 300 cycles
Figure BDA0002414099650000081
When submitted to the KITTI publication chart, training is extended to 20 cycles (Epoch) on Scene Flow for a better pre-trained model. The size of the training batch is 3, and 3 blocks of NVIDIA GTX 1080Ti GPUs are arranged in total, so that 1 batch of data is placed on each display card.
Experimental results and discussion
(1) Analyzing ablation experiment results:
all experiments were performed on a Scene Flow dataset, since there was sufficient data volume for network end-to-end training and no worry about overfitting issues. In all experiments, the Stereo Focal local performs positive-negative parallax sample equalization with α being 5.0. Considering that most parallax prediction errors are sub-pixel errors, namely the errors are smaller than 1 pixel point, and the 3-pixel error evaluated by the 3PE cannot accurately reveal the network performance, the method only utilizes the EPE error to research the performance difference of the network under different parameter settings.
Unimodal distribution variance σ analysis:
the size of the variance sigma reflects the sharpness of unimodal distribution, and plays a crucial role in the binocular deep learning method based on the adaptive unimodal stereo matching cost filtering mentioned in the application. In the method of the present application, the variance is mainly limited by the sum S, i.e., σp∈[,s+]。
First, a case where the variance σ is fixed, that is, the variances of all the pixels are the same value (s ═ 0, σ ═ is studied. Through grid search, the network prediction result is best when σ is 1.2. This also suggests that for most pixel points, they prefer to establish a unimodal distribution with σ ═ 1.2. Therefore, the lower limit of σ is set to 1.0 to explore adaptive variance learning.
Next, a variance sensitivity adjustment parameter S is investigated, which controls the upper limit of the variance σ. Fig. 3(a) shows the results obtained by adjusting the parameter S, where S is 1, the best effect is obtained, and the performance is quite stable when S is changed from 0.5 to 3.0. And, after the network converges, the variance distribution histogram of all the pixels in the Scene Flow test set when s is 1, i.e. σ ∈ [1.0,2.0] is shown in fig. 4. It can be seen that most of the pixels are biased to small variance, and some of the pixels require larger variance to smooth the single peak distribution.
Loss of equalization weight:
λconfidencethe balance between the loss of the confidence network and other losses is adjusted, and the learning of the variance is also controlled implicitly. As can be seen from FIG. 3(b), when λ is variedconfidenceWhen the matching confidence of each pixel point is too high or too low, a worse result is caused, and when the lambda is larger than the threshold value, the matching confidence of each pixel point is smaller than the threshold valueconfidenceThe best performance was obtained at 8.0.
λregressionBalanced is the widely used parallax regression loss in existing networks, and excessive lambdaregressionThe other two losses presented herein are eliminated. Fig. 3(c) shows the performance variation curve, and it can be seen that properly balancing the effect of the regression loss and the other two losses can greatly improve the network matching performance.
(2) Analysis of variance
The variance estimation is an important design for realizing the adaptive matching cost filtering in the application, and can adaptively adjust the smoothness of the single peak distribution according to the difficulty of network learning. In order to quantitatively evaluate the performance of the method, sparsifitationspots technology is adopted, and the matching of the self-confidence evaluation result predicted by us and the real error magnitude can be revealed. As shown in FIG. 5, Sparsication plots of AcfNet on Scene Flow test set were plotted. The graph shows EPE errors of the rest pixels after the part of pixels with relatively low confidence are continuously removed; the Oracal curve corresponds to EPE errors of the rest pixels after the part of pixels with relatively large errors are continuously removed; and simultaneously, an EPE error curve after the pixel points are randomly removed, namely a Random curve is also given. The result shows that the confidence evaluation curve of the method is very close to the Oracla curve, only 6.9% of pixel points are removed, the error is reduced by half, and the performance is far better than the situation of randomly eliminating the pixel points. This fully demonstrates the superior performance of the confidence evaluation of the present application in detecting and interpreting outliers. Furthermore, with reference to fig. 7, several visualization examples are given. It can be seen that the difficult-to-learn regions are mainly the occlusion regions (1a, 1c, 2a), the full mode regions (1b, 3a) and the fine objects (3 a). In these difficult areas, the network of the present application gives very low confidence, which also proves that the network can slow down the distribution of these areas to reduce their influence, thereby effectively preventing the network from over-fitting in these areas.
(3) Network module validity analysis
The effectiveness of the technical scheme is verified by continuously adding the technical scheme into the PSmNet-based network. The results of the experiment are shown in FIG. 10. With respect to the base network PSMNet, the effectiveness of applying unimodal distribution constraints to matching cost bodies is first verified, and distribution learning is constrained by cross-entropy loss (CE). It can be seen that the unimodal constraint has significant performance improvement on each index, which proves the superiority of unimodal matching cost filtering; then, solving the problem of positive and negative parallax samples in cross entropy loss (CE) by adopting Stereo FocalLoss (SF), which brings further improvement on each index; and finally, adding a self-reliability evaluation network (CENet), and also greatly improving the accuracy on each index. It is worth mentioning that, on the evaluation index ALLEPE, the technical scheme of the application reduces the performance of the PSmNet from 1.101 to 0.867, namely, the performance is improved by nearly 20%, which fully embodies the superiority and high performance of the adaptive unimodal matching cost filtering. Meanwhile, several visualization results are also given in fig. 7, which are, from left to right: a left image, a right image, a real disparity map, a predicted disparity map, an error map, a confidence map; in the error map, the warm tone means that the error is larger; in the confidence map, darker colors indicate less confidence. The prediction result of the technical scheme of the application is basically consistent with the real parallax map, and the prediction result is still good even in the area with a complex structure.
(4) Adaptive unimodal matching cost filtering effectiveness analysis
AcfNet adds a unimodal matching cost filtering constraint directly on the basis of PSmNet. Fig. 10 shows the performance comparison between two versions of AcfNet and PSMNet, where each pixel has a large performance improvement of uniform variance AcfNet (uniform) compared to PSMNet, and the adaptive version of AcfNet (adaptive) further improves the matching accuracy. This fully demonstrates the effectiveness of unimodal supervision and the superiority of adaptive variance adjustment. Compared with AcfNet (uniform), AcfNet (adaptive) has more obvious improvement in OCC (namely occlusion) areas, which is consistent with the conclusion that the confidence evaluation network of the present application can effectively detect and prevent the network from being over-fitted in the areas obtained in variance analysis.
(5) Matching cost filtering contrast analysis
Although there are many classical matching cost based filtering methods, these methods have not been comparable to existing deep learning based methods. One existing matching cost enhancement strategy is to generate a gaussian distribution centered around the true disparity, and then use the gaussian distribution as a weight to weight the matching cost which is not aggregated, so as to enhance the unimodal matching cost distribution centered around the true disparity. The existing method is mainly different from the method of the application in two points: 1) the unimodal distribution of the existing method is taken as the weight influence matching cost distribution, but the method of the application is directly used as a network supervision item, and the network can be directly guided to filter the matching cost into a single peak. 2) The actual disparity information of the existing method is needed in both training and testing phases, but only needs to be used in the training process in the present application. As shown in fig. 11, all the methods in the table are to perform random initial training on Scene Flow data set, and then directly test their generalization performance in KITTI2012 and KITTI 2015. All methods are PSMNet based networks and all available disparity information is used. As a comparison of filtering performance, the method of the application far surpasses the result of the existing matching cost enhancement strategy. Moreover, from the perspective of generalization performance, the technical scheme of the present application has 11.64% improvement on KITTI2012 and 10.74% improvement on KITTI2015 relative to PSMNet. This means that the specific unimodal constraint enables the network to learn better similarity measure and feature extraction mode, thereby showing superior generalization performance on different data sets.
(6) State-of-the-art method contrastive analysis matched with binocular stereo
To further evaluate the performance of the technical solution of the present application, a comparison between the Scene Flow, KITTI2012 and KITTI2015 on three data sets and the method embodying the highest level at present is provided in fig. 12, which includes: classification-based methods (MC-CNN, PDS, HD3-Stereo), methods for enhancing matching cost calculation (Gwc-Net), methods for stacking optimization sub-networks (iResNet-i2), methods with very powerful cost aggregation networks (PSmNet, GA-Net) and methods for adding extra information (EdgeStereo, SegStereo). Although both of them try to improve the network to obtain more robust stereo matching result, the method of the present application still outperforms the prior art in terms of performance. Where fig. 8 and 9 visualize several examples of the present application at KITTI2012 and KITTI2015 and mark where the comparison with PDS, PSMNet is significantly better. 2 visual samples are provided for each data set, in each sample, the first behavior disparity map prediction result and the second behavior error map visualization result are obtained, wherein in a KITTI2012 visual graph, white represents that the prediction is inaccurate, and in a KITTI2015, warm tones represent that the prediction is inaccurate. It can be seen that the method of the present application performs better at small objects, pictures and sky edges.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (9)

1. A binocular deep learning method based on adaptive unimodal stereo matching cost filtering is characterized in that: the method is characterized in that unimodal distribution supervision with real parallax as the center is directly applied to the matching cost predicted by the network, and adaptive matching cost filtering is realized, and the method comprises the following steps:
1) constructing a data set, wherein the data set comprises a left image and a right image, and the left image and the right image are used as a stereo image pair;
2) inputting a stereo image pair into a PSMNET stereo matching model basic network by taking PSMNT as the stereo matching model basic network, and outputting three matching Cost bodies (Cost volumes) aggregated by a stacked hourglass 3D convolutional neural network by the PSMNT stereo matching model basic network;
3) for each matching Cost body (Cost Volume), respectively estimating a confidence map by using a confidence evaluation Network (confidence evaluation Network) and adjusting a real matching Cost body (Ground Truth Costvolume) to generate a Unimodal Distribution (Unimodal Distribution) of a pixel level as a Network training mark;
4) proposing a matching cost body of a Stereo Focal Loss (Stereo Focal local) constraint estimation and a real matching cost body;
5) generating a disparity map of sub-pixels from the estimated matching cost volume by the Soft Argmin function and using a regressive L1Loss supervised estimated disparity maps and true disparity maps.
2. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 1, wherein: generating real unimodal matching cost distribution in the step 2), namely searching a series of pixels to be matched on the same polar line of the target image for each pixel point p in the reference image, wherein the matching cost body reflects the similarity of the pixel pairs to be matched, the matching cost between real matching pairs is minimum, and the matching cost of parallax values of other parameters is increased along with the distance from real parallax; the matching cost distribution of each pixel should be centered on the true disparity.
3. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 1, wherein: in the step 3), a confidence evaluation network is constructed to estimate the confidence of the matching cost predicted by the trunk network, and the confidence evaluation network adaptively adjusts the smoothness of the unimodal distribution according to the learning difficulty of the network, namely adaptively adjusts the variance of the unimodal distribution.
4. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 1, wherein: in the step 4), stereo focusing loss calculation is carried out, and D matching costs { C) are constructed for each pixel point by the matching cost body0,C1,···CD-1}, i.e. matching cost distributions; for pixel point p, the estimated matching cost distribution
Figure FDA0002414099640000011
And true matching costDistribution Pp(d) And measuring the similarity between the estimated matching cost distribution and the real matching cost distribution by using the cross entropy loss, and using the similarity as a network supervision item.
5. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 2, wherein: in the step 2), in the generation of the true unimodal matching cost distribution, the search set of the parallax d in the target image is assumed to be {0,1, L D-1}, wherein the true parallax value is dgtThe true unimodal distribution is defined as:
Figure FDA0002414099640000012
wherein
Figure FDA0002414099640000013
σ>0 is a variance for controlling the degree of sharpness of the peak shape around the true parallax.
6. The binocular deep learning method based on adaptive unimodal stereo matching cost filtering as claimed in claim 3, wherein in the confidence evaluation network in step 3), the confidence evaluation network is composed of a convolutional layer of 3 × 3, a normalization layer and a ReLU layer, and outputs a value belonging to [0,1 ] after another convolutional layer of 1 × 1 and sigmoid function]The network directly outputs the confidence degree chart f ∈ [0,1 ] for the input aggregated matching cost body]H×WWherein H, W are image height, width, respectively.
7. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 6, wherein: in the confidence evaluation network in the step 3), for the pixel point p, the variance of the real matching cost distribution can be estimated by the estimated confidence value fpAnd (3) carrying out dynamic adjustment: sigmap=s(1-fp) Plus, where s is 0 or more is a constant, reflecting the variance σ to the confidence value fpVarying degree of sensitivity, of>0 defines the lower bound of sigma,corresponding sigmap∈[,s+]。
8. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 1, wherein: in the stereo focusing loss calculation in the step 4), a weight factor focusing on positive parallax loss is introduced to improve the cross entropy loss and finally form a mathematical form of the stereo focusing loss:
Figure FDA0002414099640000021
wherein α ≧ 0 is the focusing parameter, and when α ≧ 0, the loss function directly degenerates to cross-entropy loss, and when α > 0, the stereoscopic focusing loss will be according to Pp(d) Assigns more weight to positive disparity samples.
9. The binocular deep learning method based on the adaptive unimodal stereo matching cost filtering according to claim 1, wherein: the PSmNet stereo matching model base network comprises a Spatial Pyramid Module (Spatial Pyramid Module) for extracting image features, wherein the Spatial Pyramid Module (Spatial Pyramid Module) extracts the image features containing multi-scale context information by setting 4 parallel average Pooling modules (Pooling) with fixed sizes; the PSmNet stereo matching model base network comprises a 3D CNN framework with an encoding and decoding structure in a hourglass shape, wherein the 3D CNN framework carries out repeated processing processes from top to bottom and from bottom to top and supervises and learns matching cost bodies output by the PSmNet stereo matching model base network in three stages.
CN202010185728.1A 2020-03-17 2020-03-17 Binocular depth learning method based on adaptive unimodal stereo matching cost filtering Pending CN111709977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010185728.1A CN111709977A (en) 2020-03-17 2020-03-17 Binocular depth learning method based on adaptive unimodal stereo matching cost filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010185728.1A CN111709977A (en) 2020-03-17 2020-03-17 Binocular depth learning method based on adaptive unimodal stereo matching cost filtering

Publications (1)

Publication Number Publication Date
CN111709977A true CN111709977A (en) 2020-09-25

Family

ID=72536506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010185728.1A Pending CN111709977A (en) 2020-03-17 2020-03-17 Binocular depth learning method based on adaptive unimodal stereo matching cost filtering

Country Status (1)

Country Link
CN (1) CN111709977A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362462A (en) * 2021-02-01 2021-09-07 中国计量大学 Binocular stereo vision parallax filtering method and device based on self-supervision learning
CN114782507A (en) * 2022-06-20 2022-07-22 中国科学技术大学 Asymmetric binocular stereo matching method and system based on unsupervised learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150248769A1 (en) * 2014-03-03 2015-09-03 Nokia Corporation Method, apparatus and computer program product for disparity map estimation of stereo images
CN109584290A (en) * 2018-12-03 2019-04-05 北京航空航天大学 A kind of three-dimensional image matching method based on convolutional neural networks
CN109887019A (en) * 2019-02-19 2019-06-14 北京市商汤科技开发有限公司 A kind of binocular ranging method and device, equipment and storage medium
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150248769A1 (en) * 2014-03-03 2015-09-03 Nokia Corporation Method, apparatus and computer program product for disparity map estimation of stereo images
CN109584290A (en) * 2018-12-03 2019-04-05 北京航空航天大学 A kind of three-dimensional image matching method based on convolutional neural networks
CN109887019A (en) * 2019-02-19 2019-06-14 北京市商汤科技开发有限公司 A kind of binocular ranging method and device, equipment and storage medium
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIA-REN CHANG: "Pyramid Stereo Matching Network", 《ARXIV》 *
YOUMIN ZHANG: "Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching", 《ARXIV》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362462A (en) * 2021-02-01 2021-09-07 中国计量大学 Binocular stereo vision parallax filtering method and device based on self-supervision learning
CN113362462B (en) * 2021-02-01 2024-04-05 中国计量大学 Binocular stereoscopic vision parallax filtering method and device based on self-supervision learning
CN114782507A (en) * 2022-06-20 2022-07-22 中国科学技术大学 Asymmetric binocular stereo matching method and system based on unsupervised learning
CN114782507B (en) * 2022-06-20 2022-09-30 中国科学技术大学 Asymmetric binocular stereo matching method and system based on unsupervised learning

Similar Documents

Publication Publication Date Title
CN109815893B (en) Color face image illumination domain normalization method based on cyclic generation countermeasure network
CN104867135B (en) A kind of High Precision Stereo matching process guided based on guide image
CN112884682B (en) Stereo image color correction method and system based on matching and fusion
CN106462771A (en) 3D image significance detection method
CN110414349A (en) Introduce the twin convolutional neural networks face recognition algorithms of sensor model
CN103996202A (en) Stereo matching method based on hybrid matching cost and adaptive window
CN103996201A (en) Stereo matching method based on improved gradient and adaptive window
CN109831664B (en) Rapid compressed stereo video quality evaluation method based on deep learning
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
CN111402311A (en) Knowledge distillation-based lightweight stereo parallax estimation method
CN107146248A (en) A kind of solid matching method based on double-current convolutional neural networks
Messai et al. Adaboost neural network and cyclopean view for no-reference stereoscopic image quality assessment
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN106355195A (en) The system and method used to measure image resolution value
CN108388901B (en) Collaborative significant target detection method based on space-semantic channel
CN114092697A (en) Building facade semantic segmentation method with attention fused with global and local depth features
CN111709977A (en) Binocular depth learning method based on adaptive unimodal stereo matching cost filtering
CN111310821A (en) Multi-view feature fusion method, system, computer device and storage medium
CN115496720A (en) Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment
CN111553296B (en) Two-value neural network stereo vision matching method based on FPGA
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN110659680B (en) Image patch matching method based on multi-scale convolution
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN107341449A (en) A kind of GMS Calculation of precipitation method based on cloud mass changing features
CN113011359B (en) Method for simultaneously detecting plane structure and generating plane description based on image and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200925