WO2020020160A1 - 图像视差估计 - Google Patents

图像视差估计 Download PDF

Info

Publication number
WO2020020160A1
WO2020020160A1 PCT/CN2019/097307 CN2019097307W WO2020020160A1 WO 2020020160 A1 WO2020020160 A1 WO 2020020160A1 CN 2019097307 W CN2019097307 W CN 2019097307W WO 2020020160 A1 WO2020020160 A1 WO 2020020160A1
Authority
WO
WIPO (PCT)
Prior art keywords
perspective
information
parallax
image
semantic
Prior art date
Application number
PCT/CN2019/097307
Other languages
English (en)
French (fr)
Inventor
石建萍
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to SG11202100556YA priority Critical patent/SG11202100556YA/en
Priority to JP2021502923A priority patent/JP7108125B2/ja
Publication of WO2020020160A1 publication Critical patent/WO2020020160A1/zh
Priority to US17/152,897 priority patent/US20210142095A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the field of computer vision technology, and in particular, to an image parallax estimation method and device, and a storage medium.
  • Parallax estimation is a basic research problem of computer vision and has in-depth applications in many fields, such as depth prediction, scene understanding, and so on. Most methods treat the parallax estimation task as a matching problem. From this perspective, these methods use stable and reliable features to represent image blocks, and find approximate image blocks from stereo images as matches, and then calculate parallax values.
  • This application provides a technical solution for image parallax estimation.
  • an embodiment of the present application provides an image parallax estimation method, which includes: acquiring a first perspective image and a second perspective image of a target scene; and performing feature extraction processing on the first perspective image to obtain a first A perspective feature information; performing semantic segmentation processing on the first perspective image to obtain first perspective semantic segmentation information; based on the first perspective feature information, the first perspective semantic segmentation information, and the first perspective image and Related information of the second perspective image is used to obtain disparity prediction information of the first perspective image and the second perspective image.
  • the method further includes: performing feature extraction processing on the second perspective image to obtain second perspective characteristic information; and performing the process based on the first perspective characteristic information and the second perspective characteristic information. Association processing to obtain the association information.
  • the first The parallax prediction information of the perspective image and the second perspective image includes: blending the first perspective feature information, the first perspective semantic segmentation information, and the association information to obtain hybrid feature information; based on The mixed feature information is used to obtain parallax prediction information.
  • the image parallax estimation method is implemented by a parallax estimation neural network, and the method further includes: training the parallax estimation neural network based on the parallax prediction information.
  • training the parallax estimation neural network based on the parallax prediction information includes: performing semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information; and based on the second The perspective semantic segmentation information and the parallax prediction information are used to obtain the first perspective reconstructed semantic information; the semantic information is reconstructed based on the first perspective, and the network parameters of the parallax estimation neural network are adjusted.
  • reconstructing semantic information based on the first perspective and adjusting network parameters of the parallax estimation neural network includes: reconstructing semantic information based on the first perspective to determine a semantic loss value; based on the semantics Loss value, adjusting network parameters of the parallax estimation neural network.
  • reconstructing the semantic information based on the first perspective and adjusting network parameters of the parallax estimation neural network further includes: reconstructing the semantic information based on the first perspective and the first aspect of the first perspective image.
  • training the parallax estimation neural network based on the parallax prediction information includes: obtaining a reconstructed image of a first perspective based on the parallax prediction information and the second perspective image; and according to the first Determine the photometric loss value between the photometric difference between the one-view reconstructed image and the first visual angle image; determine a smoothing loss value based on the parallax prediction information; and adjust the smoothing value based on the photometric loss value and the smoothing loss value Parallax estimation of network parameters of neural networks.
  • the first perspective image and the second perspective image correspond to labeled parallax information
  • the method further includes: training based on the parallax prediction information and the labeled parallax information to implement The method of the disparity estimation neural network.
  • training the parallax estimation neural network based on the parallax prediction information and the labeled parallax information includes: determining a parallax regression loss value based on the parallax prediction information and the labeled parallax information; Adjust the network parameters of the parallax estimation neural network according to the parallax regression loss value.
  • an embodiment of the present application provides an image parallax estimation device, the device includes: an image acquisition module for acquiring a first perspective image and a second perspective image of a target scene; and a parallax estimation neural network for Obtaining parallax prediction information for the first perspective image and the second perspective image includes: a primary feature extraction module for performing feature extraction processing on the first perspective image to obtain first perspective characteristic information; a semantic feature extraction module To perform semantic segmentation processing on the first perspective image to obtain first perspective semantic segmentation information; a parallax regression module is configured to be based on the first perspective feature information, the first perspective semantic segmentation information, and the first Relevant information of a perspective image and the second perspective image to obtain parallax prediction information of the first perspective image and the second perspective image.
  • the primary feature extraction module is further configured to perform feature extraction processing on the second perspective image to obtain second perspective feature information;
  • the parallax regression module further includes: an associated feature extraction module, Configured to perform association processing based on the first perspective feature information and the second perspective feature information to obtain the associated information.
  • the parallax regression module is further configured to: perform hybrid processing on the first perspective feature information, the first perspective semantic segmentation information, and the association information to obtain hybrid feature information; based on Using the mixed feature information to obtain the parallax prediction information.
  • the apparatus further includes: a first network training module, configured to train the parallax estimation neural network based on the parallax prediction information.
  • the first network training module is further configured to: perform semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information; based on the second perspective semantic segmentation information and The parallax prediction information is used to obtain the reconstructed semantic information of the first perspective. Based on the first perspective, the reconstructed semantic information is used to adjust the network parameters of the parallax estimation neural network.
  • the first network training module is further configured to: reconstruct semantic information based on the first perspective to determine a semantic loss value; and adjust the parallax estimation neural network based on the semantic loss value. Network parameters.
  • the first network training module is further configured to: reconstruct the semantic information and the first semantic label of the first perspective image based on the first perspective, and adjust the parallax estimation neural network. Network parameters; or reconstructing the semantic information and the first perspective semantic segmentation information based on the first perspective, and adjusting the network parameters of the parallax estimation neural network.
  • the first network training module is further configured to: obtain a reconstructed image of a first perspective based on the parallax prediction information and the second perspective image; and reconstruct the image according to the first perspective and Determine the luminosity loss value between the luminosity difference between the first perspective image; determine the smoothing loss value based on the parallax prediction information; adjust the parallax estimation nerve according to the luminosity loss value and the smoothing loss value Network parameters of the network.
  • the device further includes: a second network training module for training the parallax estimation neural network based on the parallax prediction information and labeled parallax information, the first perspective image and the The second perspective image corresponds to labeled parallax information.
  • the second network training module is further configured to determine a parallax regression loss value based on the parallax prediction information and labeled parallax information; and adjust the parallax estimation based on the parallax regression loss value.
  • Network parameters of a neural network are further configured to determine a parallax regression loss value based on the parallax prediction information and labeled parallax information; and adjust the parallax estimation based on the parallax regression loss value.
  • an embodiment of the present application provides an image parallax estimation device.
  • the device includes a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program. Steps of implementing the image parallax estimation method described in the embodiments of the present application.
  • an embodiment of the present application provides a storage medium that stores a computer program.
  • the processor is caused to execute the image parallax estimation according to the embodiment of the present application. Method steps.
  • the technical solution provided in this application obtains a first perspective image and a second perspective image of a target scene; performs feature extraction processing on the first perspective image to obtain first perspective feature information; and performs semantic segmentation on the first perspective image Processing to obtain the first perspective semantic segmentation information; based on the first perspective feature information, the first perspective semantic segmentation information, and association information between the first perspective image and the second perspective image, the first perspective is obtained.
  • the parallax prediction information of the perspective image and the second perspective image can improve the accuracy of the parallax prediction.
  • FIG. 1 is a schematic flowchart of an image disparity estimation method according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a parallax estimation system according to an embodiment of the present application.
  • 3A-3D are comparison diagrams of the effects of using an existing prediction method on the KITTI Stereo data set provided by an embodiment of the present application and the prediction method of the present application.
  • FIG. 4A and FIG. 4B are the qualitative results of the KITTI Stereo test set supervised on the KITTI 2012 test data, and FIG. 4B is the qualitative results of the KITTI 2015 test data.
  • 5A-5C are unsupervised qualitative results on a CityScapes validation set provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image disparity estimation device according to an embodiment of the present application.
  • Parallax estimation is a fundamental problem in computer vision. It has a wide range of applications, including deep prediction, scene understanding, and autonomous driving.
  • the main process of parallax estimation is to find matching pixels from the left and right images of a stereo image pair, and the distance between the matching pixels is the parallax.
  • Most parallax estimation methods mainly rely on designing reliable features to represent image blocks, and then select matching image blocks on the left and right images to calculate parallax. Most of these methods use supervised learning to train neural networks to predict parallax, and a few methods try to use unsupervised methods for training.
  • this application proposes a technical solution for image parallax estimation using semantic information.
  • An embodiment of the present application provides an image parallax estimation method. As shown in FIG. 1, the method mainly includes the following steps.
  • Step 101 Obtain a first perspective image and a second perspective image of a target scene.
  • the first perspective image and the second perspective image are images of the same spatiotemporal scene collected by two cameras or two cameras in the binocular vision system at the same time.
  • the first perspective image may be an image acquired by a first camera in the binocular vision system
  • the second perspective image may be an image acquired by a second camera in the binocular vision system.
  • the first perspective image and the second perspective image represent images acquired at different perspectives for the same scene.
  • the first perspective image and the second perspective image may be a left perspective image and a right perspective image, respectively.
  • the first perspective image may be a left perspective image, correspondingly, the second perspective image may be a right perspective image; or the first perspective image may be a right perspective image, corresponding to the first
  • the two-view image may be a left-view image.
  • the embodiment of the present application does not limit the specific implementation of the first perspective image and the second perspective image.
  • the scene includes a driving assistance scene, a robot tracking scene, a robot positioning scene, and the like. This application does not limit this.
  • Step 102 Perform feature extraction processing on the first perspective image to obtain first perspective characteristic information.
  • Step 102 may be implemented using a convolutional neural network.
  • the first perspective image may be input to a parallax estimation neural network for processing.
  • the parallax estimation neural network is named SegStereo network in the following.
  • the first perspective image may be used as an input of a first sub-network in the parallax estimation neural network for performing feature extraction processing. Specifically, a first perspective image is input to the first sub-network, and the first perspective characteristic information is obtained after a multi-layer convolution operation or further processing based on the convolution processing.
  • the first perspective feature information is a first perspective primary feature map, or the first perspective feature information and the second perspective feature information may be a three-dimensional tensor and include at least one matrix.
  • the embodiment of the present disclosure provides the first perspective
  • the specific implementation of the characteristic information is not limited.
  • a feature extraction network or a convolutional sub-network of a parallax estimation neural network is used to extract feature information or a primary feature map of a first perspective image.
  • Step 103 Perform semantic segmentation processing on the first perspective image to obtain first perspective semantic segmentation information.
  • the SegStereo network includes at least two sub-networks, which are denoted as a first sub-network and a second sub-network, respectively; the first sub-network may be a feature extraction network, and the second sub-network may be a semantic segmentation network.
  • the feature extraction network can obtain a perspective primary feature map, and the semantic segmentation network can obtain a semantic feature map.
  • the first sub-network can be implemented using at least a part of PSPNet-50 (Pyramid Scene, Parsing Network), and at least a part of the second sub-network can also be implemented using PSPNet-50, that is, the first sub-network and the second
  • the network can share part of the structure of PSPNet-50.
  • the embodiment of the present application does not limit the specific implementation of the SegStereo network.
  • the first perspective image can be input to a semantic segmentation network for semantic segmentation processing to obtain the first perspective semantic segmentation information.
  • the feature information of the first perspective may also be input into the semantic segmentation network for semantic segmentation processing to obtain the first perspective semantic segmentation information.
  • performing semantic segmentation processing on the first perspective image to obtain first perspective semantic segmentation information includes: obtaining first perspective semantic segmentation information based on the first perspective feature information.
  • the first perspective semantic segmentation information may be a three-dimensional tensor or a first perspective semantic feature map, and the embodiment of the present disclosure does not limit the specific implementation of the first perspective semantic segmentation information.
  • the first-view primary feature map can be used as an input for a second sub-network in the parallax estimation neural network for performing semantic information extraction processing.
  • the first perspective feature information or the first perspective primary feature map is input to the second sub-network, and the first perspective semantic segmentation information is obtained after multi-layer convolution operation or further processing based on the convolution processing.
  • Step 104 Obtain the first perspective image and the second perspective image based on the first perspective feature information, the first perspective semantic segmentation information, and related information of the first perspective image and the second perspective image. Parallax prediction information for a perspective image.
  • Correlation processing may be performed on the first perspective image and the second perspective image to obtain association information of the first perspective image and the second perspective image.
  • Correlation processing may also be performed based on the first perspective feature information and the second perspective feature information to obtain association information of the first perspective image and the second perspective image; wherein the second perspective feature information is The second perspective image is obtained by performing feature extraction processing.
  • the second perspective feature information may be a second perspective primary feature map; or, the second perspective feature information may be a three-dimensional tensor and includes at least one matrix. The embodiment of the present disclosure does not limit the specific implementation of the second perspective feature information.
  • the second perspective image can be used as an input to the first sub-network in the parallax estimation neural network for feature extraction processing. Specifically, a second perspective image is input to the first sub-network, and the second perspective feature information is obtained after a multi-layer convolution operation. Then, association calculation is performed based on the first perspective feature information and the second perspective feature information to obtain association information of the first perspective image and the second perspective image.
  • Performing association calculation based on the first perspective feature information and the second perspective feature information includes: performing association calculation on image blocks that may match in the first perspective feature information and the second perspective feature information to obtain an association information. That is, correlation calculation is performed on the feature information of the first perspective and the feature information of the second perspective to obtain correlation information, and the correlation information is mainly used for extraction of matching features.
  • the association information may be an association feature map.
  • the first-view primary feature map and the second-view primary feature map can be used as inputs of a correlation operation module for a correlation operation in a parallax estimation neural network.
  • a first-view primary feature map and a second-view preliminary feature map are input to the correlation calculation module 240 shown in FIG. 2, and the correlation information of the first-view image and the second-view image is obtained after the correlation operation.
  • Parallax prediction information includes: blending the first perspective feature information, the first perspective semantic segmentation information, and the associated information to obtain hybrid feature information; and obtaining parallax prediction information based on the hybrid feature information.
  • the hybrid processing here may be a connection processing, such as fusion or superimposition by channel, etc., which are not limited in the embodiments of the present disclosure.
  • one of the first perspective feature information, the first perspective semantic segmentation information, and the association information may be used or Multiple items are subjected to conversion processing, so that the first perspective feature information, first perspective semantic segmentation information, and association information after the conversion processing have the same dimensions.
  • the method may further include: performing conversion processing on the first perspective feature information to obtain first perspective feature information.
  • hybrid processing may be performed on the first perspective conversion feature information, the first perspective semantic segmentation information, and the association information to obtain hybrid feature information.
  • the first perspective feature information is subjected to spatial conversion processing to obtain the first perspective feature information, wherein the dimension of the first perspective feature information is preset.
  • the first perspective conversion feature information may be a first perspective conversion feature map, and the specific implementation of the first perspective conversion feature information is not limited in the embodiment of the present disclosure.
  • the first perspective transformation feature information is obtained.
  • the first-view feature information can be processed by using a convolution module to obtain the first-view transformation feature information.
  • the hybrid feature information may be a hybrid feature map, and the specific implementation of the hybrid feature information is not limited in the embodiments of the present disclosure.
  • the parallax prediction information may be a parallax prediction map, and the specific implementation of the parallax prediction information is not limited in the embodiments of the present disclosure.
  • the SegStereo network includes a third sub-network in addition to the first sub-network and the second sub-network.
  • the third sub-network is used for determining disparity prediction information of the first-view image and the second-view image, and the third sub-network may be a parallax regression network.
  • the first parallax regression feature information, the association information, and the first perspective semantic segmentation information are input to the parallax regression network, and the parallax regression network combines these information into mixed feature information, based on the The mixed feature information regression is used to obtain parallax prediction information.
  • the residual network and the deconvolution module 250 in the parallax regression network shown in FIG. 2 are used to predict the parallax prediction information.
  • the first-view transformation feature map, the associated feature map, and the first-view semantic feature map can be combined to obtain a hybrid feature map, thereby achieving the embedding of semantic features.
  • the hybrid feature map is obtained, the residual network and the deconvolution structure of the parallax regression network are continuously used to finally output a parallax prediction map.
  • the SegStereo network mainly uses a residual structure, which can extract more recognizable image features. While extracting the associated features of the first-view image and the second-view image, it also embeds high-level semantic features, thereby improving the accuracy of prediction. Sex.
  • the above method can be an application process of a parallax estimation neural network, that is, a method of performing parallax estimation on a pair of images to be processed using a trained parallax estimation neural network.
  • the above method may be a training process of a parallax estimation neural network, that is, the above method may also be applied to training a parallax estimation neural network.
  • the first perspective image and the second perspective image are sample images.
  • a pre-defined neural network may be trained in an unsupervised manner to obtain a parallax estimation neural network including the first sub network, the second sub network, and the third sub network.
  • the disparity estimation neural network is trained in a supervised manner to obtain a disparity estimation neural network including the first sub network, the second sub network, and the third sub network.
  • the method further includes training the parallax estimation neural network based on the parallax prediction information.
  • Training the parallax estimation neural network based on the parallax prediction information includes: performing semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information; based on the second perspective semantic segmentation information and the parallax Predict the information to obtain the reconstructed semantic information from the first perspective; reconstruct the semantic information based on the first perspective, and adjust the network parameters of the parallax estimation neural network.
  • the first perspective reconstructed semantic information may be a reconstructed first semantic feature map.
  • Semantic segmentation processing can be performed on the second perspective image to obtain the second perspective semantic segmentation information.
  • the feature information of the second perspective may also be input into the semantic segmentation network for processing, and the second perspective semantic segmentation information may be obtained.
  • performing semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information includes: obtaining second perspective semantic segmentation information based on the second perspective feature information.
  • the second perspective semantic segmentation information may be a three-dimensional tensor or a second perspective semantic feature map, and the embodiment of the present disclosure does not limit the specific implementation of the second perspective semantic segmentation information.
  • the second-view primary feature map can be used as an input for a second sub-network in the parallax estimation neural network for performing semantic information extraction processing.
  • the second perspective feature information or the second perspective primary feature map is input to the second sub-network, and the second perspective semantic segmentation information is obtained after multi-layer convolution operation or further processing based on the convolution processing.
  • a semantic segmentation network or a convolutional sub-network of a disparity estimation neural network is used to extract a first-view semantic feature map and a second-view semantic feature map.
  • the first perspective feature information and the second perspective feature information may be connected to a semantic segmentation network, and the semantic segmentation network outputs the first perspective semantic segmentation information and the second perspective semantic segmentation information.
  • reconstructing semantic information based on the first perspective and adjusting network parameters of the parallax estimation neural network includes: reconstructing semantic information based on the first perspective to determine a semantic loss value; and adjusting the semantic loss value in combination with the adjustment The parallax estimates a network parameter of a neural network.
  • Reconstructing semantic information based on the first perspective and adjusting network parameters of the parallax estimation neural network includes: reconstructing semantic information based on the first perspective and a first semantic label of the first perspective image, and adjusting the parallax estimation Network parameters of a neural network; or reconstructing semantic information and semantic segmentation information of the first perspective based on the first perspective, and adjusting the network parameters of the parallax estimation neural network.
  • reconstructing the semantic information based on the first perspective and adjusting the network parameters of the parallax estimation neural network includes: reconstructing the semantic information between the first perspective and the semantic segmentation information of the first perspective based on the first perspective. The difference, determine the semantic loss value; and adjust the network parameters of the parallax estimation neural network in combination with the semantic loss value.
  • the reconstruction operation is performed based on the predicted parallax prediction information and the semantic segmentation information of the second perspective to obtain the reconstructed semantic information of the first perspective; the reconstructed semantic information of the first perspective may also be performed with the true first semantic label.
  • a semantic loss value is obtained, and a network parameter of the parallax estimation neural network is adjusted in combination with the semantic loss value.
  • the real first semantic label is manually labeled.
  • the unsupervised learning method here is unsupervised learning for parallax, not unsupervised learning for semantic segmentation information.
  • the semantic loss may also be a cross-entropy loss, but the specific implementation of the semantic loss is not implemented in the embodiments of the present disclosure.
  • a function for calculating semantic loss is defined. This function can introduce rich semantic consistency information, so that the trained network can overcome common local ambiguity problems.
  • Training the parallax estimation neural network based on the parallax prediction information includes: obtaining a reconstructed image of a first perspective based on the parallax prediction information and the second perspective image; reconstructing the image and the first reconstructed image according to the first perspective A luminosity difference between two perspective images is used to determine a luminosity loss value; a smoothing loss value is determined based on the parallax prediction information; and the parallax estimation neural network network is adjusted according to the luminosity loss value and the smoothing loss value. parameter.
  • the smoothing loss can be determined.
  • the network By reconstructing the image to measure the photometric difference, the network can be trained in an unsupervised manner, which greatly reduces the dependence on the true-value image.
  • Training the parallax estimation neural network based on the parallax prediction information further comprising: performing a reconstruction operation based on the parallax prediction information and the second perspective image to obtain a first perspective reconstructed image; and reconstructing an image according to the first perspective Photometric difference between the first perspective image and the first perspective image to determine the photometric loss; determine the smoothness loss by imposing constraints on the non-smooth areas in the parallax prediction information; reconstruct the semantic information and the reality based on the first perspective
  • the difference between the two first semantic labels is to determine the semantic loss; determine the overall loss according to the luminosity loss, the smoothing loss, and the semantic loss; and train a parallax estimation neural network based on the overall loss minimization.
  • the training set used during training need not provide a true-value parallax image.
  • the total loss is equal to the weighted sum of the individual losses.
  • the network can be trained according to the difference in luminosity between the reconstructed image and the original image; when extracting the associated features of the first-view image and the second-view image, the semantic feature map is embedded and the semantics are defined
  • the loss combined with low-level texture information and high-level semantic information, adds semantic consistency constraints, which improves the level of parallax prediction of the trained neural network in large target areas and overcomes the local ambiguity problem to a certain extent.
  • the method for training a parallax estimation neural network further includes: training the parallax estimation neural network in a supervised manner based on the parallax prediction information.
  • the first perspective image and the second perspective image correspond to labeled parallax information
  • the parallax estimation neural network is trained based on the parallax prediction information and the labeled parallax information.
  • training the parallax estimation neural network based on the parallax prediction information and labeled parallax information includes: determining a parallax regression loss value based on the parallax prediction information and labeled parallax information; and based on the parallax prediction information, determining Smoothing loss value; adjusting network parameters of the parallax estimation neural network according to the parallax regression loss value and the smoothing loss value.
  • training the parallax estimation neural network based on the parallax prediction information and labeled parallax information includes: determining a parallax regression loss based on the parallax prediction information and labeled parallax information; and determining a non-smooth region in the parallax prediction information.
  • Constraints are imposed to determine the smoothing loss; the semantic loss is determined based on the difference between the reconstructed semantic information of the first perspective and the true first semantic label; based on the parallax regression loss, the semantic loss, and the smoothing loss, determine The total loss under supervised training; the parallax estimation neural network is trained based on the minimization of the total loss; wherein the training set used during training needs to provide labeled parallax information.
  • training the parallax estimation neural network based on the parallax prediction information and labeled parallax information includes: determining a parallax regression loss based on the parallax prediction information and labeled parallax information; and determining a non-smooth region in the parallax prediction information. Constraints are imposed to determine the smoothing loss; the semantic loss is reconstructed based on the differences between the first perspective reconstructed semantic information and the first perspective semantic segmentation information; the parallax regression loss, the semantic loss and the Smooth the loss to determine the overall loss under supervised training; train the parallax estimation neural network based on the minimized overall loss; where the training set used during training needs to provide labeled disparity information.
  • the disparity estimation neural network can be obtained by supervised training.
  • the difference between the predicted value and the true value is calculated as the supervised parallax regression loss.
  • the semantic loss and smoothness loss of unsupervised training Still applicable.
  • the first sub-network, the second sub-network, and the third sub-network are all sub-networks obtained by training a parallax estimation neural network.
  • the input and output contents of the different sub-networks are different, but they all target the same target scene.
  • the method for training the parallax estimation neural network includes: training the parallax estimation neural network using the training sample set while performing parallax prediction map training and semantic feature map training to obtain the first subnetwork, the second subnetwork, and the third Optimized parameters of the subnetwork.
  • the method for training the disparity estimation neural network includes: firstly training the disparity estimation neural network using a training sample set to perform semantic feature map training; and then using the training sample set to disparity prediction map of the disparity estimation neural network trained by the semantic feature map prediction. Training to obtain optimized parameters of the second sub-network and the first sub-network.
  • the parallax estimation method based on semantic information proposed in the embodiment of the present application uses an end-to-end parallax prediction neural network to input the left and right perspective images of a stereo image pair to directly obtain a parallax prediction map, which can meet the real-time requirement.
  • the network can be trained in an unsupervised manner, which greatly reduces the dependence on the true-value image.
  • the semantic feature map is embedded and the semantic loss is defined.
  • the combination of low-level texture information and high-level semantic information increases the semantic consistency constraint and improves the network in large target areas such as large roads.
  • the level of parallax prediction of vehicles and carts has overcome the problem of local ambiguity to a certain extent.
  • FIG. 2 shows a schematic diagram of a parallax estimation system architecture.
  • the parallax estimation system architecture is referred to as a SegStereo parallax estimation system architecture.
  • the SegStereo parallax estimation system architecture is suitable for unsupervised learning and supervised learning.
  • the pre-calibrated stereo image pair may include a first perspective image (also referred to as a left perspective image) I 1 and a second perspective image (or referred to as a right perspective image) I r .
  • a shallow neural network 210 can be used to extract the primary image feature map, and the first perspective image I l is input to the shallow neural network 210 to obtain the first perspective primary feature map F l , and the second perspective image I r is input The shallow neural network 210 obtains a second-view primary feature map F r .
  • the first perspective primary feature map may represent the aforementioned first perspective feature information
  • the second perspective primary feature map may represent the aforementioned second perspective feature information.
  • the shallow neural network 210 may be a convolution block with a convolution kernel of 3 * 3 * 256.
  • the convolution block includes a convolution layer and a batch normalized and corrected linear unit (ReLU) layer.
  • the shallow neural network 210 may be a first sub-network.
  • a trained semantic segmentation network 220 is used to extract the semantic feature map.
  • the semantic segmentation network 220 can be implemented by a part of the PSPNeT-50 network.
  • the first-view primary feature map F l is input to the semantic segmentation network 220 to obtain the first-view semantic feature map.
  • the second-view primary feature map F r is input to the semantic segmentation network 220 to obtain the second-view semantic feature map.
  • another convolution block 230 may be used to calculate the first perspective transformation feature map.
  • the sizes of the primary feature map, the semantic feature map, and the conversion feature map are reduced, for example, 1/8 of the size of the original image.
  • the sizes of the first-view primary feature map, the second-view primary feature map, the first semantic feature map, the second semantic feature map, and the first-view transformation feature map are the same.
  • the first-view image and the second-view image have the same size.
  • the correlation module 240 can be used to calculate the matching cost between the first-view primary feature map F l and the second-view primary feature map F r to obtain the correlation feature map F c .
  • the association module 240 may apply a correlation method used in an optical flow prediction network (Flow Net) to calculate the correlation between the two feature maps.
  • Flow Net optical flow prediction network
  • the maximum parallax parameter setting may be d.
  • a hybrid feature map (or a hybrid feature information representation) F h can be obtained.
  • the mixed feature map F h is sent to the subsequent residual network and deconvolution module 250 to obtain a disparity map D having the same size as the original size of the first perspective image I 1 .
  • semantic cues can be integrated in two ways. In the first aspect, semantic cues are embedded into the disparity prediction map during the feature learning process. The second aspect is to guide the training process of the neural network by introducing semantic cues in the calculation of the loss term.
  • the first aspect is how to embed semantic cues into disparity prediction graphs during feature learning.
  • the input stereo image pair includes the first perspective image and the second perspective image.
  • the shallow neural network 210 can obtain the first perspective primary feature map and the second perspective primary feature map, respectively.
  • the semantic segmentation network 220 is used to extract the semantic features of the first perspective primary feature map and the second perspective primary feature map, respectively, to obtain the first perspective semantic feature map and the second perspective semantic feature map.
  • the shallow neural network 210 may use a part of the PSP Net-50 network, and output the intermediate features (ie, the conv3_1 feature) of the network as the first-view primary feature map F l and the second-view primary feature map F r .
  • the semantic feature map can be Perform a convolution operation on, for example, a convolution block with a size of 1 ⁇ 1 ⁇ 128 having a convolution kernel can be applied to the convolution operation to obtain a transformed first semantic feature map (Not shown in Figure 2). followed by Feature map with first perspective conversion It is connected with the associated feature map F c to obtain a mixed feature map (or called the mixed feature information representation) F h , and feed the obtained mixed feature map F h to the rest of the disparity regression network, such as the subsequent residual network And deconvolution module 250.
  • the loss term introduces semantic cues and can also help guide parallax learning.
  • Semantic cues can be characterized by semantic cross-entropy loss L seg .
  • the reconstruction operation can be performed by using the reconstruction module 260 in FIG. 2 to act on the second perspective semantic feature map and the disparity prediction map to obtain a reconstructed first semantic feature map. Then, the truth value semantic label of the first perspective semantic feature map can be used. To measure the semantic cross-entropy loss L seg .
  • Second perspective semantic feature map The size of the original image, that is, one-eighth of the size of the second-perspective image, and the parallax prediction image D and the second-perspective image have the same size, that is, full-size.
  • the second perspective semantic feature map is first up-sampled to full size, and then the feature reconstruction is applied to the up-sampled full-scale second perspective semantic feature map and disparity prediction map D to obtain a full-scale reconstructed first Perspective semantic feature map.
  • the reconstructed first perspective semantic feature map is down-sampled and scaled to a full size of 1/8, thereby obtaining a reconstructed first semantic feature map
  • a convolution integral classifier with a convolution kernel size of 1 ⁇ 1 ⁇ C is used to regularize disparity learning, where C is the number of semantic classes.
  • the softmax loss function is used to represent the semantic cross-entropy loss L seg .
  • the loss term includes other parameters in addition to the semantic cross-entropy loss.
  • the above semantic information can be combined into the unsupervised and supervised model training.
  • the calculation method of the total loss under these two methods is introduced as follows.
  • the input stereo image pair includes two images, one of which can be reconstructed from the other image using the parallax prediction map.
  • the reconstructed image should theoretically be close to the original input image.
  • Utilize photometric consistency to help learn parallax in an unsupervised way Assuming a given parallax prediction map D, an image reconstruction operation on the second perspective image I r is performed, for example, as shown in FIG. 2 on the reconstruction module 260, and a reconstructed image of the first perspective is obtained.
  • the L1 norm is then used to regularize the photometric consistency, and the resulting photometric loss L p is shown in formula (1):
  • N is the number of pixels
  • i and j are the indices of the pixels
  • 1 is the L1 norm.
  • ⁇ s ( ⁇ ) is a space smoothing penalty function implemented by generalized Charbonnier function.
  • semantic cross-entropy loss L seg is shown in formula (3):
  • f yi is the real label
  • yj is the category number
  • f yj is the activation value of category yj
  • i is the pixel index
  • the softmax loss that defines a single pixel is as follows: For the entire image, for the labeled pixel position Calculate the softmax loss. The set of labeled pixels is Nv .
  • the overall loss L unsup in an unsupervised manner includes a luminosity loss L p , a smoothing loss L s, and a semantic cross-entropy loss L seg .
  • a luminosity loss L p a luminosity loss
  • L s a smoothing loss
  • L seg a semantic cross-entropy loss
  • the disparity prediction neural network is trained based on minimizing the overall loss L unsup to obtain a preset disparity prediction neural network.
  • the specific training method may use a method commonly used by those skilled in the art, and details are not described herein again. Supervised way
  • the parallax regression loss L r can be expressed as the following formula (5):
  • the overall loss L sup in a supervised manner includes a light parallax regression loss L r , a smoothing loss L s, and a semantic cross-entropy loss L seg .
  • return loss L r disparity weight loss introducing ⁇ r weight loss is introduced as a smooth weight loss L s ⁇ s
  • semantic cross entropy loss L seg weight loss introducing ⁇ seg the total loss L sup is shown in formula (6):
  • the disparity prediction neural network is trained based on the minimization of the overall loss L sup to obtain a preset disparity prediction neural network.
  • specific training methods may use methods common to those skilled in the art, and details are not described herein again.
  • the parallax prediction neural network provided in this application embeds high-level semantic features while extracting the associated information of the left and right perspective images, which helps to improve the prediction accuracy of the parallax map.
  • a function for calculating the semantic cross-entropy loss is defined. This function can introduce rich semantic consistency information, which can effectively overcome common local ambiguity problems.
  • an unsupervised learning method since the network can be trained to output the correct parallax value based on the photometric difference between the reconstructed image and the original image, there is no need to provide a large number of true-value parallax images, which can effectively reduce training complexity and calculations. cost.
  • the proposed SegStereo framework incorporates semantic segmentation information into disparity estimation, where semantic consistency can be used as an active guide for disparity estimation; semantic feature embedding strategies and semantic loss functions softmax can help train the network in an unsupervised or supervised manner;
  • the proposed parallax estimation method can obtain the most advanced results in the KITTI Stereo 2012 and 2015 benchmarks; predictions on the CityScapes dataset also show the effectiveness of the method.
  • the KITTI Stereo data set is a computer vision algorithm evaluation data set in an autonomous driving scenario. In addition to providing data in a raw data format, it also provides a benchmark for each task.
  • the CityScapes dataset is a dataset for semantic understanding of street scenes in urban roads.
  • Figs. 3A-3D show the comparison of the effect of the existing prediction method and the prediction method of the present application on the KITTI Stereo dataset, wherein Fig. 3A and Fig. 3B show the input stereo image pair, and Fig. 3C shows according to the existing prediction method 3A and 3B are obtained after processing, and FIG. 3D shows the error charts obtained after processing FIGS. 3A and 3B according to the prediction method of the present application.
  • the error map is obtained by subtracting the reconstructed image from the input original image.
  • the dark area in the lower right in FIG. 3C indicates a wrong prediction area.
  • FIG. 3D it can be seen from FIG. 3D that the error area in the lower right is greatly reduced. Therefore, under the guidance of semantic cues, the parallax estimation of SegStereo network is more accurate, especially in the locally blurred area.
  • FIGs 4A and 4B show several qualitative examples of the KITTI test set.
  • the SegStereo network can also obtain better disparity estimation results when processing challenging complex scenarios.
  • Fig. 4A shows the qualitative results of the KITTI 2012 test data. As shown in Fig. 4A, from left to right: the first perspective image, the parallax prediction map, and the error map.
  • Figure 4B shows the qualitative results of the KITTI 2015 test data. As shown in Figure 4B, from left to right: the first-view image, the parallax prediction map, and the error map.
  • the method proposed in this application can handle complex scenarios.
  • the SegStereo network can also be adapted to other data sets, such as the SegStereo network that can be tested on unsupervised training on the CityScapes validation set.
  • 5A-5C show the prediction results of an unsupervised training network on the CityScapes validation set.
  • FIG. 5A is a first-view image
  • FIG. 5B is a parallax prediction diagram obtained after processing FIG. 5A using the SGM algorithm.
  • FIG. 5A is a disparity prediction map obtained by processing FIG. 5A using a SegStereo network. Obviously, compared with the SGM algorithm, the SegStereo network produces better results in terms of global scene structure and object details.
  • the SegStereo disparity estimation architecture introduces semantic cues into the disparity estimation network.
  • PSPNet can be used as a segmentation branch to extract the semantic features of a stereo image pair
  • a residual network (ResNet) and a correlation module (Correlation) are used as parallax parts to return the parallax prediction map.
  • the association module is used to encode matching clues for stereo image pairs.
  • Segmented features are embedded as parallax branches behind the association module as semantic features.
  • the semantic consistency of stereo image pairs is reconstructed through semantic loss regularization, which further enhances the robustness of parallax estimation.
  • Both the semantic segmentation network and the parallax regression network are fully convolutional, so the network can be trained end-to-end.
  • Incorporating semantic cues into the SegStereo network can be used for unsupervised and supervised training.
  • both the photometric consistency loss and semantic cross-entropy loss are calculated and propagated backward. Both semantic feature embedding and semantic cross-entropy loss can introduce favorable constraints on semantic consistency.
  • supervised parallax regression loss instead of unsupervised luminosity consistency loss can be used to train the network, which will obtain advanced results on the KITTI Stereo benchmarks, such as the KITTI Stereo 2012 and 2015 benchmarks To get advanced results.
  • the prediction on the CityScapes dataset also shows the effectiveness of the method.
  • the above-mentioned parallax estimation method of a stereo image combined with semantic information firstly obtains a first perspective image and a second perspective image of a target scene, and uses a feature extraction network to extract primary feature maps of the first perspective image and the second perspective image; From the primary feature map of the perspective, add a convolution block to obtain the first perspective transformation feature map; based on the first perspective primary feature map and the second perspective primary feature map, use the correlation module to calculate the first perspective primary feature map and the first The associated feature map of the two-view primary feature map; then a semantic segmentation network is used to obtain the first-view semantic feature map; the first-view transformation feature map, the associated feature map, and the first-view semantic feature map are combined to obtain a mixed feature map; Finally, the disparity prediction map is returned by using the residual network and the deconvolution module.
  • a parallax estimation neural network composed of a feature extraction network, a semantic segmentation network, and a parallax regression network can be used to input a first-view image and a second-view image to quickly output a parallax prediction map, thereby achieving end-to-end parallax prediction, Meet real-time needs.
  • the semantic feature map is embedded, that is, the semantic consistency constraint is added, the local ambiguity problem is overcome to a certain extent, and the accuracy of parallax prediction can be improved .
  • FIG. 1 to FIG. 2 are merely exemplary embodiments of the present application, and those skilled in the art can make various obvious changes and / or substitutions based on the examples of FIG. 1 to FIG. 2 to obtain The technical solution still belongs to the disclosure scope of the embodiments of the present application.
  • an embodiment of the present disclosure provides an image disparity estimation device. As shown in FIG. 6, the device includes the following modules.
  • the image acquisition module 10 is configured to acquire a first perspective image and a second perspective image of a target scene.
  • the parallax estimation neural network 20 is configured to obtain parallax prediction information according to the first perspective image and the second perspective image.
  • the parallax estimation neural network 20 includes the following modules.
  • a primary feature extraction module 21 is configured to perform feature extraction processing on the first perspective image to obtain first perspective feature information.
  • the semantic feature extraction module 22 is configured to perform a semantic segmentation process on the first perspective image to obtain the first perspective semantic segmentation information.
  • a parallax regression module 23 is configured to obtain the first perspective image and the first perspective image based on the first perspective feature information, the first perspective semantic segmentation information, and the associated information of the first perspective image and the second perspective image.
  • the parallax prediction information of the second perspective image is configured to obtain the first perspective image and the first perspective image based on the first perspective feature information, the first perspective semantic segmentation information, and the associated information of the first perspective image and the second perspective image.
  • the primary feature extraction module 21 is further configured to perform feature extraction processing on the second perspective image to obtain second perspective feature information.
  • the parallax regression module 23 further includes an association module. Used to: perform association processing based on the first perspective feature information and the second perspective feature information to obtain the associated information.
  • the parallax regression module 23 is further configured to perform blending processing on the first perspective feature information, the first perspective semantic segmentation information, and the association information to obtain a hybrid feature. Information; based on the mixed feature information, disparity prediction information is obtained.
  • the device further includes: a first network training module 24 for training a parallax estimation neural network 20 based on the parallax prediction information.
  • the first network training module 24 is further configured to: perform semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information; based on the second perspective semantics Segmenting the information and the parallax prediction information to obtain the reconstructed semantic information of the first perspective; and reconstructing the semantic information based on the first perspective to adjust the network parameters of the parallax estimation neural network 20.
  • the first network training module 24 is further configured to: reconstruct semantic information based on the first perspective to determine a semantic loss value; and adjust the parallax estimation based on the semantic loss value Network parameters of the neural network 20.
  • the first network training module 24 is further configured to: reconstruct the semantic information based on the first perspective and the first semantic label of the first perspective image, and adjust the parallax estimation Network parameters of the neural network 20; or reconstructing semantic information and semantic segmentation information of the first perspective based on the first perspective, and adjusting the network parameters of the parallax estimation neural network 20.
  • the first network training module 24 is further configured to: obtain a reconstructed image of a first perspective based on the parallax prediction information and the second perspective image; and according to the first perspective Determine the luminosity loss value between the reconstructed image and the first perspective image, determine a luminosity loss value based on the parallax prediction information, and adjust the luminosity loss value and the smoothness loss value according to the The parallax estimation network parameters of the neural network 20.
  • the device further includes: a second network training module 25 for training a parallax estimation neural network 20 based on the parallax prediction information and labeled parallax information; the first perspective image and the The second perspective image corresponds to labeled parallax information.
  • the second network training module 25 is further configured to: determine a parallax regression loss value based on the parallax prediction information and labeled parallax information; and adjust an IP address based on the parallax regression loss value.
  • the parameters of the parallax estimation neural network are described.
  • the above-mentioned image acquisition module 10 acquires information in different ways, so the structure is different; when receiving from the client, it is a communication interface; when it is automatically acquiring, it corresponds to an image collector.
  • the specific structures of the image acquisition module 10 and the parallax estimation neural network 20 described above may each correspond to a processor.
  • the specific structure of the processor may be a central processing unit (CPU, Central Processing Unit), a microprocessor (MCU, Micro Controller unit), a digital signal processor (DSP, Digital Signal Processing), or a programmable logic device (PLC, Programmable logic controller) and other electronic components or a collection of electronic components.
  • the processor includes executable code.
  • the executable code is stored in a storage medium.
  • the processor may be connected to the storage medium through a communication interface such as a bus.
  • a communication interface such as a bus.
  • the processor executes a corresponding function of a specific unit, And reading and running the executable code from the storage medium.
  • a portion of the storage medium for storing the executable code is preferably a non-volatile storage medium.
  • the image acquisition module 10 and the parallax estimation neural network 20 may be integrated corresponding to the same processor, or may respectively correspond to different processors; when the integration corresponds to the same processor, the processor adopts time-division processing to the image acquisition module 10 Functions corresponding to the parallax estimation neural network 20.
  • the image parallax estimation device can use a parallax estimation neural network composed of a primary feature extraction module, a semantic feature extraction module, and a parallax regression module to input first and second perspective images, and can quickly output a parallax prediction map.
  • a parallax estimation neural network composed of a primary feature extraction module, a semantic feature extraction module, and a parallax regression module to input first and second perspective images, and can quickly output a parallax prediction map.
  • the semantic feature map is embedded, that is, the semantic consistency constraint is added, which overcomes the problem to a certain extent
  • the local ambiguity problem can improve the accuracy of parallax prediction and the accuracy of the final parallax prediction.
  • An embodiment of the present application further describes an image parallax estimation device, the device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the foregoing when the program is executed.
  • An image parallax estimation method provided by any one of the technical solutions.
  • the processor executes the program, it implements: performing feature extraction processing on the second perspective image to obtain second perspective feature information; and associating the second perspective feature information with the second perspective feature information based on the first perspective feature information And processing to obtain the related information.
  • the processor executes the program, it implements: performing hybrid processing on the first perspective feature information, the first perspective semantic segmentation information, and the association information to obtain hybrid feature information;
  • the mixed feature information is described to obtain parallax prediction information.
  • the processor when the processor executes the program, the processor implements: training the parallax estimation neural network based on the parallax prediction information.
  • the processor when the processor executes the program, it implements: performing semantic segmentation processing on the second perspective image to obtain second perspective semantic segmentation information; and based on the second perspective semantic segmentation information and the parallax Predict the information to obtain the reconstructed semantic information from the first perspective; reconstruct the semantic information based on the first perspective, and adjust the network parameters of the parallax estimation neural network.
  • the processor executes the program, it realizes: reconstructing semantic information based on the first perspective to determine a semantic loss value; and adjusting network parameters of the parallax estimation neural network based on the semantic loss value.
  • the processor executes the program, it is implemented: reconstructing semantic information based on the first perspective and a first semantic label of the first perspective image, and adjusting network parameters of the parallax estimation neural network; Or reconstruct the semantic information and the first perspective semantic segmentation information based on the first perspective and adjust the network parameters of the parallax estimation neural network.
  • a reconstructed image of a first perspective is obtained based on the parallax prediction information and the second perspective image; and the reconstructed image according to the first perspective and the first
  • a luminosity difference between two perspective images is used to determine a luminosity loss value
  • a smoothing loss value is determined based on the parallax prediction information
  • the parallax estimation neural network network is adjusted according to the luminosity loss value and the smoothing loss value.
  • the processor executes the program, it implements: training a parallax estimation neural network for implementing the method based on the parallax prediction information and labeled parallax information; the first perspective image and the The second perspective image corresponds to the labeled parallax information.
  • the processor executes the program, it is implemented: determining a parallax regression loss value based on the parallax prediction information and labeled parallax information; and adjusting the parallax estimation neural network based on the parallax regression loss value.
  • Network parameters determining a parallax regression loss value based on the parallax prediction information and labeled parallax information.
  • the image parallax estimation device provided in the embodiment of the present application can improve the accuracy of the parallax prediction and the accuracy of the final parallax prediction.
  • the embodiment of the present application also describes a computer storage medium, wherein the computer storage medium stores computer-executable instructions, and the computer-executable instructions are used to execute the image parallax estimation methods described in the foregoing embodiments. That is, after the computer-executable instructions are executed by the processor, the image parallax estimation method provided by any one of the foregoing technical solutions can be implemented.
  • the parallax estimation neural network is applied to an unmanned driving platform, and in the face of road traffic scenes, a parallax map in front of the vehicle body is output in real time, and the distance of each target and position in front can be further estimated. For more complicated conditions, such as large targets, occlusions, etc., the parallax estimation neural network can also effectively give reliable parallax prediction.
  • a parallax estimation neural network can give accurate parallax prediction results, especially for locally ambiguous locations (glare, mirror, large target), which can still Gives reliable parallax values. In this way, the smart car can obtain clearer and clearer surrounding environment information and road condition information, and perform unmanned driving based on the surrounding environment information and road condition information, thereby improving driving safety.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed components are coupled, or directly coupled, or communicated with each other through some interfaces.
  • the indirect coupling or communication connection of the device or unit may be electrical, mechanical, or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed across multiple network units; Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer-readable storage medium.
  • the execution includes The steps of the above method embodiment; and the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random access Memory), a magnetic disk, or an optical disk, etc.
  • the medium on which the program code is stored includes: a mobile storage device, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random access Memory), a magnetic disk, or an optical disk, etc.
  • the above-mentioned integrated unit of the present application is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device) is caused to perform all or part of the methods described in the embodiments of the present application.
  • the foregoing storage medium includes: various types of media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种图像视差估计方法及装置、存储介质,其中,所述的方法包括:获取目标场景的第一视角图像和第二视角图像;对所述第一视角图像进行特征提取处理,得到第一视角特征信息;对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。

Description

图像视差估计 技术领域
本申请涉及计算机视觉技术领域,具体涉及一种图像视差估计方法及装置、存储介质。
背景技术
视差估计是计算机视觉的基础研究问题,在诸多领域中有着深入的应用,例如深度预测、场景理解等等。大多数方法会将视差估计任务作为一个匹配问题,从这个角度出发,这些方法利用稳定可靠的特征来表示图像块,并从立体图像中寻找近似的图像块作为匹配,进而计算视差值。
发明内容
本申请提供一种图像视差估计的技术方案。
第一方面,本申请实施例提供了一种图像视差估计方法,所述方法包括:获取目标场景的第一视角图像和第二视角图像;对所述第一视角图像进行特征提取处理,得到第一视角特征信息;对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。
上述方案中,可选地,所述方法还包括:对所述第二视角图像进行特征提取处理,得到第二视角特征信息;基于所述第一视角特征信息与所述第二视角特征信息进行关联处理,得到所述关联信息。
上述方案中,可选地,基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的所述关联信息,得到所述第一视角图像和所述第二视角图像的所述视差预测信息,包括:对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;基于所述混合特征信息,得到视差预测信息。
上述方案中,可选地,所述图像视差估计方法通过视差估计神经网络实现,所述方法还包括:基于所述视差预测信息,训练所述视差估计神经网络。
上述方案中,可选地,基于所述视差预测信息,训练所述视差估计神经网络,包括:对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,包括:基于所述第一视角重建语义信息,确定语义损失值;基于所述语义损失值,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,还包括:基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,基于所述视差预测信息,训练所述视差估计神经网络,包括:基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像之间的光度差,确定光度损失值;基于所述视差预测信息,确定平滑损失值;根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,所述第一视角图像和所述第二视角图像对应于标注视差信息,所述方法还包括:基于所述视差预测信息和所述标注视差信息,训练用于实现所述方法的视差估计神经网络。
上述方案中,可选地,基于所述视差预测信息和所述标注视差信息,训练所述视差估计神经网络,包括:基于所述视差预测信息与所述标注视差信息,确定视差回归损失值;根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
第二方面,本申请实施例提供了一种图像视差估计装置,所述装置包括:图像获取模块,用于获取目标场景的第一视角图像和第二视角图像;视差估计神经网络,用于根据所述第一视角图像和所述第二视角图像得到视差预测信息,包括:初级特征提取模块,用于对所述第一视角图像进行特征提取处理,得到第一视角特征信息;语义特征提取模块,用于对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;视差回归模块,用于基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图 像的视差预测信息。
上述方案中,可选地,所述初级特征提取模块,还用于对所述第二视角图像进行特征提取处理,得到第二视角特征信息;所述视差回归模块还包括:关联特征提取模块,用于基于所述第一视角特征信息与所述第二视角特征信息进行关联处理,得到所述关联信息。
上述方案中,可选地,所述视差回归模块,还用于:对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;基于所述混合特征信息,得到所述视差预测信息。
上述方案中,可选地,所述装置还包括:第一网络训练模块,用于基于所述视差预测信息,训练所述视差估计神经网络。
上述方案中,可选地,所述第一网络训练模块,还用于:对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,所述第一网络训练模块,还用于:基于所述第一视角重建语义信息,确定语义损失值;基于所述语义损失值,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,所述第一网络训练模块,还用于:基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,所述第一网络训练模块,还用于:基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失值;基于所述视差预测信息,确定平滑损失值;根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
上述方案中,可选地,所述装置还包括:第二网络训练模块,用于基于所述视差预测信息和标注视差信息,训练所述视差估计神经网络,所述第一视角图像和所述第二视角图像对应于标注视差信息。
上述方案中,可选地,所述第二网络训练模块,还用于:基于所述视差预测信息与 标注视差信息,确定视差回归损失值;根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
第三方面,本申请实施例提供了一种图像视差估计装置,所述装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请实施例所述的图像视差估计方法的步骤。
第四方面,本申请实施例提供了一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行本申请实施例所述的图像视差估计方法的步骤。
本申请提供的技术方案,获取目标场景的第一视角图像和第二视角图像;对所述第一视角图像进行特征提取处理,得到第一视角特征信息;对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息;能提高视差预测的准确率。
附图说明
图1为本申请实施例提供的一种图像视差估计方法的实现流程示意图。
图2为本申请实施例提供的视差估计系统架构示意图。
图3A-图3D为本申请实施例提供的KITTI Stereo数据集上采用现有预测方法与本申请预测方法的效果对比图。
图4A和图4B为本申请实施例提供的在KITTI Stereo测试集上有监督的定性结果,其中,图4A为KITTI 2012测试数据定性结果,图4B为KITTI 2015测试数据定性结果。
图5A-图5C为本申请实施例提供的CityScapes验证集上的无监督定性结果。
图6为本申请实施例提供的一种图像视差估计装置的组成结构示意图。
具体实施方式
为了更好的解释本申请,下面,先介绍一些视差估计方法的例子。
视差估计是计算机视觉中的基本问题。它具有广泛的应用,包括深度预测、场景理解和自动驾驶。视差估计的主要过程是从立体图像对的左右图像中找出匹配的像素,匹 配像素间的距离即为视差。大多数视差估计方法主要依靠设计可靠的特征来表示图像块,然后在左右图像上选择匹配的图像块,进而计算视差。这些方法中,大部分采用有监督的学习方式来训练神经网络预测视差,也有少部分方法尝试使用无监督方法进行训练。
最近,随着深度神经网络的发展,视差估计的性能大大提高。得益于深度神经网络在提取图像特征时较好的鲁棒性,可以实现更加准确可靠的匹配图像块的搜索与定位。
但是,尽管给定了特定的局部搜索范围,且深度学习方法本身具有较大的感受野,仍然难以克服局部歧义的问题,局部歧义主要来自于图像中的无纹理区域。例如,对道路中心、车辆中心、强光区域、阴影区域的视差预测往往是不正确的,这主要是因为这些区域缺乏足够的纹理信息,光度一致性损失不足以帮助神经网络寻求正确的匹配位置。并且,而这个问题在以有监督学习或无监督学习的方式训练神经网络中都会遇到。
基于此,本申请提出了一种利用语义信息的图像视差估计的技术方案。
下面结合附图和具体实施例对本申请的技术方案进一步详细阐述。
本申请实施例提供一种图像视差估计方法,如图1所示,所述方法主要包括以下步骤。
步骤101、获取目标场景的第一视角图像和第二视角图像。
这里,所述第一视角图像和所述第二视角图像是由双目视觉系统中的两个摄像机或两个相机在同一时刻所采集到的关于同一时空场景图像。
例如,所述第一视角图像可以是所述双目视觉系统中的第一摄像机采集的图像,所述第二视角图像可以是所述双目视觉系统中的第二摄像机采集的图像。
第一视角图像和第二视角图像表示针对同一场景在不同视角采集到的图像。第一视角图像和第二视角图像可以分别为左视角图像和右视角图像。具体地,所述第一视角图像可以是左视角图像,对应的,所述第二视角图像可以是右视角图像;或者,所述第一视角图像可以是右视角图像,对应的,所述第二视角图像可以是左视角图像。本申请实施例对第一视角图像和第二视角图像的具体实现不作限定。
这里,所述场景包括辅助驾驶场景、机器人跟踪场景、机器人定位场景等。本申请对此不作限定。
步骤102、对所述第一视角图像进行特征提取处理,得到第一视角特征信息。
步骤102可以利用卷积神经网络来实现。例如,所述第一视角图像可以输入到视差 估计神经网络中进行处理,为了便于描述,下文中将该视差估计神经网络命名为SegStereo网络。
第一视角图像可以作为视差估计神经网络中的用于进行特征提取处理的第一子网络的输入。具体地,向所述第一子网络输入第一视角图像,经过多层卷积运算或者在卷积处理的基础上进一步地经过其他处理之后得到第一视角特征信息。
这里,所述第一视角特征信息为第一视角初级特征图,或者,第一视角特征信息和第二视角特征信息可以为三维张量,并且包含至少一个矩阵,本公开实施例对第一视角特征信息的具体实现不做限定。
利用视差估计神经网络的特征提取网络或卷积子网络提取第一视角图像的特征信息或初级特征图。
步骤103、对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息。
SegStereo网络至少包括2个子网络,分别记为第一子网络和第二子网络;所述第一子网络可以是特征提取网络,所述第二子网络可以是语义分割网络。所述特征提取网络能够得到视角初级特征图,所述语义分割网络能够得到语义特征图。示例性地,第一子网络可以利用PSPNet-50(Pyramid Scene Parsing Network)的至少一部分实现,第二子网络的至少一部分也可以利用PSPNet-50实现,也就是说第一子网络和第二子网络可以共享PSPNet-50的部分结构。但本申请实施例对SegStereo网络的具体实现不作限定。
可以将第一视角图像输入到语义分割网络中进行语义分割处理,得到第一视角语义分割信息。
也可以将第一视角特征信息输入到语义分割网络中进行语义分割处理,得到第一视角语义分割信息。相应地,对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息,包括:基于第一视角特征信息,得到第一视角语义分割信息。
第一视角语义分割信息可以为三维张量或者第一视角语义特征图,本公开实施例对第一视角语义分割信息的具体实现不作限定。
第一视角初级特征图可以作为视差估计神经网络中的用于进行语义信息提取处理的第二子网络的输入。具体地,向第二子网络输入第一视角特征信息或第一视角初级特征图,经过多层卷积运算或者在卷积处理的基础上进一步地经过其他处理之后得到第一视角语义分割信息。
步骤104、基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。
可以对第一视角图像和第二视角图像进行关联处理,得到第一视角图像和第二视角图像的关联信息。
也可以基于第一视角特征信息与第二视角特征信息进行关联处理,得到所述第一视角图像和所述第二视角图像的关联信息;其中,所述第二视角特征信息是经对所述第二视角图像进行特征提取处理得到的。第二视角特征信息可以为第二视角初级特征图;或者,第二视角特征信息可以为三维张量,并且包含至少一个矩阵。本公开实施例对第二视角特征信息的具体实现不做限定。
第二视角图像可以作为视差估计神经网络中的用于进行特征提取处理的第一子网络的输入。具体地,向所述第一子网络输入第二视角图像,经过多层卷积运算之后得到第二视角特征信息。然后,基于所述第一视角特征信息与所述第二视角特征信息进行关联计算,得到所述第一视角图像和所述第二视角图像的关联信息。
基于所述第一视角特征信息与所述第二视角特征信息进行关联计算,包括:对所述第一视角特征信息与所述第二视角特征信息中可能匹配的图像块进行关联计算,得到关联信息。也就是说,对第一视角特征信息与第二视角特征信息做相关(correlation)计算,得到关联信息,关联信息主要用于匹配特征的提取。关联信息可以为关联特征图。
第一视角初级特征图和第二视角初级特征图可以作为视差估计神经网络中的用于关联运算的关联运算模块的输入。例如,向图2所示关联运算模块240输入第一视角初级特征图和第二视角初级特征图,经过关联运算之后得到所述第一视角图像和所述第二视角图像的关联信息。
基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息,包括:对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;基于所述混合特征信息,得到视差预测信息。
这里的混合处理可以为连接处理,例如融合或者按通道叠加,等等,本公开实施例对此不做限定。
在对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混 合处理之前,可以对第一视角特征信息、第一视角语义分割信息和关联信息中的一项或者多项进行转换处理,以使得经过所述转换处理后的第一视角特征信息、第一视角语义分割信息和关联信息具有相同的维度。
所述方法还可包括:对所述第一视角特征信息进行转换处理,得到第一视角转换特征信息。此时,可以对第一视角转换特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息。比如,对所述第一视角特征信息进行空间转换处理,得到第一视角转换特征信息,其中,该第一视角转换特征信息的维度是预设的。
可选地,第一视角转换特征信息可以为第一视角转换特征图,本公开实施例对第一视角转换特征信息的具体实现不作限定。
例如,对所述第一子网络输出的所述第一视角特征信息,再经过一个卷积层的卷积运算之后,得到第一视角转换特征信息。可采用卷积模块对第一视角特征信息进行处理,得到第一视角转换特征信息。
可选地,混合特征信息可以为混合特征图,本公开实施例对混合特征信息的具体实现不作限定。视差预测信息可以为视差预测图,本公开实施例对视差预测信息的具体实现不作限定。
SegStereo网络除包括第一子网络和第二子网络外,还包括第三子网络。所述第三子网络用于确定第一视角图像和第二视角图像的视差预测信息,所述第三子网络可以是视差回归网络。
具体地,向所述视差回归网络输入所述第一视角转换特征信息、所述关联信息、所述第一视角语义分割信息,所述视差回归网络将这些信息合并成混合特征信息,基于所述混合特征信息回归得到视差预测信息。
基于所述混合特征信息,利用图2所示视差回归网络中的残差网络和反卷积模块250预测得到视差预测信息。
也就是说,可以将第一视角转换特征图、关联特征图、第一视角语义特征图合并,得到混合特征图,从而实现语义特征的嵌入。在得到混合特征图之后,继续利用视差回归网络的残差网络以及反卷积结构,最终输出视差预测图。
SegStereo网络主要采用了残差结构,能够提取更具辨识度的图像特征,并且在提取第一视角图像和第二视角图像的关联特征的同时,嵌入了高层的语义特征,从而提高了预测的准确性。
上述方法可以为视差估计神经网络的应用过程,即利用训练好的视差估计神经网络对待处理图像对进行视差估计的方法。在一些例子中,上述方法可以为视差估计神经网络的训练过程,即上述方法也可以应用于训练视差估计神经网络,此时,第一视角图像和第二视角图像为样本图像。
本公开实施例中,可以通过无监督方式训练预定义的神经网络,得到包含所述第一子网络、所述第二子网络和所述第三子网络的视差估计神经网络。或者,通过有监督方式训练视差估计神经网络,得到包含所述第一子网络、所述第二子网络和所述第三子网络的视差估计神经网络。
所述方法还包括:基于所述视差预测信息,训练所述视差估计神经网络。
基于所述视差预测信息,训练所述视差估计神经网络,包括:对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。第一视角重建语义信息可以是重建的第一语义特征图。
可以对第二视角图像进行语义分割处理,得到第二视角语义分割信息。
也可以将第二视角特征信息输入到语义分割网络中进行处理,得到第二视角语义分割信息。相应地,对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息,包括:基于第二视角特征信息,得到第二视角语义分割信息。
可选地,第二视角语义分割信息可以为三维张量或者第二视角语义特征图,本公开实施例对第二视角语义分割信息的具体实现不作限定。
第二视角初级特征图可以作为视差估计神经网络中的用于进行语义信息提取处理的第二子网络的输入。具体地,向第二子网络输入第二视角特征信息或第二视角初级特征图,经过多层卷积运算或者在卷积处理的基础上进一步地经过其他处理之后得到第二视角语义分割信息。
利用视差估计神经网络的语义分割网络或卷积子网络提取第一视角语义特征图和第二视角语义特征图。
可将第一视角特征信息和第二视角特征信息接入到语义分割网络,由语义分割网络输出第一视角语义分割信息和第二视角语义分割信息。
可选地,基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,包括:基于所述第一视角重建语义信息,确定语义损失值;结合所述语义损失值,调整所述视差估计神经网络的网络参数。
基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,包括:基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
可选地,基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,包括:基于所述第一视角重建语义信息和所述第一视角语义分割信息二者之间的差异,确定语义损失值;结合所述语义损失值,调整所述视差估计神经网络的网络参数。
可选的,基于预测得到的视差预测信息和第二视角的语义分割信息进行重建操作,得到第一视角重建语义信息;还可以将所述第一视角重建语义信息与真实的第一语义标签进行比较,得到语义损失值,结合所述语义损失值,调整所述视差估计神经网络的网络参数。该真实的第一语义标签是手动进行标注的,此处的无监督学习方式是针对视差的无监督学习,而非针对语义分割信息的无监督学习。
语义损失也可以为交叉熵损失,但本公开实施例对语义损失的具体实现不作实现。
在训练视差估计神经网络时,定义了用以计算语义损失的函数,该函数可以引入丰富的语义一致性信息,从而使得训练出来的网络可以克服常见的局部歧义问题。
基于所述视差预测信息,训练所述视差估计神经网络,包括:基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失值;基于所述视差预测信息,确定平滑损失值;根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
通过对所述视差预测信息中不平滑区域施加约束,可以确定平滑损失。
基于预测得到的视差预测信息和真实的第二视角图像进行重建操作,得到第一视角重建图像;比较所述第一视角重建图像与真实的第一视角图像之间的光度差,可以得到光度损失。
通过重建图像度量光度差的方式,能够以无监督的方式训练网络,从而在很大程度上减少了对于真值图像的依赖。
基于所述视差预测信息,训练所述视差估计神经网络,还包括:基于所述视差预测信息和所述第二视角图像进行重建操作,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失;通过对所述视差预测信息中不平滑区域施加约束,确定平滑损失;基于所述第一视角重建语义信息和所述真实的第一语义标签二者之间的差异,确定语义损失;根据所述光度损失、所述平滑损失和所述语义损失,确定总体损失;基于所述总体损失最小化来训练视差估计神经网络。其中,训练时所采用的训练集无需提供真值视差图像。
这里,所述总体损失等于各个损失的加权和。
如此,不需要提供真值视差图像,可以根据重建图像与原图像的光度差来训练网络;在提取第一视角图像和第二视角图像的关联特征时,嵌入了语义特征图,并且定义了语义损失,结合低层纹理信息与高层语义信息,增加了语义一致性约束,使得训练出来的神经网络在大目标区域的视差预测水平有所提高,在一定程度上克服了局部歧义问题。
可选地,所述训练视差估计神经网络的方法还包括:基于所述视差预测信息,通过有监督方式训练所述视差估计神经网络。
具体地,所述第一视角图像和所述第二视角图像对应于标注视差信息,基于所述视差预测信息和所述标注视差信息,训练所述视差估计神经网络。
可选地,基于所述视差预测信息和标注视差信息,训练所述视差估计神经网络,包括:基于所述视差预测信息与标注视差信息,确定视差回归损失值;基于所述视差预测信息,确定平滑损失值;根据所述视差回归损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
可选地,基于所述视差预测信息和标注视差信息,训练所述视差估计神经网络,包括:基于所述视差预测信息与标注视差信息,确定视差回归损失;通过对视差预测信息中不平滑区域施加约束,确定平滑损失;基于第一视角重建语义信息和真实的第一语义标签二者之间的差异,确定语义损失;根据所述视差回归损失、所述语义损失和所述平滑损失,确定有监督方式训练下的总体损失;基于所述总体损失最小化来训练视差估计神经网络;其中,训练时所采用的训练集需要提供标注视差信息。
可选地,基于所述视差预测信息和标注视差信息,训练所述视差估计神经网络,包括:基于所述视差预测信息与标注视差信息,确定视差回归损失;通过对视差预测信息中不平滑区域施加约束,确定平滑损失;基于所述第一视角重建语义信息和所述第一 视角语义分割信息二者之间的差异,确定语义损失;根据所述视差回归损失、所述语义损失和所述平滑损失,确定有监督方式训练下的总体损失;基于所述总体损失最小化来训练视差估计神经网络;其中,训练时所采用的训练集需要提供标注视差信息。
如此,可以通过有监督方式训练得到视差估计神经网络,对于有真值信号的位置,计算预测值与真实值之差,作为有监督的视差回归损失,此外,无监督训练的语义损失与平滑损失仍然适用。
所述第一子网络、所述第二子网络和所述第三子网络均是对视差估计神经网络进行训练得到的子网络。对于不同子网络,即对于第一子网络、第二子网络和第三子网络,不同子网络的输入和输出内容是不一样的,但是,它们针对的都是同一目标场景。
对视差估计神经网络进行训练的方法包括:采用训练样本集对视差估计神经网络同时进行视差预测图训练与语义特征图训练,以得到所述第一子网络、所述第二子网络和第三子网络的优化后的参数。
对视差估计神经网络进行训练的方法包括:先采用训练样本集对视差估计神经网络进行语义特征图训练;再采用所述训练样本集对经过语义特征图预测训练的视差估计神经网络进行视差预测图训练,以得到所述第二子网络和所述第一子网络的优化后的参数。
也就是说,在对视差估计神经网络进行训练时,可分阶段对其进行语义特征图预测训练与视差预测图训练。
本申请实施例提出的基于语义信息的图像视差估计方法,利用端到端的视差预测神经网络,输入立体图像对的左右视角的图像,可以直接得到视差预测图,能满足实时性需求。同时,通过重建图像和原图像度量光度差的方式,能够以无监督的方式训练网络,很大程度上减少了对于真值图像的依赖。另外,在提取左右视角图像关联特征时,嵌入了语义特征图,并且定义了语义损失,结合低层纹理信息与高层语义信息,增加了语义一致性约束,提高了网络在大目标区域如大的路面、大车等的视差预测水平,在一定程度上克服了局部歧义问题。
图2示出了一种视差估计系统架构示意图,将该视差估计系统架构记为SegStereo视差估计系统架构,该SegStereo视差估计系统架构适合于无监督学习和有监督学习。
首先,给出视差估计神经网络基本的网络结构;然后,详细介绍在该视差估计 神经网络中如何引入语义线索策略;最后,展示如何在无监督和有监督的方式下训练视差估计神经网络时使用的损失项的计算方法。
首先描述视差估计神经网络的基本结构。
整个系统架构示意图如图2所示,经过预校准的立体图像对可以包括第一视角图像(或称为左视角图像)I l和第二视角图像(或称为右视角图像)I r。可以采用一个浅层的神经网络210来提取初级图像特征图,将第一视角图像I l输入该浅层的神经网络210,得到第一视角初级特征图F l,将第二视角图像I r输入该浅层的神经网络210,得到第二视角初级特征图F r。其中,第一视角初级特征图可表示前述第一视角特征信息,第二视角初级特征图可表示前述第二视角特征信息。浅层神经网络210可以是卷积核为3*3*256的卷积块,该卷积块包括卷积层以及批量归一化和修正线性单元(ReLU,Rectified linear unit)层。浅层神经网络210可以是第一子网络。
在初级特征图的基础之上,利用一个训练好的的语义分割网络220提取语义特征图,该语义分割网络220可以用部分PSPNeT-50网络实现。将第一视角初级特征图F l输入语义分割网络220,可以得到第一视角语义特征图
Figure PCTCN2019097307-appb-000001
将第二视角初级特征图F r输入语义分割网络220,可以得到第二视角语义特征图
Figure PCTCN2019097307-appb-000002
为了保留第一视角图像的细节,对于第一视角初级特征图F l,可以使用另一个卷积块230计算第一视角转换特征图
Figure PCTCN2019097307-appb-000003
这里,相对于原始图像的尺寸,初级特征图、语义特征图和转换特征图的尺寸有缩减,例如是原始图像的尺寸的1/8。其中,第一视角初级特征图、第二视角初级特征图、第一语义特征图、第二语义特征图和第一视角转换特征图的尺寸一致。第一视角图像和第二视角图像的尺寸一致。
可以使用关联模块240来计算第一视角初级特征图F l和第二视角初级特征图F r之间的匹配代价,得到关联特征图F c。关联模块240可以应用光流预测网络(Flow Net)中使用的相关方法来计算两幅特征图的相关性。在相关运算F l⊙F r中,最大视差参数设置可为d。由此可得到如尺寸为h×w×(d+1)的关联特征图F c,其中h为第一视角初级特征图F l的高度,w为第一视角初级特征图F l的宽度。
将第一视角转换特征图
Figure PCTCN2019097307-appb-000004
第一视角语义特征图
Figure PCTCN2019097307-appb-000005
与关联特征图F c拼接,可以得到混合特征图(或称为混合特征信息表示)F h。将混合特征图F h送入后续的残差网络和反卷积模块250,可得到尺寸与第一视角图像I l的原始尺寸相同的视差图D。
下面详细描述本申请提供的语义特征对视差估计神经网络的作用,以及在视差网络应用语义特征的模块。
如前所述,由于视差估计的难点在于局部歧义的问题,局部歧义主要来自于图像中的相对模糊的无纹理区域。这些区域内部具有连续性,这些区域在分割中具有明确的语义含义。所以可以使用语义线索来帮助预测和纠正最终的视差图。可以以两种方式整合这些语义线索。第一方面,在特征学习过程将语义线索嵌入到视差预测图中。第二方面,通过在损失项的计算中引入语义线索,来指导神经网络的训练过程。
首先介绍第一方面,如何在特征学习过程将语义线索嵌入到视差预测图中。
如前所述,参考图2,输入的立体图像对包括第一视角图像和第二视角图像,通过浅层神经网络210可以分别得到第一视角初级特征图和第二视角初级特征图,然后可以使用语义分割网络220分别提取第一视角初级特征图和第二视角初级特征图的语义特征,得到第一视角语义特征图和第二视角语义特征图。在输入的立体图像对上采用已训练好的浅层神经网络210和语义分割网络220(例如,可以由PSP Net-50框架实现)提取特征,并将语义分割网络220最终的特征映射(即conv5_4特征)的输出作为第一视角语义特征图
Figure PCTCN2019097307-appb-000006
和第二视角语义特征图
Figure PCTCN2019097307-appb-000007
浅层神经网络210可以使用PSP Net-50网络的一部分,将该网络的中间特征(即conv3_1特征)的输出作为第一视角初级特征图F l和第二视角初级特征图F r。为了嵌入语义特征,可以在第一视角语义特征图
Figure PCTCN2019097307-appb-000008
上进行卷积操作,例如可以应用具有卷积核的大小为1×1×128的一个卷积块进行卷积操作,得到变换后的第一语义特征图
Figure PCTCN2019097307-appb-000009
(图2中未示出)。然后,将
Figure PCTCN2019097307-appb-000010
与第一视角转换特征图
Figure PCTCN2019097307-appb-000011
和关联特征图F c连接起来,得到混合特征图(或称为混合特征信息表示)F h,并将所得到的混合特征图F h馈送到视差回归网络的其余部分、例如后续的残差网络和反卷积模块250。
然后介绍第二方面,如何在损失项的计算中引入语义线索以训练神经网络。
在训练视差估计神经网络时,损失项引入语义线索,还可以帮助指导视差学习。语义线索可以用语义交叉熵损失L seg来表征。可以利用图2中的重建模块260进行重建操作,作用在第二视角语义特征图和视差预测图上,得到重建的第一语义特征图,然后可以利用第一视角语义特征图的真值语义标签,来度量语义交叉熵损失L seg。第二视角语义特征图
Figure PCTCN2019097307-appb-000012
的尺寸是原图、即第二视角图像的尺寸大小的1/8,视差预测图D和第二视角图像的尺寸相同、即是全尺寸的。为了进行特征重建,首先将第二视角语义特征图 上采样到全尺寸,然后将特征重建应用于上采样的全尺寸的第二视角语义特征图以及视差预测图D,得到全尺寸的重建第一视角语义特征图。将该重建第一视角语义特征图进行下采样,缩放到1/8的全尺寸,从而得到重建的第一语义特征图
Figure PCTCN2019097307-appb-000013
然后采用卷积核大小为1×1×C的卷积分类器来正则化视差学习,其中C是语义类的个数。最后使用softmax损失函数的形式表示语义交叉熵损失L seg
对于本例的视差估计神经网络的训练来讲,损失项除了语义交叉熵损失之外,还包括其他参数。上述语义信息可以结合到无监督方式和有监督方式的模型训练中。介绍这两个方式下的总体损失的计算方法如下。
无监督方式
输入的立体图像对包括两幅图像,其中的一幅图像可以利用视差预测图从另一幅图像重建,理论上重建得到的图像应该接近原始输入的图像。利用光度一致性,以帮助在无监督的方式中学习视差。假设给定视差预测图D,在第二视角图像I r上应用例如图2所示在重建模块260进行图像重建操作,并得到第一视角重建图像
Figure PCTCN2019097307-appb-000014
然后采用L1范数来正则化光度一致性,得到的光度损失L p如公式(1)所示:
Figure PCTCN2019097307-appb-000015
其中,N是像素的数量,i和j是像素的索引,|| || 1是L1范数。
光度一致性能够以无监督方式进行视差学习。如果L p中没有正则化项来估计视差局部平滑度,那么局部视差可能是不连续的。为了弥补这个问题,可以利用L1范数,对于视差预测图的梯度图
Figure PCTCN2019097307-appb-000016
的平滑度进行惩罚或约束,得到的平滑损失L s为如公式(2)所示:
Figure PCTCN2019097307-appb-000017
其中,ρ s(·)是用泛化Charbonnier函数实现的空间平滑惩罚函数。
为了利用语义线索,考虑到语义特征嵌入和语义损失,在每一个像素位置上,针对每一种可能的语义类别都有一个预测值。语义类别可以是路面、车辆、建筑等,同时使用真实标签来标记语义类别,真实标签可以是一个类别编号。真实标签上的预测值 最大。语义交叉熵损失L seg如公式(3)所示:
Figure PCTCN2019097307-appb-000018
其中,
Figure PCTCN2019097307-appb-000019
这里,f yi是真实标签,yj为类别编号,f yj是类别为yj的激活值(activation),i为像素索引,定义单个像素的softmax损失如下:对于整张图像,针对带标签的像素位置计算softmax损失,带标签的像素集合为N v
无监督方式下的总体损失L unsup,包含光度损失L p、平滑损失L s和语义交叉熵损失L seg。为了平衡不同损失分支的学习,为光度损失L p引入损失权重λ p,为平滑损失L s引入损失权重λ s,为语义交叉熵损失L seg引入损失权重λ seg。因此,总体损失L unsup如公式(4)所示:
L unsup=λ pL psL ssegL seg       (4)
然后,基于总体损失L unsup最小化来训练视差预测神经网络,从而得到预设的视差预测神经网络。具体的训练方法可以使用本领域技术人员常见的方法,在此不再赘述。有监督的方式
本申请提出的用于帮助视差预测的语义线索,在有监督的方式下也能很好地发挥作用。
在有监督的方式下,对于一个立体图像对的样本,除了第一视角图像和第二视角图像外,还同时提供该立体图像对的真值视差图像
Figure PCTCN2019097307-appb-000020
因此,可以直接采用L1范数来正则化预测回归。视差回归损失L r可表示为如下公式(5):
Figure PCTCN2019097307-appb-000021
有监督方式下的总体损失L sup,包含光视差回归损失L r、平滑损失L s和语义交叉熵损失L seg。为了平衡不同损失的学习,为视差回归损失L r引入损失权重λ r,为平滑损失L s引入损失权重λ s,为语义交叉熵损失L seg引入损失权重λ seg。因此,总体损失L sup如公式(6)所示:
L sup=λ rL rsL ssegL seg       (6)
然后,基于总体损失L sup最小化来训练视差预测神经网络,从而得到预设的视差预测神经网络。同样的,具体的训练方法可以使用本领域技术人员常见的方法,在此不再赘述。
本申请提供的视差预测神经网络在提取左右视角图像的关联信息的同时,嵌入了高层的语义特征,这有助于提高视差图的预测精度。并且,在训练网络时,定义了用于计算语义交叉熵损失的函数,该函数可以引入丰富的语义一致性信息,从而可有效克服常见的局部歧义问题。此外,在采用无监督的学习方式时,由于可以根据重建图像与原始图像的光度差来训练网络输出正确的视差值,不需要提供大量的真值视差图像,可有效降低训练复杂度和计算成本。
需要说明的是,本技术方案的主要贡献至少包括下述几部分:
提出的SegStereo框架,将语义分割信息合并到视差估计中,其中语义一致性可以作为视差估计的主动引导;语义特征嵌入策略和语义损失函数softmax可以在无监督或有监督的方式下帮助训练网络;提出的视差估计方法能在KITTI Stereo2012和2015的基准中获得最先进的成果;在CityScapes数据集上的预测也显示该方法的有效性。其中,KITTI Stereo数据集是自动驾驶场景下的计算机视觉算法评测数据集,该数据集除了提供生数据格式的数据外,还为每项任务提供了基准。CityScapes数据集是一个面向城市道路街景语义理解的数据集。
图3A-图3D示出了KITTI Stereo数据集上采用现有预测方法与本申请预测方法的效果对比图,其中,图3A和图3B表示输入的立体图像对,图3C表示根据现有预测方法对图3A和图3B进行处理后得到的误差图,图3D表示根据本申请预测方法对图3A和图3B进行处理后得到的误差图。其中,误差图是重建图像与输入的原始图像相减得到的。图3C中的右下方的深色区域表示错误的预测区域。相比于图3C,从图3D可以看出,右下方的错误区域大幅减少。因此在语义线索的指导下,SegStereo网络的视差估计更加准确,特别是在局部模糊区域。
图4A和图4B示出了KITTI测试集的几个定性例子,通过本申请提供的方法,SegStereo网络处理具有挑战性的复杂场景也可以得到较好的视差估计结果。图4A示出了KITTI 2012测试数据定性结果,如图4A所示,从左到右依次为:第一视角图像、视差预测图、误差图。图4B示出了KITTI 2015测试数据定性结果,如图4B所示,从左 到右依次为:第一视角图像、视差预测图、误差图。从图4A和图4B可以看出,在KITTI Stereo测试集上有监督的定性结果。通过融入语义信息,本申请提出的方法能够处理复杂的场景。
SegStereo网络还可以适应其他数据集,例如可以在CityScapes验证集上测试无监督训练得到的SegStereo网络。图5A-图5C示出了CityScapes验证集上的无监督训练网络的预测结果,在图5A为第一视角图像,图5B为使用SGM算法对图5A进行处理后得到的视差预测图、图5C为使用SegStereo网络对图5A进行处理后得到的视差预测图。显然,与SGM算法相比,SegStereo网络在全局场景结构和对象细节方面产生了更好的结果。
综上,本申请提供的SegStereo视差估计架构将语义线索引入视差估计网络。具体来说,可以使用PSP Net作为分割分支以提取立体图像对的语义特征,并使用残差网络(ResNet)和关联模块(Correlation)作为视差部分以回归视差预测图。关联模块用于编码立体图像对的匹配线索。分割特征作为语义特征嵌入关联模块后面的视差分支。此外,通过语义损失正则化重建立体图像对的语义的一致性,这进一步增强视差估计的鲁棒性。语义分割网络和视差回归网络都是完全卷积的,所以该网络可以进行端到端的训练。
将语义线索纳入SegStereo网络可以用于无监督和有监督的训练。在无监督训练过程中,光度一致性损失和语义交叉熵损失均被计算并向后传播。语义特征嵌入和语义交叉熵损失都可以引入语义一致性的有利约束。此外,对于有监督的训练方案,可以采用有监督的视差回归损失而不是无监督的光度一致性损失来训练网络,这将获得KITTI Stereo基准上的先进的成果,如在KITTI Stereo2012和2015的基准中获得先进的成果。在CityScapes数据集上的预测也显示了该方法的有效性。
上述结合语义信息的立体图像的视差估计方法,首先获取目标场景的第一视角图像和第二视角图像,利用一个特征提取网络提取第一视角图像和第二视角图像的初级特征图;针对第一视角的初级特征图,增加一个卷积块得到第一视角转换特征图;在第一视角初级特征图和第二视角初级特征图的基础之上,采用关联模块计算第一视角初级特征图和第二视角初级特征图的关联特征图;再使用一个语义分割网络来获取第一视角语义特征图;将第一视角转换特征图、关联特征图与第一视角语义特征图合并起来得到混合特征图;最后利用残差网络及反卷积模块回归出视差预测图。如此,能利用由特征提取网络、语义分割网络、视差回归网络构成的视差估计神经网络,输入第一视角图像 和第二视角图像,能快速输出视差预测图,从而实现端到端的视差预测,并满足实时性需求。这里,在计算第一视角图像和第二视角图像的匹配特征时,嵌入了语义特征图,也即增加了语义一致性约束,在一定程度上克服了局部歧义问题,能提高视差预测的准确率。
应理解,图1至图2所示的例子中的各种具体实现方式可以根据其逻辑以任意方式进行组合,而非必须同时满足,也就是说,图1所示的方法实施例中的任意一个或多个步骤和/或流程可以以图2所示的例子为一种可选的具体实现方式,但不限于此。
还应理解,图1至图2所示的例子仅仅是为了示例性地本申请实施例,本领域技术人员可以基于图1至图2的例子进行各种显而易见的变化和/或替换,得到的技术方案仍属于本申请实施例的公开范围。
对应上述图像视差估计方法,本公开实施例提供了一种图像视差估计装置,如图6所示,所述装置包括以下模块。
图像获取模块10,用于获取目标场景的第一视角图像和第二视角图像。
视差估计神经网络20,用于根据所述第一视角图像和所述第二视角图像得到视差预测信息。该视差估计神经网络20包括如下的模块。
初级特征提取模块21,用于对所述第一视角图像进行特征提取处理,得到第一视角特征信息。
语义特征提取模块22,用于对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息。
视差回归模块23,用于基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。
上述方案中,可选地,所述初级特征提取模块21,还用于对所述第二视角图像进行特征提取处理,得到第二视角特征信息;所述视差回归模块23还包括:关联模块,用于:基于第一视角特征信息与第二视角特征信息进行关联处理,得到所述关联信息。
作为一种实施方式,可选地,所述视差回归模块23,还用于:对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;基于所述混合特征信息,得到视差预测信息。
上述方案中,可选地,所述装置还包括:第一网络训练模块24,用于基于所述视差预测信息,训练视差估计神经网络20。
作为一种实施方式,可选地,所述第一网络训练模块24,还用于:对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;基于所述第一视角重建语义信息,调整所述视差估计神经网络20的网络参数。
作为一种实施方式,可选地,所述第一网络训练模块24,还用于:基于所述第一视角重建语义信息,确定语义损失值;基于所述语义损失值,调整所述视差估计神经网络20的网络参数。
作为一种实施方式,可选地,所述第一网络训练模块24,还用于:基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络20的网络参数;或者基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络20的网络参数。
作为一种实施方式,可选地,所述第一网络训练模块24,还用于:基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失值;基于所述视差预测信息,确定平滑损失值;根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络20的网络参数。
上述方案中,可选地,所述装置还包括:第二网络训练模块25,用于基于所述视差预测信息和标注视差信息,训练视差估计神经网络20;所述第一视角图像和所述第二视角图像对应于标注视差信息。
作为一种实施方式,可选地,所述第二网络训练模块25,还用于:基于所述视差预测信息与标注视差信息,确定视差回归损失值;根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
本领域技术人员应当理解,图6中所示的图像视差估计装置中的各处理模块的实现功能可参照前述图像视差估计方法的相关描述而理解。本领域技术人员应当理解,图6所示的图像视差估计装置中各处理单元的功能可通过运行于处理器上的程序而实现,也可通过具体的逻辑电路而实现。
实际应用中,上述图像获取模块10获取信息的方式不同,则结构不同;从客户 端接收时,它是通信接口;自动采集时,它对应的是图像采集器。上述所述图像获取模块10、视差估计神经网络20的具体结构均可对应于处理器。所述处理器具体的结构可以为中央处理器(CPU,Central Processing Unit)、微处理器(MCU,Micro Controller Unit)、数字信号处理器(DSP,Digital Signal Processing)或可编程逻辑器件(PLC,Programmable Logic Controller)等具有处理功能的电子元器件或电子元器件的集合。其中,所述处理器包括可执行代码,所述可执行代码存储在存储介质中,所述处理器可以通过总线等通信接口与所述存储介质中相连,在执行具体的各单元的对应功能时,从所述存储介质中读取并运行所述可执行代码。所述存储介质用于存储所述可执行代码的部分优选为非易失性存储介质。
所述图像获取模块10、视差估计神经网络20可以集成对应于同一处理器,或分别对应不同的处理器;当集成对应于同一处理器时,所述处理器采用时分处理所述图像获取模块10、视差估计神经网络20对应的功能。
本申请实施例提供的图像视差估计装置,能利用由初级特征提取模块、语义特征提取模块、视差回归模块构成的视差估计神经网络,输入第一和第二视角图像,能快速输出视差预测图,从而实现端到端的视差预测,并满足实时性需求;这里,在计算第一和第二视角图像的特征时,嵌入了语义特征图,也即增加了语义一致性约束,在一定程度上克服了局部歧义问题,能提高视差预测的准确率以及最终视差预测的精确度。
本申请实施例还记载了一种图像视差估计装置,所述装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现前述任意一个技术方案提供的图像视差估计方法。
作为一种实施方式,所述处理器执行所述程序时实现:对所述第二视角图像进行特征提取处理,得到第二视角特征信息;基于第一视角特征信息与第二视角特征信息进行关联处理,得到所述关联信息。
作为一种实施方式,所述处理器执行所述程序时实现:对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;基于所述混合特征信息,得到视差预测信息。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述视差预测信息,训练所述视差估计神经网络。
作为一种实施方式,所述处理器执行所述程序时实现:对所述第二视角图像进 行语义分割处理,得到第二视角语义分割信息;基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述第一视角重建语义信息,确定语义损失值;基于所述语义损失值,调整所述视差估计神经网络的网络参数。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失值;基于所述视差预测信息,确定平滑损失值;根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述视差预测信息和标注视差信息,训练用于实现所述方法的视差估计神经网络;所述第一视角图像和所述第二视角图像对应于所述标注视差信息。
作为一种实施方式,所述处理器执行所述程序时实现:基于所述视差预测信息与标注视差信息,确定视差回归损失值;根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
本申请实施例提供的图像视差估计装置,能提高视差预测的准确率以及最终视差预测的精确度。
本申请实施例还记载了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述各个实施例所述的图像视差估计方法。也就是说,所述计算机可执行指令被处理器执行之后,能够实现前述任意一个技术方案提供的图像视差估计方法。
本领域技术人员应当理解,本实施例的计算机存储介质中各程序的功能,可参照前述各实施例所述的图像视差估计方法的相关描述而理解。
基于上述各实施例所述的图像视差估计方法和装置,下面给出具体应用在无人 驾驶领域的应用场景。
将视差估计神经网络应用到无人驾驶平台中,面对道路交通场景,实时输出车体前方的视差图,进一步地可以估计前方各个目标、位置的距离。针对更为复杂的条件,例如大目标、遮挡等情况,视差估计神经网络也能有效地给出可靠的视差预测。在安装有双目立体相机的自动驾驶平台上,面对道路交通场景,视差估计神经网络能够给出准确的视差预测结果,特别是针对局部歧义位置(强光、镜面、大目标),仍然可以给出可靠的视差值。如此,智能汽车能获得更加清晰明了的周围环境信息以及路况信息,并根据周围环境信息以及路况信息执行无人驾驶,从而提高驾驶的安全性。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申 请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (22)

  1. 一种图像视差估计方法,其特征在于,所述方法包括:
    获取目标场景的第一视角图像和第二视角图像;
    对所述第一视角图像进行特征提取处理,得到第一视角特征信息;
    对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;
    基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述第二视角图像进行特征提取处理,得到第二视角特征信息;
    基于所述第一视角特征信息与所述第二视角特征信息进行关联处理,得到所述关联信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的所述关联信息,得到所述第一视角图像和所述第二视角图像的所述视差预测信息,包括:
    对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;
    基于所述混合特征信息,得到所述视差预测信息。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述图像视差估计方法通过视差估计神经网络实现,所述方法还包括:
    基于所述视差预测信息,训练所述视差估计神经网络。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述视差预测信息,训练所述视差估计神经网络,包括:
    对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;
    基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;
    基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数,包括:
    基于所述第一视角重建语义信息,确定语义损失值;
    基于所述语义损失值,调整所述视差估计神经网络的网络参数。
  7. 根据权利要求5或6所述的方法,其特征在于,所述基于所述第一视角重建语 义信息,调整所述视差估计神经网络的网络参数,包括:
    基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者
    基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
  8. 根据权利要求4至7中任一项所述的方法,其特征在于,所述基于所述视差预测信息,训练所述视差估计神经网络,包括:
    基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;
    根据所述第一视角重建图像与所述第一视角图像之间的光度差,确定光度损失值;
    基于所述视差预测信息,确定平滑损失值;
    根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述第一视角图像和所述第二视角图像对应于标注视差信息,所述方法还包括:
    基于所述视差预测信息和所述标注视差信息,训练用于实现所述方法的视差估计神经网络。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述视差预测信息和所述标注视差信息,训练所述视差估计神经网络,包括:
    基于所述视差预测信息与所述标注视差信息,确定视差回归损失值;
    根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
  11. 一种图像视差估计装置,其特征在于,所述装置包括:
    图像获取模块,用于获取目标场景的第一视角图像和第二视角图像;
    视差估计神经网络,用于根据所述第一视角图像和所述第二视角图像得到视差预测信息,包括:
    初级特征提取模块,用于对所述第一视角图像进行特征提取处理,得到第一视角特征信息;
    语义特征提取模块,用于对所述第一视角图像进行语义分割处理,得到第一视角语义分割信息;
    视差回归模块,用于基于所述第一视角特征信息、所述第一视角语义分割信息以及所述第一视角图像和所述第二视角图像的关联信息,得到所述第一视角图像和所述第二视角图像的视差预测信息。
  12. 根据权利要求11所述的装置,其特征在于,
    所述初级特征提取模块,还用于对所述第二视角图像进行特征提取处理,得到第二视角特征信息;
    所述视差回归模块还包括:
    关联特征提取模块,用于基于所述第一视角特征信息与所述第二视角特征信息进行关联处理,得到所述关联信息。
  13. 根据权利要求11或12所述的装置,其特征在于,所述视差回归模块,还用于:
    对所述第一视角特征信息、所述第一视角语义分割信息以及所述关联信息进行混合处理,得到混合特征信息;
    基于所述混合特征信息,得到所述视差预测信息。
  14. 根据权利要求11-13中任一项所述的装置,其特征在于,所述装置还包括:
    第一网络训练模块,用于基于所述视差预测信息,训练所述视差估计神经网络。
  15. 根据权利要求14所述的装置,其特征在于,所述第一网络训练模块,还用于:
    对所述第二视角图像进行语义分割处理,得到第二视角语义分割信息;
    基于所述第二视角语义分割信息和所述视差预测信息,得到第一视角重建语义信息;
    基于所述第一视角重建语义信息,调整所述视差估计神经网络的网络参数。
  16. 根据权利要求15所述的装置,其特征在于,所述第一网络训练模块,还用于:
    基于所述第一视角重建语义信息,确定语义损失值;
    基于所述语义损失值,调整所述视差估计神经网络的网络参数。
  17. 根据权利要求15或16所述的装置,其特征在于,所述第一网络训练模块,还用于:
    基于所述第一视角重建语义信息和所述第一视角图像的第一语义标签,调整所述视差估计神经网络的网络参数;或者
    基于所述第一视角重建语义信息和所述第一视角语义分割信息,调整所述视差估计神经网络的网络参数。
  18. 根据权利要求14至17中任一项所述的装置,其特征在于,所述第一网络训练模块,还用于:
    基于所述视差预测信息和所述第二视角图像,得到第一视角重建图像;
    根据所述第一视角重建图像与所述第一视角图像二者之间的光度差,确定光度损失值;
    基于所述视差预测信息,确定平滑损失值;
    根据所述光度损失值和所述平滑损失值,调整所述视差估计神经网络的网络参数。
  19. 根据权利要求11至18中任一项所述的装置,其特征在于,所述装置还包括:
    第二网络训练模块,用于基于所述视差预测信息和标注视差信息,训练所述视差估计神经网络,所述第一视角图像和所述第二视角图像对应于所述标注视差信息。
  20. 根据权利要求19所述的装置,其特征在于,所述第二网络训练模块,还用于:
    基于所述视差预测信息与标注视差信息,确定视差回归损失值;
    根据所述视差回归损失值,调整所述视差估计神经网络的网络参数。
  21. 一种图像视差估计装置,其特征在于,所述装置包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现权利要求1至10任一项所述的图像视差估计方法。
  22. 一种存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行权利要求1至10任一项所述的图像视差估计方法。
PCT/CN2019/097307 2018-07-25 2019-07-23 图像视差估计 WO2020020160A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11202100556YA SG11202100556YA (en) 2018-07-25 2019-07-23 Image disparity estimation
JP2021502923A JP7108125B2 (ja) 2018-07-25 2019-07-23 画像視差推定
US17/152,897 US20210142095A1 (en) 2018-07-25 2021-01-20 Image disparity estimation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810824486.9A CN109191515B (zh) 2018-07-25 2018-07-25 一种图像视差估计方法及装置、存储介质
CN201810824486.9 2018-07-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/152,897 Continuation US20210142095A1 (en) 2018-07-25 2021-01-20 Image disparity estimation

Publications (1)

Publication Number Publication Date
WO2020020160A1 true WO2020020160A1 (zh) 2020-01-30

Family

ID=64936941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097307 WO2020020160A1 (zh) 2018-07-25 2019-07-23 图像视差估计

Country Status (5)

Country Link
US (1) US20210142095A1 (zh)
JP (1) JP7108125B2 (zh)
CN (1) CN109191515B (zh)
SG (1) SG11202100556YA (zh)
WO (1) WO2020020160A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768434A (zh) * 2020-06-29 2020-10-13 Oppo广东移动通信有限公司 视差图获取方法、装置、电子设备和存储介质

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191515B (zh) * 2018-07-25 2021-06-01 北京市商汤科技开发有限公司 一种图像视差估计方法及装置、存储介质
US11820289B2 (en) * 2018-07-31 2023-11-21 Sony Semiconductor Solutions Corporation Solid-state imaging device and electronic device
WO2020027233A1 (ja) 2018-07-31 2020-02-06 ソニーセミコンダクタソリューションズ株式会社 撮像装置及び車両制御システム
WO2020121678A1 (ja) * 2018-12-14 2020-06-18 富士フイルム株式会社 ミニバッチ学習装置とその作動プログラム、作動方法、および画像処理装置
CN110060230B (zh) * 2019-01-18 2021-11-26 商汤集团有限公司 三维场景分析方法、装置、介质及设备
CN110163246B (zh) * 2019-04-08 2021-03-30 杭州电子科技大学 基于卷积神经网络的单目光场图像无监督深度估计方法
CN110148179A (zh) * 2019-04-19 2019-08-20 北京地平线机器人技术研发有限公司 一种训练用于估计图像视差图的神经网络模型方法、装置及介质
CN110060264B (zh) * 2019-04-30 2021-03-23 北京市商汤科技开发有限公司 神经网络训练方法、视频帧处理方法、装置及系统
CN110378201A (zh) * 2019-06-05 2019-10-25 浙江零跑科技有限公司 一种基于侧环视鱼眼相机输入的多列车铰接角测量方法
CN110310317A (zh) * 2019-06-28 2019-10-08 西北工业大学 一种基于深度学习的单目视觉场景深度估计的方法
CN110728707B (zh) * 2019-10-18 2022-02-25 陕西师范大学 基于非对称深度卷积神经网络的多视角深度预测方法
US10984290B1 (en) * 2019-11-15 2021-04-20 Zoox, Inc. Multi-task learning for real-time semantic and/or depth aware instance segmentation and/or three-dimensional object bounding
CN111192238B (zh) * 2019-12-17 2022-09-20 南京理工大学 基于自监督深度网络的无损血管三维测量方法
CN112634341B (zh) * 2020-12-24 2021-09-07 湖北工业大学 多视觉任务协同的深度估计模型的构建方法
CN112767468B (zh) * 2021-02-05 2023-11-03 中国科学院深圳先进技术研究院 基于协同分割与数据增强的自监督三维重建方法及系统
JP2023041286A (ja) * 2021-09-13 2023-03-24 日立Astemo株式会社 画像処理装置、および、画像処理方法
CN113807251A (zh) * 2021-09-17 2021-12-17 哈尔滨理工大学 一种基于外观的视线估计方法
CN113808187A (zh) * 2021-09-18 2021-12-17 京东鲲鹏(江苏)科技有限公司 视差图生成方法、装置、电子设备和计算机可读介质
US20230140170A1 (en) * 2021-10-28 2023-05-04 Samsung Electronics Co., Ltd. System and method for depth and scene reconstruction for augmented reality or extended reality devices
CN114528976B (zh) * 2022-01-24 2023-01-03 北京智源人工智能研究院 一种等变网络训练方法、装置、电子设备及存储介质
CN114782911B (zh) * 2022-06-20 2022-09-16 小米汽车科技有限公司 图像处理的方法、装置、设备、介质、芯片及车辆
CN117789971B (zh) * 2024-02-13 2024-05-24 长春职业技术学院 基于文本情感分析的心理健康智能评测系统及方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080013836A1 (en) * 2006-06-19 2008-01-17 Akira Nakamura Information Processing Device, Information Processing Method, and Program
CN101344965A (zh) * 2008-09-04 2009-01-14 上海交通大学 基于双目摄像的跟踪系统
CN101996399A (zh) * 2009-08-18 2011-03-30 三星电子株式会社 在左图像与右图像之间估计视差的设备和方法
CN102663765A (zh) * 2012-04-28 2012-09-12 Tcl集团股份有限公司 一种基于语义分割的三维图像立体匹配方法和系统
CN102799646A (zh) * 2012-06-27 2012-11-28 浙江万里学院 一种面向多视点视频的语义对象分割方法
CN108229591A (zh) * 2018-03-15 2018-06-29 北京市商汤科技开发有限公司 神经网络自适应训练方法和装置、设备、程序和存储介质
CN109191515A (zh) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 一种图像视差估计方法及装置、存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055013B2 (en) * 2013-09-17 2018-08-21 Amazon Technologies, Inc. Dynamic object tracking for user interfaces
CN105631479B (zh) * 2015-12-30 2019-05-17 中国科学院自动化研究所 基于非平衡学习的深度卷积网络图像标注方法及装置
JP2018010359A (ja) 2016-07-11 2018-01-18 キヤノン株式会社 情報処理装置、情報処理方法、およびプログラム
CN108280451B (zh) * 2018-01-19 2020-12-29 北京市商汤科技开发有限公司 语义分割及网络训练方法和装置、设备、介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080013836A1 (en) * 2006-06-19 2008-01-17 Akira Nakamura Information Processing Device, Information Processing Method, and Program
CN101344965A (zh) * 2008-09-04 2009-01-14 上海交通大学 基于双目摄像的跟踪系统
CN101996399A (zh) * 2009-08-18 2011-03-30 三星电子株式会社 在左图像与右图像之间估计视差的设备和方法
CN102663765A (zh) * 2012-04-28 2012-09-12 Tcl集团股份有限公司 一种基于语义分割的三维图像立体匹配方法和系统
CN102799646A (zh) * 2012-06-27 2012-11-28 浙江万里学院 一种面向多视点视频的语义对象分割方法
CN108229591A (zh) * 2018-03-15 2018-06-29 北京市商汤科技开发有限公司 神经网络自适应训练方法和装置、设备、程序和存储介质
CN109191515A (zh) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 一种图像视差估计方法及装置、存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768434A (zh) * 2020-06-29 2020-10-13 Oppo广东移动通信有限公司 视差图获取方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN109191515B (zh) 2021-06-01
SG11202100556YA (en) 2021-03-30
US20210142095A1 (en) 2021-05-13
JP2021531582A (ja) 2021-11-18
JP7108125B2 (ja) 2022-07-27
CN109191515A (zh) 2019-01-11

Similar Documents

Publication Publication Date Title
WO2020020160A1 (zh) 图像视差估计
Sakaridis et al. Semantic foggy scene understanding with synthetic data
CN112634341B (zh) 多视觉任务协同的深度估计模型的构建方法
Huang et al. Indoor depth completion with boundary consistency and self-attention
Li et al. Simultaneous video defogging and stereo reconstruction
US11830211B2 (en) Disparity map acquisition method and apparatus, device, control system and storage medium
Madhuanand et al. Self-supervised monocular depth estimation from oblique UAV videos
CN111209770A (zh) 一种车道线识别方法及装置
AU2021103300A4 (en) Unsupervised Monocular Depth Estimation Method Based On Multi- Scale Unification
CN115861601B (zh) 一种多传感器融合感知方法及装置
CN112288788A (zh) 单目图像深度估计方法
Alcantarilla et al. Large-scale dense 3D reconstruction from stereo imagery
CN114372523A (zh) 一种基于证据深度学习的双目匹配不确定性估计方法
CN116452752A (zh) 联合单目稠密slam与残差网络的肠壁重建方法
Han et al. Self-supervised monocular Depth estimation with multi-scale structure similarity loss
Mathew et al. Monocular depth estimation with SPN loss
CN116524324A (zh) Bev模型训练方法、装置、系统、车辆及可读存储介质
CN116630528A (zh) 基于神经网络的静态场景重建方法
CN108921852B (zh) 基于视差与平面拟合的双分支室外非结构化地形分割网络
CN113724311B (zh) 深度图获取方法、设备及存储介质
CN113554102A (zh) 代价计算动态规划的航空影像dsm匹配法
Guo et al. Unsupervised Cross-Spectrum Depth Estimation by Visible-Light and Thermal Cameras
CN114332187B (zh) 单目目标测距方法及装置
Billy et al. DA-NET: Monocular Depth Estimation using Disparity maps Awareness NETwork
CN114241441B (zh) 基于特征点的动态障碍物检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840759

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021502923

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19840759

Country of ref document: EP

Kind code of ref document: A1