WO2020168716A1 - 双目匹配方法及装置、设备和存储介质 - Google Patents

双目匹配方法及装置、设备和存储介质 Download PDF

Info

Publication number
WO2020168716A1
WO2020168716A1 PCT/CN2019/108314 CN2019108314W WO2020168716A1 WO 2020168716 A1 WO2020168716 A1 WO 2020168716A1 CN 2019108314 W CN2019108314 W CN 2019108314W WO 2020168716 A1 WO2020168716 A1 WO 2020168716A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
features
image
disparity
correlation
Prior art date
Application number
PCT/CN2019/108314
Other languages
English (en)
French (fr)
Inventor
郭晓阳
杨凯
杨武魁
李鸿升
王晓刚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2020565808A priority Critical patent/JP7153091B2/ja
Priority to SG11202011008XA priority patent/SG11202011008XA/en
Priority to KR1020207031264A priority patent/KR20200136996A/ko
Publication of WO2020168716A1 publication Critical patent/WO2020168716A1/zh
Priority to US17/082,640 priority patent/US20210042954A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • the embodiments of the present application relate to the field of computer vision, and relate to but not limited to a binocular matching method and device, equipment, and storage medium.
  • Binocular matching is a technique for recovering depth from a pair of pictures taken at different angles.
  • each pair of pictures is obtained by a pair of cameras placed left and right or up and down.
  • the pictures taken from different cameras are corrected so that the corresponding pixels are on the same horizontal line when the camera is placed left and right, or the corresponding pixels are on the same vertical line when the camera is placed up and down.
  • the problem at this time becomes to estimate the distance of the corresponding matched pixel (also called disparity).
  • disparity the distance between the focal length of the camera and the center of the two cameras, the depth can be calculated.
  • binocular matching can be roughly divided into two methods, algorithms based on traditional matching costs, and algorithms based on deep learning.
  • the embodiments of the present application provide a binocular matching method and device, equipment and storage medium.
  • an embodiment of the present application provides a binocular matching method, the method includes: acquiring an image to be processed, wherein the image is a 2D (2 Dimensions) image including a left image and a right image; The extracted features of the left image and the features of the right image are used to construct 3D (3 Dimensions) matching cost features of the image, where the 3D matching cost features include grouped cross-correlation features, or Grouping the features after the cross-correlation feature and the connection feature are spliced; using the 3D matching cost feature to determine the depth of the image.
  • an embodiment of the present application provides a method for training a binocular matching network.
  • the method includes: using a binocular matching network to determine a 3D matching cost feature of an acquired sample image, wherein the sample image includes a depth mark The left image and the right image of the information, the size of the left image and the right image are the same; the 3D matching cost feature includes the grouped cross-correlation feature, or includes the grouped cross-correlation feature and the feature after the connection feature; according to the 3D Matching cost features, using the binocular matching network to determine the predicted disparity of the sample image; comparing the depth mark information with the predicted disparity to obtain a loss function for binocular matching; using the loss function to compare the binocular Match the network for training.
  • an embodiment of the present application provides a binocular matching device, the device includes: an acquisition unit configured to acquire an image to be processed, wherein the image is a 2D image including a left image and a right image; a construction unit , Configured to construct a 3D matching cost feature of the image using the extracted features of the left image and the feature of the right image, wherein the 3D matching cost feature includes grouped cross-correlation features, or includes grouped cross-correlation
  • the feature is the feature after the splicing of the feature and the connection feature; the determining unit is configured to use the 3D matching cost feature to determine the depth of the image.
  • an embodiment of the present application provides a training device for a binocular matching network
  • the device includes: a feature extraction unit configured to use the binocular matching network to determine the 3D matching cost feature of the acquired sample image, wherein the The sample image includes the left image and the right image with depth labeling information, and the left image and the right image have the same size; the 3D matching cost feature includes the grouped cross-correlation feature, or includes the grouped cross-correlation feature and the connection feature after stitching Feature; a disparity prediction unit configured to use the binocular matching network to determine the predicted disparity of the sample image according to the 3D matching cost feature; a comparison unit configured to compare the depth mark information with the predicted disparity to obtain A loss function for binocular matching; a training unit configured to train the binocular matching network by using the loss function.
  • an embodiment of the present application provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above-mentioned binocular when the program is executed.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps in the above-mentioned binocular matching method are realized, or the above-mentioned Steps in the training method of binocular matching network.
  • the embodiments of the present application provide a binocular matching method and device, equipment and storage medium.
  • the image is a 2D image including the left image and the right image; using the extracted features of the left image and the right image to construct the 3D matching cost feature of the image,
  • the 3D matching cost feature includes grouping cross-correlation features, or, including grouping cross-correlation features and features after the splicing of connection features; using the 3D matching cost features to determine the depth of the image, in this way, can improve binocular The accuracy of matching and reduce the computing requirements of the network.
  • 1A is a schematic diagram 1 of the implementation process of the binocular matching method according to an embodiment of the application;
  • 1B is a schematic diagram of image depth estimation to be processed according to an embodiment of the application.
  • 2A is the second schematic diagram of the implementation process of the binocular matching method according to the embodiment of this application.
  • 2B is the third schematic diagram of the implementation process of the binocular matching method according to the embodiment of this application.
  • 3A is a schematic diagram of the implementation process of the training method of the binocular matching network according to the embodiment of the application;
  • FIG. 3B is a schematic diagram of grouping mutual correlation features according to an embodiment of the application.
  • FIG. 3C is a schematic diagram of connection features of an embodiment of this application.
  • 4A is a fourth schematic diagram of the implementation process of the binocular matching method according to the embodiment of the application.
  • 4B is a schematic diagram of a binocular matching network model according to an embodiment of the application.
  • 4C is a comparison diagram of experimental results of the binocular matching method according to the embodiment of the application and the prior art binocular matching method;
  • FIG. 5 is a schematic diagram of the composition structure of a binocular matching device according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of the composition structure of a training device for a binocular matching network according to an embodiment of the application;
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the application.
  • module means, “component” or “unit” used to indicate elements is only for the description of the present application, and has no specific meaning in itself. Therefore, “module”, “part” or “unit” can be used in a mixed manner.
  • the embodiment of the present application uses the packet cross-correlation matching cost feature to improve the accuracy of binocular matching and reduce the calculation requirements of the network.
  • the technical solution of the present application will be further elaborated below in conjunction with the drawings and embodiments.
  • Fig. 1A is a schematic diagram 1 of the implementation process of the binocular matching method according to an embodiment of the application. As shown in Fig. 1A, the method includes:
  • Step S101 Obtain an image to be processed, where the image is a 2D image including a left image and a right image;
  • the computer device may be a terminal, and the image to be processed may include pictures of any scene.
  • the image to be processed is generally a binocular picture including the left picture and the right picture, and is a pair of pictures taken at different angles. Usually, each pair of pictures is obtained by a pair of cameras placed left and right or up and down.
  • the terminal can be various types of equipment with information processing capabilities during implementation.
  • the mobile terminal can include a mobile phone, a PDA (Personal Digital Assistant, personal digital assistant), a navigator, and a digital phone.
  • the server can be a computer device with information processing capabilities such as mobile terminals such as mobile phones, tablet computers, and notebook computers, and fixed terminals such as personal computers and server clusters.
  • Step S102 Construct a 3D matching cost feature of the image by using the extracted features of the left image and the feature of the right image, wherein the 3D matching cost feature includes grouped cross-correlation features, or includes grouped cross-correlation The feature after the splicing of the feature and the connection feature;
  • the 3D matching cost feature can include the grouping cross-correlation feature, it can also include the feature after the grouping cross-correlation feature and the connection feature are spliced, and no matter which two features are used to form the 3D matching cost feature, it can be very different. Precise parallax prediction results.
  • Step S103 using the 3D matching cost feature to determine the depth of the image
  • the 3D matching cost feature can be used to determine the possible disparity probability of each pixel in the left image, that is, the 3D matching cost feature can determine the feature of the pixel on the left image and the corresponding pixel in the right image The degree of matching of the features. That is, through the feature of a point on the left feature map, you need to find all possible positions of it on the right feature map, and then combine the features of each possible location on the right feature map with the features of the points in the left image to classify , Get the probability that each possible position on the right feature map is the corresponding point of the point on the right image.
  • determining the depth of the image refers to determining that the point on the left image corresponds to the point on the right image, and determining the horizontal pixel distance between them (when the camera is placed left and right).
  • determining the horizontal pixel distance between them when the camera is placed left and right.
  • the steps S102 to S103 can be implemented by a binocular matching network obtained by training, where the binocular matching network includes, but is not limited to: CNN (Convolutional Neural Networks, Convolutional Neural Network), DNN (Deep Neural Network, Deep Neural Network) and RNN (Recurrent Neural Network, Recurrent Neural Network), etc.
  • the binocular matching network may include one of the CNN, DNN, and RNN networks, or may include at least two of the CNN, DNN, and RNN networks.
  • Figure 1B is a schematic diagram of the depth estimation of the image to be processed according to an embodiment of the application.
  • picture 11 is the left image of the image to be processed
  • picture 12 is the right image of the image to be processed
  • picture 13 is picture 11
  • the disparity map determined according to the picture 12, that is, the disparity map corresponding to the picture 11, can obtain the depth map corresponding to the picture 11 according to the disparity map.
  • the image to be processed is acquired, where the image is a 2D image including the left image and the right image; the extracted features of the left image and the right image are used to construct the image
  • the 3D matching cost feature wherein the 3D matching cost feature includes the grouped cross-correlation feature, or, includes the feature after the grouped cross-correlation feature and the connection feature are spliced; the 3D matching cost feature is used to determine the depth of the image, In this way, the accuracy of binocular matching can be improved and the calculation requirements of the network can be reduced.
  • FIG. 2A is a schematic diagram of the implementation process of the binocular matching method according to the embodiment of this application. As shown in FIG. 2A, the method includes:
  • Step S201 Obtain an image to be processed, where the image is a 2D image including a left image and a right image;
  • Step S202 using the extracted feature of the left image and the feature of the right image to determine the grouping cross-correlation feature
  • step S202 using the extracted features of the left image and the features of the right image to determine the grouping cross-correlation features, can be implemented through the following steps:
  • Step S2021 grouping the extracted features of the left image and the features of the right image respectively, and determining the cross-correlation results of the features of the grouped left image and the features of the grouped right image under different parallaxes;
  • Step S2022 splicing the cross-correlation results to obtain grouped cross-correlation features.
  • step S2021 the extracted features of the left image and the features of the right image are respectively grouped, and the cross-correlation between the features of the grouped left image and the features of the grouped right image under different parallaxes is determined.
  • Step S2021a group the extracted features of the left image to form a first preset number of first feature groups
  • Step S2021b Group the extracted features of the right image to form a second preset number of second feature groups, where the first preset number is the same as the second preset number;
  • Step S2021c Determine the cross-correlation results of the g-th first feature group and the g-th second feature group under different parallaxes; where g is a natural number greater than or equal to 1 and less than or equal to the first preset number; the different parallaxes include : Zero disparity, maximum disparity, and any disparity between zero disparity and maximum disparity, where the maximum disparity is the maximum disparity in the use scene corresponding to the image to be processed.
  • the features of the left image can be divided into multiple feature groups, and the features of the right image can also be divided into multiple feature groups. It is determined that a certain feature group of the multiple feature groups of the left image and the feature group corresponding to the right image have different parallaxes.
  • the grouping cross-correlation refers to grouping the features of the left image (same as the right group) after obtaining the features of the left and right images, and then performing cross-correlation calculations for the corresponding groups (calculating their correlation).
  • the determining the cross-correlation results of the g-th first feature group and the g-th second feature group under different parallaxes includes: using a formula Determine the cross-correlation results of the g-th first feature group and the g-th second feature group under different disparity d; wherein, N c represents the number of channels of the feature of the left image or the feature of the right image, The N g represents a first preset number or a second preset number, the f l g represents a feature in the first feature group, and the fr g represents a feature in the second feature group, so The (x, y) represents the pixel coordinates of the pixel with the abscissa x and the ordinate y, and the (x+d, y) represents the pixel coordinates of the pixel with the abscissa x+d and the ordinate y.
  • Step S203 Determine the grouped cross-correlation feature as a 3D matching cost feature
  • the probability of each possible parallax is determined, and the probability is weighted and averaged to obtain the parallax of the image.
  • the D max represents the maximum disparity in the usage scene corresponding to the image to be processed. It is also possible to determine the parallax with the highest probability among the possible parallaxes as the parallax of the image.
  • Step S204 Use the 3D matching cost feature to determine the depth of the image.
  • the image to be processed is acquired, where the image is a 2D image including the left image and the right image; the extracted features of the left image and the features of the right image are used to determine the grouping correlation Features; determining the grouping cross-correlation feature as a 3D matching cost feature; using the 3D matching cost feature to determine the depth of the image, in this way, the accuracy of binocular matching can be improved and the computational requirements of the network can be reduced.
  • FIG. 2B is the third schematic diagram of the implementation process of the binocular matching method according to the embodiment of the present application. As shown in FIG. 2B, the method includes:
  • Step S211 Obtain an image to be processed, where the image is a 2D image including a left image and a right image;
  • Step S212 using the extracted features of the left image and the features of the right image to determine the grouping cross-correlation feature and the connection feature;
  • step S212 using the extracted features of the left image and the features of the right image to determine the implementation method of grouping cross-correlation features, is the same as the implementation method of step S202, which is not here. Do repeat.
  • Step S213 Determine the feature after the grouped cross-correlation feature and the connection feature are spliced as a 3D matching cost feature
  • connection feature is obtained by splicing the feature of the left image and the feature of the right image in feature dimensions.
  • the grouped cross-correlation feature and the connection feature can be spliced in the feature dimension to obtain the 3D matching cost feature.
  • the 3D matching cost feature is equivalent to obtaining a feature for each possible parallax.
  • the maximum disparity is D max , then for the possible disparity 0, 1, ..., D max -1, the corresponding 2D features are obtained, and then the 3D features are combined.
  • the stitching results of the possible disparity d are D max stitched images; wherein, the f 1 represents the feature of the left image, the fr represents the feature of the right image, and the (x, y) represents The abscissa is the pixel coordinates of the pixel with x and y, the (x+d,y) represents the pixel coordinates of the pixel with x+d and the ordinate is y, and the Concat represents the two features Splicing is performed; then, the D max splicing images are spliced to obtain connection features.
  • Step S214 Use the 3D matching cost feature to determine the depth of the image.
  • the image to be processed is acquired, where the image is a 2D image including the left image and the right image; the extracted features of the left image and the features of the right image are used to determine the grouping correlation Features and connection features; the feature after the grouping cross-correlation feature and the connection feature are spliced together to determine the 3D matching cost feature; the 3D matching cost feature is used to determine the depth of the image, so that the double Target matching accuracy and reduce network computing requirements.
  • an embodiment of the present application further provides a binocular matching method, which includes:
  • Step S221 Obtain an image to be processed, where the image is a 2D image including a left image and a right image;
  • Step S222 extracting the 2D features of the left image and the 2D features of the right image by using the fully convolutional neural network sharing parameters
  • the fully convolutional neural network is a component of the binocular matching network.
  • a fully convolutional neural network can be used to extract the 2D features of the image to be processed.
  • Step S223 Construct a 3D matching cost feature of the image using the extracted features of the left image and the feature of the right image, wherein the 3D matching cost feature includes grouped cross-correlation features, or includes grouped cross-correlation The feature after the splicing of the feature and the connection feature;
  • Step S224 Use a 3D neural network to determine the probability of different disparity corresponding to each pixel in the 3D matching cost feature
  • the step S224 may be implemented by a classified neural network, which is also a component of the binocular matching network, and is used to determine the probability of different disparity corresponding to each pixel.
  • Step S225 Determine a weighted average of the probabilities of different disparity corresponding to each pixel
  • the formula Determine the weighted average of the probability of different disparity d corresponding to each pixel; wherein, the disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the usage scenario corresponding to the image to be processed The maximum disparity, the p d represents the probability corresponding to the disparity d.
  • Step S226 Determine the weighted average value as the disparity of the pixel
  • Step S227 Determine the depth of the pixel point according to the disparity of the pixel point.
  • the method further includes: using the formula Determine the parallax of the acquired pixels Corresponding depth information D; wherein the F represents the lens focal length of the camera that took the sample, and the L represents the lens baseline distance of the camera that took the sample.
  • FIG. 3A is a schematic diagram of the implementation process of the method for training a binocular matching network in an embodiment of this application. As shown in FIG. 3A, the method is include:
  • Step S301 using a binocular matching network to determine the 3D matching cost characteristics of the acquired sample image, wherein the sample image includes a left image and a right image with depth label information, and the left image and the right image have the same size;
  • the 3D matching cost features include grouped cross-correlation features, or, include the spliced features of grouped cross-correlation features and connection features;
  • Step S302 using the binocular matching network to determine the predicted disparity of the sample image according to the 3D matching cost feature
  • Step S303 comparing the depth mark information with the predicted disparity to obtain a loss function for binocular matching
  • the parameters in the binocular matching network can be updated through the obtained loss function, and the binocular matching network after updating the parameters can predict a better effect.
  • Step S304 Use the loss function to train the binocular matching network.
  • an embodiment of the present application further provides a training method of a binocular matching network, and the method includes:
  • Step S311 using the fully convolutional neural network in the binocular matching network to determine the 2D stitching feature of the left image and the 2D stitching feature of the right image respectively;
  • the step S311 using the fully convolutional neural network in the binocular matching network to determine the 2D splicing feature of the left image and the 2D splicing feature of the right image, can be implemented by the following steps:
  • Step S3111 using the fully convolutional neural network in the binocular matching network to extract the 2D features of the left image and the 2D features of the right image respectively;
  • the fully convolutional neural network is a fully convolutional neural network sharing parameters; correspondingly, the full convolutional neural network in the binocular matching network is used to extract the 2D features of the left image and the right image respectively
  • the 2D features include: using a fully convolutional neural network sharing parameters in a binocular matching network to extract the 2D features of the left image and the 2D features of the right image respectively, wherein the size of the 2D feature is the A quarter of the size of the left or right image.
  • the size of the 2D feature is one-fourth of the size of the sample, that is, 300*100 pixels.
  • the size of the 2D feature may also be other sizes, which is not limited in the embodiment of the present application.
  • the fully convolutional neural network is a component of the binocular matching network.
  • a fully convolutional neural network can be used to extract the 2D features of the sample image.
  • Step S3112 determine the identifier of the convolutional layer used for 2D feature splicing
  • the determining the identifier of the convolutional layer used for 2D feature splicing includes: when the interval ratio of the i-th convolutional layer changes, determining the i-th convolutional layer as the one used for 2D feature splicing Convolutional layer, where i is a natural number greater than or equal to 1.
  • Step S3113 according to the identifier, splicing the 2D features of the different convolutional layers in the left image in the feature dimension to obtain the first 2D splicing feature;
  • Step S3114 According to the identifier, splicing the 2D features of the different convolutional layers in the right image in the feature dimension to obtain a second 2D splicing feature.
  • Step S312 using the 2D splicing feature of the left image and the 2D splicing feature of the right image to construct a 3D matching cost feature;
  • Step S314 comparing the depth mark information with the predicted disparity to obtain a loss function for binocular matching
  • Step S315 Use the loss function to train the binocular matching network.
  • an embodiment of the present application further provides a training method of a binocular matching network, and the method includes:
  • Step S321 Use the full convolutional neural network in the binocular matching network to determine the 2D stitching feature of the left image and the 2D stitching feature of the right image respectively;
  • Step S322 using the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the grouping cross-correlation feature;
  • the step S322 using the acquired first 2D splicing feature and the acquired second 2D splicing feature, to determine the grouping cross-correlation feature can be implemented through the following steps:
  • Step S3221 Divide the acquired first 2D stitching features into N g groups to obtain N g first feature groups;
  • Step S3222 Divide the acquired second 2D splicing features into N g groups to obtain N g second feature groups, where N g is a natural number greater than or equal to 1;
  • Step S3223 Determine the cross-correlation results of N g first feature groups and N g second feature groups for the disparity d, and obtain N g *D max cross-correlation maps; wherein, the disparity d is greater than or equal to 0 A natural number smaller than D max , where D max is the maximum disparity in the usage scene corresponding to the sample image;
  • the determining the cross-correlation results of the N g first feature groups and the N g second feature groups for the parallax d to obtain N g *D max cross-correlation maps includes: determining the g th wherein the first group and the second group of the second group g wherein d is set to the disparity cross-correlation result, d max obtained cross-correlation diagram in which, greater than or equal to 1 g or less of a natural number N g; determining N g-th The cross-correlation results of a feature group and N g second feature groups with respect to the disparity d obtain N g *D max cross-correlation maps.
  • the determining the cross-correlation results of the g-th first feature group and the g-th second feature group for the disparity d to obtain D max cross-correlation maps includes: using a formula Determine the cross-correlation results of the g-th first feature group and the g-th second feature group for the disparity d, and obtain D max cross-correlation maps; wherein, N c represents the first 2D splicing feature or the result The number of channels of the second 2D splicing feature, the f l g represents a feature in the first feature group, the fr g represents a feature in the second feature group, and the (x, y) represents The abscissa is the pixel coordinates of the pixel with x and y, and the (x+d, y) represents the pixel coordinates of the pixel with x+d and the ordinate is y.
  • Step S3224 splicing the N g *D max cross-correlation graphs in feature dimensions to obtain grouped cross-correlation features.
  • Step S323 Determine the grouped cross-correlation feature as a 3D matching cost feature
  • FIG. 3B is a schematic diagram of grouped cross-correlation features according to an embodiment of this application.
  • the first 2D splicing feature in the left image is grouped to obtain multiple feature groups 31 grouped in the left image.
  • the second 2D splicing features of the right image are grouped to obtain multiple feature groups 32 of the right image grouped.
  • the shape of the first 2D splicing feature or the second 2D splicing feature is [C, H, W], where C is the number of channels of the splicing feature, H is the height of the splicing feature, and W is the width of the splicing feature .
  • the number of channels of each feature group corresponding to the left or right image is C/N g
  • the N g is the number of groups.
  • Correlation graph 33 the shape of the single cross-correlation graph 33 is [N g , H, W], and the N g * D max cross-correlation graphs 33 are spliced in feature dimensions to obtain grouped cross-correlation features, Then use the grouped cross-correlation feature as a 3D matching cost feature, and the shape of the 3D matching cost feature is [N g , D max , H, W], that is, the shape of the grouped cross-correlation feature is [N g , D max ,H,W].
  • Step S324 Use the binocular matching network to determine the predicted disparity of the sample image according to the 3D matching cost feature
  • Step S325 comparing the depth mark information with the predicted disparity to obtain a loss function for binocular matching
  • Step S326 Use the loss function to train the binocular matching network.
  • an embodiment of the present application further provides a training method of a binocular matching network, and the method includes:
  • Step S331 Use the full convolutional neural network in the binocular matching network to determine the 2D stitching feature of the left image and the 2D stitching feature of the right image respectively;
  • Step S332 using the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the grouping cross-correlation feature;
  • the step S332 using the obtained first 2D splicing feature and the obtained second 2D splicing feature, to determine the implementation method of the grouping cross-correlation feature is the same as the implementation method of the step S322. Do repeat.
  • Step S333 Use the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the connection feature;
  • the step S333 using the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the connection feature, can be implemented through the following steps:
  • Step S3331 Determine the splicing result of the acquired first 2D splicing feature and the second 2D splicing feature for the parallax d, and obtain D max spliced images; wherein the parallax d is a natural number greater than or equal to 0 and less than D max , so The D max is the maximum disparity in the usage scene corresponding to the sample image;
  • step S3332 the D max mosaic images are spliced to obtain connection features.
  • the stitching feature is the stitching result of the disparity d, and D max stitching images are obtained; wherein, the f 1 represents the feature in the first 2D stitching feature, and the fr represents the second 2D stitching feature
  • the (x, y) represents the pixel coordinates of a pixel with an abscissa of x and an ordinate of y
  • the (x+d, y) represents a pixel with an abscissa of x+d and the ordinate of y Coordinates
  • the Concat means concatenating two features.
  • Fig. 3C is a schematic diagram of the connection features of the embodiment of the application.
  • the first 2D splicing feature 35 corresponding to the left picture and the second 2D splicing feature 36 corresponding to the right picture are at different parallaxes of 0, 1, ...
  • the connection is performed at D max -1 to obtain D max stitched images 37, and the D max stitched images 37 are stitched to obtain a connection feature.
  • the shape of the 2D mosaic feature is [C, H, W]
  • the shape of the single mosaic image 37 is [2C, H, W]
  • the shape of the connection feature is [2C, D max , H, W]
  • the C is the number of channels of the 2D stitching feature
  • the D max is the maximum disparity in the use scene corresponding to the left or right image
  • the H is the height of the left or right image
  • the W is the left image The width of the picture or the right picture.
  • Step S334 splicing the grouped cross-correlation feature and the connection feature in feature dimensions to obtain a 3D matching cost feature
  • the shape of the grouped cross-correlation feature is [N g , D max , H, W] and the shape of the connection feature is [2C, D max , H, W], then the 3D matching cost feature
  • the shape of is [N g +2C,D max ,H,W].
  • Step S335 Perform matching cost aggregation on the 3D matching cost feature using the binocular matching network
  • the use of the binocular matching network to perform matching cost aggregation on the 3D matching cost feature includes: using a 3D neural network in the binocular matching network to determine each pixel in the 3D matching cost feature The corresponding probability of different disparity d; wherein, the disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the maximum disparity in the usage scene corresponding to the sample image.
  • the step S335 can be implemented by a classified neural network, which is also a component of the binocular matching network, used to determine the probability of different disparity d corresponding to each pixel .
  • Step S336 Perform disparity regression on the aggregated result to obtain the predicted disparity of the sample image
  • the performing disparity regression on the aggregated result to obtain the predicted disparity of the sample image includes: determining the weighted average of the probability of different disparity d corresponding to each pixel as the predicted disparity of the pixel , To obtain the predicted disparity of the sample image; wherein, the disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the maximum disparity in the usage scene corresponding to the sample image.
  • the formula Determine the weighted average of the probability of different disparity d corresponding to each pixel point; wherein, the disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the maximum disparity in the use scene corresponding to the sample image ,
  • the p d represents the probability corresponding to the disparity d.
  • Step S337 comparing the depth mark information with the predicted disparity to obtain a loss function for binocular matching
  • Step S338 Use the loss function to train the binocular matching network.
  • FIG. 4A is a schematic diagram 4 of the implementation process of the binocular matching method according to the embodiment of this application. As shown in FIG. 4A, the method includes:
  • Step S401 Extract 2D stitching features
  • Step S402 using the 2D splicing feature to construct a 3D matching cost feature
  • Step S403 Use the aggregation network to process the 3D matching cost feature
  • Step S404 Perform parallax regression on the processed result.
  • Fig. 4B is a schematic diagram of a binocular matching network model according to an embodiment of the application.
  • the binocular matching network model can be roughly divided into four parts, a 2D splicing feature extraction module 41 and a 3D matching cost feature construction module 42.
  • the picture 46 and the picture 47 are respectively the left picture and the right picture in the sample data.
  • the 2D splicing feature extraction module 41 is configured to use a fully convolutional neural network with shared parameters (including weight sharing) for the left and right images to extract 2D features that are 1/4 the size of the original image, and feature maps of different layers are connected into A large feature map.
  • the 3D matching cost feature construction module 42 is configured to obtain the connection feature and the grouping cross-correlation feature, and use the connection feature and the grouping cross-correlation feature to construct a feature map for all possible disparity d to form a 3D matching cost feature; wherein,
  • the all possible disparity d includes all disparity from zero disparity to the maximum disparity, and the maximum disparity refers to the maximum disparity in the use scene corresponding to the left image or the right image.
  • the aggregation network module 43 is configured to use a 3D neural network to estimate the probability of all possible disparity d.
  • the disparity regression module 44 is configured to obtain the final disparity map 45 using the probabilities of all disparity.
  • a 3D matching cost feature based on a grouping inter-correlation operation is proposed to replace the old 3D matching cost feature.
  • N g ,D max ,H,W are the number of feature groups, respectively, for the maximum disparity of the feature map, feature height and feature width.
  • the grouping cross-correlation feature and the connection feature are combined as a 3D matching cost feature to achieve better results.
  • This application proposes a new binocular matching network, which is based on the packet cross-correlation matching cost feature and an improved 3D stacked hourglass network, which can improve matching accuracy while limiting the calculation cost of the 3D aggregation network.
  • the grouping cross-correlation matching cost feature is directly constructed using high-dimensional features, which can obtain better characterization features.
  • the network structure based on grouping cross-correlation proposed in this application consists of four parts, 2D feature extraction, construction of 3D matching cost features, 3D aggregation and disparity regression.
  • the first step is 2D feature extraction, in which a network similar to a pyramid stereo matching network is used, and then the final features of the extracted second, third, and fourth convolutional layers are connected to form a 320-channel 2D feature map.
  • connection feature is the same as that in the pyramid stereo matching network, except that there are fewer channels than the pyramid stereo matching network.
  • the extracted 2D features are first compressed into 12 channels by convolution, and then the parallax connection of the left and right features is performed for each possible parallax. After the connection feature and the group-based cross-correlation feature are spliced together, they are used as the input of the 3D aggregation network.
  • the 3D aggregation network is used to aggregate features obtained from neighboring disparity and pixel prediction matching costs. It is formed by a pre-hourglass module and three stacked 3D hourglass networks to standardize convolution features.
  • the pre-hourglass module and three stacked 3D hourglass networks are connected to the output module.
  • two 3D convolutions are used to output the 3D convolution features of one channel, and then the 3D convolution features are up-sampled and converted into probabilities along the disparity dimension through the softmax function.
  • the 2D feature in the left image and the 2D feature in the right image are represented by f 1 and fr , and the channel is represented by N c .
  • the size of the 2D feature is 1/4 of the original image.
  • the left and right features are connected at different difference layers to form different matching costs, but the matching metric needs to be learned using a 3D aggregation network, and the features need to be compressed to a small channel in order to save memory before the connection.
  • the representation of this compression feature may lose information.
  • the embodiment of the present application proposes to establish a matching cost feature based on grouping mutual correlation and using a traditional matching metric.
  • the basic idea based on grouping cross-correlation is to divide 2D features into multiple groups and calculate the cross-correlation between the corresponding groups on the left and right.
  • the formula used in the examples of this application Calculate the grouping cross-correlation, where the N c represents the number of channels of 2D features, the N g represents the number of groups, the f l g represents the features in the feature group corresponding to the grouped left image, and the f r g represents the feature in the feature group corresponding to the grouped right image, the (x, y) represents the pixel coordinates of the pixel with the abscissa x and the ordinate y, and the (x+d, y) represents The abscissa is the pixel coordinate of the pixel with x+d and the ordinate is y, where ⁇ represents the product of two features.
  • correlation refers to calculating the correlation of all feature groups g and all parallaxes d.
  • This application improves the aggregation network in the pyramid stereo matching network.
  • add an additional auxiliary output module In this way, the additional auxiliary loss enables the network to learn better aggregated features of the lower layer, which is conducive to the final prediction.
  • the remaining connection modules between the different outputs are removed, thus saving computational costs.
  • the loss function is used To train a network based on packet cross-correlation, where j indicates that there are three temporary results and one final result in the packet-based cross-correlation network used in the embodiment, and ⁇ j indicates different weights attached to different results, Represents the disparity obtained using the network based on packet cross-correlation, the d * represents the true disparity, and the It is an existing loss function calculation method.
  • the prediction error of the i-th pixel can use the formula Determined, wherein, D i used in this application represents an embodiment of the method for determining matching binocular image to be processed on the left or right parallax prediction the i-th pixel, Represents the true disparity of the i-th pixel.
  • Fig. 4C is a comparison diagram of the experimental results of the binocular matching method according to the embodiment of the application and the prior art binocular matching method.
  • the prior art includes PSMNet (namely pyramid stereo matching network) and Cat64 (namely using connection features) Methods).
  • PSMNet namely pyramid stereo matching network
  • Cat64 namely using connection features
  • the two existing technologies and the second method in the embodiment of the present application both use the connection feature, but only the embodiment of the present application uses the grouping correlation feature. Furthermore, only the method in the embodiment of the present application involves feature grouping, that is, the obtained 2D splicing features are divided into 40 groups, each with 8 channels. Finally, using the image to be processed to test the methods in the prior art and the embodiments of this application, the percentages of abnormal values of stereo disparity can be obtained, which are the percentages of abnormal values greater than 1 pixel, and those greater than 2 pixels. It can be seen from the figure that the experimental results obtained by the two methods proposed in this application are better than those in the prior art, that is, the method of the embodiment of this application is used to perform processing on the image to be processed. After processing, the percentages of the stereo disparity abnormal values obtained are all smaller than the percentages of the stereo disparity abnormal values obtained after processing the image to be processed in the prior art.
  • the embodiments of the present application provide a binocular matching device, which includes each unit included and each module included in each unit, which can be implemented by a processor in a computer device; of course, it can also Realized through specific logic circuits; in the implementation process, the processor can be a CPU (Central Processing Unit, central processing unit), MPU (Microprocessor Unit, microprocessor), DSP (Digital Signal Processing, digital signal processor) or FPGA (Field Programmable Gate Array, field programmable gate array) etc.
  • CPU Central Processing Unit, central processing unit
  • MPU Microprocessor Unit, microprocessor
  • DSP Digital Signal Processing, digital signal processor
  • FPGA Field Programmable Gate Array, field programmable gate array
  • FIG. 5 is a schematic diagram of the composition structure of a binocular matching device according to an embodiment of the application. As shown in FIG. 5, the device 500 includes:
  • the obtaining unit 501 is configured to obtain an image to be processed, where the image is a 2D image including a left image and a right image;
  • the constructing unit 502 is configured to construct a 3D matching cost feature of the image by using the extracted features of the left image and the feature of the right image, wherein the 3D matching cost feature includes grouped cross-correlation features, or includes Features after grouping cross-correlation features and connection features;
  • the determining unit 503 is configured to use the 3D matching cost feature to determine the depth of the image.
  • the construction unit 502 includes:
  • the first construction subunit is configured to use the extracted features of the left image and the features of the right image to determine grouping cross-correlation features;
  • the second construction subunit is configured to determine the grouping cross-correlation feature as a 3D matching cost feature.
  • the construction unit 502 includes:
  • the first construction subunit is configured to use the extracted features of the left image and the features of the right image to determine grouping cross-correlation features and connection features;
  • the second construction subunit is configured to determine the feature after the grouped cross-correlation feature and the connection feature are spliced as a 3D matching cost feature;
  • connection feature is obtained by splicing the feature of the left image and the feature of the right image in feature dimensions.
  • the first building subunit includes:
  • the first building module is configured to group the extracted features of the left image and the features of the right image respectively, and determine the cross-correlation between the features of the grouped left image and the features of the grouped right image under different parallaxes result;
  • the second construction module is configured to splice the cross-correlation results to obtain grouped cross-correlation features.
  • the first building module includes:
  • the first construction sub-module is configured to group the extracted features of the left image to form a first preset number of first feature groups
  • a second construction sub-module configured to group the extracted features of the right image to form a second feature group of a second preset number, where the first preset number is the same as the second preset number;
  • the third construction submodule is configured to determine the cross-correlation results of the g-th group of first feature groups and the g-th group of second feature groups under different parallaxes; where g is a natural number greater than or equal to 1 and less than or equal to the first preset number;
  • the different disparity includes: zero disparity, maximum disparity, and any disparity between zero disparity and maximum disparity, and the maximum disparity is the maximum disparity in the use scene corresponding to the image to be processed.
  • the device further includes:
  • the extraction unit is configured to separately extract the 2D features of the left image and the 2D features of the right image by using a fully convolutional neural network sharing parameters.
  • the determining unit 503 includes:
  • the first determining subunit is configured to use a 3D neural network to determine the probability of different disparity corresponding to each pixel in the 3D matching cost feature;
  • the second determining subunit is configured to determine a weighted average of the probabilities of different disparity corresponding to each pixel
  • a third determining subunit configured to determine the weighted average value as the disparity of the pixel
  • the fourth determining subunit is configured to determine the depth of the pixel point according to the disparity of the pixel point.
  • an embodiment of the present application provides a training device for a binocular matching network.
  • the device includes each unit included and each module included in each unit, which can be implemented by a processor in a computer device; Of course, it can also be implemented by a specific logic circuit; in the implementation process, the processor can be a CPU, MPU, DSP, or FPGA.
  • FIG. 6 is a schematic diagram of the composition structure of a training device for a binocular matching network according to an embodiment of the application. As shown in FIG. 6, the device 600 includes:
  • the feature extraction unit 601 is configured to use a binocular matching network to determine the 3D matching cost feature of the acquired sample image, wherein the sample image includes a left image and a right image with depth label information, and the size of the left image and the right image Same;
  • the 3D matching cost features include grouped cross-correlation features, or, include grouped cross-correlation features and features after joining features;
  • the disparity prediction unit 602 is configured to use the binocular matching network to determine the predicted disparity of the sample image according to the 3D matching cost feature;
  • the comparing unit 603 is configured to compare the depth mark information with the predicted disparity to obtain a loss function of binocular matching
  • the training unit 604 is configured to train the binocular matching network by using the loss function.
  • the feature extraction unit 601 includes:
  • the first feature extraction subunit is configured to use a fully convolutional neural network in a binocular matching network to determine the 2D splicing feature of the left image and the 2D splicing feature of the right image respectively;
  • the second feature extraction subunit is configured to construct a 3D matching cost feature using the 2D stitching feature of the left image and the 2D stitching feature of the right image.
  • the first feature extraction subunit includes:
  • the first feature extraction module is configured to extract the 2D features of the left image and the 2D features of the right image by using the fully convolutional neural network in the binocular matching network;
  • the second feature extraction module is configured to determine the identifier of the convolutional layer used for 2D feature splicing
  • the third feature extraction module is configured to stitch the 2D features of the different convolutional layers in the feature dimension in the feature dimension according to the identifier to obtain the first 2D stitching feature;
  • the fourth feature extraction module is configured to stitch 2D features of different convolutional layers in the feature dimension in the right image according to the identifier to obtain a second 2D stitching feature.
  • the second feature extraction module is configured to determine the i-th convolutional layer as a convolutional layer for 2D feature splicing when the interval ratio of the i-th convolutional layer changes, Among them, i is a natural number greater than or equal to 1.
  • the fully convolutional neural network is a fully convolutional neural network sharing parameters; correspondingly, the first feature extraction module is configured to use a fully convolutional neural network sharing parameters in a binocular matching network
  • the network extracts the 2D feature of the left image and the 2D feature of the right image respectively, wherein the size of the 2D feature is a quarter of the size of the left image or the right image.
  • the second feature extraction subunit includes:
  • the first feature determination module is configured to use the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the grouping cross-correlation feature;
  • the second feature determining module is configured to determine the grouping cross-correlation feature as a 3D matching cost feature.
  • the second feature extraction subunit includes:
  • the first feature determination module is configured to use the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the grouping cross-correlation feature;
  • the first feature determination module is further configured to use the acquired first 2D splicing feature and the acquired second 2D splicing feature to determine the connection feature;
  • the second feature determination module is configured to splice the grouped cross-correlation feature and the connection feature in feature dimensions to obtain a 3D matching cost feature.
  • the first feature determination module includes:
  • the first feature determination submodule is configured to divide the acquired first 2D splicing features into N g groups to obtain N g first feature groups;
  • the second feature determination submodule is configured to divide the acquired second 2D splicing features into N g groups to obtain N g second feature groups, where N g is a natural number greater than or equal to 1;
  • the third feature determination submodule is configured to determine the cross-correlation results of the N g first feature groups and the N g second feature groups for the disparity d, and obtain N g *D max cross-correlation maps; wherein, the The disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the maximum disparity in the usage scene corresponding to the sample image;
  • the fourth feature determining submodule is configured to splice the N g *D max cross-correlation graphs in feature dimensions to obtain grouped cross-correlation features.
  • the third feature determining submodule is configured to determine the cross-correlation results of the g-th first feature group and the g-th second feature group for the disparity d, to obtain D max cross-correlation maps , Where g is a natural number greater than or equal to 1 and less than or equal to N g ; determine the cross-correlation results of N g first feature groups and N g second feature groups for the disparity d, and obtain N g *D max cross-correlations Figure.
  • the first feature determination module further includes:
  • the fifth feature determination sub-module is configured to determine the splicing results of the acquired first 2D splicing feature and the second 2D splicing feature for the parallax d, and obtain D max spliced images; wherein the parallax d is greater than or equal to 0 and less than It is a natural number D max, D max is the maximum disparity in the sample image corresponding to the usage scenario;
  • the sixth feature determining sub-module is configured to splice the D max splicing images to obtain connection features.
  • the disparity prediction unit 602 includes:
  • the first disparity prediction subunit is configured to use the binocular matching network to perform matching cost aggregation on the 3D matching cost feature;
  • the second disparity prediction subunit is configured to perform disparity regression on the aggregated result to obtain the predicted disparity of the sample image.
  • the first disparity prediction subunit is configured to use a 3D neural network in the binocular matching network to determine the probability of a different disparity d corresponding to each pixel in the 3D matching cost feature; wherein The disparity d is a natural number greater than or equal to 0 and less than D max , and the D max is the maximum disparity in the usage scene corresponding to the sample image.
  • the second disparity prediction subunit is configured to determine a weighted average of the probabilities of different disparity d corresponding to each pixel as the predicted disparity of the pixel to obtain a sample image The predicted disparity;
  • the parallax d is a natural number greater than or equal to 0 and less than D max
  • the D max is the maximum parallax in the usage scene corresponding to the sample image.
  • the above-mentioned binocular matching method or binocular matching network training method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer Readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, etc.) executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM (Read Only Memory), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific hardware and software combination.
  • an embodiment of the present application provides a computer device, including a memory and a processor, the memory stores a computer program that can be run on the processor, and when the processor executes the program, the computer device provided in the foregoing embodiment is implemented.
  • the steps in the binocular matching method, or the steps in the training method of the binocular matching network provided in the foregoing embodiment are implemented.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program when executed by a processor, implements the steps in the binocular matching method provided in the above-mentioned embodiment, or realizes the above-mentioned The steps in the training method of the binocular matching network provided in the embodiment.
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the application.
  • the hardware entity of the computer device 700 includes: a processor 701, a communication interface 702, and a memory 703.
  • the processor 701 generally controls the overall operation of the computer device 700.
  • the communication interface 702 can enable the computer device to communicate with other terminals or servers through a network.
  • the memory 703 is configured to store instructions and applications executable by the processor 701, and can also cache data to be processed or processed by the processor 701 and each module in the computer device 700 (for example, image data, audio data, voice communication data, and Video communication data) can be realized through FLASH (flash memory) or RAM (Random Access Memory).
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units; they may be located in one place or distributed on multiple network units; Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the functional units in the embodiments of the present application can all be integrated into one processing unit, or each unit can be individually used as a unit, or two or more units can be integrated into one unit;
  • the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium.
  • the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: removable storage devices, ROM (Read Only Memory, read-only memory), magnetic disks or optical disks and other media that can store program codes.
  • the above-mentioned integrated unit of this application is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, etc.) executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: removable storage devices, ROMs, magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种双目匹配方法、双目匹配装置、计算机设备和存储介质,所述方法包括:获取待处理的图像,其中,所述图像为包括左图和右图的2D图像(S101);利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征(S102);利用所述3D匹配代价特征,确定所述图像的深度(S103)。

Description

双目匹配方法及装置、设备和存储介质
相关申请的交叉引用
本申请要求在2019年02月19提交中国专利局、申请号为201910127860.4、申请名称为“一种双目匹配方法及装置、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机视觉领域,涉及但不限于一种双目匹配方法及装置、设备和存储介质。
背景技术
双目匹配是一种从一对不同角度拍摄的图片中恢复深度的技术,通常每对图片通过一对左右或者上下放置的相机获得。为了简化问题,会对从不同摄像机拍摄的图片进行校正,使得当左右放置相机时对应像素位于同一水平线,或者上下放置相机时对应像素位于同一竖直线。此时问题变成了估计对应匹配像素的距离(又称为视差)。通过视差,相机的焦距与两个相机中心的距离,即可计算深度。目前双目匹配可以大致分为两种方法,基于传统匹配代价的算法,以及基于深度学习的算法。
发明内容
本申请实施例提供一种双目匹配方法及装置、设备和存储介质。
本申请实施例的技术方案是这样实现的:
第一方面,本申请实施例提供一种双目匹配方法,所述方法包括:获取待处理的图像,其中,所述图像为包括左图和右图的2D(2 Dimensions,二维)图像;利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D(3 Dimensions,三维)匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;利用所述3D匹配代价特征,确定所述图像的深度。
第二方面,本申请实施例提供一种双目匹配网络的训练方法,所述方法包括:利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;根据所述3D匹配代价特征,利用所述双目匹配网络确定样本图像的预测视差;将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;利用所述损失函数对所述双目匹配网络进行训练。
第三方面,本申请实施例提供一种双目匹配装置,所述装置包括:获取单元,配置为获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;构建单元,配置为利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;确定单元,配置为利用所述3D匹配代价特征,确定所述图像的深度。
第四方面,本申请实施例提供一种双目匹配网络的训练装置,所述装置包括:特征提取单元,配置为利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征; 视差预测单元,配置为利用所述双目匹配网络根据所述3D匹配代价特征,确定样本图像的预测视差;比较单元,配置为将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;训练单元,配置为利用所述损失函数对所述双目匹配网络进行训练。
第五方面,本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述双目匹配方法中的步骤,或,实现如上所述双目匹配网络的训练方法中的步骤。
第六方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上所述双目匹配方法中的步骤,或,实现如上所述双目匹配网络的训练方法中的步骤。
本申请实施例提供一种双目匹配方法及装置、设备和存储介质。通过获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;利用所述3D匹配代价特征,确定所述图像的深度,如此,能够提高双目匹配的准确度并降低网络的计算需求。
附图说明
图1A为本申请实施例双目匹配方法的实现流程示意图一;
图1B为本申请实施例待处理的图像深度估计示意图;
图2A为本申请实施例双目匹配方法的实现流程示意图二;
图2B为本申请实施例双目匹配方法的实现流程示意图三;
图3A为本申请实施例双目匹配网络的训练方法的实现流程示意图;
图3B为本申请实施例分组互相关特征示意图;
图3C为本申请实施例连接特征示意图;
图4A为本申请实施例双目匹配方法的实现流程示意图四;
图4B为本申请实施例双目匹配网络模型示意图;
图4C为本申请实施例双目匹配方法和现有技术双目匹配方法的实验结果对比图;
图5为本申请实施例双目匹配装置的组成结构示意图;
图6为本申请实施例双目匹配网络的训练装置的组成结构示意图;
图7为本申请实施例计算机设备的一种硬件实体示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对申请的具体技术方案做进一步详细描述。以下实施例仅用于说明本申请,不用于限制本申请的范围。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请的说明,其本身没有特定的意义。因此,“模块”、“部件”或“单元”可以混合地使用。
本申请实施例使用分组互相关匹配代价特征提高双目匹配的准确度并降低网络的计算需求。下面结合附图和实施例对本申请的技术方案进一步详细阐述。
本申请实施例提供一种双目匹配方法,该方法应用于计算机设备,该方法所实现的功能可以通过服务器中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该服务器至少包括处理器和存储介质。图1A为本申请实施例双目匹配方法的实现流程示意图一,如图1A所示,所述方法包括:
步骤S101、获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
这里,所述计算机设备可以是终端,所述待处理图像,可以包含任意场景的图片。并且,所述待处理的图像,一般是包括左图和右图的双目图片,是一对不同角度拍摄的图片,通常每对图片通过一对左右或者上下放置的相机获得。
一般来说,所述终端在实施的过程中可以为各种类型的具有信息处理能力的设备,例如所述移动终端可以包括手机、PDA(Personal Digital Assistant,个人数字助理)、导航仪、数字电话、视频电话、智能手表、智能手环、可穿戴设备、平板电脑等。服务器在实现的过程中可以是移动终端如手机、平板电脑、笔记本电脑,固定终端如个人计算机和服务器集群等具有信息处理能力的计算机设备。
步骤S102、利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
这里,当所述3D匹配代价特征可以包括分组互相关特征,也可以包括分组互相关特征与连接特征拼接后的特征,并且,无论使用上述哪两种特征构成3D匹配代价特征,都能得到非常精准的视差预测结果。
步骤S103、利用所述3D匹配代价特征,确定所述图像的深度;
这里,可以通过所述3D匹配代价特征,确定每个左图中像素可能的视差的概率,也就是说,通过所述3D匹配代价特征,确定左图上像素点的特征和右图对应像素点的特征的匹配程度。即通过左特征图上一个点的特征去需找它在右特征图上所有可能的位置,然后分别将右特征图上每个可能的位置的特征和左图所述点的特征结合,进行分类,得到右特征图上每个可能的位置是所述点在右图上的对应点的概率。
这里,确定图像的深度,指的是确定左图的点在右图对应的点,并且确定他们之间的横向像素距离(当相机为左右放置时)。当然,也可以是确定右图的点在左图的对应点,本申请并不做限制。
本申请实例中,所述步骤S102至步骤S103,可以通过训练得到的双目匹配网络实现,其中,所述双目匹配网络包括但不限于:CNN(Convolutional Neural Networks,卷积神经网络)、DNN(Deep Neural Network,深度神经网络)和RNN(Recurrent Neural Network,循环神经网络)等。当然,所述双目匹配网络可以包含所述CNN、DNN和RNN等网络中的一种网络,也可以包含所述CNN、DNN和RNN等网络中的至少两种网络。
图1B为本申请实施例待处理的图像深度估计示意图,如图1B所示,图片11为待处理的图像中的左图,图片12为待处理的图像中的右图,图片13为图片11根据所述图片12确定出的视差图,即图片11对应的视差图,根据所述视差图,即可获取图片11对应的深度图。
本申请实施例中,通过获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;利用所述3D匹配代价特征,确定所述图像的深度,如此,能够提高双目匹配的准确度并降低网络的计算需求。
基于上述的方法实施例,本申请实施例再提供一种双目匹配方法,图2A为本申请实施例双目匹配方法的实现流程示意图二,如图2A所示,所述方法包括:
步骤S201、获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
步骤S202、利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征;
本申请实施例中,所述步骤S202、利用提取的所述左图的特征和所述右图的特征, 确定分组互相关特征,可以通过以下步骤实现:
步骤S2021、将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果;
步骤S2022、将所述互相关结果进行拼接,得到分组互相关特征。
其中,所述步骤S2021、将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果,可以通过以下步骤实现:
步骤S2021a、将提取的所述左图的特征进行分组,形成第一预设数量的第一特征组;
步骤S2021b、将提取的所述右图的特征进行分组,形成第二预设数量的第二特征组,所述第一预设数量与所述第二预设数量相同;
步骤S2021c、确定第g组第一特征组与第g组第二特征组在不同视差下的互相关结果;其中,g为大于等于1小于等于第一预设数量的自然数;所述不同视差包括:零视差、最大视差和零视差与最大视差之间的任一视差,所述最大视差为待处理的图像对应的使用场景下的最大视差。
这里,可以将左图的特征分成多个特征组,将右图的特征也分成多个特征组,确定左图的多个特征组中的某一特征组和右图对应的特征组在不同视差下的互相关结果。所述分组互相关,指的是分别得到左右图的特征后,对左图的特征进行分组(同右组),然后对应的组进行互相关计算(计算他们的相关性)。
在一些实施例中,所述确定第g组第一特征组与第g组第二特征组在不同视差下的互相关结果,包括:利用公式
Figure PCTCN2019108314-appb-000001
确定第g组第一特征组与第g组第二特征组在不同视差d下的互相关结果;其中,所述N c表示所述左图的特征或所述右图的特征的通道数,所述N g表示第一预设数量或第二预设数量,所述f l g表示所述第一特征组中的特征,所述f r g表示所述第二特征组中的特征,所述(x,y)表示横坐标为x纵坐标为y的像素点的像素坐标,所述(x+d,y)表示横坐标为x+d纵坐标为y的像素点的像素坐标。
步骤S203、将所述分组互相关特征,确定为3D匹配代价特征;
这里,对于某个像素点,通过提取出所述像素点在0至D max视差下的3D匹配特征,确定每个可能视差的概率,将所述概率进行加权平均,就可以得到图像的视差,其中,所述D max表示待处理的图像对应的使用场景下的最大视差。也可以将可能视差中概率最大的视差,确定为图像的视差。
步骤S204、利用所述3D匹配代价特征,确定所述图像的深度。
本申请实施例中,通过获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征;将所述分组互相关特征,确定为3D匹配代价特征;利用所述3D匹配代价特征,确定所述图像的深度,如此,能够提高双目匹配的准确度并降低网络的计算需求。
基于上述的方法实施例,本申请实施例再提供一种双目匹配方法,图2B为本申请实施例双目匹配方法的实现流程示意图三,如图2B所示,所述方法包括:
步骤S211、获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
步骤S212、利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征 和连接特征;
本申请实施例中,所述步骤S212、利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征的实现方法,与所述步骤S202的实现方法相同,在此不做赘述。
步骤S213、将所述分组互相关特征与所述连接特征进行拼接后的特征,确定为3D匹配代价特征;
其中,所述连接特征为将所述左图的特征与所述右图的特征在特征维度上进行拼接得到的。
这里,可以将分组互相关特征和连接特征在特征维度上进行拼接,得到3D匹配代价特征。3D匹配代价特征相当于对每种可能的视差都得到一个特征。比如最大视差是D max,那么对可能的视差0,1,……,D max-1都得到相应的2D特征,再拼起来就是3D特征。
在一些实施例中,可以利用公式C d(x,y)=Concat(f l(x,y),f r(x+d,y)),确定左图的特征和右图的特征对每种可能的视差d的拼接结果,得到D max个拼接图;其中,所述f l表示所述左图的特征,所述f r表示所述右图的特征,所述(x,y)表示横坐标为x纵坐标为y的像素点的像素坐标,所述(x+d,y)表示横坐标为x+d纵坐标为y的像素点的像素坐标,所述Concat表示对两个特征进行拼接;然后,将所述D max个拼接图进行拼接,得到连接特征。
步骤S214、利用所述3D匹配代价特征,确定所述图像的深度。
本申请实施例中,通过获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征和连接特征;将所述分组互相关特征与所述连接特征进行拼接后的特征,确定为3D匹配代价特征;利用所述3D匹配代价特征,确定所述图像的深度,如此,能够提高双目匹配的准确度并降低网络的计算需求。
基于上述的方法实施例,本申请实施例再提供一种双目匹配方法,所述方法包括:
步骤S221、获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
步骤S222、利用共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征;
本申请实施例中,所述全卷积神经网络是双目匹配网络中的一个组成部分。在所述双目匹配网络中,可以利用一个全卷积神经网络提取待处理图像的2D特征。
步骤S223、利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
步骤S224、使用3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差的概率;
本申请实施例中,所述步骤S224可以由一个分类的神经网络实现,所述分类的神经网络也是双目匹配网络中的一个组成部分,用于确定每一像素点对应的不同视差的概率。
步骤S225、确定所述每一像素点对应的不同视差的概率的加权平均值;
在一些实施例中,可以利用公式
Figure PCTCN2019108314-appb-000002
确定获取的每一像素点对应的不同 视差d的概率的加权平均值;其中,所述视差d为大于等于0小于D max的自然数,所述D max为待处理的图像对应的使用场景下的最大视差,所述p d表示所述视差d对应的概率。
步骤S226、将所述加权平均值确定为所述像素点的视差;
步骤S227、根据所述像素点的视差,确定所述像素点的深度。
在一些实施例中,所述方法还包括:利用公式
Figure PCTCN2019108314-appb-000003
确定获取的像素点的视差
Figure PCTCN2019108314-appb-000004
对应的深度信息D;其中,所述F表示拍摄样本的摄像机的镜头焦距,所述L表示拍摄样本的摄像机的镜头基线距离。
基于上述的方法实施例,本申请实施例提供一种双目匹配网络的训练方法,图3A为本申请实施例双目匹配网络的训练方法的实现流程示意图,如图3A所示,所述方法包括:
步骤S301、利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
步骤S302、根据所述3D匹配代价特征,利用所述双目匹配网络确定样本图像的预测视差;
步骤S303、将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
这里,可以通过得到的损失函数对所述双目匹配网络中的参数进行更新,更新参数后的双目匹配网络能够预测出更好的效果。
步骤S304、利用所述损失函数对所述双目匹配网络进行训练。
基于上述的方法实施例,本申请实施例再提供一种双目匹配网络的训练方法,所述方法包括:
步骤S311、利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
本申请实施例中,所述步骤S311、利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征,可以通过以下步骤实现:
步骤S3111、利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征;
这里,所述全卷积神经网络为共享参数的全卷积神经网络;对应地,所述利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,包括:利用双目匹配网络中的共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,其中,所述2D特征的尺寸是所述左图或右图的尺寸的四分之一。
举例来说,样本的尺寸为1200*400像素,则所述2D特征的尺寸在所述样本的尺寸的四分之一,即300*100像素。当然,所述2D特征的尺寸也可以是其他的尺寸,本申请实施例对此不做限制。
本申请实施例中,所述全卷积神经网络是双目匹配网络中的一个组成部分。在所述双目匹配网络中,可以利用一个全卷积神经网络提取样本图像的2D特征。
步骤S3112、确定用于进行2D特征拼接的卷积层的标识;
这里,所述确定用于进行2D特征拼接的卷积层的标识,包括:当第i卷积层的间隔率发生变化时,将所述第i卷积层确定为用于进行2D特征拼接的卷积层,其中,i为 大于等于1的自然数。
步骤S3113、根据所述标识,将所述左图中不同卷积层的2D特征在特征维度上进行拼接,得到第一2D拼接特征;
举例来说,有多层级的特征分别是64维度、128维度和128维度(这里的维度指的是通道数目),则连接起来就是一个320维的特征图。
步骤S3114、根据所述标识,将所述右图中不同卷积层的2D特征在特征维度上进行拼接,得到第二2D拼接特征。
步骤S312、利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征;
步骤S313、利用所述双目匹配网络根据所述3D匹配代价特征,确定样本图像的预测视差;
步骤S314、将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
步骤S315、利用所述损失函数对所述双目匹配网络进行训练。
基于上述的方法实施例,本申请实施例再提供一种双目匹配网络的训练方法,所述方法包括:
步骤S321、利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
步骤S322、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
本申请实施例中,所述步骤S322、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征,可以通过以下步骤实现:
步骤S3221、将获取的第一2D拼接特征分成N g组,得到N g个第一特征组;
步骤S3222、将获取的第二2D拼接特征分成N g组,得到N g个第二特征组,N g为大于等于1的自然数;
步骤S3223、确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
本申请实施例中,所述确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图,包括:确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图,其中,g为大于等于1小于等于N g的自然数;确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图。
这里,所述确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图,包括:利用公式
Figure PCTCN2019108314-appb-000005
确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图;其中,所述N c表示所述第一2D拼接特征或所述第二2D拼接特征的通道数, 所述f l g表示所述第一特征组中的特征,所述f r g表示所述第二特征组中的特征,所述(x,y)表示横坐标为x纵坐标为y的像素点的像素坐标,所述(x+d,y)表示横坐标为x+d纵坐标为y的像素点的像素坐标。
步骤S3224、将所述N g*D max个互相关图在特征维度上进行拼接,得到分组互相关特征。
这里,所述使用场景有很多,例如,驾驶场景、室内机器人场景和手机双摄场景等等。
步骤S323、将所述分组互相关特征,确定为3D匹配代价特征;
图3B为本申请实施例分组互相关特征示意图,如图3B所示,对左图的第一2D拼接特征进行分组,得到多个左图分组后的特征组31。对右图的第二2D拼接特征进行分组,得到多个右图分组后的特征组32。所述第一2D拼接特征或所述第二2D拼接特征的形状均为[C,H,W],其中,C为拼接特征的通道数,H为拼接特征的高,W为拼接特征的宽。则左图或右图对应的每个特征组的通道数为C/N g,所述N g为分组的个数。将左图和右图对应的特征组进行互相关计算,计算每个对应的特征组在视差0,1,……,D max-1下的互相关性,可以得到N g*D max个互相关图33,所述单个互相关图33的形状为[N g,H,W],将所述N g*D max个互相关图33在特征维度上进行拼接,可以得到分组互相关特征,然后将所述分组互相关特征作为3D匹配代价特征,所述3D匹配代价特征的形状为[N g,D max,H,W],即所述分组互相关特征的形状为[N g,D max,H,W]。
步骤S324、根据所述3D匹配代价特征,利用所述双目匹配网络确定样本图像的预测视差;
步骤S325、将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
步骤S326、利用所述损失函数对所述双目匹配网络进行训练。
基于上述的方法实施例,本申请实施例再提供一种双目匹配网络的训练方法,所述方法包括:
步骤S331、利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
步骤S332、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
本申请实施例中,所述步骤S332、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征的实现方法,与所述步骤S322的实现方法相同,在此不做赘述。
步骤S333、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征;
本申请实施例中,所述步骤S333、利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征,可以通过以下步骤实现:
步骤S3331、确定获取的第一2D拼接特征和第二2D拼接特征对于所述视差d的拼接结果,得到D max个拼接图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
步骤S3332、将所述D max个拼接图进行拼接,得到连接特征。
在一些实施例中,可以利用公式C d(x,y)=Concat(f l(x,y),f r(x+d,y)),确定获取的第一2D拼接特征和第二2D拼接特征对于所述视差d的拼接结果,得到D max个拼接图;其中,所述f l表示所述第一2D拼接特征中的特征,所述f r表示所述第二2D拼接特征中的特征,所述(x,y)表示横坐标为x纵坐标为y的像素点的像素坐标,所述(x+d,y)表示横坐标为x+d纵坐标为y的像素点的像素坐标,所述Concat表示对两个特征进行拼接。
图3C为本申请实施例连接特征示意图,如图3C所示,将左图对应的第一2D拼接特征35和右图对应的第二2D拼接特征36在不同的视差0,1,……,D max-1下进行连接,得到D max个拼接图37,将所述D max个拼接图37进行拼接,得到连接特征。其中,所述2D拼接特征的形状为[C,H,W],所述单个拼接图37的形状为[2C,H,W],所述连接特征的形状为[2C,D max,H,W],所述C为2D拼接特征的通道数,所述D max为左图或右图对应的使用场景下的最大视差,所述H为左图或右图的高,所述W为左图或右图的宽。
步骤S334、将所述分组互相关特征和所述连接特征在特征维度上进行拼接,得到3D匹配代价特征;
举例来说,所述分组互相关特征的形状为[N g,D max,H,W],所述连接特征的形状为[2C,D max,H,W],则所述3D匹配代价特征的形状为[N g+2C,D max,H,W]。
步骤S335、利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合;
这里,所述利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合,包括:使用所述双目匹配网络中的3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差d的概率;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
本申请实施例中,所述步骤S335可以由一个分类的神经网络实现,所述分类的神经网络也是双目匹配网络中的一个组成部分,用于确定每一像素点对应的不同视差d的概率。
步骤S336、对聚合后的结果进行视差回归,得到样本图像的预测视差;
这里,所述对聚合后的结果进行视差回归,得到样本图像的预测视差,包括:将所述每一像素点对应的不同视差d的概率的加权平均值,确定为所述像素点的预测视差,以得到样本图像的预测视差;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
在一些实施例中,可以利用公式
Figure PCTCN2019108314-appb-000006
确定获取的每一像素点对应的不同视差d的概率的加权平均值;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差,所述p d表示所述视差d对应的概率。
步骤S337、将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失 函数;
步骤S338、利用所述损失函数对所述双目匹配网络进行训练。
基于上述的方法实施例,本申请实施例再提供一种双目匹配方法,图4A为本申请实施例双目匹配方法的实现流程示意图四,如图4A所示,所述方法包括:
步骤S401、提取2D拼接特征;
步骤S402、利用所述2D拼接特征,构建3D匹配代价特征;
步骤S403、利用聚合网络对所述3D匹配代价特征进行处理;
步骤S404、对处理后的结果,进行视差回归。
图4B为本申请实施例双目匹配网络模型示意图,如图4B所示,所述双目匹配网络模型大致可以分为四个部分,2D拼接特征提取模块41,3D匹配代价特征构建模块42,聚合网络模块43和视差回归模块44。所述图片46和图片47分别为样本数据中的左图和右图。所述2D拼接特征提取模块41,配置为对左右图片使用共享参数(包括权值共享)的全卷积神经网络提取1/4相比原图大小的2D特征,不同层的特征图被连接成一个大的特征图。所述3D匹配代价特征构建模块42,配置为获取连接特征和分组互相关特征,并利用所述连接特征和分组互相关特征对所有可能的视差d构建特征图,形成3D匹配代价特征;其中,所述所有可能的视差d包括零视差到最大视差中的所有视差,最大视差指的是左图或右图对应的使用场景下的最大视差。所述聚合网络模块43,配置为使用3D神经网络来估计对所有可能的视差d的概率。所述视差回归模块44,配置为使用所有视差的概率得到最终的视差图45。
本申请实施例中,提出了基于分组互相关操作的3D匹配代价特征来替代旧的3D匹配代价特征。首先将得到的2D拼接特征分组分成N g组,选取左右图对应的第g组特征组(比如g=1时选取第1组左图特征和第1组右图特征),计算它们对于视差d的互相关结果。对于每个特征组g(0<=g<N g),每个可能的视差d(0<=d<D max),可以得到一种N g*D max个互相关图。将这些结果连接合并即可得到形状为[N g,D max,H,W]的分组互相关特征。其中N g,D max,H和W分别为特征组数量,对于特征图的最大视差,特征高和特征宽。
然后,将所述分组互相关特征和连接特征结合,作为3D匹配代价特征,以达到更好的效果。
本申请提出了一种新的双目匹配网络,此匹配网络基于分组互相关匹配代价特征以及改进的3D堆叠沙漏网络,能够在限制3D聚合网络计算代价的同时提高匹配精度。其中,分组互相关匹配代价特征使用高维度特征直接构建,能够得到更好的表征特征。
本申请提出的基于分组互相关的网络结构由四个部分组成,2D特征提取,构建3D匹配代价特征,3D聚合和视差回归。
第一步是2D特征提取,其中采用了类似金字塔立体匹配网络的网络,然后将提取的第2、3、4卷积层的最终特征进行连接,形成一个320通道的2D特征图。
3D匹配代价特征由两部分组成:连接特征和基于分组的互相关特征。所述连接特征与金字塔立体匹配网络中的相同,只是相比金字塔立体匹配网络来说有更少的通道数。提取出的2D特征首先通过卷积压缩成12个通道,然后对每种可能的视差进行左右特征的视差连接。将所述连接特征和基于分组互相关特征拼接后,作为3D聚合网络的输入。
3D聚合网络用于聚合从相邻视差和像素预测匹配代价得到的特征。它是由一个预沙漏模块和三个堆叠的3D沙漏网络形成的,以规范卷积特征。
预沙漏模块和三个堆叠的3D沙漏网络连接到输出模块。对于每一个输出模块,采用两个3D卷积输出一个通道的3D卷积特征,然后对该3D卷积特征进行上采样并通过softmax函数沿着视差维度转换为概率。
左图的2D特征和右图的2D特征用f l和f r表示,用N c表示通道,2D特征的大小为原始图像的1/4。现有技术中,左右特征在不同的差异层被连接以形成不同的匹配代价,但是匹配度量需要使用3D聚合网络进行学习,并且,在连接之前为了节省内存特征需要被压缩至很小的通道。但是,这种压缩特征的表示可能会丢失信息。为了解决了上述问题,本申请实施例提出了基于分组互相关,利用了传统的匹配度量,建立匹配代价特征。
基于分组互相关的基本思想是将2D特征分成多个组,计算左图和右图对应组的互相关性。本申请实施例中使用公式
Figure PCTCN2019108314-appb-000007
计算分组互相关性,其中,所述N c表示2D特征的通道数,所述N g表示分组的个数,所述f l g表示分组后的左图对应的特征组中的特征,所述f r g表示分组后的右图对应的特征组中的特征,所述(x,y)表示横坐标为x纵坐标为y的像素点的像素坐标,所述(x+d,y)表示横坐标为x+d纵坐标为y的像素点的像素坐标,这里⊙表示两个特征的乘积。其中,相关性指的是计算所有特征组g和所有视差d的相关性。
为了进一步提高性能,分组互相关匹配代价可以与原始连接特征进行结合。实验结果表明,分组相关特征和连接特征是相互补充的。
本申请对金字塔立体匹配网络中的聚合网络进行了改进。首先,添加一个额外的辅助输出模块,这样,额外的辅助损失使网络学习较低层的更好聚合特征,有利于最终预测。其次,不同输出之间的剩余连接模块被移除,因此,节省了计算成本。
本申请实施例中,使用损失函数
Figure PCTCN2019108314-appb-000008
来训练基于分组互相关的网络,其中,j表示实施例中使用的基于分组互相关的网络中有三个临时结果和一个最终结果,λ j表示对于不同的结果所附加的不同权值,
Figure PCTCN2019108314-appb-000009
表示使用所述基于分组互相关的网络得到的视差,所述d *表示真实视差,所述
Figure PCTCN2019108314-appb-000010
是一种现有的损失函数计算方法。
这里,第i个像素的预测误差可以用公式
Figure PCTCN2019108314-appb-000011
确定,其中,d i表示使用本申请实施例提供的双目匹配方法确定的待处理图像左图或右图上第i个像素点的预测视差,
Figure PCTCN2019108314-appb-000012
表示所述第i个像素点的真实视差。
图4C为本申请实施例双目匹配方法和现有技术双目匹配方法的实验结果对比图,如图4C所示,现有技术包括PSMNet(即金字塔立体匹配网络)和Cat64(即使用连接特征的方法)。而本申请实施例的双目匹配方法包括两种,第一种是Gwc40(GwcNet-g)(即基于分组互相关特征的方法),第二种是Gwc40-Cat24(GwcNet-gc)(即基于分组互相关特征与连接特征拼接后的特征的方法)。其中,两种现有技术和本申请实施例的第二种方法,均使用了连接特征,但是,只有本申请实施例使用了分组互相关特征。进而,只有本申请实施例中的方法涉及到了特征分组,即,将得到的2D拼接特征分成了40组,每组8个通道数。最后,使用待处理图像对现有技术和本申请实施例中的方法进 行测试,可以得到立体视差异常值的百分比,分别为大于1个像素的异常值的百分比,大于2个像素的异常值的百分比,和大于3个像素的异常值的百分比,从图中可以看出,本申请提出的两种方法得到的实验结果均优于现有技术,即使用本申请实施例的方法对待处理图像进行处理后,得到的立体视差异常值的百分比,均小于现有技术对待处理图像进行处理后得到的立体视差异常值的百分比。
基于前述的实施例,本申请实施例提供一种双目匹配装置,该装置包括所包括的各单元、以及各单元所包括的各模块,可以通过计算机设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为CPU(Central Processing Unit,中央处理器)、MPU(Microprocessor Unit,微处理器)、DSP(Digital Signal Processing,数字信号处理器)或FPGA(Field Programmable Gate Array,现场可编程门阵列)等。
图5为本申请实施例双目匹配装置的组成结构示意图,如图5所示,所述装置500包括:
获取单元501,配置为获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
构建单元502,配置为利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
确定单元503,配置为利用所述3D匹配代价特征,确定所述图像的深度。
在一些实施例中,所述构建单元502,包括:
第一构建子单元,配置为利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征;
第二构建子单元,配置为将所述分组互相关特征,确定为3D匹配代价特征。
在一些实施例中,所述构建单元502,包括:
第一构建子单元,配置为利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征和连接特征;
第二构建子单元,配置为将所述分组互相关特征与所述连接特征进行拼接后的特征,确定为3D匹配代价特征;
其中,所述连接特征为将所述左图的特征与所述右图的特征在特征维度上进行拼接得到的。
在一些实施例中,所述第一构建子单元,包括:
第一构建模块,配置为将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果;
第二构建模块,配置为将所述互相关结果进行拼接,得到分组互相关特征。
在一些实施例中,所述第一构建模块,包括:
第一构建子模块,配置为将提取的所述左图的特征进行分组,形成第一预设数量的第一特征组;
第二构建子模块,配置为将提取的所述右图的特征进行分组,形成第二预设数量的第二特征组,所述第一预设数量与所述第二预设数量相同;
第三构建子模块,配置为确定第g组第一特征组与第g组第二特征组在不同视差下的互相关结果;其中,g为大于等于1小于等于第一预设数量的自然数;所述不同视差包括:零视差、最大视差和零视差与最大视差之间的任一视差,所述最大视差为待处理的图像对应的使用场景下的最大视差。
在一些实施例中,所述装置还包括:
提取单元,配置为利用共享参数的全卷积神经网络分别提取所述左图的2D特征和 所述右图的2D特征。
在一些实施例中,所述确定单元503,包括:
第一确定子单元,配置为使用3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差的概率;
第二确定子单元,配置为确定所述每一像素点对应的不同视差的概率的加权平均值;
第三确定子单元,配置为将所述加权平均值确定为所述像素点的视差;
第四确定子单元,配置为根据所述像素点的视差,确定所述像素点的深度。
基于前述的实施例,本申请实施例提供一种双目匹配网络的训练装置,该装置包括所包括的各单元、以及各单元所包括的各模块,可以通过计算机设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为CPU、MPU、DSP或FPGA等。
图6为本申请实施例双目匹配网络的训练装置的组成结构示意图,如图6所示,所述装置600包括:
特征提取单元601,配置为利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
视差预测单元602,配置为利用所述双目匹配网络根据所述3D匹配代价特征,确定样本图像的预测视差;
比较单元603,配置为将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
训练单元604,配置为利用所述损失函数对所述双目匹配网络进行训练。
在一些实施例中,所述特征提取单元601,包括:
第一特征提取子单元,配置为利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
第二特征提取子单元,配置为利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征。
在一些实施例中,所述第一特征提取子单元,包括:
第一特征提取模块,配置为利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征;
第二特征提取模块,配置为确定用于进行2D特征拼接的卷积层的标识;
第三特征提取模块,配置为根据所述标识,将所述左图中不同卷积层的2D特征在特征维度上进行拼接,得到第一2D拼接特征;
第四特征提取模块,配置为根据所述标识,将所述右图中不同卷积层的2D特征在特征维度上进行拼接,得到第二2D拼接特征。
在一些实施例中,所述第二特征提取模块,配置为当第i卷积层的间隔率发生变化时,将所述第i卷积层确定为用于进行2D特征拼接的卷积层,其中,i为大于等于1的自然数。
在一些实施例中,所述全卷积神经网络为共享参数的全卷积神经网络;对应地,所述第一特征提取模块,配置为利用双目匹配网络中的共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,其中,所述2D特征的尺寸是所述左图或右图的尺寸的四分之一。
在一些实施例中,所述第二特征提取子单元,包括:
第一特征确定模块,配置为利用获取的第一2D拼接特征和获取的第二2D拼接特 征,确定分组互相关特征;
第二特征确定模块,配置为将所述分组互相关特征,确定为3D匹配代价特征。
在一些实施例中,所述第二特征提取子单元,包括:
第一特征确定模块,配置为利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
所述第一特征确定模块,还配置为利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征;
第二特征确定模块,配置为将所述分组互相关特征和所述连接特征在特征维度上进行拼接,得到3D匹配代价特征。
在一些实施例中,所述第一特征确定模块,包括:
第一特征确定子模块,配置为将获取的第一2D拼接特征分成N g组,得到N g个第一特征组;
第二特征确定子模块,配置为将获取的第二2D拼接特征分成N g组,得到N g个第二特征组,N g为大于等于1的自然数;
第三特征确定子模块,配置为确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
第四特征确定子模块,配置为将所述N g*D max个互相关图在特征维度上进行拼接,得到分组互相关特征。
在一些实施例中,所述第三特征确定子模块,配置为确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图,其中,g为大于等于1小于等于N g的自然数;确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图。
在一些实施例中,所述第一特征确定模块,还包括:
第五特征确定子模块,配置为确定获取的第一2D拼接特征和第二2D拼接特征对于所述视差d的拼接结果,得到D max个拼接图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
第六特征确定子模块,配置为将所述D max个拼接图进行拼接,得到连接特征。
在一些实施例中,所述视差预测单元602,包括:
第一视差预测子单元,配置为利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合;
第二视差预测子单元,配置为对聚合后的结果进行视差回归,得到样本图像的预测视差。
在一些实施例中,所述第一视差预测子单元,配置为使用所述双目匹配网络中的3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差d的概率;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
在一些实施例中,所述第二视差预测子单元,配置为将所述每一像素点对应的不同 视差d的概率的加权平均值,确定为所述像素点的预测视差,以得到样本图像的预测视差;
其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的双目匹配方法或双目匹配网络的训练方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、ROM(Read Only Memory,只读存储器)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
对应地,本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施例中提供的双目匹配方法中的步骤,或,实现上述实施例中提供的双目匹配网络的训练方法中的步骤。
对应地,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的双目匹配方法中的步骤,或,实现上述实施例中提供的双目匹配网络的训练方法中的步骤。
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,图7为本申请实施例计算机设备的一种硬件实体示意图,如图7所示,该计算机设备700的硬件实体包括:处理器701、通信接口702和存储器703,其中
处理器701通常控制计算机设备700的总体操作。
通信接口702可以使计算机设备通过网络与其他终端或服务器通信。
存储器703配置为存储由处理器701可执行的指令和应用,还可以缓存待处理器701以及计算机设备700中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过FLASH(闪存)或RAM(Random Access Memory,随机访问存储器)实现。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者 装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM(Read Only Memory,只读存储器)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (42)

  1. 一种双目匹配方法,其中,所述方法包括:
    获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
    利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
    利用所述3D匹配代价特征,确定所述图像的深度。
  2. 根据权利要求1所述的方法,其特征在于,所述利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,包括:
    利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征;
    将所述分组互相关特征,确定为3D匹配代价特征。
  3. 根据权利要求1所述的方法,其特征在于,所述利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,包括:
    利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征和连接特征;
    将所述分组互相关特征与所述连接特征进行拼接后的特征,确定为3D匹配代价特征;
    其中,所述连接特征为将所述左图的特征与所述右图的特征在特征维度上进行拼接得到的。
  4. 根据权利要求2或3所述的方法,其特征在于,所述利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征,包括:
    将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果;
    将所述互相关结果进行拼接,得到分组互相关特征。
  5. 根据权利要求4所述的方法,其特征在于,所述将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果,包括:
    将提取的所述左图的特征进行分组,形成第一预设数量的第一特征组;
    将提取的所述右图的特征进行分组,形成第二预设数量的第二特征组,所述第一预设数量与所述第二预设数量相同;
    确定第g组第一特征组与第g组第二特征组在不同视差下的互相关结果;其中,g为大于等于1小于等于第一预设数量的自然数;所述不同视差包括:零视差、最大视差和零视差与最大视差之间的任一视差,所述最大视差为待处理的图像对应的使用场景下的最大视差。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述利用提取的所述左图的特征和所述右图的特征之前,所述方法还包括:
    利用共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征。
  7. 根据权利要求6所述的方法,其特征在于,所述利用所述3D匹配代价特征,确定所述图像的深度,包括:
    使用3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差的概率;
    确定所述每一像素点对应的不同视差的概率的加权平均值;
    将所述加权平均值确定为所述像素点的视差;
    根据所述像素点的视差,确定所述像素点的深度。
  8. 一种双目匹配网络的训练方法,其特征在于,所述方法包括:
    利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
    利用所述双目匹配网络根据所述3D匹配代价特征,确定样本图像的预测视差;
    将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
    利用所述损失函数对所述双目匹配网络进行训练。
  9. 根据权利要求8所述的方法,其特征在于,所述利用双目匹配网络确定获取的样本图像的3D匹配代价特征,包括:
    利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
    利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征。
  10. 根据权利要求9所述的方法,其特征在于,所述利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征,包括:
    利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征;
    确定用于进行2D特征拼接的卷积层的标识;
    根据所述标识,将所述左图中不同卷积层的2D特征在特征维度上进行拼接,得到第一2D拼接特征;
    根据所述标识,将所述右图中不同卷积层的2D特征在特征维度上进行拼接,得到第二2D拼接特征。
  11. 根据权利要求10所述的方法,其特征在于,所述确定用于进行2D特征拼接的卷积层的标识,包括:当第i卷积层的间隔率发生变化时,将所述第i卷积层确定为用于进行2D特征拼接的卷积层,其中,i为大于等于1的自然数。
  12. 根据权利要求10或11所述的方法,其特征在于,所述全卷积神经网络为共享参数的全卷积神经网络;
    所述利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,包括:利用双目匹配网络中的共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,其中,所述2D特征的尺寸是所述左图或右图的尺寸的四分之一。
  13. 根据权利要求9至12任一项所述的方法,其特征在于,所述利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征,包括:
    利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
    将所述分组互相关特征,确定为3D匹配代价特征。
  14. 根据权利要求9至12任一项所述的方法,其特征在于,所述利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征,包括:
    利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
    利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征;
    将所述分组互相关特征和所述连接特征在特征维度上进行拼接,得到3D匹配代价特征。
  15. 根据权利要求13或14所述的方法,其特征在于,所述利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征,包括:
    将获取的第一2D拼接特征分成N g组,得到N g个第一特征组;
    将获取的第二2D拼接特征分成N g组,得到N g个第二特征组,N g为大于等于1的自然数;
    确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
    将所述N g*D max个互相关图在特征维度上进行拼接,得到分组互相关特征。
  16. 根据权利要求15所述的方法,其特征在于,所述确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图,包括:
    确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图,其中,g为大于等于1小于等于N g的自然数;
    确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图。
  17. 根据权利要求14所述的方法,其特征在于,所述利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征,包括:
    确定获取的第一2D拼接特征和第二2D拼接特征对于所述视差d的拼接结果,得到D max个拼接图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
    将所述D max个拼接图进行拼接,得到连接特征。
  18. 根据权利要求8所述的方法,其特征在于,所述根据所述3D匹配代价特征,利用所述双目匹配网络确定样本图像的预测视差,包括:
    利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合;
    对聚合后的结果进行视差回归,得到样本图像的预测视差。
  19. 根据权利要求18所述的方法,其特征在于,所述利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合,包括:
    使用所述双目匹配网络中的3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差d的概率;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
  20. 根据权利要求18所述的方法,其特征在于,所述对聚合后的结果进行视差回归,得到样本图像的预测视差,包括:
    将所述每一像素点对应的不同视差d的概率的加权平均值,确定为所述像素点的预测视差,以得到样本图像的预测视差;
    其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
  21. 一种双目匹配装置,其中,所述装置包括:
    获取单元,配置为获取待处理的图像,其中,所述图像为包括左图和右图的2D图像;
    构建单元,配置为利用提取的所述左图的特征和所述右图的特征,构建所述图像的3D匹配代价特征,其中,所述3D匹配代价特征是包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
    确定单元,配置为利用所述3D匹配代价特征,确定所述图像的深度。
  22. 根据权利要求21所述的装置,其中,所述构建单元,包括:
    第一构建子单元,配置为利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征;
    第二构建子单元,配置为将所述分组互相关特征,确定为3D匹配代价特征。
  23. 根据权利要求21所述的装置,其中,所述构建单元,包括:
    第一构建子单元,配置为利用提取的所述左图的特征和所述右图的特征,确定分组互相关特征和连接特征;
    第二构建子单元,配置为将所述分组互相关特征与所述连接特征进行拼接后的特征,确定为3D匹配代价特征;
    其中,所述连接特征为将所述左图的特征与所述右图的特征在特征维度上进行拼接得到的。
  24. 根据权利要求22或23所述的装置,其中,所述第一构建子单元,包括:
    第一构建模块,配置为将提取的所述左图的特征和所述右图的特征分别进行分组,确定分组后的左图的特征和分组后的右图的特征在不同视差下的互相关结果;
    第二构建模块,配置为将所述互相关结果进行拼接,得到分组互相关特征。
  25. 根据权利要求24所述的装置,其中,所述第一构建模块,包括:
    第一构建子模块,配置为将提取的所述左图的特征进行分组,形成第一预设数量的第一特征组;
    第二构建子模块,配置为将提取的所述右图的特征进行分组,形成第二预设数量的第二特征组,所述第一预设数量与所述第二预设数量相同;
    第三构建子模块,配置为确定第g组第一特征组与第g组第二特征组在不同视差下的互相关结果;其中,g为大于等于1小于等于第一预设数量的自然数;所述不同视差包括:零视差、最大视差和零视差与最大视差之间的任一视差,所述最大视差为待处理的图像对应的使用场景下的最大视差。
  26. 根据权利要求21至25任一项所述的装置,其中,所述装置还包括:
    提取单元,配置为利用共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征。
  27. 根据权利要求26所述的装置,其中,所述确定单元,包括:
    第一确定子单元,配置为使用3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差的概率;
    第二确定子单元,配置为确定所述每一像素点对应的不同视差的概率的加权平均值;
    第三确定子单元,配置为将所述加权平均值确定为所述像素点的视差;
    第四确定子单元,配置为根据所述像素点的视差,确定所述像素点的深度。
  28. 一种双目匹配网络的训练装置,其中,所述装置包括:
    特征提取单元,配置为利用双目匹配网络确定获取的样本图像的3D匹配代价特征,其中,所述样本图像包括有深度标记信息的左图和右图,所述左图和右图的尺寸相同;所述3D匹配代价特征包括分组互相关特征,或,包括分组互相关特征与连接特征拼接后的特征;
    视差预测单元,配置为利用所述双目匹配网络根据所述3D匹配代价特征,确定样本图像的预测视差;
    比较单元,配置为将所述深度标记信息与所述预测视差进行比较,得到双目匹配的损失函数;
    训练单元,配置为利用所述损失函数对所述双目匹配网络进行训练。
  29. 根据权利要求28所述的装置,其中,所述特征提取单元,包括:
    第一特征提取子单元,配置为利用双目匹配网络中的全卷积神经网络分别确定所述左图的2D拼接特征和所述右图的2D拼接特征;
    第二特征提取子单元,配置为利用所述左图的2D拼接特征和所述右图的2D拼接特征,构建3D匹配代价特征。
  30. 根据权利要求29所述的装置,其中,所述第一特征提取子单元,包括:
    第一特征提取模块,配置为利用双目匹配网络中的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征;
    第二特征提取模块,配置为确定用于进行2D特征拼接的卷积层的标识;
    第三特征提取模块,配置为根据所述标识,将所述左图中不同卷积层的2D特征在特征维度上进行拼接,得到第一2D拼接特征;
    第四特征提取模块,配置为根据所述标识,将所述右图中不同卷积层的2D特征在特征维度上进行拼接,得到第二2D拼接特征。
  31. 根据权利要求30所述的装置,其中,所述第二特征提取模块,配置为当第i卷积层的间隔率发生变化时,将所述第i卷积层确定为用于进行2D特征拼接的卷积层,其中,i为大于等于1的自然数。
  32. 根据权利要求30或31所述的装置,其中,所述全卷积神经网络为共享参数的全卷积神经网络;所述第一特征提取模块,配置为利用双目匹配网络中的共享参数的全卷积神经网络分别提取所述左图的2D特征和所述右图的2D特征,其中,所述2D特征的尺寸是所述左图或右图的尺寸的四分之一。
  33. 根据权利要求29至32任一项所述的装置,其中,所述第二特征提取子单元,包括:
    第一特征确定模块,配置为利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
    第二特征确定模块,配置为将所述分组互相关特征,确定为3D匹配代价特征。
  34. 根据权利要求29至32任一项所述的装置,其中,所述第二特征提取子单元,包括:
    第一特征确定模块,配置为利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定分组互相关特征;
    所述第一特征确定模块,还配置为利用获取的第一2D拼接特征和获取的第二2D拼接特征,确定连接特征;
    第二特征确定模块,配置为将所述分组互相关特征和所述连接特征在特征维度上进行拼接,得到3D匹配代价特征。
  35. 根据权利要求33或34所述的装置,其中,所述第一特征确定模块,包括:
    第一特征确定子模块,配置为将获取的第一2D拼接特征分成N g组,得到N g个第一特征组;
    第二特征确定子模块,配置为将获取的第二2D拼接特征分成N g组,得到N g个第二特征组,N g为大于等于1的自然数;
    第三特征确定子模块,配置为确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
    第四特征确定子模块,配置为将所述N g*D max个互相关图在特征维度上进行拼接, 得到分组互相关特征。
  36. 根据权利要求35所述的装置,其中,所述第三特征确定子模块,配置为确定第g组第一特征组和第g组第二特征组对于所述视差d的互相关结果,得到D max个互相关图,其中,g为大于等于1小于等于N g的自然数;确定N g个第一特征组和N g个第二特征组对于所述视差d的互相关结果,得到N g*D max个互相关图。
  37. 根据权利要求34所述的装置,其中,所述第一特征确定模块,还包括:
    第五特征确定子模块,配置为确定获取的第一2D拼接特征和第二2D拼接特征对于所述视差d的拼接结果,得到D max个拼接图;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差;
    第六特征确定子模块,配置为将所述D max个拼接图进行拼接,得到连接特征。
  38. 根据权利要求28所述的装置,其中,所述视差预测单元,包括:
    第一视差预测子单元,配置为利用所述双目匹配网络对所述3D匹配代价特征,进行匹配代价聚合;
    第二视差预测子单元,配置为对聚合后的结果进行视差回归,得到样本图像的预测视差。
  39. 根据权利要求38所述的装置,其中,所述第一视差预测子单元,配置为使用所述双目匹配网络中的3D神经网络确定所述3D匹配代价特征中每一像素点对应的不同视差d的概率;其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
  40. 根据权利要求38所述的装置,其中,所述第二视差预测子单元,配置为将所述每一像素点对应的不同视差d的概率的加权平均值,确定为所述像素点的预测视差,以得到样本图像的预测视差;
    其中,所述视差d为大于等于0小于D max的自然数,所述D max为样本图像对应的使用场景下的最大视差。
  41. 一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现权利要求1至7任一项所述双目匹配方法中的步骤,或,实现权利要求8至20任一项所述双目匹配网络的训练方法中的步骤。
  42. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至7任一项所述双目匹配方法中的步骤,或,实现权利要求8至20任一项所述双目匹配网络的训练方法中的步骤。
PCT/CN2019/108314 2019-02-19 2019-09-26 双目匹配方法及装置、设备和存储介质 WO2020168716A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2020565808A JP7153091B2 (ja) 2019-02-19 2019-09-26 両眼マッチング方法及び装置、機器並びに記憶媒体
SG11202011008XA SG11202011008XA (en) 2019-02-19 2019-09-26 Binocular matching method and apparatus, and device and storage medium
KR1020207031264A KR20200136996A (ko) 2019-02-19 2019-09-26 양안 매칭 방법 및 장치, 기기 및 저장 매체
US17/082,640 US20210042954A1 (en) 2019-02-19 2020-10-28 Binocular matching method and apparatus, device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910127860.4 2019-02-19
CN201910127860.4A CN109887019B (zh) 2019-02-19 2019-02-19 一种双目匹配方法及装置、设备和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/082,640 Continuation US20210042954A1 (en) 2019-02-19 2020-10-28 Binocular matching method and apparatus, device and storage medium

Publications (1)

Publication Number Publication Date
WO2020168716A1 true WO2020168716A1 (zh) 2020-08-27

Family

ID=66928674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108314 WO2020168716A1 (zh) 2019-02-19 2019-09-26 双目匹配方法及装置、设备和存储介质

Country Status (6)

Country Link
US (1) US20210042954A1 (zh)
JP (1) JP7153091B2 (zh)
KR (1) KR20200136996A (zh)
CN (1) CN109887019B (zh)
SG (1) SG11202011008XA (zh)
WO (1) WO2020168716A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260538A (zh) * 2018-12-03 2020-06-09 北京初速度科技有限公司 基于长基线双目鱼眼相机的定位及车载终端
CN112819777A (zh) * 2021-01-28 2021-05-18 重庆西山科技股份有限公司 一种双目内窥镜辅助显示方法、系统、装置和存储介质

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383256B (zh) * 2018-12-29 2024-05-17 北京市商汤科技开发有限公司 图像处理方法、电子设备及计算机可读存储介质
CN109887019B (zh) * 2019-02-19 2022-05-24 北京市商汤科技开发有限公司 一种双目匹配方法及装置、设备和存储介质
CN110689060B (zh) * 2019-09-16 2022-01-28 西安电子科技大学 一种基于聚合特征差异学习网络的异源图像匹配方法
US11763433B2 (en) * 2019-11-14 2023-09-19 Samsung Electronics Co., Ltd. Depth image generation method and device
CN111260711B (zh) * 2020-01-10 2021-08-10 大连理工大学 一种弱监督可信代价传播的视差估计方法
CN111709977A (zh) * 2020-03-17 2020-09-25 北京航空航天大学青岛研究院 一种基于自适应单峰立体匹配成本滤波的双目深度学习方法
KR20220127642A (ko) * 2021-03-11 2022-09-20 삼성전자주식회사 전자 장치 및 그 제어 방법
CN113393366A (zh) * 2021-06-30 2021-09-14 北京百度网讯科技有限公司 双目匹配方法、装置、设备以及存储介质
CN113283848B (zh) * 2021-07-21 2021-09-28 湖北浩蓝智造科技有限公司 一种货物入库检测方法、仓储入库系统及存储介质
CN114627535B (zh) * 2022-03-15 2024-05-10 平安科技(深圳)有限公司 基于双目摄像头的坐标匹配方法、装置、设备及介质
CN114419349B (zh) * 2022-03-30 2022-07-15 中国科学技术大学 一种图像匹配方法和装置
CN115063467B (zh) * 2022-08-08 2022-11-15 煤炭科学研究总院有限公司 煤矿井下高分辨率图像视差估计方法及装置
CN115908992B (zh) * 2022-10-22 2023-12-05 北京百度网讯科技有限公司 双目立体匹配的方法、装置、设备以及存储介质
CN116229123B (zh) * 2023-02-21 2024-04-30 深圳市爱培科技术股份有限公司 基于多通道分组互相关代价卷的双目立体匹配方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030072483A1 (en) * 2001-08-10 2003-04-17 Stmicroelectronics, Inc. Method and apparatus for recovering depth using multi-plane stereo and spatial propagation
CN101908230A (zh) * 2010-07-23 2010-12-08 东南大学 一种基于区域深度边缘检测和双目立体匹配的三维重建方法
US20150206307A1 (en) * 2014-01-20 2015-07-23 Nokia Corporation Visual Perception Matching Cost On Binocular Stereo Images
CN107767413A (zh) * 2017-09-20 2018-03-06 华南理工大学 一种基于卷积神经网络的图像深度估计方法
CN109887019A (zh) * 2019-02-19 2019-06-14 北京市商汤科技开发有限公司 一种双目匹配方法及装置、设备和存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680510B (zh) * 2013-12-18 2017-06-16 北京大学深圳研究生院 Radar视差图优化方法、立体匹配视差图优化方法及系统
KR102016551B1 (ko) * 2014-01-24 2019-09-02 한화디펜스 주식회사 위치 추정 장치 및 방법
TWI549477B (zh) * 2014-04-17 2016-09-11 聚晶半導體股份有限公司 產生深度資訊的方法與裝置
US10582179B2 (en) * 2016-02-01 2020-03-03 Samsung Electronics Co., Ltd. Method and apparatus for processing binocular disparity image
CN105956597A (zh) * 2016-05-04 2016-09-21 浙江大学 一种基于卷积神经网络的双目立体匹配方法
CN106447661A (zh) * 2016-09-28 2017-02-22 深圳市优象计算技术有限公司 一种深度图快速生成方法
CN106679567A (zh) * 2017-02-14 2017-05-17 成都国铁电气设备有限公司 基于双目立体视觉的接触网及支柱几何参数检测测量系统
CN107316326B (zh) * 2017-06-29 2020-10-30 海信集团有限公司 应用于双目立体视觉的基于边的视差图计算方法和装置
CN108230235B (zh) * 2017-07-28 2021-07-02 北京市商汤科技开发有限公司 一种视差图生成系统、方法及存储介质
CN107506711B (zh) * 2017-08-15 2020-06-30 江苏科技大学 基于卷积神经网络的双目视觉障碍物检测系统及方法
CN108257165B (zh) * 2018-01-03 2020-03-24 上海兴芯微电子科技有限公司 图像立体匹配方法、双目视觉设备
CN108381549B (zh) * 2018-01-26 2021-12-14 广东三三智能科技有限公司 一种双目视觉引导机器人快速抓取方法、装置及存储介质
CN108961327B (zh) * 2018-05-22 2021-03-30 深圳市商汤科技有限公司 一种单目深度估计方法及其装置、设备和存储介质
CN109191512B (zh) * 2018-07-27 2020-10-30 深圳市商汤科技有限公司 双目图像的深度估计方法及装置、设备、程序及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030072483A1 (en) * 2001-08-10 2003-04-17 Stmicroelectronics, Inc. Method and apparatus for recovering depth using multi-plane stereo and spatial propagation
CN101908230A (zh) * 2010-07-23 2010-12-08 东南大学 一种基于区域深度边缘检测和双目立体匹配的三维重建方法
US20150206307A1 (en) * 2014-01-20 2015-07-23 Nokia Corporation Visual Perception Matching Cost On Binocular Stereo Images
CN107767413A (zh) * 2017-09-20 2018-03-06 华南理工大学 一种基于卷积神经网络的图像深度估计方法
CN109887019A (zh) * 2019-02-19 2019-06-14 北京市商汤科技开发有限公司 一种双目匹配方法及装置、设备和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260538A (zh) * 2018-12-03 2020-06-09 北京初速度科技有限公司 基于长基线双目鱼眼相机的定位及车载终端
CN111260538B (zh) * 2018-12-03 2023-10-03 北京魔门塔科技有限公司 基于长基线双目鱼眼相机的定位及车载终端
CN112819777A (zh) * 2021-01-28 2021-05-18 重庆西山科技股份有限公司 一种双目内窥镜辅助显示方法、系统、装置和存储介质
CN112819777B (zh) * 2021-01-28 2022-12-27 重庆西山科技股份有限公司 一种双目内窥镜辅助显示方法、系统、装置和存储介质

Also Published As

Publication number Publication date
US20210042954A1 (en) 2021-02-11
JP7153091B2 (ja) 2022-10-13
SG11202011008XA (en) 2020-12-30
CN109887019B (zh) 2022-05-24
JP2021526683A (ja) 2021-10-07
CN109887019A (zh) 2019-06-14
KR20200136996A (ko) 2020-12-08

Similar Documents

Publication Publication Date Title
WO2020168716A1 (zh) 双目匹配方法及装置、设备和存储介质
US11983850B2 (en) Image processing method and apparatus, device, and storage medium
WO2020156143A1 (zh) 三维人体姿态信息检测方法及装置、电子设备、存储介质
WO2022237081A1 (zh) 妆容迁移方法、装置、设备和计算机可读存储介质
US11698529B2 (en) Systems and methods for distributing a neural network across multiple computing devices
WO2022151661A1 (zh) 一种三维重建方法、装置、设备及存储介质
CN112423191B (zh) 一种视频通话设备和音频增益方法
WO2022165722A1 (zh) 单目深度估计方法、装置及设备
JP2019121349A (ja) 視差マップを生成するための方法、画像処理デバイス、およびシステム
CN113537254A (zh) 图像特征提取方法、装置、电子设备及可读存储介质
CN114742703A (zh) 双目立体全景图像的生成方法、装置、设备和存储介质
CN114677350A (zh) 连接点提取方法、装置、计算机设备及存储介质
WO2022126921A1 (zh) 全景图片的检测方法、装置、终端及存储介质
KR20180000696A (ko) 적어도 하나의 라이트필드 카메라를 사용하여 입체 이미지 쌍을 생성하는 방법 및 장치
CN111091117B (zh) 用于二维全景图像的目标检测方法、装置、设备、介质
CN111814811A (zh) 图像信息提取方法、训练方法及装置、介质和电子设备
CN109961092A (zh) 一种基于视差锚点的双目视觉立体匹配方法及系统
CN111161138A (zh) 用于二维全景图像的目标检测方法、装置、设备、介质
CN114663599A (zh) 一种基于多视图的人体表面重建方法及系统
WO2021208630A1 (zh) 标定方法、标定装置及应用其的电子设备
CN111382753B (zh) 光场语义分割方法、系统、电子终端及存储介质
CN114078113A (zh) 用于基于代价-体注意力的视差估计的系统和方法
Zhang et al. Spatio-temporal attention graph for monocular 3d human pose estimation
CN114494612A (zh) 构建点云地图的方法、装置和设备
CN113452981B (zh) 图像处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19915869

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207031264

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020565808

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19915869

Country of ref document: EP

Kind code of ref document: A1