US11158077B2 - Disparity estimation - Google Patents
Disparity estimation Download PDFInfo
- Publication number
- US11158077B2 US11158077B2 US17/127,540 US202017127540A US11158077B2 US 11158077 B2 US11158077 B2 US 11158077B2 US 202017127540 A US202017127540 A US 202017127540A US 11158077 B2 US11158077 B2 US 11158077B2
- Authority
- US
- United States
- Prior art keywords
- disparity
- image
- stage
- processing
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 claims abstract description 330
- 238000000605 extraction Methods 0.000 claims abstract description 64
- 238000000034 method Methods 0.000 claims abstract description 62
- 230000001174 ascending effect Effects 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 description 26
- 230000006870 function Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003936 working memory Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000005265 energy consumption Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000007670 refining Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G06K9/46—
-
- G06K9/6232—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
Definitions
- the present disclosure relates to the field of computer vision technologies, and more particular, to a disparity estimation method, an electronic device, and a computer-readable storage medium.
- computer vision technologies can be used to obtain the disparity between each pair of matching pixels in two images of different angle of views for the same scene, to obtain the disparity map, and depth information of the scene can be obtained based on the disparity map.
- Depth information may be used in various fields such as three-dimensional reconstruction, automated driving, and obstacle detection.
- methods for obtaining disparity by using computer vision technologies may include local area matching methods, global optimization methods, semi-global methods, methods based on neural network such as convolutional neural network, etc.
- a disparity estimation method includes: performing feature extraction on each image in an image pair; and performing cascaded multi-stage disparity processing according to extracted image features to obtain multiple disparity maps with increasing sizes, e.g., successively increasing sizes.
- An input of a first stage disparity processing in the multi-stage disparity processing includes multiple image features each having a size corresponding to the first stage disparity processing, and an input of disparity processing of each stage other than the first stage disparity processing in the multi-stage disparity processing includes: one or more image features each having a size corresponding to disparity processing of the stage and a disparity map generated by disparity processing of an immediate previous stage.
- an electronic device includes a processor and a memory that stores a program.
- the program including instructions that, when executed by the processor, cause the processor to perform the method according to the present disclosure.
- a computer-readable storage medium that stores a program
- the program including instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the method according to the present disclosure.
- FIG. 1 is a block diagram illustrating a disparity estimation system according to exemplary embodiments of the present disclosure
- FIG. 2 is a schematic diagram illustrating basic structure features of an image according to exemplary embodiments of the present disclosure
- FIG. 3 is a schematic diagram illustrating semantic features of an image according to exemplary embodiments of the present disclosure
- FIG. 4 is a schematic diagram illustrating edge features of an image according to some exemplary embodiments of the present disclosure
- FIG. 5 is a block diagram illustrating a possible overall structure of a disparity estimation system according to exemplary embodiments of the present disclosure
- FIG. 6 is a block diagram illustrating another possible overall structure of a disparity estimation system according to exemplary embodiments of the present disclosure
- FIGS. 7A and 7B are respectively schematic diagrams illustrating a reference image and a corresponding disparity map with ground truth on which the network is based according to exemplary embodiments of the present disclosure
- FIG. 8 is a schematic diagram illustrating multiple disparity maps with successively increasing sizes from right to left according to exemplary embodiments of the present disclosure, in which the multiple disparity maps are obtained by performing cascaded multi-stage disparity processing on the reference image shown in FIG. 7A by using a trained disparity estimation system;
- FIG. 9 is a flowchart illustrating a disparity estimation method according to exemplary embodiments of the present disclosure.
- FIG. 10 is a flowchart illustrating multi-stage disparity processing according to exemplary embodiments of the present disclosure.
- FIG. 11 is a block diagram illustrating an exemplary computing device applicable to exemplary embodiments of the present disclosure.
- first”, “second”, etc. used to describe various elements are not intended to limit the positional, temporal or importance relationship of these elements, but rather only to distinguish one component from the other.
- first element and the second element may point to the same instance of the elements, and in some cases, based on contextual descriptions, they may also refer to different instances.
- computer vision technologies can be used to obtain the disparity between each pair of matching pixels in two images of different angle of views for the same scene, to obtain the disparity map, and depth information of the scene can be obtained based on the disparity map.
- Depth information may be used in various fields such as three-dimensional reconstruction, automated driving, and obstacle detection.
- methods for obtaining disparity by using computer vision technologies may include local area matching methods, global optimization methods, semi-global methods, methods based on neural network such as convolutional neural network, etc.
- the local area matching method mainly includes operations such as matching cost computation, cost aggregation, disparity computation, and disparity refinement. It has a high speed and low energy consumption, but its algorithm effect is related to algorithm parameters (such as the size of the matching window), which is difficult to meet requirements of complex scenes.
- the global optimization method has better matching accuracy. It makes an assumption for the smoothing term, and transforms disparity computation into an energy optimization problem. Further, most global optimization methods do not have cost aggregation step, by considering the matching cost and the smoothing term, an energy function is proposed for the global point, and the disparity is obtained by minimizing the energy function. However, compared with the local area matching method, the global optimization method has more computation and higher energy consumption.
- the semi-global method can balance matching accuracy and computation speed to a certain extent. Unlike the global algorithm that optimizes the global point, the semi-global method divides the energy function of each point into paths in multiple directions, solves the value of each path, and then adds the values of all paths to obtain the energy of the point. The value of each path can be solved by dynamic planning. However, compared with the local area matching method, the semi-global method also requires more computation and higher energy consumption.
- the method based on neural network such as the convolutional neural network (CNN) can obtain a larger perceptual field by constructing a disparity network, and can have a better disparity prediction capability in an untextured region of the image.
- CNN convolutional neural network
- its computation amount is related to parameters of the neural network and the image size. The more complex the network parameter and the larger image size are, the greater the memory consumption and the lower the running speed are.
- the present disclosure provides a disparity estimation system, it can perform cascaded multi-stage disparity processing based on extracted image features of each image in an image pair, to obtain multiple disparity maps with increasing sizes, e.g., successively increasing sizes.
- the input of a first stage disparity processing in the multi-stage disparity processing may include multiple image features each having a size corresponding to the first stage disparity processing.
- the input of disparity processing of each stage other than the first stage disparity processing in the multi-stage disparity processing may include: one or more image features each having a size corresponding to the disparity processing of the stage and a disparity map generated by disparity processing of an immediate previous stage.
- disparity estimation system by performing cascaded multi-stage disparity processing on the extracted image features, in which the input of disparity processing of each stage includes the image feature having the size corresponding to the disparity processing of the stage, multiple disparity maps of different sizes can be obtained at one time for use by multiple disparity maps target devices with different performance or different accuracy requirements, such that accuracy and speed requirements of different target devices can be met, and flexibility and applicability of the disparity estimation system can also be improved.
- Exemplary embodiments of the disparity estimation system of the present disclosure will be further described below with reference to the accompanying drawings.
- FIG. 1 is a block diagram illustrating a disparity estimation system according to exemplary embodiments of the present disclosure.
- the disparity estimation system 100 may include, for example, a feature extraction network 200 and a disparity generation network 300 .
- the feature extraction network 200 is configured to perform feature extraction on each image in an image pair and output extracted image features to the disparity generation network 300 .
- the disparity generation network 300 is configured to perform cascaded multi-stage disparity processing according to the extracted image features to obtain multiple disparity maps with increasing sizes, e.g., successively increasing sizes.
- the input of a first stage disparity processing in the multi-stage disparity processing includes multiple image features each having a size corresponding to the first stage disparity processing.
- the input of disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing includes: one or more image features each having a size corresponding to the disparity processing of the stage and a disparity map generated by disparity processing of an immediate previous stage.
- cascaded multi-stage disparity processing may be performed based on the extracted image features of each image in the image pair to obtain multiple disparity maps with increasing sizes, e.g., successively increasing sizes.
- the input of the disparity processing of each stage may include an image feature having the size corresponding to the disparity processing of the stage.
- the image pair may be an image pair for the same scene captured by a multiocular camera.
- the size of each image in the image pair is the same, and the corresponding angle of view is different.
- the image pair may also be the image pair meeting requirements acquired in other manners (e.g., acquired from other third-party devices).
- each image in the image pair may be a grayscale image or a color image.
- the multiocular camera refers to a camera configured with two, three or more lenses and capable of performing static or dynamic image photographing, it can cover scenes of different angle of views or ranges through the configured multiple lenses, so as to enhance its capability for detecting objects in the scene.
- a binocular camera configured with two lenses e.g., a left lens and a right lens
- the binocular camera can capture, through the configured two lenses, two images (e.g., a left-view image and a right-view image) of the scene with the same size and different photographing angles.
- the image pair formed by the two images may be used to determine displacement (e.g., horizontal displacement), i.e., disparity, of objects in the scene between corresponding pixels in the two images, so as to determine depth information such as distance of the object.
- the disparity estimation system 100 and the multiocular camera may be independent of each other.
- the disparity estimation system 100 can perform, by the feature extraction network 200 , feature extraction on each image in the image pair for the same scene captured by the multiocular camera, and perform, by the disparity generation network 300 , cascaded multi-stage disparity processing on the extracted image features, so as to obtain multiple disparity maps with increasing sizes, e.g., successively increasing sizes.
- the multiocular camera may also be part of the disparity estimation system 100 .
- the disparity estimation system 100 may include the multiocular camera, in addition to the feature extraction network 200 and the disparity generation network 300 .
- the image feature of each image in the image pair extracted by the feature extraction network 200 of the disparity estimation system 100 may include at least one or more of: a basic structure feature, a semantic feature, an edge feature, a texture feature, a color feature, an object shape feature, or an image-self-based feature.
- FIG. 2 illustrates, in three images (a), (b) and (c) (e.g., grayscale images or color images), a schematic diagram of basic structure features of an image that may be extracted according to exemplary embodiments of the present disclosure.
- the basic structure feature may refer to the feature for reflecting various fine structures of the image.
- FIG. 3 illustrates, in four images (a), (b), (c) and (d) (e.g., grayscale images or color images), a schematic diagram of semantic features of an image that may be extracted according to exemplary embodiments of the present disclosure.
- the semantic feature may refer to the feature that can distinguish different objects or different types of objects in the image.
- accuracy of disparity determination of an ambiguous region (e.g., a large flat region) of the image can be improved based on the semantic feature.
- FIG. 4 illustrates, in two images (a) and (b) (e.g., grayscale images or color images), a schematic diagram of edge features of an image that may be extracted according to exemplary embodiments of the present disclosure.
- the edge feature may refer to the feature that can reflect boundary information of the object or the region in the image.
- the texture feature may refer to the feature that can reflect the texture of the image
- the color feature may refer to the feature that can reflect the color of the image
- the object shape feature may refer to the feature that can reflect the shape of the object in the image.
- the image-self-based feature may refer to the image itself, or the image obtained by upsampling or downsampling the image itself with a certain coefficient or ratio.
- the coefficient or the ratio for the upsampling or the downsampling may be, for example, 2, 3, or other values greater than 1.
- each other image feature in addition to the image-self-based feature, can be obtained, by a corresponding feature extraction sub-network, by performing feature extraction on the image, so as to improve the efficiency of image feature extraction and thus improve the efficiency of disparity estimation.
- feature extraction may be performed on the image from at least three different dimensions of the basic structure feature, the semantic feature, and the edge feature.
- the feature extraction network 200 may include multiple feature extraction sub-networks respectively configured to extract different features of the image.
- the multiple feature extraction sub-networks may include at least a basic structure feature sub-network 201 configured to extract basic structure features of the image, a semantic feature sub-network 202 configured to extract semantic features of the image, and an edge feature sub-network 203 configured to extract edge features of the image.
- the basic structure feature sub-network 201 may adopt any network that can be configured to extract basic structure features of the image, such as VGG (very deep convolutional networks for large-scale image recognition) or ResNet (Residual Network).
- the semantic feature sub-network 202 may adopt any network that can be configured to extract semantic features of the image, such as DeepLabV3+(an encoder-decoder with atrous separable convolution for semantic image segmentation).
- the edge feature sub-network 203 may adopt any network that can be configured to extract edge features of the image, such as a HED (holistically-nested edge detection) network.
- the HED network may adopt the VGG as the backbone network, and when the edge feature sub-network 203 adopts the HED network, the basic structure feature sub-network 201 and the edge feature sub-network 203 may adopt the same VGG network, to simplify the structure of the feature extraction network.
- the feature extraction network 200 or the feature extraction sub-networks included in the feature extraction network 200 may be an extraction network pre-trained based on a training sample set, such that the efficiency of image feature extraction can be improved, the efficiency of disparity estimation can be improved.
- the feature extraction network 200 or the feature extraction sub-networks c included in the feature extraction network 200 may also be obtained by real-time training based on a training sample set, or obtained by refining the pre-trained extracted network in real time or periodically based on the updated training sample set, so as to improve the accuracy of features extracted by the feature extraction network.
- the feature extraction network 200 or the feature extraction sub-networks included in the feature extraction network 200 may be trained with supervised training or unsupervised training, which may be flexibly selected according to actual requirements.
- Supervised training usually uses existing training samples (such as labeled data) to learn mapping from input to output, and then applies the mapping relationship to unknown data for classification or regression.
- Supervised training algorithms may include, for example, a logistic regression algorithm, a support vector machine (SVM) algorithm, a decision tree algorithm, etc.
- SVM support vector machine
- the difference between unsupervised training and supervised training lies in that unsupervised training does not require training samples, it directly models unlabeled data to find out the rules.
- Typical algorithms of unsupervised training may include a clustering algorithm, a random forest algorithm, etc.
- the input of the first stage disparity processing in the multi-stage disparity processing may include multiple image features each having a size corresponding to the first stage disparity processing
- the input of disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing may include one or more image features each having a size corresponding to the disparity processing of the stage.
- the multiple disparity maps obtained by the disparity generation network 300 are N disparity maps with successively increasing sizes
- the image features extracted by the feature extraction network 200 may include image features of N sizes, N may be a positive integer not less than 2.
- N may be 4 (as shown in FIG. 5 or FIG. 6 ). In other examples, N may also be 2, 3, 5 or others according to actual requirements. In addition, N is not the greater the better, and N can be selected as a proper value on the premise that accuracy requirement of the target device and the speed of the disparity estimation system are balanced.
- the size of each image may refer to the size of a single channel of each image, which can be represented by the height and width of the image.
- the size of the image may be expressed as H ⁇ W, in which H is the height of the image, W is the width of the image, and the two may be in units of pixels.
- H the height of the image
- W the width of the image
- the two may be in units of pixels.
- the size of the image may also be represented by one or more of parameters that can reflect the number of pixels, the data amount, the storage amount, the definition of the image, and the like.
- the size of each image in the image pair (i.e., the size of the original image that has not been downsampled and/or upsampled) may be determined according to parameters such as the size and number of pixels of sensors of the multiocular camera configured to capture the image pair.
- the size corresponding to the disparity processing of each stage may be consistent with the size of the disparity map required to be obtained by the disparity processing of each stage.
- the size of the image feature may refer to the size of a single channel of the image formed by the image feature itself, or the size of an extracted image on which extraction of an image feature of a required size is based.
- the extracted image maybe any image in the image pair, or the image obtained by upsampling or downsampling the image with a certain coefficient or ratio.
- the size of the image in the image pair is H ⁇ W (which may be referred to as full size)
- the image feature of the full size extracted for the image may be the image feature obtained by performing feature extraction on the image itself, and the image feature of
- H 2 ⁇ W 2 size (which may be referred to as 1 ⁇ 2 size) extracted for the image may be the image feature obtained by performing 2 times downsampling on the image to obtain an image of 1 ⁇ 2 size and then performing feature extraction on the image of the 1 ⁇ 2 size.
- the input of disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing may further include the disparity map generated by disparity processing of an immediate previous stage.
- the disparity map generated by the first stage disparity processing may be refined stage by stage to obtain the disparity map of the corresponding size.
- the image feature of the minimum size in the image features of the N sizes extracted by the feature extraction network 200 may include, for example, at least one type of image feature of a first image and at least one type of image feature of a second image in the image pair, and the image feature of each non-minimum size in the image features of the N sizes may include, for example, at least one type of image feature of the first image and/or at least one type of image feature of the second image in the image pair.
- the image feature of the minimum size in the image features of the N sizes extracted by the feature extraction network 200 may include the basic structure feature, the semantic feature and the edge feature of the first image (e.g., a left-view image) in the image pair, and the basic structure feature of the second image (e.g., a right-view image) in the image pair.
- the image feature of each non-minimum size in the image features of the N sizes extracted by the feature extraction network 200 may include the edge feature of the first image in the image pair or the image-self-based feature of the first image.
- each type of image feature of each image extracted by the feature extraction network 200 may have one or more sizes, and the number of the multiple sizes may be less than or equal to N.
- N may be 4, the extracted edge feature and the image-self-based feature of the first image may have two sizes respectively, the extracted basic structure feature and semantic feature of the first image may have one size, and the extracted basic structure feature of the second image may have one size.
- FIG. 5 or FIG. 6 is merely an example.
- each type of image feature of each image extracted by the feature extraction network 200 may have more sizes.
- N is 4, and the edge feature of the first image extracted by the feature extraction network 200 may have three or four sizes, which is not limited.
- the feature extraction network 200 may store (e.g., cache) the image features in a storage device or a storage medium for subsequent reading and use.
- the feature extraction network 200 may further perform epipolar rectification on the images in the image pair, such that the images in the image pair have disparity in one direction (e.g., a horizontal direction or a vertical direction).
- the disparity search range of the image can be limited to one direction, thereby improving the efficiency of subsequent feature extraction and disparity generation.
- the epipolar rectification operation on the images in the image pair may be performed by the multiocular camera or other third-party devices.
- the multiocular camera may perform epipolar rectification on the images in the image pair, and send the rectified image pair to the disparity estimation system.
- it may send the image pair to other third-party devices, and the other third-party devices perform epipolar rectification on the images in the image pair, and send the rectified image pair to the disparity estimation system.
- the size of the disparity map having a maximum size in the multiple disparity maps obtained by the disparity generation network 300 may be consistent with the size of each image in the image pair (i.e., the original size of each image). Therefore, through the cascaded multi-stage disparity processing, at least a disparity map having a corresponding size consistent with the size of each image in the image pair and having a relatively high accuracy, and a disparity map with another accuracy can be obtained, such that requirements of high-performance target devices for accuracy of the disparity map generated by the disparity estimation system can be met, while improving flexibility and the applicability of the disparity estimation system.
- the size of each disparity map in the multiple disparity maps may be less than the size of each image in the image pair.
- the height and width of the latter disparity map may be respectively twice the height and width of the previous disparity map.
- the size of the last disparity map in the 4 disparity maps is H ⁇ W (which may be consistent with the size of each image in the image pair), and then the size of other three disparity maps arranged before the last disparity map may be successively:
- H 2 ⁇ W 2 (which may be referred to as 1 ⁇ 2 size if H ⁇ W size is referred to as full size),
- the value 2 is used as a scaling step for the height and width of the adjacent disparity maps (or the coefficient or ratio of upsampling or downsampling of the adjacent disparity maps).
- the height and width of the latter disparity map may also be respectively 3 times, 4 times, or other times (e.g., a positive integer greater than 1, which may be selected according to the actual accuracy required) the height and width of the previous disparity map.
- the image features extracted by the feature extraction network 200 may include image features of N sizes, and N is a positive integer not less than 2.
- the disparity generation network may be configured to: generate, in the first stage disparity processing of the multi-stage disparity processing, an initial disparity map having a minimum size according to at least a part of an image feature of the minimum size in the image features of the N sizes; and perform, in disparity processing of each subsequent stage of the multi-stage disparity processing, disparity refinement on a disparity map generated by disparity processing of an immediate previous stage according to at least a part of an image feature having a corresponding size in the image features of the N sizes, to generate a refined disparity map having the corresponding size.
- the multiple disparity maps may include at least each refined disparity map.
- the multi-stage disparity processing may include disparity processing of N+1 stages.
- the disparity generation network may be configured to: successively perform, in disparity processing of N stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of an immediate previous stage, based on at least a part of an image feature having a corresponding size in the image features of the N sizes in ascending order of sizes, to obtain N refined disparity maps with successively increasing sizes; and take the N refined disparity maps as the multiple disparity maps.
- the sizes of the N refined disparity maps correspond to the N sizes, respectively.
- the multi-stage disparity processing may include disparity processing of 4+1 stages.
- the extracted image features may include image features of 4 sizes.
- the 4 sizes are 1 ⁇ 8 size
- the disparity generation network may be configured to: generate, in the first stage disparity processing of the multi-stage disparity processing, an initial disparity map having the minimum size (i.e., 1 ⁇ 8 size) according to at least a part of the image feature of the minimum size (e.g., part or all of the basic structure feature of 1 ⁇ 8 size, the semantic feature of 1 ⁇ 8 size and the edge feature of 1 ⁇ 8 size of the first image, and basic structure feature of 1 ⁇ 8 size of the second image) in the image features of the 4 sizes; and successively perform, in disparity processing of 4 stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of an immediate previous stage based on at least a part of the image feature having a corresponding size in the image features of the 4 sizes in ascending order of sizes (e.g., successively based on part or all
- the multiple disparity maps obtained by the disparity estimation system 100 do not include the initial disparity map generated by the first stage disparity processing, but include the refined disparity maps obtained by successively refining the initial disparity map. Thereby, the accuracy of the multiple disparity maps obtained by the disparity estimation system can be improved.
- the multi-stage disparity processing may include disparity processing of N stages.
- the disparity generation network may be configured to: successively perform, in disparity processing of N ⁇ 1 stages other than the first stage disparity processing, disparity refinement on a disparity map generated by disparity processing of an immediate previous stage based on at least a part of an image feature having a corresponding size in image features of N ⁇ 1 non-minimum sizes of the image features of the N sizes in ascending order of sizes, to obtain N ⁇ 1 refined disparity maps with successively increasing sizes; and take the initial disparity map and the N ⁇ 1 refined disparity maps as the multiple disparity maps.
- the sizes of the initial disparity map and the N ⁇ 1 refined disparity maps correspond to the N sizes, respectively.
- the multi-stage disparity processing may include disparity processing of 4 stages.
- the extracted image features may include image features of 4 sizes.
- the 4 sizes are 1 ⁇ 8 size
- the disparity generation network may be configured to: generate, in the first stage disparity processing of the multi-stage disparity processing, an initial disparity map having a minimum size (i.e., 1 ⁇ 8 size) according to at least a part of the image feature of the minimum size (e.g., part or all of the basic structure feature of 1 ⁇ 8 size, the semantic feature of 1 ⁇ 8 size and the edge feature of 1 ⁇ 8 size of the first image, and basic structure feature of 1 ⁇ 8 size of the second image) in the image features of the 4 sizes; and successively perform, in disparity processing of 3 stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of an immediate previous stage based on at least a part of the image feature having a corresponding size in image features of other 3 non-minimum sizes in ascending order of sizes (e.g., successively based
- the multiple disparity maps obtained by the disparity estimation system 100 includes the initial disparity map generated by the first stage disparity processing of the multi-stage disparity processing, such that the processing efficiency of the disparity estimation system can be improved.
- the disparity generation network 300 may be configured to: perform, in disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing, residual computation on the disparity map generated by disparity processing of an immediate previous stage based on at least part of an image feature having a corresponding size, to obtain a residual map having the corresponding size; and combine the residual map having the corresponding size with the disparity map generated by disparity processing of the immediate previous stage to obtain a refined disparity map having the corresponding size.
- a first residual map of 1 ⁇ 8 size may be computed based on part or all of the extracted edge feature of 1 ⁇ 8 size of the first image and the initial disparity map (of 1 ⁇ 8 size) generated by disparity processing of the immediate previous stage, and the first residual map is combined (e.g., added) with the initial disparity map, to obtain a first refined disparity map of 1 ⁇ 8 size as output of disparity processing of the stage.
- a second residual map of 1 ⁇ 4 size may be computed based on part or all of the extracted edge feature of 1 ⁇ 4 size of the first image and the first refined disparity map (of 1 ⁇ 8 size) generated by disparity processing of the immediate previous stage, and the second residual map is combined with the first refined disparity map (e.g., the second residual map is added to an upsampled version of 1 ⁇ 4 size of the first refined disparity map), to obtain a second refined disparity map of 1 ⁇ 4 size as output of disparity processing of the stage, and so on. More specific examples will be discussed below.
- the disparity generation network 300 may further be configured to: upsample, in disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing and before disparity refinement is performed on a disparity map generated by disparity processing of an immediate previous stage, the disparity map generated by disparity processing of the immediate previous stage to the size corresponding to the current stage disparity processing, in response to a size of the disparity map generated by disparity processing of the immediate previous stage being less than a size corresponding to a current stage disparity processing.
- the algorithm adopted for upsampling may include, for example, a nearest-neighbor interpolation algorithm, a bilinear interpolation algorithm, a deconvolution algorithm, etc. In this way, the disparity map adopted in disparity refinement of each stage may be the disparity map having the size corresponding to the disparity processing of the stage.
- disparity refinement can be performed on the initial disparity map generated to obtain the first refined disparity map of 1 ⁇ 8 size.
- the first refined disparity map of 1 ⁇ 8 size generated by disparity processing of the immediate previous stage may be upsampled to 1 ⁇ 4 size corresponding to the current stage disparity processing, and then based on part or all of the extracted edge feature of 1 ⁇ 4 size of the first image, disparity refinement may be performed on the upsampled first refined disparity map of 1 ⁇ 4 size to obtain the second refined disparity map of 1 ⁇ 4 size.
- the residual map of 1 ⁇ 4 size may be computed, and the residual map of the 1 ⁇ 4 size can be added to the upsampled first refined disparity map of the 1 ⁇ 4 size to obtain the second refined disparity map of 1 ⁇ 4 size, and so on.
- image features based on which different refined disparity maps are generated may be image features of the same type or image features of different types.
- the image features based on which different refined disparity maps are generated may be image features of the same image or different images in the image pair. For example, as shown in FIG.
- image features based on which the first two different refined disparity maps are generated may be image features of the same type (e.g., edge features) of the same image (e.g., the first image) in the image pair, and image features based on which the two intermediate different refined disparity maps are generated may be image features of different types (e.g., the edge feature of the first image, and image-self-based feature of the first image) of the same image (e.g., the first image), and so on.
- image features based on which the first two different refined disparity maps are generated may be image features of the same type (e.g., edge features) of the same image (e.g., the first image) in the image pair
- image features based on which the two intermediate different refined disparity maps are generated may be image features of different types (e.g., the edge feature of the first image, and image-self-based feature of the first image) of the same image (e.g., the first image), and so on.
- the image feature based on which each refined disparity map is generated may include, for example, the edge feature of at least one image in the image pair and/or the image-self-based feature of at least one image in the image pair.
- a refined disparity map of the corresponding size can be generated based on the edge feature of the first image.
- disparity refinement of the last two stages corresponding to a larger size disparity refinement may be performed on the disparity map generated by disparity processing of the immediate previous stage by using the image-self-based feature of the first image instead of the edge feature, so as to reduce the amount of computation required for feature extraction of large-size images and improve the processing efficiency of the disparity estimation system.
- FIG. 5 is merely an example.
- the image features based on which the corresponding refined disparity map is generated may also be a combination of the two, or a combination of the other extracted one or more image features, and the like.
- the image-self-based feature of the at least one image in the image pair may include, for example, the at least one image itself, or the image obtained by downsampling the at least one image itself according to the size of the refined disparity map to be generated.
- the downsampling process may include the following operations. For example, for an image with a size of H ⁇ W, in the case that the downsampling coefficient or ratio is K, a point may be selected every K points in each row and each column of the original image to form an image.
- the downsampling coefficient or ratio may be 2, 3 or other values greater than 1. Certainly, this is merely an example, and downsampling may also be implemented in other manners, for example, averaging of K points.
- disparity refinement may be performed on the disparity map generated by disparity processing of the immediate previous stage by using the first image of 1 ⁇ 2 size and the first image itself instead of the edge feature of the corresponding size, the first image of 1 ⁇ 2 size is obtained by downsampling the first image itself based on a downsampling coefficient of 2, so as to reduce the amount of computation required for feature extraction of large-size images and improve the processing efficiency of the disparity estimation system.
- the disparity generation network 300 may include an initial disparity generation sub-network 301 and at least one disparity refinement sub-network 302 .
- the initial disparity generation sub-network 301 and each of the at least one disparity refinement sub-network 302 are successively cascaded.
- the initial disparity generation sub-network 301 is configured to perform the first stage disparity processing
- the at least one disparity refinement sub-network 302 is configured to perform disparity processing of the stages other than the first stage disparity processing.
- FIG. 5 is used as an example below for illustrative description of the operating process of the disparity estimation system 100 including multiple feature extraction sub-networks (such as the basic structure feature sub-network 201 , the semantic feature sub-network 202 , and the edge feature sub-network 203 ), the initial disparity generation sub-network 301 , and multiple (e.g., four) disparity refinement sub-networks 302 .
- feature extraction sub-networks such as the basic structure feature sub-network 201 , the semantic feature sub-network 202 , and the edge feature sub-network 203
- the initial disparity generation sub-network 301 such as the initial disparity generation sub-network 301
- multiple (e.g., four) disparity refinement sub-networks 302 such as the basic structure feature sub-network 201 , the semantic feature sub-network 202 , and the edge feature sub-network 203 .
- multiple disparity refinement sub-networks 302 such as the basic structure feature
- the disparity estimation system 100 may extract image features of the size required for subsequent multi-stage disparity processing on the first image I 1 and the second image I 2 , based on the multiple feature extraction sub-networks in the feature extraction network 200 .
- the basic structure
- me first image I 1 may be extracted based on the edge feature sub-network 203 .
- the feature extraction network 200 of the disparity estimation system 100 may further extract the image-self-based feature of 1 ⁇ 2 size
- the basic structure feature of 1 ⁇ 8 size of the first image I 1 , the basic structure feature of 1 ⁇ 8 size of the second image I 2 , the semantic feature of 1 ⁇ 8 size of the first image I 1 , and the edge feature of 1 ⁇ 8 size of the first image I 1 may be output by the corresponding feature extraction sub-network to the initial disparity generation sub-network 301 for the first stage disparity processing, to obtain the initial disparity map dispS 1 of 1 ⁇ 8 size.
- the four disparity refinement sub-networks 302 successively cascaded with the initial disparity generation sub-network 301 may successively perform disparity refinement of different stages on the initial disparity map dispS 1 respectively, so as to obtain multiple refined disparity maps with successively increasing sizes.
- the first disparity refinement sub-network may perform disparity refinement on the initial disparity map dispS 1 of 1 ⁇ 8 size output by the initial disparity generation sub-network 301 based on (part or all of) the edge feature of 1 ⁇ 8 size of the first image I 1 from the edge feature sub-network 203 , to obtain a first refined disparity map dispS 1 _refine of 1 ⁇ 8 size.
- the first disparity refinement sub-network may obtain a first residual map of 1 ⁇ 8 size based on (part or all of) the edge feature of 1 ⁇ 8 size of the first image I 1 and the initial disparity map dispS 1 of 1 ⁇ 8 size, and add the first residual map of the 1 ⁇ 8 size to the initial disparity map dispS 1 of the 1 ⁇ 8 size to obtain the first refined disparity map dispS 1 _refine of the 1 ⁇ 8 size.
- the second disparity refinement sub-network may perform disparity refinement on the first refined disparity map dispS 1 _refine of the 1 ⁇ 8 size output by the first disparity refinement sub-network based on (part or all of) the edge feature of 1 ⁇ 4 size of the first image I 1 from the edge feature sub-network 203 , to obtain a second refined disparity map dispS 2 _refine of 1 ⁇ 4 size.
- the second disparity refinement sub-network may upsample the first refined disparity map of the 1 ⁇ 8 size output by the first disparity refinement sub-network to 1 ⁇ 4 size corresponding to the current stage disparity processing, and obtain the second residual map of 1 ⁇ 4 size based on (part or all of) the edge feature of 1 ⁇ 4 size of the first image I 1 and the upsampled first refined disparity map of 1 ⁇ 4 size, and add the second residual map of the 1 ⁇ 4 size to the upsampled first refined disparity map of the 1 ⁇ 4 size to obtain the second refined disparity map dispS 2 _refine of the 1 ⁇ 4 size.
- the third disparity refinement sub-network may perform disparity refinement on the second refined disparity map dispS 2 _refine of the 1 ⁇ 4 size output by the second disparity refinement sub-network based on (part or all of) the image-self-based feature of 1 ⁇ 2 size of the first image I 1 extracted by the feature extraction network 200 , to obtain a third refined disparity map dispS 3 _refine of 1 ⁇ 2 size.
- the third disparity refinement sub-network may upsample the second refined disparity map of the 1 ⁇ 4 size output by the second disparity refinement sub-network to 1 ⁇ 2 size corresponding to the current stage disparity processing, and obtain a third residual map of 1 ⁇ 2 size based on (part or all of) the image-self-based feature of 1 ⁇ 2 size of the first image and the upsampled second refined disparity map of 1 ⁇ 2 size, and add the third residual map of the 1 ⁇ 2 size to the upsampled second refilled disparity map of the 1 ⁇ 2 size to obtain the third refined disparity map dispS 3 _refine of the 1 ⁇ 2 size.
- the fourth disparity refinement sub-network may perform disparity refinement on the third refined disparity map dispS 3 _refine of the 1 ⁇ 2 size output by the third disparity refinement sub-network based on (part or all of) the image-self-based feature of full size of the first image I 1 extracted by the feature extraction network 200 , to obtain a fourth refined disparity map dispS 4 _refine of full size.
- the fourth disparity refinement sub-network may upsample the third refined disparity map of the 1 ⁇ 2 size output by the third disparity refinement sub-network to full size corresponding to the current stage disparity processing, obtain a fourth residual map of full size based on (part or all of) the image-self-based feature of full size of the first image and the upsampled third refined disparity map of full size, and add the fourth residual map of the full size to the upsampled third refined disparity map of the full size to obtain the fourth refined disparity map dispS 4 _refine of the full size.
- the third and the fourth disparity refinement sub-networks perform disparity refinement using the image-self-based feature of the first image, such that the amount of computation is reduced.
- one or both of them may also use the edge feature or other features of the first image.
- the first disparity refinement sub-network and/or the second disparity refinement sub-network may also use the image-self-based feature of the first image instead of the edge feature, so as to further reduce the amount of computation, which is not limited in the present disclosure.
- the first refined disparity map dispS 1 _refine of the 1 ⁇ 8 size, the second refined disparity map dispS 2 _refine of the 1 ⁇ 4 size, the third refined disparity map dispS 3 _refine of the 1 ⁇ 2 size, and the fourth refined disparity map dispS 4 _refine of the full size may be used as the multiple disparity maps with successively increasing sizes obtained by the disparity estimation system 100 shown in FIG. 5 .
- the operating process of the disparity estimation system 100 shown in FIG. 6 is similar to the operating process of the disparity estimation system 100 shown in FIG. 5 , except that the size of the initial disparity map generated by the initial disparity generation sub-network 301 is less than the size of the refined disparity map generated by the first disparity refinement sub-network, and that the initial disparity map generated by the initial disparity generation sub-network 301 is used as one of the multiple disparity maps with successively increasing sizes obtained by the disparity estimation system 100 , and details are not described herein.
- the initial disparity generation sub-network 301 and each of the at least one disparity refinement sub-network 302 may be any convolutional neural network that can implement corresponding disparity processing functions such as a two-dimensional deep convolutional neural network (2DCNN) or a three-dimensional deep convolutional neural network (3DCNN).
- 2DCNN two-dimensional deep convolutional neural network
- 3DCNN three-dimensional deep convolutional neural network
- the initial disparity generation sub-network 301 when it adopts a 2DCNN structure for obtaining the disparity, the initial disparity generation sub-network 301 may include a first number of convolutional layers successively cascaded (e.g., 5 convolutional layers, which may also select other numbers according to actual requirements).
- the convolution manner of each convolutional layer may adopt, for example, depthwise separable convolution.
- an initial disparity generation sub-network 301 will be illustrated below by using Table 1, the initial disparity generation sub-network 301 may be applied to the disparity estimation system shown in FIG. 5 , and adopt the 2DCNN structure including 5 convolutional layers successively cascaded (e.g., conv1 to conv5 in Table 1). As an example, the initial disparity generation sub-network 301 adopts a MobileNetV2 network architecture.
- the corrld layer may be configured to perform corresponding operations on the basic structure feature of 1 ⁇ 8 size of the first image and the basic structure feature of 1 ⁇ 8 size of the second image extracted by the feature extraction network 200 in FIG. 5 .
- the semanS 1 _conv layer may be configured to perform convolution processing on the semantic feature of 1 ⁇ 8 size of the first image based on a 3 ⁇ 3 convolution kernel.
- the edgeS 1 _conv layer may be configured to perform convolution processing on the edge feature of 1 ⁇ 8 size of the first image based on a 3 ⁇ 3 convolution kernel.
- the concat layer may be configured to combine the features output by the corrld layer, the semanS 1 _conv layer, and the edgeS 1 _conv layer.
- the MB_conv operation involved in the conv1 to conv5 layers may be the depthwise separable convolution operation in MobileNetV2, and the MB_conv_res operation refers to the residual depthwise separable convolution operation in MobileNetV2.
- the conv1 layer, the conv2 layer, and the conv4 layer may be respectively configured to perform a depthwise separable convolution operation on the feature output by the previous layer
- the conv3 layer and the conv5 layer may be respectively configured to perform a residual depthwise separable convolution operation on the feature output by the previous layer.
- the dispS 1 layer may be configured to perform soft argmin computation on the feature output by the previous layer, to obtain the initial disparity map dispS 1 of the corresponding size (i.e., the 1 ⁇ 8 size).
- H and W in Table 1 may represent the height and width of the image in the image pair input to the disparity estimation system 100 respectively, and D may represent the maximum disparity range of the image.
- the unit of H, W and D may be pixels.
- the value of D may be related to the focal length of each lens and/or the spacing between the lenses in the multiocular camera configured to capture the image pair.
- the number of convolutional layers of the initial disparity generation sub-network 301 adopting the 2DCNN structure may be determined according to the number of features obtained by the concat layer. For example, when the number of features obtained by the concat layer is large, the number of convolutional layers included in the initial disparity generation sub-network 301 may also be increased.
- the initial disparity generation sub-network 301 may also adopt a 3DCNN structure for obtaining the disparity.
- the initial disparity generation sub-network 301 adopting the 3DCNN structure may include a second number of convolutional layers successively cascaded (e.g., 7 convolutional layers, which may also be other numbers according to actual requirements).
- an initial disparity generation sub-network 301 will be illustrated below by using Table 2, the initial disparity generation sub-network 301 may be applied to the disparity estimation system shown in FIG. 5 , and adopt the 3DCNN structure including 7 convolutional layers successively cascaded (e.g., conv1 to conv7 in Table 2)
- the edgeS 1 _conv layer may be configured to perform convolution processing on the extracted edge feature of 1 ⁇ 8 size of the first image based on the 3 ⁇ 3 convolution kernel.
- the semanS 3 _conv layer may be configured to perform convolution processing on the extracted semantic feature of 1 ⁇ 8 size of the first image based on the 3 ⁇ 3 convolution kernel.
- the concat layer may be configured to combine the features output by the featS 1 layer, the semanS 1 _conv layer, and the edgeS 1 _conv layer.
- featS 1 may refer to the extracted basic structure feature of 1 ⁇ 8 size of the first image and the extracted basic structure feature of 1 ⁇ 8 size of the second image.
- the cost layer may be configured to shift features output by the concat layer.
- the conv1 layer to the conv7 layer may be respectively configured to perform the convolution operation on the feature output by the previous layer, based on the 3 ⁇ 3 ⁇ 3 convolution kernel.
- the conv2 layer, the conv4 layer, and the conv6 layer may be a residual module of the 3DCNN network.
- the conv2 layer, the conv4 layer, and the conv6 layer may be configured to perform the convolution operation on the feature output by the previous layer, and add the convolution result to the result output by the previous layer.
- the dispS 1 layer may be configured to perform soft argmin computation on the feature output by the previous layer, to obtain the initial disparity map dispS 1 of the corresponding size (i.e., the 1 ⁇ 8 size).
- H and Win Table 2 may represent the height and width of the image in the image pair input to the disparity estimation system 100 , respectively.
- F may represent the number of feature channels
- 1F represents that the number of channels is F
- 3F represents that the number of channels 3 ⁇ F, and so on.
- the number of convolutional layers of the initial disparity generation sub-network 301 adopting the 3DCNN structure may be determined according to the number of features obtained by the concat layer. For example, when the number of features obtained by the concat layer is large, the number of convolutional layers included in the initial disparity generation sub-network 301 may also be increased.
- the number of convolutional layers included in each of the at least one disparity refinement sub-network 302 may be less than the number of convolutional layers included in the initial disparity generation sub-network 301 .
- the number of convolutional layers included in each disparity refinement sub-network 302 may be 3, and may alternatively be set to other values according to actual requirements.
- each disparity refinement sub-network 302 may also adopt the 3DCNN structure, which is not limited.
- Table 3 to Table 6 successively describe the 2DCNN network structures of the first to fourth disparity refinement sub-networks of the disparity estimation system shown in FIG. 5 .
- edgeS1_conv from edge feature conv 3 ⁇ 3, 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 8 8 features concat concat (dispS1, edgeS1_conv) 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 9 conv1 MB_conv, 4 features 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 4 conv2 MB_conv_res, 4 features 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 4 conv3 MB_conv, 1 features 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 1 dispS1_refine add(dispS1, conv3) 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 1 dispS1_refine add(dispS1, conv3) 1 ⁇ 8H ⁇ 1 ⁇ 8W ⁇ 1
- the edgeS 1 _conv layer may be configured to perform convolution processing on the extracted edge feature of 1 ⁇ 8 size of the first image based on the 3 ⁇ 3 convolution kernel.
- the concat layer may be configured to combine the initial disparity map dispS 1 of 1 ⁇ 8 size generated by disparity processing of the immediate previous stage (i.e., the initial disparity generation processing) and the feature output by the edgeS 1 _conv layer.
- the conv1 layer and the conv3 layer may be respectively configured to perform the depthwise separable convolution operation on the feature output by the previous layer
- the conv2 layer may be configured to perform the residual depthwise separable convolution operation on the feature output by the previous layer.
- the dispS 1 _refine layer may be configured to perform superimposition computation on the feature output by the previous layer (i.e., the conv3 layer) and the initial disparity map dispS 1 of the 1 ⁇ 8 size generated by disparity processing of the immediate previous stage, to obtain the first refined disparity map dispS 1 _refine of the corresponding size (i.e., the 1 ⁇ 8 size).
- the dispS 1 _up layer may be configured to upsample the first refined disparity map dispS 1 _refine of the 1 ⁇ 8 size generated by disparity processing of the immediate previous stage (i.e., the first stage disparity refinement) to obtain the refined disparity map dispS 1 _up of 1 ⁇ 4 size.
- the edgeS 2 _conv layer may be configured to perform convolution processing on the extracted edge feature of 1 ⁇ 4 size of the first image based on the 3 ⁇ 3 convolution kernel.
- the concat layer may be configured to combine the upsampled refined disparity map dispS 1 _up of the 1 ⁇ 4 size and the feature output by the edgeS 2 _conv layer.
- the conv1 layer and the conv3 layer may be respectively configured to perform the depthwise separable convolution operation on the feature output by the previous layer
- the conv2 layer may be configured to perform the residual depthwise separable convolution operation on the feature output by the previous layer.
- the dispS 2 _refine layer may be configured to perform an addition operation on the feature output by the previous layer (i.e., the conv3 layer) and the upsampled refined disparity map dispS 1 _up of the 1 ⁇ 4 size, to obtain the second refined disparity map dispS 2 _refine of the corresponding size (i.e., the 1 ⁇ 4 size).
- the dispS 2 _up layer may be configured to upsample the second refined disparity map dispS 2 _refine of the 1 ⁇ 4 size generated by disparity processing of the immediate previous stage (i.e., the second stage disparity refinement) to obtain the refined disparity map dispS 2 _up of 1 ⁇ 2 size.
- the imgS 3 layer may be configured to downsample the first image itself to obtain the image-self-based feature of 1 ⁇ 2 size of the first image.
- I 1 represents the first image.
- the concat layer may be configured to combine the upsampled refined disparity map dispS 2 _up of 1 ⁇ 2 size and the feature output by the imgS 3 layer.
- each of the conv1 layer, the conv2 layer, and the conv3 layer may be configured to perform the convolution operation on the feature output by its previous layer, respectively.
- the dispS 3 _refine layer may be configured to perform an addition operation on the feature output by the previous layer (i.e., the conv3 layer) and the upsampled refined disparity map dispS 2 _up of the 1 ⁇ 2 size, to obtain the third refined disparity map dispS 3 _refine of the corresponding size (i.e., 1 ⁇ 2 size).
- the dispS 3 _up layer may be configured to upsample the third refined disparity map dispS 3 _refine of the 1 ⁇ 2 size generated by disparity processing of the immediate previous stage (i.e., the third stage disparity refinement) to obtain the refined disparity map dispS 3 _up of full size.
- the concat layer may be configured to combine the upsampled refined disparity map dispS 3 _up of the full size and the first image itself.
- I 1 represents the first image.
- each of the conv1 layer, the conv2 layer and the conv3 layer may be configured to perform the convolution operation on the feature output by its previous layer, respectively.
- the dispS 4 _refine layer may be configured to perform an addition operation on the feature output by the previous layer (i.e., the conv3 layer) and the upsampled refined disparity map dispS 3 _up of the full size, to obtain the fourth refined disparity map dispS 4 _refine of the corresponding size (i.e., the full size).
- H and W in Tables 3-6 may respectively represent the height and width of the image in the image pair input to the disparity estimation system 100 .
- the number of convolutional layers of each disparity refinement sub-network 302 adopting the 2DCNN structure may be determined according to the number of features obtained by the concat layer. For example, when the number of features obtained by the concat layer is large, the number of convolutional layers included in each disparity refinement sub-network 302 may also be increased.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 may be pre-trained based on a training sample set, such that the efficiency of disparity processing can be improved.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 may be obtained by real-time training based on a training sample set, or obtained by refining pre-trained network in real time or periodically based on the updated training sample set, so as to improve the accuracy of disparity generation.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 may be trained with supervised training or unsupervised training, which may be flexibly selected according to actual requirements.
- supervised training and unsupervised training reference may be made to the relevant descriptions in the above embodiments, and details are not described herein again.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 may be configured to compute a loss function.
- the loss function may represent an error between a disparity in a disparity map generated by the sub-network and a corresponding real disparity. In this way, by computing the loss function, the accuracy of each disparity map generated by the disparity estimation system can be determined. In addition, the corresponding system may be refined based on the loss function.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 is trained with supervised training
- the function f represents the difference between the estimated disparity (Disp Sn ) and the real disparity (Disp GTn )
- the function g represents the disparity continuity constraint.
- the edge feature may also be taken as a regular term of the loss function, which is not limited. Accordingly, the final loss function of the disparity estimation system 100 may be the sum of loss functions output by respective disparity processing sub-networks or disparity processing of respective stages.
- the loss function of each disparity processing sub-network or disparity processing of each stage may be obtained by reconstructing the image and computing the reconstruction error.
- each sub-network of the initial disparity generation sub-network 301 and the at least one disparity refinement sub-network 302 is trained with supervised training, a Scene Flow is used as the training set, and the structure of the disparity estimation system is shown in FIG. 5 , a reference image, a corresponding disparity map with ground truth, and the result obtained by applying the trained parameter to the reference image of a Middlebury dataset are illustrated with reference to FIGS. 7A, 7B, and 8 .
- FIGS. 7A and 7B are respectively schematic diagrams illustrating a reference image and a corresponding disparity map with ground truth on which the network is based according to exemplary embodiments of the present disclosure.
- FIG. 8 is a schematic diagram illustrating multiple disparity maps with successively increasing sizes from right to left (i.e., the result obtained by applying the trained parameter to pictures of a Middlebury dataset) according to exemplary embodiments of the present disclosure, the multiple disparity maps are obtained by performing cascaded multi-stage disparity processing on the reference image shown in FIG. 7A by using a trained disparity estimation system.
- FIGS. 7A, 7B, and 8 respectively illustrate the reference image, the disparity map with ground truth, and the generated multiple disparity maps in the form of grayscale images, it may be understood that when the reference image shown in FIG. 7A is a color image, the disparity maps shown in FIGS. 7B and 8 may be corresponding color images.
- the disparity generation network 300 may further be configured to select, according to performance of a target device, a disparity map whose size matches the performance of the target device from the multiple disparity maps as a disparity map to be provided to the target device. For example, when the performance of the target device is high and/or accuracy of the disparity map required by the target device is high, a disparity map of a large size may be selected from the multiple disparity maps and provided to the target device.
- the target device may also actively acquire, according to its performance, the required disparity map from the multiple disparity maps obtained by the disparity estimation system, which is not limited.
- the multiple disparity maps obtained by the disparity estimation system may also be provided to the corresponding target device for further processing.
- the multiple disparity maps may be provided to the corresponding target device, such that the target device obtains the depth map based on the disparity map, and then obtains depth information of the scene, so as to be applied to various application scenarios such as three-dimensional reconstruction, automated driving, and obstacle detection.
- FIGS. 1 to 8 The exemplary disparity estimation system according to the present disclosure has been described above with reference to FIGS. 1 to 8 .
- Exemplary embodiments of an exemplary disparity estimation method and an exemplary electronic device according to the present disclosure will be described below with reference to FIGS. 9, 10, and 11 .
- FIGS. 9, 10, and 11 It should be noted that various definitions, embodiments, implementations, examples and the like described above with reference to FIGS. 1 to 8 may also be applied to or combined with the exemplary embodiments described below.
- FIG. 9 is a flowchart illustrating a disparity estimation method according to exemplary embodiments of the present disclosure.
- the disparity estimation method may include the following operations: performing feature extraction on each image in an image pair (block S 901 ); and performing cascaded multi-stage disparity processing according to extracted image features to obtain multiple disparity maps with increasing sizes (block S 902 ).
- the input of the first stage disparity processing in the multi-stage disparity processing includes multiple image features each having a size corresponding to the first stage disparity processing.
- the input of disparity processing of each stage other than the first stage disparity processing in the multi-stage disparity processing includes: one or more image features each having a size corresponding to the disparity processing of the stage and a disparity map generated by disparity processing of an immediate previous stage.
- the image pair may be an image pair for the same scene captured by a multiocular camera.
- the size of each image in the image pair is the same, and the corresponding angle of view is different.
- each image in the image pair may be a grayscale image or a color image.
- the extracted image feature of each image in the image pair may include at least one or more of the following features: the basic structure feature, the semantic feature, the edge feature, the texture feature, the color feature, the object shape feature, or the image-self-based feature.
- the image feature of the first image (e.g., the left-view image) in the image pair may include the basic structure feature, the semantic feature, and the edge feature
- the image feature of the second image (e.g., the right-view image) in the image pair may include the basic structure feature.
- the image feature of the first image and the second image in the image pair may include the basic structure feature, the semantic feature, the edge feature, etc.
- a size of the disparity map having the maximum size in the multiple disparity maps may be consistent with the size of each image in the image pair.
- the size of respective disparity maps of the multiple disparity maps may be less than the size of each image in the image pair.
- the height and width of the latter disparity map may be respectively twice the height and width of the previous disparity map.
- the height and width may also be respectively 3 times, 4 times, or other times (e.g., a positive integer greater than 1) the height and width of the former disparity map according to the actual required accuracy.
- the size of the last disparity map in the multiple disparity maps is H ⁇ W (which may be consistent with the size of each image in the image pair), and then the size of other three disparity maps arranged before the last disparity map may be successively:
- H 2 ⁇ W 2 (which may be referred to as 1 ⁇ 2 size if H ⁇ W size is referred to as full size),
- H 8 ⁇ W 8 (which may be referred to as 1 ⁇ 8 size).
- the extracted image features may include image features of N sizes, where N may be a positive integer not less than 2. Accordingly, as shown in FIG. 10 , which is a flowchart illustrating multi-stage disparity processing according to exemplary embodiments of the present disclosure, performing cascaded multi-stage disparity processing according to extracted image features to obtain multiple disparity maps with increasing sizes may include the following operations.
- an initial disparity map having a minimum size according to at least a part of the image feature of the minimum size in the image features of the N sizes is generated.
- the extracted image features of the N sizes include image features of 1 ⁇ 8 size, 1 ⁇ 4 size, 1 ⁇ 2 size, and the full size.
- the initial disparity map having the minimum size i.e., the 1 ⁇ 8 size
- the image feature of the minimum size i.e., the 1 ⁇ 8 size
- the initial disparity map may be obtained by performing disparity shift on the corresponding image features having the corresponding size by using the 3DCNN, or the initial disparity map may be obtained by computing the difference between the shifted image features of the corresponding size by using the 2DCNN.
- disparity refinement is performed on a disparity map generated by disparity processing of an immediate previous stage according to at least a part of the image feature having a corresponding size in the image features of the N sizes, to generate a refined disparity map having the corresponding size, in which the multiple disparity maps include at least each refined disparity map.
- the multi-stage disparity processing may include disparity processing of N+1 stages. Accordingly, performing, in disparity processing of each subsequent stage in the multi-stage disparity processing, disparity refinement on the disparity map generated by disparity processing of the immediate previous stage according to at least a part of the image feature having the corresponding size in the image features of the N sizes to generate the refined disparity map having the corresponding size, in which the multiple disparity maps include at least each refined disparity map, may include: successively performing, in disparity processing of N stages other than the first stage disparity processing, disparity refinement on a disparity map generated by disparity processing of an immediate previous stage based on at least a part of an image feature having a corresponding size in the image features of the N sizes in ascending order of sizes, to obtain N refined disparity maps with successively increasing sizes, and taking the N refined disparity maps as the multiple disparity maps.
- the sizes of the N refined disparity maps correspond to the N sizes, respectively.
- the extracted image features of the N sizes include image features of 1 ⁇ 8 size, 1 ⁇ 4 size, 1 ⁇ 2 size, and full size
- the multi-stage disparity processing include disparity processing of 4+1 stages, in disparity processing of the four stages other than the first stage disparity processing
- disparity refinement may be successively performed on the disparity map generated by disparity processing of the immediate previous stage based on at least a part of the image feature having the corresponding size in the image features of the 4 sizes in ascending order of sizes, to obtain four refined disparity maps with successively increasing sizes (e.g., the refined disparity map of 1 ⁇ 8 size, the refined disparity map of 1 ⁇ 4 size, the refined disparity map of 1 ⁇ 2 size, and the refined disparity map of full size), and then the four refined disparity maps may be taken as the multiple disparity maps.
- the multi-stage disparity processing may include disparity processing of N stages. Accordingly, performing, in disparity processing of each subsequent stage in the multi-stage disparity processing, disparity refinement on the disparity map generated by disparity processing of the immediate previous stage according to at least a part of the image feature having the corresponding size in the image features of the N sizes, to generate the refined disparity map having the corresponding size, in which the multiple disparity maps include at least each refined disparity map, may include: successively performing, in disparity processing of N ⁇ 1 stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of the immediate previous stage based on at least a part of the image feature having the corresponding size in image features of N ⁇ 1 non-minimum sizes of the image features of the N sizes in ascending order of sizes, to obtain N ⁇ 1 refined disparity maps with successively increasing sizes; and taking the initial disparity map and the N ⁇ 1 refined disparity maps as the multiple disparity maps.
- the extracted image features of the N sizes include image features of 1 ⁇ 8 size, 1 ⁇ 4 size, 1 ⁇ 2 size, and full size
- the multi-stage disparity processing include disparity processing of 4 stages, in disparity processing of the three stages other than the first stage disparity processing, disparity refinement may be successively performed on the disparity map generated by disparity processing of the immediate previous stage based on at least part of the image feature having the corresponding size in image features of the other 3 non-minimum sizes in ascending order of sizes, to obtain three refined disparity maps with successively increasing sizes (e.g., the refined disparity map of 1 ⁇ 4 size, the refined disparity map of 1 ⁇ 2 size, and the refined disparity map of full size), and use the initial disparity map and the three refined disparity maps as the multiple disparity maps.
- the obtained multiple disparity maps include or do not include the initial disparity map generated by the first stage disparity processing, so as to improve the flexibility of disparity generation.
- performing, in disparity processing of each subsequent stage in the multi-stage disparity processing, disparity refinement on the disparity map generated by disparity processing of the immediate previous stage according to at least a part of the image feature having the corresponding size in the image features of the N sizes to generate the refined disparity map having the corresponding size may include: performing, in disparity processing of each stage other than the first stage disparity processing of the multi-stage disparity processing, residual computation on a disparity map generated by disparity processing of an immediate previous stage based on at least a part of the image feature having the corresponding size, to obtain a residual map having the corresponding size, and combining the residual map having the corresponding size with the disparity map generated by disparity processing of the immediate previous stage to obtain a refined disparity map having the corresponding size.
- the extracted image features of the N sizes include image features of 1 ⁇ 8 size, 1 ⁇ 4 size, 1 ⁇ 2 size, and full size
- the multi-stage disparity processing include disparity processing of 4+1 stages
- a first residual map of 1 ⁇ 8 size may be obtained based on part or all of the extracted image feature of the 1 ⁇ 8 size and the initial disparity map generated by disparity processing of the immediate previous stage
- the first refined disparity map of 1 ⁇ 8 size can be obtained based on the first residual map and the initial disparity map.
- a second residual map of 1 ⁇ 4 size may be obtained based on part or all of the extracted image feature of 1 ⁇ 4 size and the first refined disparity map generated by disparity processing of the immediate previous stage, and a second refined disparity map of 1 ⁇ 4 size can be obtained based on the second residual map and the first refined disparity map, and so on.
- the method may further include: in response to the size of the disparity map generated by disparity processing of the immediate previous stage being less than the size corresponding to the current stage disparity processing, upsampling the disparity map generated by disparity processing of the immediate previous stage to the size corresponding to the current stage disparity processing.
- the extracted image features of the N sizes still include image features of 1 ⁇ 8 size, 1 ⁇ 4 size, 1 ⁇ 2 size, and full size
- the multi-stage disparity processing include disparity processing of 4+1 stages, in disparity processing corresponding to 1 ⁇ 4 size (i.e., disparity refinement corresponding to 1 ⁇ 4 size) of disparity processing of 4 stages other than the first stage disparity processing
- the first refined disparity map of the 1 ⁇ 8 size generated by disparity processing of the immediate previous stage may be upsampled to 1 ⁇ 4 size corresponding to the current stage disparity processing, and then disparity refinement may be performed on the upsampled first refined disparity map of the 1 ⁇ 4 size based on part or all of the extracted image feature of the 1 ⁇ 4 size, to obtain the second refined disparity map of the 1 ⁇ 4 size.
- the image feature of the minimum size in the image features of the N sizes may include, for example, at least one type of image feature of a first image and at least one type of image feature of a second image in the image pair.
- the image feature of the minimum size in the image features of the N sizes may include the basic structure feature, the semantic feature and the edge feature of the first image (e.g., the left-view image) in the image pair, and the basic structure feature of the second image (e.g., the right-view image) in the image pair.
- the image feature of each non-minimum size in the image features of the N sizes may include, for example, at least one type of image feature of the first image and/or at least one type of image feature of the second image in the image pair.
- the image feature of each non-minimum size in the image features of the N sizes may include the edge feature of the first image or the image-self-based feature of the first image in the image pair.
- image features based on which different refined disparity maps are generated may be image features of a same type or image features of different types.
- the image feature based on which different refined disparity maps are generated may be image features of the same image or different images in the image pair.
- the image feature based on which each refined disparity map is generated may include, for example, the edge feature of at least one image in the image pair and/or the image-self-based feature of at least one image in the image pair.
- the image-self-based feature of the at least one image in the image pair may include, for example, the at least one image itself, or the image obtained by downsampling the at least one image itself according to the size of the refined disparity map to be generated.
- the disparity estimation method may further include: computing a loss function of disparity processing of each stage in the multi-stage disparity processing.
- the loss function may represent an error between the disparity in the disparity map generated by the disparity processing of the stage and the corresponding real disparity.
- the disparity estimation method may further include: selecting, according to performance of a target device, the disparity map whose size matches the performance of the target device from the multiple disparity maps as the disparity map to be provided to the target device. For example, when the performance of the target device is high and/or accuracy of the disparity map required by the target device is high, the disparity map of a large size may be selected from the multiple disparity maps and provided to the target device.
- the target device may also actively acquire, according to its performance, the required disparity map from the multiple disparity maps obtained by the disparity estimation system.
- the disparity estimation method may further include: before image feature extraction is performed on each image in the image pair, performing epipolar rectification on the images in the image pair, such that the images in the image pair have disparity in one direction (e.g., a horizontal direction).
- the disparity search range of the image can be limited to one direction, thereby improving the efficiency of subsequent feature extraction and disparity generation.
- An aspect of the present disclosure may include an electronic device.
- the electronic device may include: a processor; and a memory that stores a program, the program including instructions that, when executed by the processor, cause the processor to perform any of the methods above.
- An aspect of the present disclosure may include a computer-readable storage medium that stores a program, the program including instructions that, when executed by a processor of an electronic device, cause the electronic device to perform any of the methods.
- the computing device 2000 is an example of a hardware device that can be applied to various aspects of the present disclosure.
- the computing device 2000 may be any machine configured to perform processing and/or computing, which may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an on-board computer, or any combination thereof.
- the above electronic device may be implemented, in whole or at least in part, by the computing device 2000 or a similar device or system.
- the computing device 2000 may include elements in connection with a bus 2002 or in communication with a bus 2002 (possibly via one or more interfaces).
- the computing device 2000 may include a bus 2002 , one or more processors 2004 , one or more input devices 2006 , and one or more output devices 2008 .
- the one or more processors 2004 may be any type of processors and may include, but are not limited to, one or more general purpose processors and/or one or more dedicated processors (e.g., special processing chips).
- the input device 2006 may be any type of device capable of inputting information to the computing device 2000 , and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote controller.
- the output device 2008 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a loudspeaker, an audio/video output terminal, a vibrator, and/or a printer.
- the computing device 2000 may further include a storage device 2010 or be connected to the storage device 2010 .
- the storage device may be non-transitory and may be any storage device capable of implementing data storage, and may include, but is not limited to, a disk drive, an optical storage device, a solid-state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a read-only memory (ROM), a random access memory (RAM), a cache memory and/or any other memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or codes.
- the storage device 2010 may be detached from an interface.
- the storage device 2010 may have data/programs (including instructions)/codes for implementing the above methods and steps.
- the computing device 2000 may also include a communication device 2012 .
- the communication device 2012 may be any type of device or system capable of communicating with an external device and/or a network, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, for example, a BluetoothTM device, a 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
- the computing device 2000 may further include a working memory 2014 , which may be any type of working memory capable of storing programs (including instructions) and/or data useful to the working of the processor 2004 , and may include, but is not limited to, a random-access memory and/or a read-only memory.
- a working memory 2014 may be any type of working memory capable of storing programs (including instructions) and/or data useful to the working of the processor 2004 , and may include, but is not limited to, a random-access memory and/or a read-only memory.
- Software elements may be located in the working memory 2014 , and may include, but are not limited to, an operating system 2016 , one or more applications (that is, application programs) 2018 , drivers, and/or other data and codes. Instructions for performing the above-mentioned methods and steps may be included in the one or more applications 2018 , and the feature extraction network 200 and the disparity generation network 300 of the above disparity estimation system 100 may be implemented by the processor 2004 by reading and executing instructions of the one or more applications 2018 . More specifically, the feature extraction network 200 of the above disparity estimation system 100 may be implemented, for example, by the processor 2004 by executing the application 2018 having an instruction for performing block S 901 .
- the disparity generation network 300 of the above disparity estimation system 100 may be implemented, for example, by the processor 2004 by executing the application 2018 having an instruction for performing block S 902 , and so on.
- Executable codes or source codes of the instructions of the software elements (programs) may be stored in a non-transitory computer-readable storage medium (e.g., the above storage device 2010 ), and may be stored in the working memory 2014 when executed (may be compiled and/or installed).
- the executable codes or source codes of the instructions of the software elements (programs) may also be downloaded from a remote location.
- custom hardware may also be used, and/or specific elements may be implemented in hardware, software, firmware, middleware, microcodes, hardware description languages, or any combination thereof.
- some or all of the disclosed methods and devices may be implemented by programming hardware (e.g., a programmable logic circuit including a field programmable gate array (FPGA) and/or a programmable logic array (PLA)) in an assembly language or a hardware programming language (such as VERILOG, VHDL, and C++) by using the logic and the algorithm according to the present disclosure.
- programming hardware e.g., a programmable logic circuit including a field programmable gate array (FPGA) and/or a programmable logic array (PLA)
- FPGA field programmable gate array
- PLA programmable logic array
- the above method may be implemented in a server-client mode.
- the client may receive data input by a user and send the data to the server.
- the client can also receive data input by the user, perform a part of the processing in the above method, and send the data obtained through the processing to the server.
- the server may receive the data from the client, perform the above method or another part of the above method, and return an execution result to the client.
- the client may receive the execution result of the method from the server, and may present the execution result to the user by means of, for example, an output device.
- the components of the computing device 2000 may be distributed over a network. For example, some processing may be performed by one processor while other processing may be performed by another processor away from the processor. Other components of the computing device 2000 may also be similarly distributed. In this way, the computing device 2000 can be interpreted as a distributed computing system that performs processing at multiple positions.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
size (which may be referred to as ½ size) extracted for the image may be the image feature obtained by performing 2 times downsampling on the image to obtain an image of ½ size and then performing feature extraction on the image of the ½ size.
(which may be referred to as ½ size if H×W size is referred to as full size),
(which may be referred to as ¼ size), and
(which may be referred to as ⅛ size). In other words, in the present disclosure, the
¼ size
½ size
and full size (H×W, which may refer to a size consistent with the size of the original image in the image pair), respectively. The disparity generation network may be configured to: generate, in the first stage disparity processing of the multi-stage disparity processing, an initial disparity map having the minimum size (i.e., ⅛ size) according to at least a part of the image feature of the minimum size (e.g., part or all of the basic structure feature of ⅛ size, the semantic feature of ⅛ size and the edge feature of ⅛ size of the first image, and basic structure feature of ⅛ size of the second image) in the image features of the 4 sizes; and successively perform, in disparity processing of 4 stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of an immediate previous stage based on at least a part of the image feature having a corresponding size in the image features of the 4 sizes in ascending order of sizes (e.g., successively based on part or all of the edge feature of ⅛ size of the first image, part or all of the edge feature of ¼ size of the first image, the image-self-based feature of ½ size of the first image, and the image-self-based feature of full size of the first image), to obtain 4 refined disparity maps with successively increasing sizes (e.g., the refined disparity map of ⅛ size, the refined disparity map of ¼ size, the refined disparity map of ½ size, and the refined disparity map of full size). The 4 refined disparity maps may be used as the multiple disparity maps.
¼ size
½ size
and full size (H×W, which may refer to a size consistent with the size of each image in the image pair), respectively. The disparity generation network may be configured to: generate, in the first stage disparity processing of the multi-stage disparity processing, an initial disparity map having a minimum size (i.e., ⅛ size) according to at least a part of the image feature of the minimum size (e.g., part or all of the basic structure feature of ⅛ size, the semantic feature of ⅛ size and the edge feature of ⅛ size of the first image, and basic structure feature of ⅛ size of the second image) in the image features of the 4 sizes; and successively perform, in disparity processing of 3 stages other than the first stage disparity processing, disparity refinement on the disparity map generated by disparity processing of an immediate previous stage based on at least a part of the image feature having a corresponding size in image features of other 3 non-minimum sizes in ascending order of sizes (e.g., successively based on part or all of the edge feature of ¼ size of the first image, the image-self-based feature of ½ size of the first image, and the image-self-based feature of full size of the first image), to obtain 3 refined disparity maps with successively increasing sizes (e.g., the refined disparity map of ¼ size, the refined disparity map of ½ size, and the refined disparity map of full size). The initial disparity map and the 3 refined disparity maps may be used as the multiple disparity maps.
feature of ⅛ size of the first image I1 and the basic structure feature of ⅛ size of the second image I2 may be extracted based on the basic
or me first image I1 may be extracted based on the
and the image-self-based feature of full size (H×W) (i.e., the first image I1 itself).
TABLE 1 |
related description of a 2DCNN network structure of |
the initial |
Name | Layer Description | Output Tensor Dim |
corr1d | correlation layer | ⅛H × ⅛W × ⅛D |
semanS1_conv | from semantic feature: conv | ⅛H × ⅛W × ⅛D |
3 × 3, ⅛D features, | ||
edgeS1_conv | from edge feature: conv | ⅛H × ⅛W × ⅛D |
3 × 3, ⅛D features, | ||
concat | concat (corr1d, semanS1_conv, | ⅛H × ⅛W × ⅜D |
dgeS1_conv) | ||
conv1 | MB_conv, ⅜D features | ⅛H × ⅛W × ⅜D |
conv2 | MB_conv, 2/8D features | ⅛H × ⅛W × 2/8D |
conv3 | MB_conv_res, 2/8D features | ⅛H × ⅛W × 2/8D |
conv4 | MB_conv, ⅛D features | ⅛H × ⅛W × ⅛D |
conv5 | MB_conv_res, ⅛D features | ⅛H × ⅛W × ⅛D |
dispS1 | soft argmin | ⅛H × ⅛W × 1 |
TABLE 2 |
related description of a 3DCNN network structure of |
the initial |
Name | Layer Description | Output Tensor Dim |
edgeS1_conv | from edge feature: |
⅛H × ⅛W × F |
F features | ||
semanS1_conv | from semantic feature: conv | ⅛H × ⅛W × |
3 × 3, F features | ||
concat | concat (featS1, semanS1_conv, | ⅛H × ⅛W × 3F |
edgeS1_conv) | ||
cost | shift concatenate layer | ⅛D × ⅛H × |
⅛W × 6F | ||
conv1 | 3DCNN, 3 × 3 × 3, 4F features | ⅛D × ⅛H × |
⅛W × 4F | ||
conv2 | 3DCNN, 3 × 3 × 3, 4F features, | ⅛D × ⅛H × |
add conv1 | ⅛W × 4F | |
conv3 | 3DCNN, 3 × 3 × 3, 2F features | ⅛D × ⅛H × |
⅛W × 2F | ||
conv4 | 3DCNN, 3 × 3 × 3, 2F features, | ⅛D × ⅛H × |
add conv3 | ⅛W × 2F | |
conv5 | 3DCNN, 3 × 3 × 3, 1F features | ⅛D × ⅛H × |
⅛W × 1F | ||
conv6 | 3DCNN, 3 × 3 × 3, 2F features, | ⅛D × ⅛H × |
add conv5 | ⅛W × 1F | |
conv7 | 3DCNN, 3 × 3 × 3, 1F features | ⅛D × ⅛H × |
⅛W × 1 | ||
dispS1 | soft argmin | ⅛H × ⅛W × 1 |
TABLE 3 |
related description of a 2DCNN network structure |
of the first disparity refinement sub-network |
Name | Layer Description | Output Tensor Dim |
edgeS1_conv | from edge feature: |
⅛H × ⅛W × 8 |
8 features | ||
concat | concat (dispS1, edgeS1_conv) | ⅛H × ⅛W × 9 |
conv1 | MB_conv, 4 features | ⅛H × ⅛W × 4 |
conv2 | MB_conv_res, 4 features | ⅛H × ⅛W × 4 |
conv3 | MB_conv, 1 features | ⅛H × ⅛W × 1 |
dispS1_refine | add(dispS1, conv3) | ⅛H × ⅛W × 1 |
TABLE 4 |
related description of a possible 2DCNN network structure |
of the second disparity refinement sub-network |
Name | Layer Description | Output Tensor Dim |
dispS1_up | upsample(dispS1_refine) | ¼H × ¼W × 1 |
edgeS2_conv | from edge feature: |
¼H × ¼W × 8 |
8 features | ||
concat | concat (dispS1_up, edgeS2_conv) | ¼H × ¼W × 9 |
conv1 | MB_conv, 4 features | ¼H × ¼W × 4 |
conv2 | MB_conv_res, 4 features | ¼H × ¼W × 4 |
conv3 | MB_conv, 1 features | ¼H × ¼W × 1 |
dispS2_refine | add (dispS1_up, conv3) | ¼H × ¼W × 1 |
TABLE 5 |
related description of a possible 2DCNN network structure |
of the third disparity refinement sub-network |
Name | Layer Description | Output Tensor Dim |
dispS2_up | upsample(dispS2_refine) | ½H × ½W × 1 |
imgS3 | downsample(I1) | ½H × ½W × 3 |
concat | concat (dispS2_up, imgS3) | ½H × ½W × 4 |
conv1 | conv 3 × 3, 4 features | ½H × ½W × 4 |
conv2 | conv 3 × 3, 2 features | ½H × ½W × 2 |
conv3 | conv 3 × 3, 1 features | ½H × ½W × 1 |
dispS3_refine | add(dispS2_up, conv3) | ½H × ½W × 1 |
TABLE 6 |
related description of a possible 2DCNN network structure |
of the fourth disparity refinement sub-network |
Name | Layer Description | Output Tensor Dim |
dispS3_up | upsample(dispS3_refine) | H × W × 1 |
concat | concat (dispS3_up, I1) | H × W × 4 |
conv1 | conv 3 × 3, 4 features | H × W × 4 |
conv2 | conv 3 × 3, 2 features | H × W × 2 |
conv3 | conv 3 × 3, 1 features | H × W × 1 |
dispS4_refine | add(dispS3_up, conv3) | H × W × 1 |
and g(x)=|xx|+|xy|. In addition, the edge feature may also be taken as a regular term of the loss function, which is not limited. Accordingly, the final loss function of the
(which may be referred to as ½ size if H×W size is referred to as full size),
(which may be referred to as ¼ size), and
(which may be referred to as ⅛ size).
Claims (13)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911281475.1 | 2019-12-13 | ||
CN201911281475.1A CN112991254A (en) | 2019-12-13 | 2019-12-13 | Disparity estimation system, method, electronic device, and computer-readable storage medium |
PCT/CN2020/121824 WO2021114870A1 (en) | 2019-12-13 | 2020-10-19 | Parallax estimation system and method, electronic device and computer-readable storage medium |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/121824 Continuation WO2021114870A1 (en) | 2019-12-13 | 2020-10-19 | Parallax estimation system and method, electronic device and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210209782A1 US20210209782A1 (en) | 2021-07-08 |
US11158077B2 true US11158077B2 (en) | 2021-10-26 |
Family
ID=73789924
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/127,540 Active US11158077B2 (en) | 2019-12-13 | 2020-12-18 | Disparity estimation |
Country Status (6)
Country | Link |
---|---|
US (1) | US11158077B2 (en) |
EP (1) | EP3836083B1 (en) |
JP (1) | JP6902811B2 (en) |
KR (1) | KR102289239B1 (en) |
CN (1) | CN112991254A (en) |
WO (1) | WO2021114870A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220147776A1 (en) * | 2020-11-12 | 2022-05-12 | Ambarella International Lp | Unsupervised multi-scale disparity/optical flow fusion |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11645505B2 (en) * | 2020-01-17 | 2023-05-09 | Servicenow Canada Inc. | Method and system for generating a vector representation of an image |
CN113808187A (en) * | 2021-09-18 | 2021-12-17 | 京东鲲鹏(江苏)科技有限公司 | Disparity map generation method and device, electronic equipment and computer readable medium |
WO2024025851A1 (en) * | 2022-07-26 | 2024-02-01 | Becton, Dickinson And Company | System and method for estimating object distance and/or angle from an image capture device |
Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5727078A (en) * | 1993-01-19 | 1998-03-10 | Thomson-Csf | Process for estimating disparity between the monoscopic images making up a sterescopic image |
US20070024614A1 (en) * | 2005-07-26 | 2007-02-01 | Tam Wa J | Generating a depth map from a two-dimensional source image for stereoscopic and multiview imaging |
US20070071311A1 (en) * | 2005-09-28 | 2007-03-29 | Deere & Company, A Delaware Corporation | Method for processing stereo vision data using image density |
KR100762670B1 (en) | 2006-06-07 | 2007-10-01 | 삼성전자주식회사 | Method and device for generating disparity map from stereo image and stereo matching method and device therefor |
US20110176722A1 (en) * | 2010-01-05 | 2011-07-21 | Mikhail Sizintsev | System and method of processing stereo images |
US20120008857A1 (en) * | 2010-07-07 | 2012-01-12 | Electronics And Telecommunications Research Institute | Method of time-efficient stereo matching |
KR101178015B1 (en) | 2011-08-31 | 2012-08-30 | 성균관대학교산학협력단 | Generating method for disparity map |
US8737723B1 (en) * | 2010-12-09 | 2014-05-27 | Google Inc. | Fast randomized multi-scale energy minimization for inferring depth from stereo image pairs |
US20140147031A1 (en) * | 2012-11-26 | 2014-05-29 | Mitsubishi Electric Research Laboratories, Inc. | Disparity Estimation for Misaligned Stereo Image Pairs |
JP2015100065A (en) | 2013-11-20 | 2015-05-28 | キヤノン株式会社 | Image processing apparatus, control method of the same, control program of the same, and imaging apparatus |
US20150178936A1 (en) * | 2013-12-20 | 2015-06-25 | Thomson Licensing | Method and apparatus for performing depth estimation |
US20150254864A1 (en) * | 2014-03-07 | 2015-09-10 | Thomson Licensing | Method and apparatus for disparity estimation |
EP3070671A1 (en) * | 2015-03-18 | 2016-09-21 | Politechnika Poznanska | A system and a method for generating a depth map |
CN106600583A (en) | 2016-12-07 | 2017-04-26 | 西安电子科技大学 | Disparity map acquiring method based on end-to-end neural network |
JP2017520852A (en) * | 2014-07-08 | 2017-07-27 | クアルコム,インコーポレイテッド | System and method for stereo depth estimation using global minimization and depth interpolation |
CN108335322A (en) | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | Depth estimation method and device, electronic equipment, program and medium |
US20190014303A1 (en) * | 2016-02-25 | 2019-01-10 | SZ DJI Technology Co., Ltd. | Imaging system and method |
CN109472819A (en) * | 2018-09-06 | 2019-03-15 | 杭州电子科技大学 | A kind of binocular parallax estimation method based on cascade geometry context neural network |
JP2019096294A (en) | 2017-11-23 | 2019-06-20 | 三星電子株式会社Samsung Electronics Co.,Ltd. | Parallax estimation device and method |
JP2019121349A (en) | 2018-01-09 | 2019-07-22 | 緯創資通股▲ふん▼有限公司Wistron Corporation | Method for generating parallax map, image processing device and system |
US10380753B1 (en) * | 2018-05-30 | 2019-08-13 | Aimotive Kft. | Method and apparatus for generating a displacement map of an input dataset pair |
CN110148179A (en) | 2019-04-19 | 2019-08-20 | 北京地平线机器人技术研发有限公司 | A kind of training is used to estimate the neural net model method, device and medium of image parallactic figure |
US20190295282A1 (en) | 2018-03-21 | 2019-09-26 | Nvidia Corporation | Stereo depth estimation using deep neural networks |
JP2019184587A (en) | 2018-03-30 | 2019-10-24 | キヤノン株式会社 | Parallax detection device, parallax detection method, and parallax detection device control program |
CN110427968A (en) | 2019-06-28 | 2019-11-08 | 武汉大学 | A kind of binocular solid matching process based on details enhancing |
US20200334819A1 (en) * | 2018-09-30 | 2020-10-22 | Boe Technology Group Co., Ltd. | Image segmentation apparatus, method and relevant computing device |
US10839543B2 (en) * | 2019-02-26 | 2020-11-17 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
-
2019
- 2019-12-13 CN CN201911281475.1A patent/CN112991254A/en active Pending
-
2020
- 2020-10-19 WO PCT/CN2020/121824 patent/WO2021114870A1/en active Application Filing
- 2020-12-09 EP EP20212748.6A patent/EP3836083B1/en active Active
- 2020-12-10 JP JP2020205055A patent/JP6902811B2/en active Active
- 2020-12-11 KR KR1020200173186A patent/KR102289239B1/en active IP Right Grant
- 2020-12-18 US US17/127,540 patent/US11158077B2/en active Active
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5727078A (en) * | 1993-01-19 | 1998-03-10 | Thomson-Csf | Process for estimating disparity between the monoscopic images making up a sterescopic image |
US20070024614A1 (en) * | 2005-07-26 | 2007-02-01 | Tam Wa J | Generating a depth map from a two-dimensional source image for stereoscopic and multiview imaging |
US20070071311A1 (en) * | 2005-09-28 | 2007-03-29 | Deere & Company, A Delaware Corporation | Method for processing stereo vision data using image density |
KR100762670B1 (en) | 2006-06-07 | 2007-10-01 | 삼성전자주식회사 | Method and device for generating disparity map from stereo image and stereo matching method and device therefor |
US20110176722A1 (en) * | 2010-01-05 | 2011-07-21 | Mikhail Sizintsev | System and method of processing stereo images |
US20120008857A1 (en) * | 2010-07-07 | 2012-01-12 | Electronics And Telecommunications Research Institute | Method of time-efficient stereo matching |
US8737723B1 (en) * | 2010-12-09 | 2014-05-27 | Google Inc. | Fast randomized multi-scale energy minimization for inferring depth from stereo image pairs |
KR101178015B1 (en) | 2011-08-31 | 2012-08-30 | 성균관대학교산학협력단 | Generating method for disparity map |
US20140147031A1 (en) * | 2012-11-26 | 2014-05-29 | Mitsubishi Electric Research Laboratories, Inc. | Disparity Estimation for Misaligned Stereo Image Pairs |
JP2015100065A (en) | 2013-11-20 | 2015-05-28 | キヤノン株式会社 | Image processing apparatus, control method of the same, control program of the same, and imaging apparatus |
US20150178936A1 (en) * | 2013-12-20 | 2015-06-25 | Thomson Licensing | Method and apparatus for performing depth estimation |
US9600889B2 (en) * | 2013-12-20 | 2017-03-21 | Thomson Licensing | Method and apparatus for performing depth estimation |
US20150254864A1 (en) * | 2014-03-07 | 2015-09-10 | Thomson Licensing | Method and apparatus for disparity estimation |
US9704252B2 (en) * | 2014-03-07 | 2017-07-11 | Thomson Licensing | Method and apparatus for disparity estimation |
US10074158B2 (en) * | 2014-07-08 | 2018-09-11 | Qualcomm Incorporated | Systems and methods for stereo depth estimation using global minimization and depth interpolation |
JP2017520852A (en) * | 2014-07-08 | 2017-07-27 | クアルコム,インコーポレイテッド | System and method for stereo depth estimation using global minimization and depth interpolation |
EP3070671A1 (en) * | 2015-03-18 | 2016-09-21 | Politechnika Poznanska | A system and a method for generating a depth map |
US20190014303A1 (en) * | 2016-02-25 | 2019-01-10 | SZ DJI Technology Co., Ltd. | Imaging system and method |
CN106600583A (en) | 2016-12-07 | 2017-04-26 | 西安电子科技大学 | Disparity map acquiring method based on end-to-end neural network |
JP2019096294A (en) | 2017-11-23 | 2019-06-20 | 三星電子株式会社Samsung Electronics Co.,Ltd. | Parallax estimation device and method |
JP2019121349A (en) | 2018-01-09 | 2019-07-22 | 緯創資通股▲ふん▼有限公司Wistron Corporation | Method for generating parallax map, image processing device and system |
CN108335322A (en) | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | Depth estimation method and device, electronic equipment, program and medium |
US20190295282A1 (en) | 2018-03-21 | 2019-09-26 | Nvidia Corporation | Stereo depth estimation using deep neural networks |
JP2019184587A (en) | 2018-03-30 | 2019-10-24 | キヤノン株式会社 | Parallax detection device, parallax detection method, and parallax detection device control program |
US10380753B1 (en) * | 2018-05-30 | 2019-08-13 | Aimotive Kft. | Method and apparatus for generating a displacement map of an input dataset pair |
CN109472819A (en) * | 2018-09-06 | 2019-03-15 | 杭州电子科技大学 | A kind of binocular parallax estimation method based on cascade geometry context neural network |
US20200334819A1 (en) * | 2018-09-30 | 2020-10-22 | Boe Technology Group Co., Ltd. | Image segmentation apparatus, method and relevant computing device |
US10839543B2 (en) * | 2019-02-26 | 2020-11-17 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
CN110148179A (en) | 2019-04-19 | 2019-08-20 | 北京地平线机器人技术研发有限公司 | A kind of training is used to estimate the neural net model method, device and medium of image parallactic figure |
CN110427968A (en) | 2019-06-28 | 2019-11-08 | 武汉大学 | A kind of binocular solid matching process based on details enhancing |
Non-Patent Citations (9)
Title |
---|
Airborne Vehicle Detection in Dense Urban Areas Using HoG Features and Disparity Maps, Sebastian Tuermer et al., IEEE, 1939-1404, 2013, pp. 2327-2337 (Year: 2013). * |
Anytime Stereo Imag Depth Estimation—Devices, Yan Wang et al., arXiv: 1810.11408v2, Mar. 5, 2019, pp. 1-6 (Year: 2019). * |
Chen, "Stereo Matching Algorithm Based on Multi-Scale Information and Attention," China Academic Journal Electronic Publishing House, p. 10-22, 2019. |
GHRNET: Guided Hierarchical Refinement Network for Stereo Matching, Bin Tan et al., IEEE, 978-1-5386-6249-6, 2019, pp. 4459-4463 (Year: 2019). * |
Neural Disparity Map Estimation from Stereo Image, Nadia Baha et al., The International Arab Journal of Information Technology, vol. 9, No. 3, May 2012, pp. 217-224 (Year: 2012). * |
Song et al., "EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching", arXiv:1803.05196v3 [cs.CV], Sep. 23, 2018, (16 pages). |
Stereo Matching Algorithm Based on Multi-Scale Information and Attention, Chen, China Academic Journal Electronic Publishing House, 2019, p. 10-22, (Year: 2019). * |
Tan et al., "Ghrnet: Guided Hierarchical Refinement Network for Stereo Matching", School of Remote Sensing and Information Engineering, Wuhan University, 2019, (5 pages). |
Wang et al., "Anytime Stereo Image Depth Estimation on Mobile Devices", arXiv:1810.11408v2 [cs.CV], Mar. 5, 2019, (8 pages). |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220147776A1 (en) * | 2020-11-12 | 2022-05-12 | Ambarella International Lp | Unsupervised multi-scale disparity/optical flow fusion |
US11526702B2 (en) * | 2020-11-12 | 2022-12-13 | Ambarella International Lp | Unsupervised multi-scale disparity/optical flow fusion |
Also Published As
Publication number | Publication date |
---|---|
WO2021114870A1 (en) | 2021-06-17 |
US20210209782A1 (en) | 2021-07-08 |
EP3836083B1 (en) | 2023-08-09 |
KR20210076853A (en) | 2021-06-24 |
KR102289239B1 (en) | 2021-08-12 |
JP6902811B2 (en) | 2021-07-14 |
JP2021096850A (en) | 2021-06-24 |
CN112991254A (en) | 2021-06-18 |
EP3836083A1 (en) | 2021-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11158077B2 (en) | Disparity estimation | |
US10733431B2 (en) | Systems and methods for optimizing pose estimation | |
US10796452B2 (en) | Optimizations for structure mapping and up-sampling | |
US10586350B2 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
US10956967B2 (en) | Generating and providing augmented reality representations of recommended products based on style similarity in relation to real-world surroundings | |
US20180231871A1 (en) | Depth estimation method for monocular image based on multi-scale CNN and continuous CRF | |
US20190340649A1 (en) | Generating and providing augmented reality representations of recommended products based on style compatibility in relation to real-world surroundings | |
EP3493106B1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
EP3493105A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
US20200211185A1 (en) | 3d segmentation network and 3d refinement module | |
KR102218608B1 (en) | Real time overlay placement in videos for augmented reality applications | |
US20150116355A1 (en) | Reference image slicing | |
US11127115B2 (en) | Determination of disparity | |
EP3493104A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
CN116745813A (en) | Self-supervision type depth estimation framework for indoor environment | |
Pan et al. | An automatic 2D to 3D video conversion approach based on RGB-D images | |
WO2021114871A1 (en) | Parallax determination method, electronic device, and computer-readable storage medium | |
US20240070812A1 (en) | Efficient cost volume processing within iterative process | |
CN116957999A (en) | Depth map optimization method, device, equipment and storage medium | |
Zuo et al. | Integration of colour and affine invariant feature for multi-view depth video estimation | |
CN117523560A (en) | Semantic segmentation method, semantic segmentation device and storage medium | |
Bhatti et al. | Stereo correspondence estimation using multiwavelets scale-space representation-based multiresolution analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: NEXTVPU (SHANGHAI) CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FANG, SHU;ZHOU, JI;FENG, XINPENG;REEL/FRAME:056579/0442 Effective date: 20201019 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
AS | Assignment |
Owner name: NEXTVPU (SHANGHAI) CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FANG, SHU;ZHOU, JI;FENG, XINPENG;REEL/FRAME:056850/0681 Effective date: 20201019 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |