WO2023159757A1 - Disparity map generation method and apparatus, electronic device, and storage medium - Google Patents

Disparity map generation method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2023159757A1
WO2023159757A1 PCT/CN2022/090665 CN2022090665W WO2023159757A1 WO 2023159757 A1 WO2023159757 A1 WO 2023159757A1 CN 2022090665 W CN2022090665 W CN 2022090665W WO 2023159757 A1 WO2023159757 A1 WO 2023159757A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
features
image
left view
target
Prior art date
Application number
PCT/CN2022/090665
Other languages
French (fr)
Chinese (zh)
Inventor
唐小初
张祎頔
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023159757A1 publication Critical patent/WO2023159757A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a method and device for generating a disparity map, electronic equipment, and a storage medium.
  • Disparity estimation is a fundamental computer vision problem that aims to predict distance measurements for each point in a target scene.
  • the current stereo matching algorithm usually encounters difficulties in ill-posed regions such as weak textures, repeated textures, and occlusions when performing disparity estimation, and often cannot accurately estimate the disparity of the target object, resulting in a large error in the generated disparity map. Therefore, how to improve the accuracy of the disparity estimation and reduce the error of the disparity map has become an urgent technical problem to be solved.
  • the embodiment of the present application proposes a method for generating a disparity map, the method including:
  • the target image includes a left view and a right view
  • Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
  • the embodiment of the present application proposes a device for generating a disparity map, and the device includes:
  • An image acquisition module configured to acquire a target image, wherein the target image includes a left view and a right view of the target object;
  • a feature extraction module configured to perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features
  • An image segmentation module configured to perform image segmentation processing on the left view feature to obtain the first image feature
  • a fusion module configured to perform fusion processing on the left view feature, the first image feature, and the right view feature to obtain a target cost volume
  • a disparity estimation module configured to perform disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map
  • the semantic refinement module is configured to perform semantic refinement processing on the estimated disparity map through a preset semantic refinement network and the first image feature, to obtain a target disparity map.
  • the embodiment of the present application provides an electronic device, the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a program for implementing the processor
  • a data bus connecting and communicating with the memory when the program is executed by the processor, a method for generating a disparity map is implemented, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target The image includes a left view and a right view; feature extraction is performed on the left view to obtain multiple left view features, and feature extraction is performed on the right view to obtain multiple right view features; image segmentation is performed on the left view features processing to obtain the first image feature; combining the left view feature, the first image feature and the right view feature to obtain the target cost volume; Perform disparity estimation on the volume to obtain an estimated disparity map; perform semantic refinement processing on the estimated disparity map through a preset semantic refinement network and the first image feature to obtain a target disparity map.
  • the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and the one or more This program can be executed by one or more processors to implement a method for generating a disparity map, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target image includes a left view and a right view; Perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features; perform image segmentation processing on the left view features to obtain first image features; Combining the left view feature, the first image feature and the right view feature to obtain a target cost volume; performing disparity estimation on the target cost volume through a preset three-dimensional convolution hourglass model to obtain an estimated disparity map; Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
  • the disparity map generation method and device, electronic equipment and storage medium proposed in this application can make the obtained left view features and right view features more meet the requirements of disparity estimation.
  • the semantic information can be used to assist the disparity estimation and improve the reliability of the disparity estimation.
  • Semantic refinement of the estimated disparity map through the preset semantic refinement network and the first image features can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the error of the disparity map.
  • FIG. 1 is a flow chart of a method for generating a disparity map provided in an embodiment of the present application
  • Fig. 2 is the flowchart of step S102 in Fig. 1;
  • Fig. 3 is the flowchart of step S103 in Fig. 1;
  • Fig. 4 is the flowchart of step S104 in Fig. 1;
  • Fig. 5 is the flowchart of step S402 in Fig. 4;
  • Fig. 6 is the flowchart of step S105 in Fig. 1;
  • Fig. 7 is the flowchart of step S106 in Fig. 1;
  • FIG. 8 is a schematic structural diagram of a disparity map generation device provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • stereo matching algorithms usually encounter difficulties in ill-posed regions such as weak textures, repeated textures, and occlusions when performing disparity estimation, and often cannot accurately perform disparity estimation on target objects. Therefore, how to improve the accuracy of disparity estimation has become an urgent technical problem to be solved.
  • embodiments of the present application provide a method and device for generating a disparity map, an electronic device, and a storage medium, aiming at improving the accuracy of disparity estimation and reducing errors of the disparity map.
  • the disparity map generation method and device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the method for generating a disparity map in the embodiment of the present application is described.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the disparity map generation method provided in the embodiment of the present application relates to the technical field of artificial intelligence.
  • the method for generating a disparity map provided in the embodiment of the present application may be applied to a terminal, may also be applied to a server, and may also be software running on the terminal or the server.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc.
  • the server end can be configured as an independent physical server, or can be configured as a server cluster or a distributed system composed of multiple physical servers, or It can be configured as a cloud that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the server; the software may be an application for realizing the disparity map generation method, etc., but is not limited to the above forms.
  • the application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • Fig. 1 is an optional flow chart of a method for generating a disparity map provided by an embodiment of the present application.
  • the method in Fig. 1 may include, but is not limited to, step S101 to step S106.
  • Step S101 acquiring a target image, wherein the target image includes a left view and a right view;
  • Step S102 performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
  • Step S103 performing image segmentation processing on the left view feature to obtain the first image feature
  • Step S104 combining the left view feature, the first image feature and the right view feature to obtain the target cost body
  • Step S105 performing disparity estimation on the target cost volume through the preset three-dimensional convolutional hourglass model to obtain an estimated disparity map
  • Step S106 perform semantic refinement processing on the estimated disparity map through the preset semantic refinement network and the first image features, to obtain the target disparity map.
  • step S101 to step S106 of the embodiment of the present application by performing feature extraction on the left view, multiple left view features are obtained, and by performing feature extraction on the right view, multiple right view features are obtained, which can make the obtained left view features and right View features are more in line with the needs of disparity estimation.
  • the first image feature is obtained by performing image segmentation processing on the left view feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost body, and the target is calculated by the preset three-dimensional convolutional hourglass model.
  • the cost body performs disparity estimation to obtain an estimated disparity map, which can use semantic information to assist disparity estimation and improve the reliability of disparity estimation.
  • the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the generated The error of the target disparity map of .
  • the target image can be a two-dimensional image or a three-dimensional image; in some embodiments, the target image can be obtained by computer tomography (Computed Tomo-graphy, -CT), in another item In an embodiment, the target image can also be obtained by magnetic resonance imaging (Magnetic Resonance Imaging, MRI). In some other embodiments, the target image can also be obtained by shooting with a binocular camera, etc., but is not limited thereto. The left view and the right view are left and right views captured by a binocular camera.
  • the disparity map generation method further includes pre-constructing a stereo matching network, the stereo matching network mainly includes a feature extraction module, an image segmentation module, a disparity estimation module and a semantic refinement module, wherein the feature extraction module mainly It is composed of a residual network for feature extraction of the input target image; the image segmentation module is mainly composed of a PSPNet decoding network for sampling the target image after feature extraction; the disparity estimation module is mainly composed of a three-dimensional convolutional network , which is used to estimate the disparity of the target image after sampling processing, and generate an estimated disparity map; the semantic refinement module is mainly composed of a semantic refinement network, which mainly includes a convolutional layer and a fully connected layer, which is used to estimate The disparity map is semantically refined to generate the target disparity map.
  • the stereo matching network mainly includes a feature extraction module, an image segmentation module, a disparity estimation module and a semantic refinement module, wherein the feature extraction module mainly It is composed of a residual network for
  • the feature extraction module includes a residual network and a pooling layer, and step S102 may include, but is not limited to, step S201 to step S202:
  • Step S201 performing convolution processing on the left view to obtain the convolution feature of the left view, and performing convolution processing on the right view to obtain the convolution feature of the right view;
  • Step S202 according to the preset multi-scale feature resolution parameters, perform pyramid pooling processing on the left view convolution features to obtain multiple left view features, and perform pyramid pooling on the right view convolution features according to the multi-scale feature resolution parameters Pooling processing to obtain multiple right-view features.
  • step S201 of some embodiments feature extraction is performed on the left view and the right view through the pre-residual network of the feature extraction module in the stereo matching network, specifically, the residual network is composed of a plurality of residual dense blocks, The left view and the right view are respectively convoluted through the convolution layers of different residual dense blocks, and the left view convolution feature and the right view convolution feature are obtained.
  • the left view convolution feature and the right view convolution feature are input to the pooling layer of the feature extraction module, and the left view convolution feature is respectively used by the multi-scale feature resolution parameters of the pooling layer ,
  • Right view convolution features are processed by pyramid pooling, and the multi-scale features of the left view and the multi-scale features of the right view can be obtained through the pyramid pooling process.
  • pyramid pooling is performed on the left view convolution features, so that the resolutions of the obtained left view features are 1/4 and 1/3 of the original left view resolution respectively. 8 and 1/16;
  • the pyramid pooling process is performed on the right view convolution features, so that the resolution of the obtained multiple right view features is 1/1 of the original right view resolution. 4, 1/8 and 1/16.
  • This method can fully combine view feature information at different scales. Under multiple scales, low-level features can have high resolution, and high-level features can contain richer semantic information, thereby improving the accuracy of view estimation.
  • the image segmentation module includes a decoding layer and a convolutional layer
  • step S103 may include but is not limited to include steps S301 to S303:
  • Step S301 perform up-sampling processing on the left-view features through the preset bilinear peak interpolation method to obtain the first-view feature hidden variables;
  • Step S302 performing feature sorting on the feature latent variables of the first view through the preset first function to obtain the feature sequence of the first view;
  • Step S303 performing convolution processing on the first view feature sequence to obtain the first image feature.
  • the preset bilinear peak interpolation method mainly uses the pixel values of four adjacent points, assigns different weights according to their distances from the interpolation point, and performs linear interpolation through the bilinear peak interpolation method.
  • the linear peak interpolation method can upsample the left view features, and upsample the left view features of different scales to a quarter of the original resolution through bilinear interpolation. This method can achieve the average of the left view Low-pass filtering smoothes the edge of the left view, thereby producing a relatively coherent output image, and can also improve the accuracy of the first view feature hidden variable.
  • the first function is a concat function
  • the first view feature sequence is obtained by sequentially connecting the first view feature latent variables through the concat function.
  • step S303 of some embodiments convolution processing is performed on the feature sequence of the first view through a convolution layer to obtain multiple first image features of different scales.
  • step S104 may include but not limited to include steps S401 to S402:
  • Step S401 according to the preset multi-scale feature resolution parameters, classify and combine the left-view features and right-view features to obtain an initial cost body;
  • step S402 the initial cost volume and the first image feature are concatenated through the preset three-dimensional convolutional network to obtain the target cost volume.
  • the preset multi-scale feature resolution parameters may be 4, 8, 6, etc., which may be set according to actual conditions, but are not limited thereto.
  • the different resolution parameters of multi-scale features multiple left-view features and multiple right-view features are classified and combined.
  • the left-view features and right-view features with multi-scale feature resolution parameters of 4 are added vectorially. , to obtain the view features with a multi-scale feature resolution parameter of 4.
  • the cost volume is a low-cost resolution cost volume constructed on different scales, which refers to the intermediate result obtained in the process of image stitching.
  • the input of the stereo matching network is usually two images, namely the left view and the right view.
  • the stereo matching network A maximum parallax will be initialized. For example, if the maximum parallax is 5, five stitching operations of different scales will be performed on the left view and the right view. The parallax values corresponding to these five stitching operations are equal to 0, 1, 2, 3, 4.
  • the left view and the right view are directly spliced; when the disparity value is 1, the left view and the right view are misaligned by 1 pixel;
  • the dislocation of 2 pixels with the right view is spliced, when the parallax value is 3, the dislocation of 3 pixels between the left view and the right view is spliced; when the disparity value is 4, the dislocation of 4 pixels between the left view and the right view is performed stitching.
  • the tensor size of the original left view and right view is W*H*3, where W refers to the image width, H is the image height, and 3 is the number of channels.
  • the tensor size of the left view and the right view is three-dimensional, and splicing
  • the tensor size of the obtained target view is W*H*3*5
  • the target view is the cost body
  • the tensor size of the target view is four-dimensional.
  • the input images are spliced at different scales, and the intermediate product obtained is the cost body. Further, the cost body is input to the stereo matching network for each pixel point matching, and the fused cost body can be obtained.
  • the maximum parallax parameter in the tensor size of the cost body can be removed through the stereo matching network (for example, the maximum parallax value 5 ), so that the tensor size of the output image is still three-dimensional, that is, the tensor size of the output image is W*H*3.
  • step S402 may also include but not limited to include steps S501 to S503:
  • Step S501 regularize the initial cost body through the three-dimensional convolution network to obtain the first intermediate cost body, and perform regularization processing on the first image features through the three-dimensional convolution network to obtain the first intermediate image features;
  • Step S502 performing down-sampling processing on the first intermediate cost body through a three-dimensional convolutional network to obtain a second intermediate cost body, and performing up-sampling processing on the first intermediate image features to obtain second intermediate image features;
  • Step S503 splicing the second intermediate cost volume and the second intermediate image features through the three-dimensional convolutional network to obtain the target cost volume.
  • the initial cost body is regularized through the three-dimensional convolution of the three-dimensional convolutional network to obtain the first intermediate cost body A, including a feature resolution of The first intermediate cost body A1, the feature resolution is The first intermediate cost body A2, the feature resolution is The first intermediate cost body A3 of , and adjust the number of channels of the first image feature whose multi-scale feature resolution parameter is 16 to And the first semantic feature is upgraded from two-dimensional to four-dimensional, and the four-dimensional first image feature is regularized through the three-dimensional convolution of the three-dimensional convolutional network, and the feature resolution is obtained as The first intermediate image feature B3, through the same operation, the feature resolution is obtained as The first intermediate image feature B2 of the feature resolution is The first intermediate image feature B1 of .
  • the feature resolution is The first intermediate cost body A1 of is down-sampled, and the first intermediate cost body A1 after the down-sampling process and the feature resolution are The first intermediate cost body A2 is connected to obtain the second intermediate cost body, and the number of channels is adjusted through three-dimensional convolution. Then, the second intermediate cost body is processed by using the three-dimensional convolution with a step size of 2 in the three-dimensional convolutional network. Sampling processing, the second intermediate cost body and feature resolution after downsampling processing are The first intermediate cost body A3 is connected to obtain the target view cost body.
  • the feature resolution is The first intermediate image feature B3 with feature resolution is The first intermediate cost body A3 is connected, and the feature resolution is obtained as The second intermediate image features of , and through three-dimensional deconvolution, the feature resolution is The second intermediate image feature C3 is upsampled, and the feature resolution is obtained as The second intermediate image feature C2 of , the feature resolution is adjusted by three-dimensional convolution as The number of channels of the second intermediate image feature C2.
  • the feature resolution is The first intermediate image feature B2 with feature resolution is The first intermediate cost body A2 is connected, and the feature resolution is obtained as The second intermediate image feature C1 of , will feature resolution as The first intermediate image feature B1 with feature resolution is The first intermediate cost body A1 is connected, and the feature resolution is obtained as The second intermediate image feature C1.
  • the feature resolution is The second intermediate image feature C1 and the target view cost volume are concatenated to obtain the target cost volume.
  • the three-dimensional convolutional hourglass model includes an aggregation layer and a prediction layer.
  • Step S105 includes but is not limited to steps S601 to S602:
  • Step S601 performing cost aggregation processing on the target cost body through the aggregation layer to obtain the fusion cost body;
  • Step S602 performing disparity estimation on the fused cost volume through the second function of the prediction layer to obtain an estimated disparity map.
  • the three-dimensional convolutional hourglass model includes two stacked aggregation layers, the aggregation layer structure is the same as the structure of the above-mentioned three-dimensional convolutional network, and the target cost body is respectively input into the two aggregation layers, The target cost body is aggregated through each aggregation layer, and then the outputs of the two aggregation layers are fused to obtain the final fused cost body.
  • the second function is a soft argmin function
  • the soft argmin function can be used to perform more accurate disparity estimation on the fusion cost volume obtained through aggregation, and obtain an estimated disparity map.
  • the semantic refinement network includes a convolutional layer and a fully connected layer
  • step S106 may also include, but is not limited to, step S701 to step S704:
  • Step S701 performing probability calculation on the first image feature through the third function of the semantic refinement network to generate a semantic probability map
  • Step S702 performing convolution processing on the estimated disparity map through the semantic refinement network to obtain the estimated disparity feature
  • Step S703 performing fusion processing on the semantic probability map and the estimated disparity feature through the semantic refinement network to obtain the preliminary disparity feature;
  • step S704 the preliminary disparity feature is decoded through the semantic refinement network to obtain the target disparity map.
  • a third function is preset on the fully connected layer of the semantic refinement network, the third function is a softmax function, and the probability calculation of the first image feature is performed through the softmax function, and according to the calculation result, the softmax function A probability distribution is created on the preset semantic category labels, and the semantic possibility of the first image feature on different semantic category labels is reflected through the semantic probability map.
  • step S702 of some embodiments two-dimensional convolution processing is performed on the estimated disparity map through the convolution layer of the semantic refinement network to capture image features of the estimated disparity map to obtain estimated disparity features.
  • step S703 of some embodiments through the convolution layer of the semantic refinement network, the semantic probability map and the estimated disparity feature are vector multiplied according to the preset weight ratio, so as to realize the feature fusion of the semantic feature and the estimated disparity feature, Get preliminary disparity features that get semantically weighted.
  • step S704 of some embodiments convolutional decoding and deconvolution upsampling are performed on the preliminary disparity feature through the convolutional layer of the semantic refinement network to obtain a target disparity map, which is used to reflect the resolution of the target image rate parallax.
  • the parallax map generation method uses the image segmentation results to weight the semantic categories of the parallax estimation results, and then encodes and decodes, so as to improve the scene semantic reliability of the estimated parallax and enhance the understanding of the scene for the stereo matching task.
  • using the semantic information of the scene can improve the effect of disparity estimation in inappropriate regions, thereby improving the accuracy of disparity estimation and reducing the error of the disparity map.
  • a target image is acquired, where the target image includes a left view and a right view. Furthermore, performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features can make the obtained left view features and right view features more in line with the requirements of disparity estimation. Furthermore, image segmentation is performed on the left view feature to obtain the first image feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost volume, and the preset three-dimensional convolutional hourglass model is used to The target cost volume performs disparity estimation to obtain an estimated disparity map. In this way, semantic information can be used to assist disparity estimation and improve the reliability of disparity estimation. Finally, the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the disparity Figure error.
  • the embodiment of the present application also provides a disparity map generation device, which can implement the above disparity map generation method, and the device includes:
  • An image acquisition module 801 configured to acquire a target image, wherein the target image includes a left view and a right view of the target object;
  • the feature extraction module 802 is used to perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features;
  • the image segmentation module 803 is configured to perform image segmentation processing on the left view feature to obtain the first image feature;
  • the fusion module 804 is used to perform fusion processing on the left view feature, the first image feature and the right view feature to obtain the target cost body;
  • a disparity estimation module 805, configured to perform disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
  • the semantic refinement module 806 is configured to perform semantic refinement processing on the estimated disparity map through the preset semantic refinement network and the first image features, to obtain the target disparity map.
  • the specific implementation manner of the disparity map generation device is basically the same as the specific embodiment of the above-mentioned disparity map generation method, and will not be repeated here.
  • the embodiment of the present application also provides an electronic device, the electronic device includes: a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory , when the program is executed by the processor, the above disparity map generation method is implemented.
  • the electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.
  • FIG. 9 illustrates a hardware structure of an electronic device in another embodiment.
  • the electronic device includes:
  • the processor 901 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs, so as to realize The technical solutions provided by the embodiments of the present application;
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs, so as to realize The technical solutions provided by the embodiments of the present application;
  • ASIC Application Specific Integrated Circuit
  • the memory 902 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM).
  • the memory 902 can store operating systems and other application programs.
  • the relevant program codes are stored in the memory 902 and called by the processor 901 to execute a parallax A graph generation method, wherein the disparity map generation method includes: acquiring a target image, wherein the target image includes a left view and a right view; performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain A plurality of right-view features; perform image segmentation processing on the left-view features to obtain the first image features; combine the left-view features, the first image features and the right-view features to obtain the target cost body; through the preset three-dimensional convolution
  • the hourglass model estimates the disparity of the target cost body to obtain the estimated disparity map; through the preset semantic refinement network and the first image feature, the estimated disparity map is semantically refined to obtain the target disparity map;
  • the input/output interface 903 is used to realize information input and output
  • the communication interface 904 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);
  • bus 905 for transferring information between various components of the device (such as processor 901, memory 902, input/output interface 903 and communication interface 904);
  • the processor 901 , the memory 902 , the input/output interface 903 and the communication interface 904 are connected to each other within the device through the bus 905 .
  • An embodiment of the present application also provides a storage medium, which is a computer-readable storage medium for computer-readable storage.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement a method for generating a disparity map, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target image Including the left view and the right view; performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features; performing image segmentation processing on the left view features to obtain the first image feature ;
  • the left view feature, the first image feature and the right view feature are combined to obtain the target cost volume;
  • the disparity estimation is performed on the target cost volume through the preset three-dimensional convolution hourglass model, and the estimated disparity map is obtained; through the preset semantics
  • the refinement network and the first image features perform semantic refinement on the estimated dispar
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the disparity map generation method, disparity map generation device, electronic device, and storage medium provided in the embodiments of the present application obtain a target image, wherein the target image includes a left view and a right view. Furthermore, performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features can make the obtained left view features and right view features more in line with the requirements of disparity estimation. Furthermore, image segmentation is performed on the left view feature to obtain the first image feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost volume, and the preset three-dimensional convolutional hourglass model is used to The target cost volume performs disparity estimation to obtain an estimated disparity map.
  • the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can improve the scene semantic reliability of the estimated disparity and enhance the understanding of the scene for the stereo matching task.
  • the semantic information of the scene can improve the effect of disparity estimation in inappropriate regions, thereby improving the accuracy of disparity estimation and reducing the error of disparity maps.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc., which can store programs. medium.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or optical disc etc., which can store programs. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence, and provide a disparity map generation method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a target image, the target image comprising a left view and a right view; performing feature extraction on the left view to obtain a plurality of left view features, and performing feature extraction on the right view to obtain a plurality of right view features; performing image segmentation processing on the left view features to obtain a first image feature; performing combination processing on the left view features, the first image feature, and the right view features to obtain a target cost volume; performing disparity estimation on the target cost volume by means of a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map; and performing semantic refinement processing on the estimated disparity map by means of a preset semantic refinement network and the first image feature to obtain a target disparity map. According to the embodiments of the present application, the accuracy of disparity estimation can be improved, and the error of the target disparity map is reduced.

Description

视差图生成方法和装置、电子设备及存储介质Disparity map generation method and device, electronic device and storage medium
本申请要求于2022年2月22日提交中国专利局、申请号为202210162805.0,发明名称为“视差图生成方法和装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210162805.0 filed on February 22, 2022, and the title of the invention is "Disparity map generation method and device, electronic equipment and storage medium", the entire content of which is incorporated by reference incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种视差图生成方法和装置、电子设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular to a method and device for generating a disparity map, electronic equipment, and a storage medium.
背景技术Background technique
视差估计是旨在预测目标场景中每个点的距离测量情况的基本计算机视觉问题。Disparity estimation is a fundamental computer vision problem that aims to predict distance measurements for each point in a target scene.
技术问题technical problem
以下是发明人意识到的现有技术的技术问题:The following are the technical problems of the prior art that the inventors are aware of:
目前的立体匹配算法在进行视差估计时,通常在弱纹理、重复纹理和遮挡等不适定区域遇到困难,往往无法准确地对目标对象进行视差估计,使得生成的视差图误差较大。因此,如何提高视差估计的准确性,减小视差图的误差,成为了亟待解决的技术问题。The current stereo matching algorithm usually encounters difficulties in ill-posed regions such as weak textures, repeated textures, and occlusions when performing disparity estimation, and often cannot accurately estimate the disparity of the target object, resulting in a large error in the generated disparity map. Therefore, how to improve the accuracy of the disparity estimation and reduce the error of the disparity map has become an urgent technical problem to be solved.
技术解决方案technical solution
第一方面,本申请实施例提出了一种视差图生成方法,所述方法包括:In the first aspect, the embodiment of the present application proposes a method for generating a disparity map, the method including:
获取目标图像,其中,所述目标图像包括左视图和右视图;Acquiring a target image, wherein the target image includes a left view and a right view;
对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
对所述左视图特征进行图像分割处理,得到第一图像特征;performing image segmentation processing on the left view feature to obtain a first image feature;
对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;combining the left view feature, the first image feature and the right view feature to obtain a target cost body;
通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;Performing disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
第二方面,本申请实施例提出了一种视差图生成装置,所述装置包括:In the second aspect, the embodiment of the present application proposes a device for generating a disparity map, and the device includes:
图像获取模块,用于获取目标图像,其中,所述目标图像包括目标对象的左视图和右视图;An image acquisition module, configured to acquire a target image, wherein the target image includes a left view and a right view of the target object;
特征提取模块,用于对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;A feature extraction module, configured to perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features;
图像分割模块,用于对所述左视图特征进行图像分割处理,得到第一图像特征;An image segmentation module, configured to perform image segmentation processing on the left view feature to obtain the first image feature;
融合模块,用于对所述左视图特征、所述第一图像特征以及所述右视图特征进行融合处理,得到目标代价体;A fusion module, configured to perform fusion processing on the left view feature, the first image feature, and the right view feature to obtain a target cost volume;
视差估计模块,用于通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;A disparity estimation module, configured to perform disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
语义细化模块,用于通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。The semantic refinement module is configured to perform semantic refinement processing on the estimated disparity map through a preset semantic refinement network and the first image feature, to obtain a target disparity map.
第三方面,本申请实施例提出了一种电子设备,所述电子设备包括存储器、处理器、存储在所述存储器上并可在所述处理器上运行的程序以及用于实现所述处理器和所述存储器之间的连接通信的数据总线,所述程序被所述处理器执行时实现一种视差图生成方法,其中, 所述视差图生成方法包括:获取目标图像,其中,所述目标图像包括左视图和右视图;对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;对所述左视图特征进行图像分割处理,得到第一图像特征;对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。In the third aspect, the embodiment of the present application provides an electronic device, the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a program for implementing the processor A data bus connecting and communicating with the memory, when the program is executed by the processor, a method for generating a disparity map is implemented, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target The image includes a left view and a right view; feature extraction is performed on the left view to obtain multiple left view features, and feature extraction is performed on the right view to obtain multiple right view features; image segmentation is performed on the left view features processing to obtain the first image feature; combining the left view feature, the first image feature and the right view feature to obtain the target cost volume; Perform disparity estimation on the volume to obtain an estimated disparity map; perform semantic refinement processing on the estimated disparity map through a preset semantic refinement network and the first image feature to obtain a target disparity map.
第四方面,本申请实施例提出了一种存储介质,所述存储介质为计算机可读存储介质,用于计算机可读存储,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现一种视差图生成方法,其中,所述视差图生成方法包括:获取目标图像,其中,所述目标图像包括左视图和右视图;对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;对所述左视图特征进行图像分割处理,得到第一图像特征;对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。In a fourth aspect, the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and the one or more This program can be executed by one or more processors to implement a method for generating a disparity map, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target image includes a left view and a right view; Perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features; perform image segmentation processing on the left view features to obtain first image features; Combining the left view feature, the first image feature and the right view feature to obtain a target cost volume; performing disparity estimation on the target cost volume through a preset three-dimensional convolution hourglass model to obtain an estimated disparity map; Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
有益效果Beneficial effect
本申请提出的视差图生成方法和装置、电子设备及存储介质能够使得得到的左视图特征和右视图特征更加符合视差估计的需求。通过预设的三维卷积沙漏模型对目标代价体进行视差估计,能够利用语义信息辅助视差估计,提高视差估计的可靠性。通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,能够增强立体匹配任务对场景的理解,提高视差估计的准确性,减小视差图的误差。The disparity map generation method and device, electronic equipment and storage medium proposed in this application can make the obtained left view features and right view features more meet the requirements of disparity estimation. By using the preset three-dimensional convolutional hourglass model to estimate the disparity of the target cost volume, the semantic information can be used to assist the disparity estimation and improve the reliability of the disparity estimation. Semantic refinement of the estimated disparity map through the preset semantic refinement network and the first image features can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the error of the disparity map.
附图说明Description of drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
图1是本申请实施例提供的视差图生成方法的流程图;FIG. 1 is a flow chart of a method for generating a disparity map provided in an embodiment of the present application;
图2是图1中的步骤S102的流程图;Fig. 2 is the flowchart of step S102 in Fig. 1;
图3是图1中的步骤S103的流程图;Fig. 3 is the flowchart of step S103 in Fig. 1;
图4是图1中的步骤S104的流程图;Fig. 4 is the flowchart of step S104 in Fig. 1;
图5是图4中的步骤S402的流程图;Fig. 5 is the flowchart of step S402 in Fig. 4;
图6是图1中的步骤S105的流程图;Fig. 6 is the flowchart of step S105 in Fig. 1;
图7是图1中的步骤S106的流程图;Fig. 7 is the flowchart of step S106 in Fig. 1;
图8是本申请实施例提供的视差图生成装置的结构示意图;FIG. 8 is a schematic structural diagram of a disparity map generation device provided by an embodiment of the present application;
图9是本申请实施例提供的电子设备的硬件结构示意图。FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在 限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiment of the application, and are not intended to limit the application.
目前,立体匹配算法在进行视差估计时,通常在弱纹理、重复纹理和遮挡等不适定区域遇到困难,往往无法准确地对目标对象进行视差估计。因此,如何提高视差估计的准确性,成为了亟待解决的技术问题。At present, stereo matching algorithms usually encounter difficulties in ill-posed regions such as weak textures, repeated textures, and occlusions when performing disparity estimation, and often cannot accurately perform disparity estimation on target objects. Therefore, how to improve the accuracy of disparity estimation has become an urgent technical problem to be solved.
基于此,本申请实施例提供了一种视差图生成方法和装置、电子设备及存储介质,旨在提高视差估计的准确性,减小视差图的误差。Based on this, embodiments of the present application provide a method and device for generating a disparity map, an electronic device, and a storage medium, aiming at improving the accuracy of disparity estimation and reducing errors of the disparity map.
本申请实施例提供的视差图生成方法和装置、电子设备及存储介质,具体通过如下实施例进行说明,首先描述本申请实施例中的视差图生成方法。The disparity map generation method and device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the method for generating a disparity map in the embodiment of the present application is described.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请实施例提供的视差图生成方法,涉及人工智能技术领域。本申请实施例提供的视差图生成方法可应用于终端中,也可应用于服务器端中,还可以是运行于终端或服务器端中的软件。在一些实施例中,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机等;服务器端可以配置成独立的物理服务器,也可以配置成多个物理服务器构成的服务器集群或者分布式系统,还可以配置成提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN以及大数据和人工智能平台等基础云计算服务的云服务器;软件可以是实现视差图生成方法的应用等,但并不局限于以上形式。The disparity map generation method provided in the embodiment of the present application relates to the technical field of artificial intelligence. The method for generating a disparity map provided in the embodiment of the present application may be applied to a terminal, may also be applied to a server, and may also be software running on the terminal or the server. In some embodiments, the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc.; the server end can be configured as an independent physical server, or can be configured as a server cluster or a distributed system composed of multiple physical servers, or It can be configured as a cloud that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The server; the software may be an application for realizing the disparity map generation method, etc., but is not limited to the above forms.
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
图1是本申请实施例提供的视差图生成方法的一个可选的流程图,图1中的方法可以包括但不限于包括步骤S101至步骤S106。Fig. 1 is an optional flow chart of a method for generating a disparity map provided by an embodiment of the present application. The method in Fig. 1 may include, but is not limited to, step S101 to step S106.
步骤S101,获取目标图像,其中,目标图像包括左视图和右视图;Step S101, acquiring a target image, wherein the target image includes a left view and a right view;
步骤S102,对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征;Step S102, performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
步骤S103,对左视图特征进行图像分割处理,得到第一图像特征;Step S103, performing image segmentation processing on the left view feature to obtain the first image feature;
步骤S104,对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体;Step S104, combining the left view feature, the first image feature and the right view feature to obtain the target cost body;
步骤S105,通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图;Step S105, performing disparity estimation on the target cost volume through the preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
步骤S106,通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图。Step S106, perform semantic refinement processing on the estimated disparity map through the preset semantic refinement network and the first image features, to obtain the target disparity map.
本申请实施例的步骤S101至步骤S106,通过对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征,能够使得得到的左视图特征和右视图特征更加符合视差估计的需求。通过对左视图特征进行图像分割处理,得到第一图像特征,并对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体,通过预设 的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图,能够利用语义信息辅助视差估计,提高视差估计的可靠性。最后,通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图,能够增强立体匹配任务对场景的理解,提高视差估计的准确性,减小生成的目标视差图的误差。在一些实施例的步骤S101中,目标图像可以为二维图像也可以是三维图像;在一些实施例,该目标图像可以是通过计算机断层扫描(Computed Tomo-graphy,-CT),在另一项实施例,该目标图像还可以是通过核磁共振成像(Magnetic Resonance Imaging,MRI)得来,在一些其他实施例中,该目标图像还可以通过双目摄像机等等拍摄得到,不限于此。左视图和右视图为通过双目摄像头拍摄到的左右两幅视图。From step S101 to step S106 of the embodiment of the present application, by performing feature extraction on the left view, multiple left view features are obtained, and by performing feature extraction on the right view, multiple right view features are obtained, which can make the obtained left view features and right View features are more in line with the needs of disparity estimation. The first image feature is obtained by performing image segmentation processing on the left view feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost body, and the target is calculated by the preset three-dimensional convolutional hourglass model. The cost body performs disparity estimation to obtain an estimated disparity map, which can use semantic information to assist disparity estimation and improve the reliability of disparity estimation. Finally, the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the generated The error of the target disparity map of . In step S101 of some embodiments, the target image can be a two-dimensional image or a three-dimensional image; in some embodiments, the target image can be obtained by computer tomography (Computed Tomo-graphy, -CT), in another item In an embodiment, the target image can also be obtained by magnetic resonance imaging (Magnetic Resonance Imaging, MRI). In some other embodiments, the target image can also be obtained by shooting with a binocular camera, etc., but is not limited thereto. The left view and the right view are left and right views captured by a binocular camera.
在一些实施例的步骤S102之前,视差图生成方法还包括预构建立体匹配网络,该立体匹配网络主要包括特征提取模块、图像分割模块、视差估计模块以及语义细化模块,其中,特征提取模块主要由残差网络构成,用于对输入的目标图像进行特征提取;图像分割模块主要由PSPNet解码网络构成,用于对经过特征提取的目标图像进行采样处理;视差估计模块主要由三维卷积网络构成,用于对采样处理之后的目标图像进行视差估计,生成估计视差图;语义细化模块主要由语义细化网络构成,该语言细化网络主要包括卷积层和全连接层,用于对估计视差图进行语义细化处理,生成目标视差图。Before step S102 in some embodiments, the disparity map generation method further includes pre-constructing a stereo matching network, the stereo matching network mainly includes a feature extraction module, an image segmentation module, a disparity estimation module and a semantic refinement module, wherein the feature extraction module mainly It is composed of a residual network for feature extraction of the input target image; the image segmentation module is mainly composed of a PSPNet decoding network for sampling the target image after feature extraction; the disparity estimation module is mainly composed of a three-dimensional convolutional network , which is used to estimate the disparity of the target image after sampling processing, and generate an estimated disparity map; the semantic refinement module is mainly composed of a semantic refinement network, which mainly includes a convolutional layer and a fully connected layer, which is used to estimate The disparity map is semantically refined to generate the target disparity map.
请参阅图2,在一些实施例中,特征提取模块包括残差网络和池化层,步骤S102可以包括但不限于包括步骤S201至步骤S202:Referring to FIG. 2, in some embodiments, the feature extraction module includes a residual network and a pooling layer, and step S102 may include, but is not limited to, step S201 to step S202:
步骤S201,对左视图进行卷积处理,得到左视图卷积特征,并对右视图进行卷积处理,得到右视图卷积特征;Step S201, performing convolution processing on the left view to obtain the convolution feature of the left view, and performing convolution processing on the right view to obtain the convolution feature of the right view;
步骤S202,根据预设的多尺度特征分辨率参数,对左视图卷积特征进行金字塔池化处理,得到多个左视图特征,并根据多尺度特征分辨率参数,对右视图卷积特征进行金字塔池化处理,得到多个右视图特征。Step S202, according to the preset multi-scale feature resolution parameters, perform pyramid pooling processing on the left view convolution features to obtain multiple left view features, and perform pyramid pooling on the right view convolution features according to the multi-scale feature resolution parameters Pooling processing to obtain multiple right-view features.
在一些实施例的步骤S201中,通过上述立体匹配网络中特征提取模块的预残差网络分别对左视图和右视图进行特征提取,具体地,该残差网络由多个残差密集块构成,通过不同残差密集块的卷积层对左视图和右视图分别进行卷积处理,得到左视图卷积特征和右视图卷积特征。In step S201 of some embodiments, feature extraction is performed on the left view and the right view through the pre-residual network of the feature extraction module in the stereo matching network, specifically, the residual network is composed of a plurality of residual dense blocks, The left view and the right view are respectively convoluted through the convolution layers of different residual dense blocks, and the left view convolution feature and the right view convolution feature are obtained.
在一些实施例的步骤S202中,将左视图卷积特征和右视图卷积特征输入至特征提取模块的池化层,通过池化层的多尺度特征分辨率参数,分别对左视图卷积特征、右视图卷积特征进行金字塔池化处理,通过金字塔池化处理可以获取到左视图的多尺度特征以及右视图的多尺度特征。In step S202 of some embodiments, the left view convolution feature and the right view convolution feature are input to the pooling layer of the feature extraction module, and the left view convolution feature is respectively used by the multi-scale feature resolution parameters of the pooling layer , Right view convolution features are processed by pyramid pooling, and the multi-scale features of the left view and the multi-scale features of the right view can be obtained through the pyramid pooling process.
例如,根据预设的多尺度特征分辨率参数,对左视图卷积特征进行金字塔池化处理,使得得到的多个左视图特征的分辨率分别为原始左视图分辨率的1/4、1/8和1/16;根据预设的多尺度特征分辨率参数,对右视图卷积特征进行金字塔池化处理,使得得到的多个右视图特征的分辨率分别为原始右视图分辨率的1/4、1/8和1/16。这一方式能够充分结合不同尺度下的视图特征信息,在多尺度下,低层特征能够具备高分辨率,高层特征能够包含的语义信息更为丰富,从而提高视图估计的准确性。For example, according to the preset multi-scale feature resolution parameters, pyramid pooling is performed on the left view convolution features, so that the resolutions of the obtained left view features are 1/4 and 1/3 of the original left view resolution respectively. 8 and 1/16; According to the preset multi-scale feature resolution parameters, the pyramid pooling process is performed on the right view convolution features, so that the resolution of the obtained multiple right view features is 1/1 of the original right view resolution. 4, 1/8 and 1/16. This method can fully combine view feature information at different scales. Under multiple scales, low-level features can have high resolution, and high-level features can contain richer semantic information, thereby improving the accuracy of view estimation.
请参阅图3,在一些实施例中,图像分割模块包括解码层、卷积层,步骤S103可以包括但不限于包括步骤S301至步骤S303:Please refer to FIG. 3 , in some embodiments, the image segmentation module includes a decoding layer and a convolutional layer, and step S103 may include but is not limited to include steps S301 to S303:
步骤S301,通过预设的双线性峰插法对左视图特征进行上采样处理,得到第一视图特征隐变量;Step S301, perform up-sampling processing on the left-view features through the preset bilinear peak interpolation method to obtain the first-view feature hidden variables;
步骤S302,通过预设的第一函数对第一视图特征隐变量进行特征排序,得到第一视图特征序列;Step S302, performing feature sorting on the feature latent variables of the first view through the preset first function to obtain the feature sequence of the first view;
步骤S303,对第一视图特征序列进行卷积处理,得到第一图像特征。Step S303, performing convolution processing on the first view feature sequence to obtain the first image feature.
在一些实施例的步骤S301中,预设的双线性峰插法主要是使用邻近4个点的像元值,按照其距内插点的距离赋予不同的权重,进行线性内插,通过双线性峰插法能够对左视图特征进 行上采样处理,将不同尺度的左视图特征通过双线性插值上采样到原始分辨率的四分之一,这一方式能够实现对左视图的平均化低通滤波,使左视图的边缘受到平滑作用,从而产生一个比较连贯的输出图像,也能够提高第一视图特征隐变量的准确性。In step S301 of some embodiments, the preset bilinear peak interpolation method mainly uses the pixel values of four adjacent points, assigns different weights according to their distances from the interpolation point, and performs linear interpolation through the bilinear peak interpolation method. The linear peak interpolation method can upsample the left view features, and upsample the left view features of different scales to a quarter of the original resolution through bilinear interpolation. This method can achieve the average of the left view Low-pass filtering smoothes the edge of the left view, thereby producing a relatively coherent output image, and can also improve the accuracy of the first view feature hidden variable.
例如,在进行双线峰插值计算时,可以取左视图上的(x,y)点周围的4个邻点,在y方向(或x方向)内插两次,再在x方向(或y方向)内插一次,得到(x,y)点的值f(x,y)。设4个邻点分别为(i,j),(i,j+1),(i+1,j),(i+1,j+1),i代表左上角为原点的行数,j代表列数。设α=x-i,β=y-j,过(x,y)作直线与x轴平行,与4个邻点组成的边相交于点(i,y)和点(i+1,y)。先在y方向内插,计算交点的值f(i,y)和f(i+1,y)。f(i,y)即由f(i,j+1)与f(i,j)内插计算而来。For example, when performing bilinear peak interpolation calculations, you can take 4 adjacent points around the (x, y) point on the left view, interpolate twice in the y direction (or x direction), and then in the x direction (or y direction) direction) to interpolate once to obtain the value f(x, y) of point (x, y). Let the 4 adjacent points be (i, j), (i, j+1), (i+1, j), (i+1, j+1), i represents the number of rows with the origin in the upper left corner, and j Represents the number of columns. Assume α=x-i, β=y-j, draw a straight line through (x, y) parallel to the x-axis, intersect with the side formed by 4 adjacent points at point (i, y) and point (i+1, y). Interpolate in the y direction first, and calculate the intersection values f(i, y) and f(i+1, y). f(i,y) is calculated by interpolating f(i,j+1) and f(i,j).
在一些实施例的步骤S302中,第一函数为concat函数,通过concat函数对第一视图特征隐变量进行序列连接,得到第一视图特征序列。In step S302 of some embodiments, the first function is a concat function, and the first view feature sequence is obtained by sequentially connecting the first view feature latent variables through the concat function.
在一些实施例的步骤S303中,通过卷积层对第一视图特征序列进行卷积处理,获得多个不同尺度的第一图像特征。In step S303 of some embodiments, convolution processing is performed on the feature sequence of the first view through a convolution layer to obtain multiple first image features of different scales.
请参阅图4,在一些实施例中,步骤S104可以包括但不限于包括步骤S401至步骤S402:Referring to FIG. 4, in some embodiments, step S104 may include but not limited to include steps S401 to S402:
步骤S401,根据预设的多尺度特征分辨率参数,对左视图特征和右视图特征进行分类组合处理,得到初始代价体;Step S401, according to the preset multi-scale feature resolution parameters, classify and combine the left-view features and right-view features to obtain an initial cost body;
步骤S402,通过预设的三维卷积网络对初始代价体、第一图像特征进行拼接处理,得到目标代价体。In step S402, the initial cost volume and the first image feature are concatenated through the preset three-dimensional convolutional network to obtain the target cost volume.
在一些实施例的步骤S401中,预设的多尺度特征分辨率参数可以为4,8,6等等,可以根据实际情况进行设定,不限于此。根据多尺度特征分辨率参数的不同,将多个左视图特征和多个右视图特征进行分类组合,例如,将多尺度特征分辨率参数均为4的左视图特征和右视图特征进行向量相加,得到多尺度特征分辨率参数为4的视图特征。其中,初始代价体的代价体尺寸可以表示为
Figure PCTCN2022090665-appb-000001
H和W为目标图像的图像尺寸,其中,H为目标图像的高度,W为目标图像的宽度,D为视差搜索范围,C为特征通道数,s为下采样率,s=4,8,6。
In step S401 of some embodiments, the preset multi-scale feature resolution parameters may be 4, 8, 6, etc., which may be set according to actual conditions, but are not limited thereto. According to the different resolution parameters of multi-scale features, multiple left-view features and multiple right-view features are classified and combined. For example, the left-view features and right-view features with multi-scale feature resolution parameters of 4 are added vectorially. , to obtain the view features with a multi-scale feature resolution parameter of 4. Among them, the cost body size of the initial cost body can be expressed as
Figure PCTCN2022090665-appb-000001
H and W are the image dimensions of the target image, where H is the height of the target image, W is the width of the target image, D is the disparity search range, C is the number of feature channels, s is the downsampling rate, s=4,8, 6.
需要说明的是,代价体是在不同尺度上构建的低成本分辨率成本体积,指图像拼接过程中得到的中间结果。具体地,由于是大多数的立体匹配过程是双目立体匹配,因此,立体匹配网络的输入通常是两张图,即左视图与右视图,对左视图与右视图进行拼接时,立体匹配网络会初始化设置一个最大视差,例如,最大视差为5,则对左视图与右视图进行五个不同尺度的拼接操作,这五种拼接操作对应的视差值分别等于0、1、2、3、4。当视差值为0时,则将左视图与右视图直接拼接;当视差值为1时,对左视图与右视图错位1个像素进行拼接;当视差值为2时,对左视图与右视图错位2个像素进行拼接,当视差值为3时,对左视图与右视图错位3个像素进行拼接,当视差值为4时,对左视图与右视图错位4个像素进行拼接。最初的左视图与右视图的张量尺寸是W*H*3,其中,W指图像宽度,H为图像高度,3为通道数,左视图与右视图的张量尺寸是三维的,而拼接得到的目标视图的张量尺寸则是W*H*3*5,该目标视图即为即为代价体,目标视图的张量尺寸是四维的。简言之,根据预设的最大视差,在不同尺度下对输入图像进行拼接处理,得到的中间产物,即为代价体。进一步地将代价体输入到立体匹配网络进行每一像素点匹配,可以得到融合的代价体,同时通过立体匹配网络可以去掉代价体的张量尺寸中的最大视差参数(例如,最大视差值5),使得输出的图像的张量尺寸仍是三维的,即输出图像的张量尺寸为W*H*3。It should be noted that the cost volume is a low-cost resolution cost volume constructed on different scales, which refers to the intermediate result obtained in the process of image stitching. Specifically, since most of the stereo matching process is binocular stereo matching, the input of the stereo matching network is usually two images, namely the left view and the right view. When splicing the left view and the right view, the stereo matching network A maximum parallax will be initialized. For example, if the maximum parallax is 5, five stitching operations of different scales will be performed on the left view and the right view. The parallax values corresponding to these five stitching operations are equal to 0, 1, 2, 3, 4. When the disparity value is 0, the left view and the right view are directly spliced; when the disparity value is 1, the left view and the right view are misaligned by 1 pixel; When the dislocation of 2 pixels with the right view is spliced, when the parallax value is 3, the dislocation of 3 pixels between the left view and the right view is spliced; when the disparity value is 4, the dislocation of 4 pixels between the left view and the right view is performed stitching. The tensor size of the original left view and right view is W*H*3, where W refers to the image width, H is the image height, and 3 is the number of channels. The tensor size of the left view and the right view is three-dimensional, and splicing The tensor size of the obtained target view is W*H*3*5, the target view is the cost body, and the tensor size of the target view is four-dimensional. In short, according to the preset maximum parallax, the input images are spliced at different scales, and the intermediate product obtained is the cost body. Further, the cost body is input to the stereo matching network for each pixel point matching, and the fused cost body can be obtained. At the same time, the maximum parallax parameter in the tensor size of the cost body can be removed through the stereo matching network (for example, the maximum parallax value 5 ), so that the tensor size of the output image is still three-dimensional, that is, the tensor size of the output image is W*H*3.
请参阅图5,在一些实施例中,步骤S402还可以包括但不限于包括步骤S501至步骤S503:Referring to FIG. 5, in some embodiments, step S402 may also include but not limited to include steps S501 to S503:
步骤S501,通过三维卷积网络对初始代价体进行正则化处理,得到第一中间代价体,并通过三维卷积网络对第一图像特征进行正则化处理,得到第一中间图像特征;Step S501, regularize the initial cost body through the three-dimensional convolution network to obtain the first intermediate cost body, and perform regularization processing on the first image features through the three-dimensional convolution network to obtain the first intermediate image features;
步骤S502,通过三维卷积网络对第一中间代价体进行下采样处理,得到第二中间代价体,并对第一中间图像特征进行上采样处理,得到第二中间图像特征;Step S502, performing down-sampling processing on the first intermediate cost body through a three-dimensional convolutional network to obtain a second intermediate cost body, and performing up-sampling processing on the first intermediate image features to obtain second intermediate image features;
步骤S503,通过三维卷积网络对第二中间代价体与第二中间图像特征进行拼接处理,得 到目标代价体。Step S503, splicing the second intermediate cost volume and the second intermediate image features through the three-dimensional convolutional network to obtain the target cost volume.
在一些实施例的步骤S501中,通过三维卷积网络的三维卷积对初始代价体进行正则化处理,得到第一中间代价体A,包括特征分辨率为
Figure PCTCN2022090665-appb-000002
的第一中间代价体A1,特征分辨率为
Figure PCTCN2022090665-appb-000003
的第一中间代价体A2,特征分辨率为
Figure PCTCN2022090665-appb-000004
的第一中间代价体A3,并通过二维卷积将多尺度特征分辨率参数为16的第一图像特征的通道数调整为
Figure PCTCN2022090665-appb-000005
并且将第一语义特征从二维升维到四维,通过三维卷积网络的三维卷积将四维的第一图像特征进行正则化处理,得到特征分辨率为
Figure PCTCN2022090665-appb-000006
的第一中间图像特征B3,通过相同的操作,得到特征分辨率为
Figure PCTCN2022090665-appb-000007
的第一中间图像特征B2,特征分辨率为
Figure PCTCN2022090665-appb-000008
的第一中间图像特征B1。
In step S501 of some embodiments, the initial cost body is regularized through the three-dimensional convolution of the three-dimensional convolutional network to obtain the first intermediate cost body A, including a feature resolution of
Figure PCTCN2022090665-appb-000002
The first intermediate cost body A1, the feature resolution is
Figure PCTCN2022090665-appb-000003
The first intermediate cost body A2, the feature resolution is
Figure PCTCN2022090665-appb-000004
The first intermediate cost body A3 of , and adjust the number of channels of the first image feature whose multi-scale feature resolution parameter is 16 to
Figure PCTCN2022090665-appb-000005
And the first semantic feature is upgraded from two-dimensional to four-dimensional, and the four-dimensional first image feature is regularized through the three-dimensional convolution of the three-dimensional convolutional network, and the feature resolution is obtained as
Figure PCTCN2022090665-appb-000006
The first intermediate image feature B3, through the same operation, the feature resolution is obtained as
Figure PCTCN2022090665-appb-000007
The first intermediate image feature B2 of the feature resolution is
Figure PCTCN2022090665-appb-000008
The first intermediate image feature B1 of .
在一些实施例的步骤S502和步骤S503中,利用三维卷积网络中步长为2的三维卷积将特征分辨率为
Figure PCTCN2022090665-appb-000009
的第一中间代价体A1进行下采样处理,将下采样处理之后的第一中间代价体A1与特征分辨率为
Figure PCTCN2022090665-appb-000010
的第一中间代价体A2连接,得到第二中间代价体,并通过三维卷积进行通道数调整,进而,利用三维卷积网络中步长为2的三维卷积将第二中间代价体进行下采样处理,将下采样处理之后第二中间代价体与特征分辨率为
Figure PCTCN2022090665-appb-000011
的第一中间代价体A3连接,得到目标视图代价体。
In step S502 and step S503 of some embodiments, the feature resolution is
Figure PCTCN2022090665-appb-000009
The first intermediate cost body A1 of is down-sampled, and the first intermediate cost body A1 after the down-sampling process and the feature resolution are
Figure PCTCN2022090665-appb-000010
The first intermediate cost body A2 is connected to obtain the second intermediate cost body, and the number of channels is adjusted through three-dimensional convolution. Then, the second intermediate cost body is processed by using the three-dimensional convolution with a step size of 2 in the three-dimensional convolutional network. Sampling processing, the second intermediate cost body and feature resolution after downsampling processing are
Figure PCTCN2022090665-appb-000011
The first intermediate cost body A3 is connected to obtain the target view cost body.
同样地,将特征分辨率为
Figure PCTCN2022090665-appb-000012
的第一中间图像特征B3与特征分辨率为
Figure PCTCN2022090665-appb-000013
的第一中间代价体A3连接,得到特征分辨率为
Figure PCTCN2022090665-appb-000014
的第二中间图像特征,并通过三维反卷积将特征分辨率为
Figure PCTCN2022090665-appb-000015
的第二中间图像特征C3进行上采样处理,得到特征分辨率为
Figure PCTCN2022090665-appb-000016
的第二中间图像特征C2,通过三维卷积调整特征分辨率为
Figure PCTCN2022090665-appb-000017
的第二中间图像特征C2的通道数。
Similarly, the feature resolution is
Figure PCTCN2022090665-appb-000012
The first intermediate image feature B3 with feature resolution is
Figure PCTCN2022090665-appb-000013
The first intermediate cost body A3 is connected, and the feature resolution is obtained as
Figure PCTCN2022090665-appb-000014
The second intermediate image features of , and through three-dimensional deconvolution, the feature resolution is
Figure PCTCN2022090665-appb-000015
The second intermediate image feature C3 is upsampled, and the feature resolution is obtained as
Figure PCTCN2022090665-appb-000016
The second intermediate image feature C2 of , the feature resolution is adjusted by three-dimensional convolution as
Figure PCTCN2022090665-appb-000017
The number of channels of the second intermediate image feature C2.
通过上述操作,将特征分辨率为
Figure PCTCN2022090665-appb-000018
的第一中间图像特征B2与特征分辨率为
Figure PCTCN2022090665-appb-000019
的第一中间代价体A2连接,得到特征分辨率为
Figure PCTCN2022090665-appb-000020
的第二中间图像特征C1,将将特征分辨率为
Figure PCTCN2022090665-appb-000021
的第一中间图像特征B1与特征分辨率为
Figure PCTCN2022090665-appb-000022
的第一中间代价体A1连接,得到特征分辨率为
Figure PCTCN2022090665-appb-000023
的第二中间图像特征C1。
Through the above operations, the feature resolution is
Figure PCTCN2022090665-appb-000018
The first intermediate image feature B2 with feature resolution is
Figure PCTCN2022090665-appb-000019
The first intermediate cost body A2 is connected, and the feature resolution is obtained as
Figure PCTCN2022090665-appb-000020
The second intermediate image feature C1 of , will feature resolution as
Figure PCTCN2022090665-appb-000021
The first intermediate image feature B1 with feature resolution is
Figure PCTCN2022090665-appb-000022
The first intermediate cost body A1 is connected, and the feature resolution is obtained as
Figure PCTCN2022090665-appb-000023
The second intermediate image feature C1.
最后,将特征分辨率为
Figure PCTCN2022090665-appb-000024
的第二中间图像特征C1和目标视图代价体进行拼接处理,得到目标代价体。
Finally, the feature resolution is
Figure PCTCN2022090665-appb-000024
The second intermediate image feature C1 and the target view cost volume are concatenated to obtain the target cost volume.
请参阅图6,在一些实施例,三维卷积沙漏模型包括聚合层、预测层,步骤S105包括但不限于包括步骤S601至步骤S602:Please refer to FIG. 6. In some embodiments, the three-dimensional convolutional hourglass model includes an aggregation layer and a prediction layer. Step S105 includes but is not limited to steps S601 to S602:
步骤S601,通过聚合层对目标代价体进行代价聚合处理,得到融合代价体;Step S601, performing cost aggregation processing on the target cost body through the aggregation layer to obtain the fusion cost body;
步骤S602,通过预测层的第二函数对融合代价体进行视差估计,得到估计视差图。Step S602, performing disparity estimation on the fused cost volume through the second function of the prediction layer to obtain an estimated disparity map.
在一些实施例的步骤S601中,三维卷积沙漏模型包括两个堆叠的聚合层,聚合层结构与上述的三维卷积网络的结构相同,将目标代价体分别输入至这两个聚合层中,通过每一聚合层对目标代价体进行代价聚合,再将两个聚合层的输出进行融合处理,得到最终的融合代价体。In step S601 of some embodiments, the three-dimensional convolutional hourglass model includes two stacked aggregation layers, the aggregation layer structure is the same as the structure of the above-mentioned three-dimensional convolutional network, and the target cost body is respectively input into the two aggregation layers, The target cost body is aggregated through each aggregation layer, and then the outputs of the two aggregation layers are fused to obtain the final fused cost body.
在一些实施例的步骤S602中,该第二函数为soft argmin函数,通过soft argmin函数能够对经过聚合得到的融合代价体进行较为准确地视差估计,得到估计视差图。In step S602 of some embodiments, the second function is a soft argmin function, and the soft argmin function can be used to perform more accurate disparity estimation on the fusion cost volume obtained through aggregation, and obtain an estimated disparity map.
请参阅图7,在一些实施例中,语义细化网络包括卷积层和全连接层,步骤S106还可以包括但不限于包括步骤S701至步骤S704:Please refer to FIG. 7 , in some embodiments, the semantic refinement network includes a convolutional layer and a fully connected layer, and step S106 may also include, but is not limited to, step S701 to step S704:
步骤S701,通过语义细化网络的第三函数对第一图像特征进行概率计算,生成语义概率图;Step S701, performing probability calculation on the first image feature through the third function of the semantic refinement network to generate a semantic probability map;
步骤S702,通过语义细化网络对估计视差图进行卷积处理,得到估计视差特征;Step S702, performing convolution processing on the estimated disparity map through the semantic refinement network to obtain the estimated disparity feature;
步骤S703,通过语义细化网络对语义概率图和估计视差特征进行融合处理,得到初步视差特征;Step S703, performing fusion processing on the semantic probability map and the estimated disparity feature through the semantic refinement network to obtain the preliminary disparity feature;
步骤S704,通过语义细化网络对初步视差特征进行解码处理,得到目标视差图。In step S704, the preliminary disparity feature is decoded through the semantic refinement network to obtain the target disparity map.
在一些实施例的步骤S701中,语义细化网络的全连接层上预设有第三函数,第三函数为softmax函数,通过softmax函数对第一图像特征进行概率计算,根据计算结果,softmax函数会在预设的语义类别标签上创建一个概率分布,通过语义概率图来反映出第一图像特征在不同语义类别标签上的语义可能性。In step S701 of some embodiments, a third function is preset on the fully connected layer of the semantic refinement network, the third function is a softmax function, and the probability calculation of the first image feature is performed through the softmax function, and according to the calculation result, the softmax function A probability distribution is created on the preset semantic category labels, and the semantic possibility of the first image feature on different semantic category labels is reflected through the semantic probability map.
在一些实施例的步骤S702中,通过语义细化网络的卷积层对估计视差图进行二维卷积处理,捕捉估计视差图的图像特征,得到估计视差特征。In step S702 of some embodiments, two-dimensional convolution processing is performed on the estimated disparity map through the convolution layer of the semantic refinement network to capture image features of the estimated disparity map to obtain estimated disparity features.
在一些实施例的步骤S703中,通过语义细化网络的卷积层,根据预设的权重比例对语义概率图与估计视差特征进行向量相乘,从而实现语义特征与估计视差特征的特征融合,得到获得语义加权的初步视差特征。In step S703 of some embodiments, through the convolution layer of the semantic refinement network, the semantic probability map and the estimated disparity feature are vector multiplied according to the preset weight ratio, so as to realize the feature fusion of the semantic feature and the estimated disparity feature, Get preliminary disparity features that get semantically weighted.
在一些实施例的步骤S704中,通过语义细化网络的卷积层对初步视差特征进行卷积解码和反卷积上采样处理,得到目标视差图,该目标视差图用于反映目标图像的分辨率视差。In step S704 of some embodiments, convolutional decoding and deconvolution upsampling are performed on the preliminary disparity feature through the convolutional layer of the semantic refinement network to obtain a target disparity map, which is used to reflect the resolution of the target image rate parallax.
通过上述步骤S701至步骤S704,视差图生成方法利用图像分割结果,对视差估计结果进行语义类别的加权,再进行编码和解码,提高估计视差的场景语义可靠性,增强立体匹配任务对场景的理解,利用场景的语义信息可以改善不适当区域的视差估计效果,从而提高视差估计的准确性,减小视差图的误差。Through the above steps S701 to S704, the parallax map generation method uses the image segmentation results to weight the semantic categories of the parallax estimation results, and then encodes and decodes, so as to improve the scene semantic reliability of the estimated parallax and enhance the understanding of the scene for the stereo matching task. , using the semantic information of the scene can improve the effect of disparity estimation in inappropriate regions, thereby improving the accuracy of disparity estimation and reducing the error of the disparity map.
本申请实施例通过获取目标图像,其中,目标图像包括左视图和右视图。进而,对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征,能够使得得到的左视图特征和右视图特征更加符合视差估计的需求。进而,对左视图特征进行图像分割处理,得到第一图像特征,并对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体,通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图,这样一来,能够利用语义信息辅助视差估计,提高视差估计的可靠性。最后,通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图,能够增强立体匹配任务对场景的理解,提高视差估计的准确性,减小视差图的误差。In this embodiment of the present application, a target image is acquired, where the target image includes a left view and a right view. Furthermore, performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features can make the obtained left view features and right view features more in line with the requirements of disparity estimation. Furthermore, image segmentation is performed on the left view feature to obtain the first image feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost volume, and the preset three-dimensional convolutional hourglass model is used to The target cost volume performs disparity estimation to obtain an estimated disparity map. In this way, semantic information can be used to assist disparity estimation and improve the reliability of disparity estimation. Finally, the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can enhance the understanding of the scene for the stereo matching task, improve the accuracy of the disparity estimation, and reduce the disparity Figure error.
请参阅图8,本申请实施例还提供一种视差图生成装置,可以实现上述视差图生成方法,该装置包括:Please refer to FIG. 8, the embodiment of the present application also provides a disparity map generation device, which can implement the above disparity map generation method, and the device includes:
图像获取模块801,用于获取目标图像,其中,目标图像包括目标对象的左视图和右视图;An image acquisition module 801, configured to acquire a target image, wherein the target image includes a left view and a right view of the target object;
特征提取模块802,用于对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征;The feature extraction module 802 is used to perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features;
图像分割模块803,用于对左视图特征进行图像分割处理,得到第一图像特征;The image segmentation module 803 is configured to perform image segmentation processing on the left view feature to obtain the first image feature;
融合模块804,用于对左视图特征、第一图像特征以及右视图特征进行融合处理,得到目标代价体;The fusion module 804 is used to perform fusion processing on the left view feature, the first image feature and the right view feature to obtain the target cost body;
视差估计模块805,用于通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图;A disparity estimation module 805, configured to perform disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
语义细化模块806,用于通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图。The semantic refinement module 806 is configured to perform semantic refinement processing on the estimated disparity map through the preset semantic refinement network and the first image features, to obtain the target disparity map.
该视差图生成装置的具体实施方式与上述视差图生成方法的具体实施例基本相同,在此不再赘述。The specific implementation manner of the disparity map generation device is basically the same as the specific embodiment of the above-mentioned disparity map generation method, and will not be repeated here.
本申请实施例还提供了一种电子设备,电子设备包括:存储器、处理器、存储在存储器上并可在处理器上运行的程序以及用于实现处理器和存储器之间的连接通信的数据总线,程序被处理器执行时实现上述视差图生成方法。该电子设备可以为包括平板电脑、车载电脑等任意智能终端。The embodiment of the present application also provides an electronic device, the electronic device includes: a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory , when the program is executed by the processor, the above disparity map generation method is implemented. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.
请参阅图9,图9示意了另一实施例的电子设备的硬件结构,电子设备包括:Please refer to FIG. 9. FIG. 9 illustrates a hardware structure of an electronic device in another embodiment. The electronic device includes:
处理器901,可以采用通用的CPU(CentralProcessingUnit,中央处理器)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请实施例所提供的技术方案;The processor 901 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs, so as to realize The technical solutions provided by the embodiments of the present application;
存储器902,可以采用只读存储器(ReadOnlyMemory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(RandomAccessMemory,RAM)等形式实现。存储器902可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器902中,并由处理器901来调用执行一种视差图生成方法,其中,视差图生成方法包括:获取目标图像,其中,目标图像包括左视图和右视图;对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征;对左视图特征进行图像分割处理,得到第一图像特征;对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体;通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图;通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图;The memory 902 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM). The memory 902 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 902 and called by the processor 901 to execute a parallax A graph generation method, wherein the disparity map generation method includes: acquiring a target image, wherein the target image includes a left view and a right view; performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain A plurality of right-view features; perform image segmentation processing on the left-view features to obtain the first image features; combine the left-view features, the first image features and the right-view features to obtain the target cost body; through the preset three-dimensional convolution The hourglass model estimates the disparity of the target cost body to obtain the estimated disparity map; through the preset semantic refinement network and the first image feature, the estimated disparity map is semantically refined to obtain the target disparity map;
输入/输出接口903,用于实现信息输入及输出;The input/output interface 903 is used to realize information input and output;
通信接口904,用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;The communication interface 904 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);
总线905,在设备的各个组件(例如处理器901、存储器902、输入/输出接口903和通信接口904)之间传输信息;bus 905, for transferring information between various components of the device (such as processor 901, memory 902, input/output interface 903 and communication interface 904);
其中处理器901、存储器902、输入/输出接口903和通信接口904通过总线905实现彼此之间在设备内部的通信连接。The processor 901 , the memory 902 , the input/output interface 903 and the communication interface 904 are connected to each other within the device through the bus 905 .
本申请实施例还提供了一种存储介质,存储介质为计算机可读存储介质,用于计算机可读存储,计算机可读存储介质可以是非易失性,也可以是易失性。存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现一种视差图生成方法,其中,视差图生成方法包括:获取目标图像,其中,目标图像包括左视图和右视图;对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征;对左视图特征进行图像分割处理,得到第一图像特征;对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体;通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图;通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图。An embodiment of the present application also provides a storage medium, which is a computer-readable storage medium for computer-readable storage. The computer-readable storage medium may be non-volatile or volatile. The storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement a method for generating a disparity map, wherein the method for generating a disparity map includes: acquiring a target image, wherein the target image Including the left view and the right view; performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features; performing image segmentation processing on the left view features to obtain the first image feature ; The left view feature, the first image feature and the right view feature are combined to obtain the target cost volume; the disparity estimation is performed on the target cost volume through the preset three-dimensional convolution hourglass model, and the estimated disparity map is obtained; through the preset semantics The refinement network and the first image features perform semantic refinement on the estimated disparity map to obtain the target disparity map.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
本申请实施例提供的视差图生成方法、视差图生成装置、电子设备及存储介质,其通过获取目标图像,其中,目标图像包括左视图和右视图。进而,对左视图进行特征提取,得到多个左视图特征,并对右视图进行特征提取,得到多个右视图特征,能够使得得到的左视图特征和右视图特征更加符合视差估计的需求。进而,对左视图特征进行图像分割处理,得到第一图像特征,并对左视图特征、第一图像特征以及右视图特征进行组合处理,得到目标代价体,通过预设的三维卷积沙漏模型对目标代价体进行视差估计,得到估计视差图,这样一来,能够利用语义信息辅助视差估计,提高视差估计的可靠性。最后,通过预设的语义细化网络和第一图像特征对估计视差图进行语义细化处理,得到目标视差图,能够提高估计视差的场景语义可靠性,增强立体匹配任务对场景的理解,利用场景的语义信息可以改善不适当区域的视差估计效果,从而提高视差估计的准确性,减小视差图的误差。The disparity map generation method, disparity map generation device, electronic device, and storage medium provided in the embodiments of the present application obtain a target image, wherein the target image includes a left view and a right view. Furthermore, performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features can make the obtained left view features and right view features more in line with the requirements of disparity estimation. Furthermore, image segmentation is performed on the left view feature to obtain the first image feature, and the left view feature, the first image feature and the right view feature are combined to obtain the target cost volume, and the preset three-dimensional convolutional hourglass model is used to The target cost volume performs disparity estimation to obtain an estimated disparity map. In this way, semantic information can be used to assist disparity estimation and improve the reliability of disparity estimation. Finally, the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target disparity map, which can improve the scene semantic reliability of the estimated disparity and enhance the understanding of the scene for the stereo matching task. The semantic information of the scene can improve the effect of disparity estimation in inappropriate regions, thereby improving the accuracy of disparity estimation and reducing the error of disparity maps.
本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着技术的演变和新应用 场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and new For the emergence of application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.
本领域技术人员可以理解的是,图1-7中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。Those skilled in the art can understand that the technical solutions shown in Figures 1-7 do not constitute a limitation to the embodiments of the present application, and may include more or fewer steps than those shown in the illustrations, or combine certain steps, or be different A step of.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc., which can store programs. medium.
以上参照附图说明了本申请实施例的优选实施例,并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进,均应在本申请实施例的权利范围之内。The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims (20)

  1. 一种视差图生成方法,其中,所述方法包括:A method for generating a disparity map, wherein the method includes:
    获取目标图像,其中,所述目标图像包括左视图和右视图;Acquiring a target image, wherein the target image includes a left view and a right view;
    对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
    对所述左视图特征进行图像分割处理,得到第一图像特征;performing image segmentation processing on the left view feature to obtain a first image feature;
    对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;combining the left view feature, the first image feature and the right view feature to obtain a target cost body;
    通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;Performing disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
    通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
  2. 根据权利要求1所述的视差图生成方法,其中,所述对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征的步骤,包括:The method for generating a disparity map according to claim 1, wherein the step of performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features ,include:
    对所述左视图进行卷积处理,得到左视图卷积特征,并对所述右视图进行卷积处理,得到右视图卷积特征;performing convolution processing on the left view to obtain convolution features of the left view, and performing convolution processing on the right view to obtain convolution features of the right view;
    根据预设的多尺度特征分辨率参数,对所述左视图卷积特征进行金字塔池化处理,得到所述多个左视图特征,并根据所述多尺度特征分辨率参数,对所述右视图卷积特征进行金字塔池化处理,得到所述多个右视图特征。According to the preset multi-scale feature resolution parameters, perform pyramid pooling processing on the left view convolution features to obtain the multiple left view features, and according to the multi-scale feature resolution parameters, perform a pyramid pooling process on the right view The convolution features are subjected to pyramid pooling processing to obtain the plurality of right view features.
  3. 根据权利要求1所述的视差图生成方法,其中,所述对所述左视图特征进行图像分割处理,得到第一图像特征的步骤,包括:The method for generating a disparity map according to claim 1, wherein the step of performing image segmentation processing on the left view feature to obtain the first image feature comprises:
    通过预设的双线性峰插法对所述左视图特征进行上采样处理,得到第一视图特征隐变量;performing upsampling processing on the left view feature by a preset bilinear peak interpolation method to obtain a first view feature hidden variable;
    通过预设的第一函数对所述第一视图特征隐变量进行特征排序,得到第一视图特征序列;performing feature sorting on the first view feature latent variable by a preset first function to obtain a first view feature sequence;
    对所述第一视图特征序列进行卷积处理,得到所述第一图像特征。Perform convolution processing on the first view feature sequence to obtain the first image feature.
  4. 根据权利要求1所述的视差图生成方法,其中,所述对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体的步骤,包括:The method for generating a disparity map according to claim 1, wherein the step of combining the left view feature, the first image feature and the right view feature to obtain a target cost body includes:
    根据预设的多尺度特征分辨率参数,对所述左视图特征和所述右视图特征进行分类组合处理,得到初始代价体;According to the preset multi-scale feature resolution parameters, the left view feature and the right view feature are classified and combined to obtain an initial cost body;
    通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所述目标代价体。The target cost volume is obtained by splicing the initial cost volume and the first image feature through a preset three-dimensional convolutional network.
  5. 根据权利要求4所述的视差图生成方法,其中,所述通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所述目标代价体的步骤,包括:The disparity map generation method according to claim 4, wherein the step of obtaining the target cost volume by splicing the initial cost volume and the first image features through a preset three-dimensional convolutional network, include:
    通过所述三维卷积网络对所述初始代价体进行正则化处理,得到第一中间代价体,并通过所述三维卷积网络对所述第一图像特征进行正则化处理,得到第一中间图像特征;Regularizing the initial cost body through the three-dimensional convolutional network to obtain a first intermediate cost body, and performing regularization processing on the first image features through the three-dimensional convolutional network to obtain a first intermediate image feature;
    通过所述三维卷积网络对所述第一中间代价体进行下采样处理,得到第二中间代价体,并对所述第一中间图像特征进行上采样处理,得到第二中间图像特征;Performing downsampling processing on the first intermediate cost body through the three-dimensional convolutional network to obtain a second intermediate cost body, and performing upsampling processing on the first intermediate image features to obtain second intermediate image features;
    通过所述三维卷积网络对所述第二中间代价体与所述第二中间图像特征进行拼接处理,得到所述目标代价体。The target cost volume is obtained by splicing the second intermediate cost volume and the second intermediate image features through the three-dimensional convolutional network.
  6. 根据权利要求1所述的视差图生成方法,其中,所述三维卷积沙漏模型包括聚合层、预测层,所述通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图的步骤,包括:The method for generating a disparity map according to claim 1, wherein the three-dimensional convolutional hourglass model includes an aggregation layer and a prediction layer, and the disparity estimation is performed on the target cost volume through the preset three-dimensional convolutional hourglass model to obtain The steps of estimating the disparity map include:
    通过所述聚合层对所述目标代价体进行代价聚合处理,得到融合代价体;performing cost aggregation processing on the target cost body through the aggregation layer to obtain a fusion cost body;
    通过所述预测层的第二函数对所述融合代价体进行视差估计,得到所述估计视差图。Performing disparity estimation on the fused cost volume through the second function of the prediction layer to obtain the estimated disparity map.
  7. 根据权利要求1至6任一项所述的视差图生成方法,其中,所述通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图的步骤,包括:The method for generating a disparity map according to any one of claims 1 to 6, wherein the estimated disparity map is semantically refined through the preset semantic refinement network and the first image features to obtain the target The steps of the disparity map include:
    通过所述语义细化网络的第三函数对所述第一图像特征进行概率计算,生成语义概率图;Performing probability calculation on the first image feature through the third function of the semantic refinement network to generate a semantic probability map;
    通过所述语义细化网络对所述估计视差图进行卷积处理,得到估计视差特征;performing convolution processing on the estimated disparity map through the semantic refinement network to obtain estimated disparity features;
    通过所述语义细化网络对所述语义概率图和所述估计视差特征进行融合处理,得到初步视差特征;performing fusion processing on the semantic probability map and the estimated disparity feature through the semantic refinement network to obtain preliminary disparity features;
    通过所述语义细化网络对所述初步视差特征进行解码处理,得到所述目标视差图。Decoding the preliminary disparity feature through the semantic refinement network to obtain the target disparity map.
  8. 一种视差图生成装置,其中,所述装置包括:A device for generating a disparity map, wherein the device includes:
    图像获取模块,用于获取目标图像,其中,所述目标图像包括左视图和右视图;An image acquisition module, configured to acquire a target image, wherein the target image includes a left view and a right view;
    特征提取模块,用于对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;A feature extraction module, configured to perform feature extraction on the left view to obtain multiple left view features, and perform feature extraction on the right view to obtain multiple right view features;
    图像分割模块,用于对所述左视图特征进行图像分割处理,得到第一图像特征;An image segmentation module, configured to perform image segmentation processing on the left view feature to obtain the first image feature;
    融合模块,用于对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;A fusion module, configured to combine the left view features, the first image features, and the right view features to obtain a target cost body;
    视差估计模块,用于通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;A disparity estimation module, configured to perform disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
    语义细化模块,用于通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。The semantic refinement module is configured to perform semantic refinement processing on the estimated disparity map through a preset semantic refinement network and the first image feature, to obtain a target disparity map.
  9. 一种电子设备,其中,所述电子设备包括存储器、处理器、存储在所述存储器上并可在所述处理器上运行的程序以及用于实现所述处理器和所述存储器之间的连接通信的数据总线,所述程序被所述处理器执行时实现一种视差图生成方法,其中,所述视差图生成方法包括:An electronic device, wherein the electronic device includes a memory, a processor, a program stored on the memory and operable on the processor, and a program for realizing the connection between the processor and the memory A data bus for communication, when the program is executed by the processor, a method for generating a disparity map is implemented, wherein the method for generating a disparity map includes:
    获取目标图像,其中,所述目标图像包括左视图和右视图;Acquiring a target image, wherein the target image includes a left view and a right view;
    对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
    对所述左视图特征进行图像分割处理,得到第一图像特征;performing image segmentation processing on the left view feature to obtain a first image feature;
    对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;combining the left view feature, the first image feature and the right view feature to obtain a target cost body;
    通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;Performing disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
    通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
  10. 根据权利要求9所述的电子设备,其中,所述对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征的步骤,包括:The electronic device according to claim 9, wherein the step of performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features includes :
    对所述左视图进行卷积处理,得到左视图卷积特征,并对所述右视图进行卷积处理,得到右视图卷积特征;performing convolution processing on the left view to obtain convolution features of the left view, and performing convolution processing on the right view to obtain convolution features of the right view;
    根据预设的多尺度特征分辨率参数,对所述左视图卷积特征进行金字塔池化处理,得到所述多个左视图特征,并根据所述多尺度特征分辨率参数,对所述右视图卷积特征进行金字塔池化处理,得到所述多个右视图特征。According to the preset multi-scale feature resolution parameters, perform pyramid pooling processing on the left view convolution features to obtain the multiple left view features, and according to the multi-scale feature resolution parameters, perform a pyramid pooling process on the right view The convolution features are subjected to pyramid pooling processing to obtain the plurality of right view features.
  11. 根据权利要求9所述的电子设备,其中,所述对所述左视图特征进行图像分割处理,得到第一图像特征的步骤,包括:The electronic device according to claim 9, wherein the step of performing image segmentation processing on the left view feature to obtain the first image feature comprises:
    通过预设的双线性峰插法对所述左视图特征进行上采样处理,得到第一视图特征隐变量;performing upsampling processing on the left view feature by a preset bilinear peak interpolation method to obtain a first view feature hidden variable;
    通过预设的第一函数对所述第一视图特征隐变量进行特征排序,得到第一视图特征序列;performing feature sorting on the first view feature latent variable by a preset first function to obtain a first view feature sequence;
    对所述第一视图特征序列进行卷积处理,得到所述第一图像特征。Perform convolution processing on the first view feature sequence to obtain the first image feature.
  12. 根据权利要求9所述的电子设备,其中,所述对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体的步骤,包括:The electronic device according to claim 9, wherein the step of combining the left view feature, the first image feature and the right view feature to obtain a target cost body includes:
    根据预设的多尺度特征分辨率参数,对所述左视图特征和所述右视图特征进行分类组合处理,得到初始代价体;According to the preset multi-scale feature resolution parameters, the left view feature and the right view feature are classified and combined to obtain an initial cost body;
    通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所 述目标代价体。The initial cost volume and the first image feature are spliced through a preset three-dimensional convolutional network to obtain the target cost volume.
  13. 根据权利要求12所述的电子设备,其中,所述通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所述目标代价体的步骤,包括:The electronic device according to claim 12, wherein the step of splicing the initial cost body and the first image feature through a preset three-dimensional convolutional network to obtain the target cost body includes:
    通过所述三维卷积网络对所述初始代价体进行正则化处理,得到第一中间代价体,并通过所述三维卷积网络对所述第一图像特征进行正则化处理,得到第一中间图像特征;Regularizing the initial cost body through the three-dimensional convolutional network to obtain a first intermediate cost body, and performing regularization processing on the first image features through the three-dimensional convolutional network to obtain a first intermediate image feature;
    通过所述三维卷积网络对所述第一中间代价体进行下采样处理,得到第二中间代价体,并对所述第一中间图像特征进行上采样处理,得到第二中间图像特征;Performing downsampling processing on the first intermediate cost body through the three-dimensional convolutional network to obtain a second intermediate cost body, and performing upsampling processing on the first intermediate image features to obtain second intermediate image features;
    通过所述三维卷积网络对所述第二中间代价体与所述第二中间图像特征进行拼接处理,得到所述目标代价体。The target cost volume is obtained by splicing the second intermediate cost volume and the second intermediate image features through the three-dimensional convolutional network.
  14. 根据权利要求9所述的电子设备,其中,所述三维卷积沙漏模型包括聚合层、预测层,所述通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图的步骤,包括:The electronic device according to claim 9, wherein the three-dimensional convolutional hourglass model includes an aggregation layer and a prediction layer, and the estimated parallax is obtained by performing disparity estimation on the target cost volume through the preset three-dimensional convolutional hourglass model Figure steps, including:
    通过所述聚合层对所述目标代价体进行代价聚合处理,得到融合代价体;performing cost aggregation processing on the target cost body through the aggregation layer to obtain a fusion cost body;
    通过所述预测层的第二函数对所述融合代价体进行视差估计,得到所述估计视差图。Performing disparity estimation on the fused cost volume through the second function of the prediction layer to obtain the estimated disparity map.
  15. 一种存储介质,所述存储介质为计算机可读存储介质,用于计算机可读存储,其中,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现一种视差图生成方法,其中,所述视差图生成方法包括:A storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, wherein the storage medium stores one or more programs, and the one or more programs can be used by one or more Executed by a processor to implement a method for generating a disparity map, wherein the method for generating a disparity map includes:
    获取目标图像,其中,所述目标图像包括左视图和右视图;Acquiring a target image, wherein the target image includes a left view and a right view;
    对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征;performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features;
    对所述左视图特征进行图像分割处理,得到第一图像特征;performing image segmentation processing on the left view feature to obtain a first image feature;
    对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体;combining the left view feature, the first image feature and the right view feature to obtain a target cost body;
    通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图;Performing disparity estimation on the target cost volume through a preset three-dimensional convolutional hourglass model to obtain an estimated disparity map;
    通过预设的语义细化网络和所述第一图像特征对所述估计视差图进行语义细化处理,得到目标视差图。Semantic refinement is performed on the estimated disparity map by using a preset semantic refinement network and the first image feature to obtain a target disparity map.
  16. 根据权利要求15所述的存储介质,其中,所述对所述左视图进行特征提取,得到多个左视图特征,并对所述右视图进行特征提取,得到多个右视图特征的步骤,包括:The storage medium according to claim 15, wherein the step of performing feature extraction on the left view to obtain multiple left view features, and performing feature extraction on the right view to obtain multiple right view features includes :
    对所述左视图进行卷积处理,得到左视图卷积特征,并对所述右视图进行卷积处理,得到右视图卷积特征;performing convolution processing on the left view to obtain convolution features of the left view, and performing convolution processing on the right view to obtain convolution features of the right view;
    根据预设的多尺度特征分辨率参数,对所述左视图卷积特征进行金字塔池化处理,得到所述多个左视图特征,并根据所述多尺度特征分辨率参数,对所述右视图卷积特征进行金字塔池化处理,得到所述多个右视图特征。According to the preset multi-scale feature resolution parameters, perform pyramid pooling processing on the left view convolution features to obtain the multiple left view features, and according to the multi-scale feature resolution parameters, perform a pyramid pooling process on the right view The convolution features are subjected to pyramid pooling processing to obtain the plurality of right view features.
  17. 根据权利要求15所述的存储介质,其中,所述对所述左视图特征进行图像分割处理,得到第一图像特征的步骤,包括:The storage medium according to claim 15, wherein the step of performing image segmentation processing on the left view feature to obtain the first image feature comprises:
    通过预设的双线性峰插法对所述左视图特征进行上采样处理,得到第一视图特征隐变量;performing upsampling processing on the left view feature by a preset bilinear peak interpolation method to obtain a first view feature hidden variable;
    通过预设的第一函数对所述第一视图特征隐变量进行特征排序,得到第一视图特征序列;performing feature sorting on the first view feature latent variable by a preset first function to obtain a first view feature sequence;
    对所述第一视图特征序列进行卷积处理,得到所述第一图像特征。Perform convolution processing on the first view feature sequence to obtain the first image feature.
  18. 根据权利要求15所述的存储介质,其中,所述对所述左视图特征、所述第一图像特征以及所述右视图特征进行组合处理,得到目标代价体的步骤,包括:The storage medium according to claim 15, wherein the step of combining the left view feature, the first image feature and the right view feature to obtain a target cost body includes:
    根据预设的多尺度特征分辨率参数,对所述左视图特征和所述右视图特征进行分类组合处理,得到初始代价体;According to the preset multi-scale feature resolution parameters, the left view feature and the right view feature are classified and combined to obtain an initial cost body;
    通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所述目标代价体。The target cost volume is obtained by splicing the initial cost volume and the first image feature through a preset three-dimensional convolutional network.
  19. 根据权利要求18所述的存储介质,其中,所述通过预设的三维卷积网络对所述初始代价体、所述第一图像特征进行拼接处理,得到所述目标代价体的步骤,包括:The storage medium according to claim 18, wherein the step of splicing the initial cost body and the first image feature through a preset three-dimensional convolutional network to obtain the target cost body includes:
    通过所述三维卷积网络对所述初始代价体进行正则化处理,得到第一中间代价体,并通过所述三维卷积网络对所述第一图像特征进行正则化处理,得到第一中间图像特征;Regularizing the initial cost body through the three-dimensional convolutional network to obtain a first intermediate cost body, and performing regularization processing on the first image features through the three-dimensional convolutional network to obtain a first intermediate image feature;
    通过所述三维卷积网络对所述第一中间代价体进行下采样处理,得到第二中间代价体,并对所述第一中间图像特征进行上采样处理,得到第二中间图像特征;Performing downsampling processing on the first intermediate cost body through the three-dimensional convolutional network to obtain a second intermediate cost body, and performing upsampling processing on the first intermediate image features to obtain second intermediate image features;
    通过所述三维卷积网络对所述第二中间代价体与所述第二中间图像特征进行拼接处理,得到所述目标代价体。The target cost volume is obtained by splicing the second intermediate cost volume and the second intermediate image features through the three-dimensional convolutional network.
  20. 根据权利要求15所述的存储介质,其中,所述三维卷积沙漏模型包括聚合层、预测层,所述通过预设的三维卷积沙漏模型对所述目标代价体进行视差估计,得到估计视差图的步骤,包括:The storage medium according to claim 15, wherein the three-dimensional convolutional hourglass model includes an aggregation layer and a prediction layer, and the estimated parallax is obtained by performing disparity estimation on the target cost volume through the preset three-dimensional convolutional hourglass model Figure steps, including:
    通过所述聚合层对所述目标代价体进行代价聚合处理,得到融合代价体;performing cost aggregation processing on the target cost body through the aggregation layer to obtain a fusion cost body;
    通过所述预测层的第二函数对所述融合代价体进行视差估计,得到所述估计视差图。Performing disparity estimation on the fused cost volume through the second function of the prediction layer to obtain the estimated disparity map.
PCT/CN2022/090665 2022-02-22 2022-04-29 Disparity map generation method and apparatus, electronic device, and storage medium WO2023159757A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210162805.0A CN114519710A (en) 2022-02-22 2022-02-22 Disparity map generation method and device, electronic equipment and storage medium
CN202210162805.0 2022-02-22

Publications (1)

Publication Number Publication Date
WO2023159757A1 true WO2023159757A1 (en) 2023-08-31

Family

ID=81599939

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090665 WO2023159757A1 (en) 2022-02-22 2022-04-29 Disparity map generation method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN114519710A (en)
WO (1) WO2023159757A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078984A (en) * 2023-10-17 2023-11-17 腾讯科技(深圳)有限公司 Binocular image processing method and device, electronic equipment and storage medium
CN117747056A (en) * 2024-02-19 2024-03-22 遂宁市中心医院 Preoperative image estimation method, device and equipment for minimally invasive surgery and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661429B (en) * 2022-11-11 2023-03-10 四川川锅环保工程有限公司 System and method for identifying defects of boiler water wall pipe and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287964A (en) * 2019-06-13 2019-09-27 浙江大华技术股份有限公司 A kind of solid matching method and device
US20200273192A1 (en) * 2019-02-26 2020-08-27 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
CN111696148A (en) * 2020-06-17 2020-09-22 中国科学技术大学 End-to-end stereo matching method based on convolutional neural network
CN112581517A (en) * 2020-12-16 2021-03-30 电子科技大学中山学院 Binocular stereo matching device and method
CN113762267A (en) * 2021-09-02 2021-12-07 北京易航远智科技有限公司 Multi-scale binocular stereo matching method and device based on semantic association
CN113763446A (en) * 2021-08-17 2021-12-07 沈阳工业大学 Stereo matching method based on guide information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9552633B2 (en) * 2014-03-07 2017-01-24 Qualcomm Incorporated Depth aware enhancement for stereo video
CN109118490B (en) * 2018-06-28 2021-02-26 厦门美图之家科技有限公司 Image segmentation network generation method and image segmentation method
CN111340077B (en) * 2020-02-18 2024-04-12 平安科技(深圳)有限公司 Attention mechanism-based disparity map acquisition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200273192A1 (en) * 2019-02-26 2020-08-27 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
CN110287964A (en) * 2019-06-13 2019-09-27 浙江大华技术股份有限公司 A kind of solid matching method and device
CN111696148A (en) * 2020-06-17 2020-09-22 中国科学技术大学 End-to-end stereo matching method based on convolutional neural network
CN112581517A (en) * 2020-12-16 2021-03-30 电子科技大学中山学院 Binocular stereo matching device and method
CN113763446A (en) * 2021-08-17 2021-12-07 沈阳工业大学 Stereo matching method based on guide information
CN113762267A (en) * 2021-09-02 2021-12-07 北京易航远智科技有限公司 Multi-scale binocular stereo matching method and device based on semantic association

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEE, SEONG-WHAN ; LI, STAN Z: "SAT 2015 18th International Conference, Austin, TX, USA, September 24-27, 2015", vol. 11211 Chap.39, 6 October 2018, SPRINGER , Berlin, Heidelberg , ISBN: 3540745491, article YANG GUORUN; ZHAO HENGSHUANG; SHI JIANPING; DENG ZHIDONG; JIA JIAYA: "SegStereo: Exploiting Semantic Information for Disparity Estimation", pages: 660 - 676, XP047488261, 032548, DOI: 10.1007/978-3-030-01234-2_39 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078984A (en) * 2023-10-17 2023-11-17 腾讯科技(深圳)有限公司 Binocular image processing method and device, electronic equipment and storage medium
CN117078984B (en) * 2023-10-17 2024-02-02 腾讯科技(深圳)有限公司 Binocular image processing method and device, electronic equipment and storage medium
CN117747056A (en) * 2024-02-19 2024-03-22 遂宁市中心医院 Preoperative image estimation method, device and equipment for minimally invasive surgery and storage medium

Also Published As

Publication number Publication date
CN114519710A (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
WO2023159757A1 (en) Disparity map generation method and apparatus, electronic device, and storage medium
WO2019020075A1 (en) Image processing method, device, storage medium, computer program, and electronic device
CN111507333B (en) Image correction method and device, electronic equipment and storage medium
US11880990B2 (en) Method and apparatus with feature embedding
CN113870422B (en) Point cloud reconstruction method, device, equipment and medium
US11615612B2 (en) Systems and methods for image feature extraction
EP3836083B1 (en) Disparity estimation system and method, electronic device and computer program product
US11651507B2 (en) Content-adaptive binocular matching method and apparatus
US20220198731A1 (en) Pixel-aligned volumetric avatars
CN113129352A (en) Sparse light field reconstruction method and device
CN112509021A (en) Parallax optimization method based on attention mechanism
CN114742875A (en) Binocular stereo matching method based on multi-scale feature extraction and self-adaptive aggregation
CN113034666B (en) Stereo matching method based on pyramid parallax optimization cost calculation
CN116486009A (en) Monocular three-dimensional human body reconstruction method and device and electronic equipment
JP2024521816A (en) Unrestricted image stabilization
CN114022630A (en) Method, device and equipment for reconstructing three-dimensional scene and computer readable storage medium
CN116188349A (en) Image processing method, device, electronic equipment and storage medium
CN113887289A (en) Monocular three-dimensional object detection method, device, equipment and product
CN111369425A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113298097B (en) Feature point extraction method and device based on convolutional neural network and storage medium
US20230177722A1 (en) Apparatus and method with object posture estimating
US20240135632A1 (en) Method and appratus with neural rendering based on view augmentation
CN116883797A (en) Image processing method and related equipment
CN114387325A (en) Image feature processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928029

Country of ref document: EP

Kind code of ref document: A1