WO2023184526A1 - System and method of real-time stereoscopic visualization based on monocular camera - Google Patents

System and method of real-time stereoscopic visualization based on monocular camera Download PDF

Info

Publication number
WO2023184526A1
WO2023184526A1 PCT/CN2022/085011 CN2022085011W WO2023184526A1 WO 2023184526 A1 WO2023184526 A1 WO 2023184526A1 CN 2022085011 W CN2022085011 W CN 2022085011W WO 2023184526 A1 WO2023184526 A1 WO 2023184526A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
interpolation
depth map
monocular
processing device
Prior art date
Application number
PCT/CN2022/085011
Other languages
French (fr)
Inventor
Pengjia CAO
Kun FANG
Qin LUO
Xiaofang GAN
Yingying LIU
Original Assignee
Covidien Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Covidien Lp filed Critical Covidien Lp
Priority to PCT/CN2022/085011 priority Critical patent/WO2023184526A1/en
Publication of WO2023184526A1 publication Critical patent/WO2023184526A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • H04N13/268Image signal generators with monoscopic-to-stereoscopic image conversion based on depth image-based rendering [DIBR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/50Constructional details
    • H04N23/555Constructional details for picking-up images in sites, inaccessible due to their dimensions or hazardous conditions, e.g. endoscopes or borescopes

Definitions

  • Minimally invasive surgery has become an indispensable part in surgical procedures and is performed with the aid of an endoscope, which allows for viewing of the surgical site through a natural opening, a small incision, or an access port.
  • conventional minimally invasive surgeries mostly employ monocular endoscopes, which only display two-dimensional (2D) images lacking depth information. Therefore, it is challenging for a surgeon to accurately move surgical instruments to specific locations inside a patient’s body. Surgeons usually perceive depth in 2D images according to motion parallax, monocular cues, and other indirect visual feedback for positioning accuracy. Stereoscopic visualization provides better imaging of the surgical site during minimally invasive surgery, providing the surgeon with depth perception. Despite the advantages of depth information or stereoscopic images, dual-camera endoscopes have the drawback of being much more expensive than monocular endoscopes.
  • the present disclosure relates to a stereoscopic visualization system for endoscopes and, more particularly, to a stereoscopic visualization system generating stereoscopic images based on monocular images.
  • an image processing device for generating a stereoscopic image.
  • the image processing device may include a processor; and a memory, including instructions stored thereon, which when executed by the processor cause the image processing device to: resize a monocular image having a first resolution yielding a resized image having a second resolution; calculate an estimated depth map of the monocular image based on the resized image; resize the estimated depth map yielding a resized estimated depth map having the first resolution; generate a counterpart monocular image based on the resized estimated depth map; and generate a stereoscopic image based on the monocular image and the counterpart monocular image.
  • Implementations of the above embodiment may include one or more of the following features.
  • the second resolution may be smaller than the first resolution.
  • the monocular image may be a frame from a video stream.
  • the image processing device may be configured to execute a convolutional neural network to calculate the estimated depth map.
  • the image processing device may be further configured to resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  • the image processing device may be also configured to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
  • an imaging system for generating a stereoscopic image includes a monocular endoscope configured to capture a monocular image.
  • the system also includes an image processing device having a processor and a memory, with instructions stored thereon, which when executed by the processor cause the image processing device to: resize the monocular image having a first resolution yielding a resized image having a second resolution; calculate an estimated depth map of the monocular image based on the resized image; resize the estimated depth map yielding a resized estimated depth map having the first resolution; generate a counterpart monocular image based on the resized estimated depth map; and generate a stereoscopic image based on the monocular image and the counterpart monocular image.
  • the imaging system may include a stereoscopic display configured to display the stereoscopic image.
  • the monocular image may be a frame from a video stream.
  • the image processing device may be configured to execute a convolutional neural network to calculate the estimated depth map.
  • the image processing device may be further configured resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  • the image processing device may be also configured to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
  • a method for generating a stereoscopic image includes resizing a monocular image having a first resolution yielding a resized image having a second resolution.
  • the method also includes calculating an estimated depth map of the monocular image based on the resized image.
  • the method further includes resizing the estimated depth map yielding a resized estimated depth map having the first resolution.
  • the method additionally includes generating a counterpart monocular image based on the resized estimated depth map and generating a stereoscopic image based on the monocular image and the counterpart monocular image.
  • Implementations of the above embodiment may include one or more of the following features.
  • the method may also include receiving the monocular image as a frame from a video stream.
  • the second resolution may be smaller than the first resolution.
  • the method may further include outputting the stereoscopic image on a stereoscopic display.
  • Calculating the estimated depth map may further include executing a convolutional neural network.
  • Resizing the monocular image may further include using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  • Resizing the estimated depth map may further include using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
  • FIG. 1 is a schematic view of an imaging system according to an embodiment of the present disclosure
  • FIG. 2 is flow chart of a stereoscopic image generating algorithm according to an embodiment of the present disclosure.
  • FIG. 3 is flow chart of a stereoscopic image generating algorithm according to another embodiment of the present disclosure.
  • an imaging system 10 includes a monocular endoscope 20 and an image processing device 30.
  • the endoscope 20 is configured to capture 2D image data, which includes still images or a video stream having a plurality of monocular endoscopic images captured over a period of time.
  • the endoscope 20 may be any device structurally configured for internally imaging an anatomical region of a body (e.g., human or animal) and may include fiber optics, lenses, miniaturized (e.g., complementary metal oxide semiconductor (CMOS) sensor) imaging systems or the like.
  • CMOS complementary metal oxide semiconductor
  • Suitable endoscopes 20 include, but are not limited to, any type of scope (e.g., a bronchoscope, a colonoscope, a laparoscope, etc. ) and any device similar to a scope that is equipped with an image system (e.g., an imaging cannula) .
  • the endoscope 20 is coupled the image processing device 30 that is configured to receive image data from the endoscope 20 for further processing.
  • the image processing device 30 may include a processor 32, which may be operably connected to a memory 34, which may include one or more of volatile, non-volatile, magnetic, optical, or electrical media, such as read-only memory (ROM) , random access memory (RAM) , electrically-erasable programmable ROM (EEPROM) , non-volatile RAM (NVRAM) , or flash memory.
  • the processor 32 is configured to perform the operations, calculations, and/or set of instructions stored in the memory 34.
  • the processor 32 may be any suitable processor including, but not limited to, a hardware processor, a field programmable gate array (FPGA) , a digital signal processor (DSP) , a central processing unit (CPU) , a microprocessor, a graphic processing unit ( “GPU” ) , and combinations thereof.
  • a hardware processor e.g., a field programmable gate array (FPGA) , a digital signal processor (DSP) , a central processing unit (CPU) , a microprocessor, a graphic processing unit ( “GPU” ) , and combinations thereof.
  • FPGA field programmable gate array
  • DSP digital signal processor
  • CPU central processing unit
  • microprocessor e.g., a microprocessor
  • GPU graphic processing unit
  • the image processing device 30 is also coupled to a display 40, which may be a stereoscopic monitor and is configured to display the stereoscopic images or stereoscopic video stream generated by and transmitted from the image processing device 30.
  • the display 40 may be configured to display stereoscopic images in a side-by-side format or an interlaced format to be viewed with the aid of 3D glasses.
  • the display 40 may be an autostereoscopic display (e.g., using a parallax barrier, lenticular lens, or other display technologies) configured to display stereoscopic images without 3D glasses.
  • the image processing device 30 receives monocular images from the endoscope 20 as input, and generates the corresponding stereoscopic images which are displayed on the display 40.
  • the input monocular image may be the left image or the right image in the generated stereoscopic images and the generated image is the counterpart image (e.g., left or right) .
  • the image processing device 30 is configured to execute an image generation algorithm based on deep learning, which performs stereoscopic image generation.
  • the algorithm is illustrated in FIG. 2 and may be embodied as a software application or instructions stored in the memory 34 and executable by the processor 32.
  • the image processing device 30 receives an input image (e.g., left image) which may be a still image or a frame of a video stream, from the endoscope 20.
  • the input image may be a right image.
  • the image generation algorithm calculates an estimated depth map for the input mage using a convolutional neural network.
  • the convolutional neural network may have any suitable convolutional architecture, such a U-Net architecture, which may be used in medical image processing.
  • the parameters to be optimized in the algorithm include those of the convolutional neural network. There are no learnable parameters in the sampling step and are thus, excluded from optimization.
  • training of the neural network may happen on a separate system, e.g., graphic processor unit ( “GPU” ) workstations, high performing computer clusters, etc., and the trained algorithm may then be deployed on the image processing device 30.
  • the stereoscopic image generation algorithm may be trained in an end-to-end manner using actual stereoscopic endoscopic images as training data.
  • the algorithm receives a left image of the stereoscopic images as input, and outputs one estimated right image using the process described above with respect to FIG. 2.
  • the parameters in the algorithm are optimized via backpropagating the gradients with respect to the differences. Given a large enough training set and appropriate training settings, the algorithm training converges, and the differences between estimated and actual images are reduced to a locally minimal value, which indicates that the stereoscopic image generation algorithm has been fully trained.
  • the image processing algorithm generates another image (e.g., right) by sampling the input image based on the estimated depth map. After a counterpart image is generated, the input image and the generated image are combined as a stereoscopic image and displayed on the display 40.
  • FIG. 3 shows a method for stereoscopic visualization using the imaging system 10 including the process and algorithm of FIG. 2.
  • a video stream from the endoscope 20 is received at the image processing device 30. More specifically, the image processing device 30 reads one frame (i.e., a still monocular image) at a time from the video stream.
  • the image may be of any suitable resolution, e.g., 4K, 1080p, 720p, etc.
  • the single frame is resized (e.g., downsized) to a smaller size (i.e., resolution) , which may be reduced by a factor of from about 1.5 to about 5. Resizing may be accomplished any suitable image resizing algorithm to reduce the resolution of the image to a desired image size.
  • the first resizing operation i.e., resizing the input image, may be implemented using any suitable interpolation technique, including, but not limited to, an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, and/or a bicubic interpolation.
  • the resized image is processed using a convolutional neural network yielding an estimated depth map, as described above with respect to FIG. 2.
  • the depth map estimation is performed on the resized image, allowing for faster processing and generation of the depth map due to the smaller resolution size on which depth estimation is being performed.
  • the estimated depth map is resized (e.g., enlarged) to the original image size, since the estimated depth map was obtained from the smaller image.
  • the second resizing operation i.e., resizing the depth map, may be implemented using any suitable interpolation technique, including, but not limited to, a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, an area interpolation, and combinations thereof.
  • Two resizing operations allow for faster image generation while maintaining the quality and resolution of the generated image.
  • the input image is first resized to a smaller size to perform depth estimation portion of the algorithm. Thereafter, the estimated depth map from the algorithm is then resized back to original size of the input image to generate the right image. Without the resizing operations, the processing speed of the algorithm would be adversely affected.
  • the image processing device 30 samples the original input image and generates the counterpart (e.g., right) image based on the resized depth map. Finally, at step 210, the left original image and the right generated image are combined as a stereoscopic image and displayed on the display 40.
  • the image generation algorithm according to the present disclosure was tested to demonstrate the effect of two resizing operations on stereoscopic image generation from a single image.
  • Two algorithms, one with two resizing operations, and one without resizing operations, were executed on a personal computer (PC) with an NVIDIA GTX 1070 GPU, running Windows 10, CUDA 10.2.89, cuDNN 8.0.5, and PyTorch 1.6.0 (hereinafter “Windows PC” ) .
  • Net interference and total time were measured.
  • “Net inference” indicates the time that the Windows PC took to calculate the estimated depth map using the convolutional neural network.
  • Total time indicates the total processing of one frame, i.e., from reading one input frame to generating the stereoscopic images. The statistics were averaged for 500 frames.
  • the image generation algorithm according to the present disclosure was also tested using an open-source package TensorRT TM developed by the corporation to increase the processing speed of the convolutional neural network.
  • TensorRT TM is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications.
  • the Windows PC was used to execute the algorithm of this disclosure without TensorRT.
  • Table 2 shows, both, “net interference” and “total time” were significantly improved by TensorRT taking advantage of the GPU processing.
  • Computer-readable media may include non-transitory computer-readable media, which corresponds to a tangible medium such as data storage media (e.g., RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer) .
  • data storage media e.g., RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

An imaging system includes a monocular endoscope configured to capture a monocular image. The system also includes an image processing device having a processor and a memory, with instructions stored thereon, which when executed by the processor cause the image processing device to: resize the monocular image having a first resolution yielding a resized image having a second resolution; calculate an estimated depth map of the monocular image based on the resized image; resize the estimated depth map yielding a resized estimated depth map having the first resolution; generate a counterpart monocular image based on the resized estimated depth map; and generate a stereoscopic image based on the monocular image and the counterpart monocular image.

Description

SYSTEM AND METHOD OF REAL-TIME STEREOSCOPIC VISUALIZATION BASED ON MONOCULAR CAMERA BACKGROUND
Minimally invasive surgery has become an indispensable part in surgical procedures and is performed with the aid of an endoscope, which allows for viewing of the surgical site through a natural opening, a small incision, or an access port. However, conventional minimally invasive surgeries mostly employ monocular endoscopes, which only display two-dimensional (2D) images lacking depth information. Therefore, it is challenging for a surgeon to accurately move surgical instruments to specific locations inside a patient’s body. Surgeons usually perceive depth in 2D images according to motion parallax, monocular cues, and other indirect visual feedback for positioning accuracy. Stereoscopic visualization provides better imaging of the surgical site during minimally invasive surgery, providing the surgeon with depth perception. Despite the advantages of depth information or stereoscopic images, dual-camera endoscopes have the drawback of being much more expensive than monocular endoscopes.
SUMMARY
The present disclosure relates to a stereoscopic visualization system for endoscopes and, more particularly, to a stereoscopic visualization system generating stereoscopic images based on monocular images.
According to one embodiment of the present disclosure, an image processing device for generating a stereoscopic image is disclosed. The image processing device may include a processor; and a memory, including instructions stored thereon, which when executed by the processor cause the image processing device to: resize a monocular image having a first resolution yielding a resized image having a second resolution; calculate an estimated depth map of the monocular image based on the resized image; resize the estimated depth map yielding a resized estimated depth map having the first resolution; generate a counterpart monocular image based on the resized estimated depth map; and generate a stereoscopic image based on the monocular image and the counterpart monocular image.
Implementations of the above embodiment may include one or more of the following features. According to one aspect of the above embodiment, the second resolution may be smaller than the first resolution. The monocular image may be a frame from a video stream. The  image processing device may be configured to execute a convolutional neural network to calculate the estimated depth map. The image processing device may be further configured to resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation. The image processing device may be also configured to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
According to another embodiment of the present disclosure, an imaging system for generating a stereoscopic image is disclosed. The imaging system includes a monocular endoscope configured to capture a monocular image. The system also includes an image processing device having a processor and a memory, with instructions stored thereon, which when executed by the processor cause the image processing device to: resize the monocular image having a first resolution yielding a resized image having a second resolution; calculate an estimated depth map of the monocular image based on the resized image; resize the estimated depth map yielding a resized estimated depth map having the first resolution; generate a counterpart monocular image based on the resized estimated depth map; and generate a stereoscopic image based on the monocular image and the counterpart monocular image.
Implementations of the above embodiment may include one or more of the following features. According to one aspect of the above embodiment, the second resolution is smaller than the first resolution. The imaging system may include a stereoscopic display configured to display the stereoscopic image. The monocular image may be a frame from a video stream. The image processing device may be configured to execute a convolutional neural network to calculate the estimated depth map. The image processing device may be further configured resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation. The image processing device may be also configured to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
According to a further embodiment of the present disclosure, a method for generating a stereoscopic image is disclosed. The method includes resizing a monocular image having a first resolution yielding a resized image having a second resolution. The method also includes  calculating an estimated depth map of the monocular image based on the resized image. The method further includes resizing the estimated depth map yielding a resized estimated depth map having the first resolution. The method additionally includes generating a counterpart monocular image based on the resized estimated depth map and generating a stereoscopic image based on the monocular image and the counterpart monocular image.
Implementations of the above embodiment may include one or more of the following features. According to one aspect of the above embodiment, the method may also include receiving the monocular image as a frame from a video stream. The second resolution may be smaller than the first resolution. The method may further include outputting the stereoscopic image on a stereoscopic display. Calculating the estimated depth map may further include executing a convolutional neural network. Resizing the monocular image may further include using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation. Resizing the estimated depth map may further include using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
BRIEF DESCRIPTION OF THE DRAWINGS
The present disclosure may be understood by reference to the accompanying drawings, when considered in conjunction with the subsequent, detailed description, in which:
FIG. 1 is a schematic view of an imaging system according to an embodiment of the present disclosure;
FIG. 2 is flow chart of a stereoscopic image generating algorithm according to an embodiment of the present disclosure; and
FIG. 3 is flow chart of a stereoscopic image generating algorithm according to another embodiment of the present disclosure.
DETAILED DESCRIPTION
Embodiments of the presently disclosed system are described in detail with reference to the drawings, in which like reference numerals designate identical or corresponding elements in each of the several views. In the following description, well-known functions or constructions are not described in detail to avoid obscuring the present disclosure in unnecessary detail. Those  skilled in the art will understand that the present disclosure may be adapted for use with any imaging system.
With reference to FIG. 1, an imaging system 10 includes a monocular endoscope 20 and an image processing device 30. The endoscope 20 is configured to capture 2D image data, which includes still images or a video stream having a plurality of monocular endoscopic images captured over a period of time. The endoscope 20 may be any device structurally configured for internally imaging an anatomical region of a body (e.g., human or animal) and may include fiber optics, lenses, miniaturized (e.g., complementary metal oxide semiconductor (CMOS) sensor) imaging systems or the like. Suitable endoscopes 20 include, but are not limited to, any type of scope (e.g., a bronchoscope, a colonoscope, a laparoscope, etc. ) and any device similar to a scope that is equipped with an image system (e.g., an imaging cannula) .
The endoscope 20 is coupled the image processing device 30 that is configured to receive image data from the endoscope 20 for further processing. The image processing device 30 may include a processor 32, which may be operably connected to a memory 34, which may include one or more of volatile, non-volatile, magnetic, optical, or electrical media, such as read-only memory (ROM) , random access memory (RAM) , electrically-erasable programmable ROM (EEPROM) , non-volatile RAM (NVRAM) , or flash memory. The processor 32 is configured to perform the operations, calculations, and/or set of instructions stored in the memory 34. The processor 32 may be any suitable processor including, but not limited to, a hardware processor, a field programmable gate array (FPGA) , a digital signal processor (DSP) , a central processing unit (CPU) , a microprocessor, a graphic processing unit ( “GPU” ) , and combinations thereof. Those skilled in the art will appreciate that the processor may be substituted for by using any logic processor (e.g., control circuit) adapted to execute algorithms, calculations, and/or set of instructions described herein.
The image processing device 30 is also coupled to a display 40, which may be a stereoscopic monitor and is configured to display the stereoscopic images or stereoscopic video stream generated by and transmitted from the image processing device 30. The display 40 may be configured to display stereoscopic images in a side-by-side format or an interlaced format to be viewed with the aid of 3D glasses. In further embodiments, the display 40 may be an autostereoscopic display (e.g., using a parallax barrier, lenticular lens, or other display technologies) configured to display stereoscopic images without 3D glasses.
The image processing device 30 receives monocular images from the endoscope 20 as input, and generates the corresponding stereoscopic images which are displayed on the display 40. Specifically, the input monocular image may be the left image or the right image in the generated stereoscopic images and the generated image is the counterpart image (e.g., left or right) .
The image processing device 30 is configured to execute an image generation algorithm based on deep learning, which performs stereoscopic image generation. The algorithm is illustrated in FIG. 2 and may be embodied as a software application or instructions stored in the memory 34 and executable by the processor 32. Initially, at step 100, the image processing device 30 receives an input image (e.g., left image) which may be a still image or a frame of a video stream, from the endoscope 20. In embodiments, the input image may be a right image.
At step 102, the image generation algorithm calculates an estimated depth map for the input mage using a convolutional neural network. The convolutional neural network may have any suitable convolutional architecture, such a U-Net architecture, which may be used in medical image processing. The parameters to be optimized in the algorithm include those of the convolutional neural network. There are no learnable parameters in the sampling step and are thus, excluded from optimization.
In various embodiments, training of the neural network may happen on a separate system, e.g., graphic processor unit ( “GPU” ) workstations, high performing computer clusters, etc., and the trained algorithm may then be deployed on the image processing device 30. The stereoscopic image generation algorithm may be trained in an end-to-end manner using actual stereoscopic endoscopic images as training data. During the training phase, the algorithm receives a left image of the stereoscopic images as input, and outputs one estimated right image using the process described above with respect to FIG. 2. By measuring and minimizing the differences between the estimated right images and the actual right images, the parameters in the algorithm are optimized via backpropagating the gradients with respect to the differences. Given a large enough training set and appropriate training settings, the algorithm training converges, and the differences between estimated and actual images are reduced to a locally minimal value, which indicates that the stereoscopic image generation algorithm has been fully trained.
At step 104, the image processing algorithm generates another image (e.g., right) by sampling the input image based on the estimated depth map. After a counterpart image is  generated, the input image and the generated image are combined as a stereoscopic image and displayed on the display 40.
FIG. 3 shows a method for stereoscopic visualization using the imaging system 10 including the process and algorithm of FIG. 2. Initially, at step 200 a video stream from the endoscope 20 is received at the image processing device 30. More specifically, the image processing device 30 reads one frame (i.e., a still monocular image) at a time from the video stream. The image may be of any suitable resolution, e.g., 4K, 1080p, 720p, etc. At step 202, the single frame is resized (e.g., downsized) to a smaller size (i.e., resolution) , which may be reduced by a factor of from about 1.5 to about 5. Resizing may be accomplished any suitable image resizing algorithm to reduce the resolution of the image to a desired image size. The first resizing operation, i.e., resizing the input image, may be implemented using any suitable interpolation technique, including, but not limited to, an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, and/or a bicubic interpolation.
At step 204, the resized image is processed using a convolutional neural network yielding an estimated depth map, as described above with respect to FIG. 2. The depth map estimation is performed on the resized image, allowing for faster processing and generation of the depth map due to the smaller resolution size on which depth estimation is being performed.
At step 206, the estimated depth map is resized (e.g., enlarged) to the original image size, since the estimated depth map was obtained from the smaller image. The second resizing operation, i.e., resizing the depth map, may be implemented using any suitable interpolation technique, including, but not limited to, a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, an area interpolation, and combinations thereof.
Two resizing operations allow for faster image generation while maintaining the quality and resolution of the generated image. As noted above, the input image is first resized to a smaller size to perform depth estimation portion of the algorithm. Thereafter, the estimated depth map from the algorithm is then resized back to original size of the input image to generate the right image. Without the resizing operations, the processing speed of the algorithm would be adversely affected.
At step 208, the image processing device 30 samples the original input image and generates the counterpart (e.g., right) image based on the resized depth map. Finally, at step 210,  the left original image and the right generated image are combined as a stereoscopic image and displayed on the display 40.
The image generation algorithm according to the present disclosure was tested to demonstrate the effect of two resizing operations on stereoscopic image generation from a single image. Two algorithms, one with two resizing operations, and one without resizing operations, were executed on a personal computer (PC) with an NVIDIA GTX 1070 GPU, running Windows 10, CUDA 10.2.89, cuDNN 8.0.5, and PyTorch 1.6.0 (hereinafter “Windows PC” ) . Net interference and total time were measured. “Net inference” indicates the time that the Windows PC took to calculate the estimated depth map using the convolutional neural network. “Total time” indicates the total processing of one frame, i.e., from reading one input frame to generating the stereoscopic images. The statistics were averaged for 500 frames. The original size of each frame was 1728 x 512. The resized smaller size of the frame was 896 x 256. The results are summarized in Table 1 below and demonstrate that the algorithm with resizing operations has a significant effect on the time to generate depth maps and obtain stereoscopic images.
  Net inference (ms) Total time (ms)
Without resizing operations 42.535 77.321
With resizing operations 18.808 48.842
Table 1
The image generation algorithm according to the present disclosure was also tested using an open-source package TensorRT TM developed by the
Figure PCTCN2022085011-appb-000001
corporation to increase the processing speed of the convolutional neural network. TensorRT TM is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. The Windows PC was used to execute the algorithm of this disclosure without TensorRT. Another PC with an NVIDIA GTX 1070 GPU, running Ubuntu 18.04, CUDA 10.0, cuDNN7.6.5, PyTorch 1.6.0, and TensorRT 7.0.0.11 executed the algorithm. As the data in Table 2 shows, both, “net interference” and “total time” were significantly improved by TensorRT taking advantage of the GPU processing.
  Net inference (ms) Total time (ms)
Without TensorRT 42.535 77.321
With TensorRT 13.284 27.454
Table 2
The disclosed method and techniques may be implemented in hardware, software, firmware, virtualized computer environments, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include non-transitory computer-readable media, which corresponds to a tangible medium such as data storage media (e.g., RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer) .
While several embodiments of the disclosure have been shown in the drawings and/or described herein, it is not intended that the disclosure be limited thereto, as it is intended that the disclosure be as broad in scope as the art will allow and that the specification be read likewise. Therefore, the above description should not be construed as limiting, but merely as exemplifications of particular embodiments. Those skilled in the art will envision other modifications within the scope of the claims appended hereto.

Claims (20)

  1. An image processing device for generating a stereoscopic image, the image processing device comprising:
    a processor; and
    a memory, including instructions stored thereon, which when executed by the processor cause the image processing device to:
    resize a monocular image having a first resolution yielding a resized image having a second resolution;
    calculate an estimated depth map of the monocular image based on the resized image;
    resize the estimated depth map yielding a resized estimated depth map having the first resolution;
    generate a counterpart monocular image based on the resized estimated depth map; and
    generate a stereoscopic image based on the monocular image and the counterpart monocular image.
  2. The image processing device according to claim 1, wherein the second resolution is smaller than the first resolution.
  3. The image processing device according to claim 1, wherein the monocular image is a frame from a video stream.
  4. The image processing device according to claim 1, wherein the instructions, when executed by the processor, further cause the image processing device to execute a convolutional neural network to calculate the estimated depth map.
  5. The image processing device according to claim 1, wherein the instructions, when executed by the processor, further cause the image processing device to resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  6. The image processing device according to claim 1, wherein the instructions, when executed by the processor, further cause the image processing device to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
  7. An imaging system for generating a stereoscopic image, the imaging system comprising:
    a monocular endoscope configured to capture a monocular image;
    an image processing device including:
    a processor; and
    a memory, including instructions stored thereon, which when executed by the processor cause the image processing device to:
    resize the monocular image having a first resolution yielding a resized image having a second resolution;
    calculate an estimated depth map of the monocular image based on the resized image;
    resize the estimated depth map yielding a resized estimated depth map having the first resolution;
    generate a counterpart monocular image based on the resized estimated depth map; and
    generate a stereoscopic image based on the monocular image and the counterpart monocular image.
  8. The imaging system according to claim 7, further comprising a stereoscopic display configured to display the stereoscopic image.
  9. The imaging system according to claim 7, wherein the second resolution is smaller than the first resolution.
  10. The imaging system according to claim 7, wherein the monocular image is a frame from a video stream.
  11. The imaging system according to claim 7, wherein the instructions, when executed by the processor, further cause the image processing device to execute a convolutional neural network to calculate the estimated depth map.
  12. The imaging system according to claim 7, wherein the instructions, when executed by the processor, further cause the image processing device to resize the monocular image using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  13. The imaging system according to claim 7, wherein the instructions, when executed by the processor, further cause the image processing device to resize the estimated depth map using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
  14. A method for generating a stereoscopic image, the method comprising:
    resizing a monocular image having a first resolution yielding a resized image having a second resolution;
    calculating an estimated depth map of the monocular image based on the resized image;
    resizing the estimated depth map yielding a resized estimated depth map having the first resolution;
    generating a counterpart monocular image based on the resized estimated depth map; and
    generating a stereoscopic image based on the monocular image and the counterpart monocular image.
  15. The method according to claim 14, further comprising:
    receiving the monocular image as a frame from a video stream.
  16. The method according to claim 14, wherein the second resolution is smaller than the first resolution.
  17. The method according to claim 14, further comprising:
    outputting the stereoscopic image on a stereoscopic display.
  18. The method according to claim 14, wherein calculating the estimated depth map further includes executing a convolutional neural network.
  19. The method according to claim 14, wherein resizing the monocular image further includes using at least one of an area interpolation, a nearest-neighbor interpolation, a bilinear interpolation, or a bicubic interpolation.
  20. The method according to claim 14, wherein resizing the estimated depth map further includes using at least one of a bilinear interpolation, a nearest-neighbor interpolation, a linear interpolation, a bicubic interpolation, a trilinear interpolation, or an area interpolation.
PCT/CN2022/085011 2022-04-02 2022-04-02 System and method of real-time stereoscopic visualization based on monocular camera WO2023184526A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/085011 WO2023184526A1 (en) 2022-04-02 2022-04-02 System and method of real-time stereoscopic visualization based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/085011 WO2023184526A1 (en) 2022-04-02 2022-04-02 System and method of real-time stereoscopic visualization based on monocular camera

Publications (1)

Publication Number Publication Date
WO2023184526A1 true WO2023184526A1 (en) 2023-10-05

Family

ID=88198823

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085011 WO2023184526A1 (en) 2022-04-02 2022-04-02 System and method of real-time stereoscopic visualization based on monocular camera

Country Status (1)

Country Link
WO (1) WO2023184526A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6847392B1 (en) * 1996-10-31 2005-01-25 Nec Corporation Three-dimensional structure estimation apparatus
US20120274629A1 (en) * 2011-04-28 2012-11-01 Baek Heumeil Stereoscopic image display and method of adjusting stereoscopic image thereof
US20170235277A1 (en) * 2016-02-12 2017-08-17 Samsung Electronics Co., Ltd. Method and apparatus for processing holographic image
US20170366795A1 (en) * 2016-06-17 2017-12-21 Altek Semiconductor Corp. Stereo image generating method and electronic apparatus utilizing the method
CN111179326A (en) * 2019-12-27 2020-05-19 精英数智科技股份有限公司 Monocular depth estimation algorithm, system, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6847392B1 (en) * 1996-10-31 2005-01-25 Nec Corporation Three-dimensional structure estimation apparatus
US20120274629A1 (en) * 2011-04-28 2012-11-01 Baek Heumeil Stereoscopic image display and method of adjusting stereoscopic image thereof
US20170235277A1 (en) * 2016-02-12 2017-08-17 Samsung Electronics Co., Ltd. Method and apparatus for processing holographic image
US20170366795A1 (en) * 2016-06-17 2017-12-21 Altek Semiconductor Corp. Stereo image generating method and electronic apparatus utilizing the method
CN111179326A (en) * 2019-12-27 2020-05-19 精英数智科技股份有限公司 Monocular depth estimation algorithm, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10966592B2 (en) 3D endoscope apparatus and 3D video processing apparatus
US20190051039A1 (en) Image processing apparatus, image processing method, program, and surgical system
JP5684033B2 (en) IMAGING DEVICE AND ENDOSCOPE DEVICE OPERATION METHOD
US20140336461A1 (en) Surgical structured light system
US20150215614A1 (en) Imaging system and method
US20140293007A1 (en) Method and image acquisition system for rendering stereoscopic images from monoscopic images
US9635343B2 (en) Stereoscopic endoscopic image processing apparatus
US20160295194A1 (en) Stereoscopic vision system generatng stereoscopic images with a monoscopic endoscope and an external adapter lens and method using the same to generate stereoscopic images
US11030745B2 (en) Image processing apparatus for endoscope and endoscope system
US10993603B2 (en) Image processing device, image processing method, and endoscope system
US10609354B2 (en) Medical image processing device, system, method, and program
JPWO2016043063A1 (en) Image processing apparatus and image processing method
US20220012954A1 (en) Generation of synthetic three-dimensional imaging from partial depth maps
CN114630611A (en) System and method for changing visual direction during video-guided clinical surgery using real-time image processing
US11426052B2 (en) Endoscopic system
EP3029934A1 (en) Image outputting device
CN109068035B (en) Intelligent micro-camera array endoscopic imaging system
WO2023184526A1 (en) System and method of real-time stereoscopic visualization based on monocular camera
JP7179837B2 (en) Endoscope device, endoscope image display method, and operation method of endoscope device
JPWO2005091649A1 (en) 3D display method using video images continuously acquired by a single imaging device
WO2023184527A1 (en) System and method for unsupervised stereoscopic reconstruction with disparity consistency
WO2023184525A1 (en) System and method for deep learning based hybrid image enlargement
EP3119264B1 (en) Optically adaptive endoscope
JP2014068796A (en) Stereoscopic endoscope apparatus
WO2020084784A1 (en) Image processing device and endoscope system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934355

Country of ref document: EP

Kind code of ref document: A1