WO2024050918A1 - Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium - Google Patents

Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium Download PDF

Info

Publication number
WO2024050918A1
WO2024050918A1 PCT/CN2022/125009 CN2022125009W WO2024050918A1 WO 2024050918 A1 WO2024050918 A1 WO 2024050918A1 CN 2022125009 W CN2022125009 W CN 2022125009W WO 2024050918 A1 WO2024050918 A1 WO 2024050918A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth
endoscope
virtual
network
Prior art date
Application number
PCT/CN2022/125009
Other languages
French (fr)
Chinese (zh)
Inventor
刘宏斌
田庆瑶
张子惠
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Publication of WO2024050918A1 publication Critical patent/WO2024050918A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/06Devices, other than using radiation, for detecting or locating foreign bodies ; determining position of probes within or on the body of the patient
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the present application relates to the technical field of endoscope positioning, and in particular to an endoscope positioning method, electronic device and non-transitory computer-readable storage medium.
  • An endoscope is a testing instrument that integrates traditional optics, ergonomics, precision machinery, modern electronics, mathematics, and software. It has image sensors, optical lenses, light sources, mechanical devices, etc. It can enter the stomach through the mouth or enter the body through other natural orifices. Endoscopes can see lesions that cannot be shown by X-rays, so they have become a commonly used technical method in medical examinations.
  • endoscope positioning include: (1) Extracting the depth of the endoscopic image through the shape from shading (SFS) method, and identifying the part with greater depth as the airway. After the airway is extracted, the model reconstructed from the preoperative CT is compared, and the current image is mapped to the airway branch where the camera is located, or the endoscope movement is estimated based on changes in the deepest position of the airway in adjacent images. This method is possible at airway bifurcations, but it is difficult to provide continuous endoscopic positioning information when there is no or only one airway in the field of view. (2) Extract the feature points of the endoscopic image through the Structure From Motion (SFM) method.
  • SFM Structure From Motion
  • This application provides an endoscope positioning method, electronic equipment and non-transitory computer-readable storage medium to solve the shortcomings in the existing technology of being unable to provide continuous positioning information and easily causing positioning loss, and to achieve rapid positioning of the endoscope. , accurate positioning and the ability to obtain continuous pose information.
  • This application provides an endoscope positioning method, including:
  • Depth image of tn frame image wherein, the virtual endoscope is determined based on the real endoscope;
  • the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
  • the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network
  • the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator
  • the first generator is used to convert the depth image into a real-style endoscopic image
  • the second generator is used to convert the real-style endoscope image into The image is converted into a depth image
  • the depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:
  • Establish a virtual model obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;
  • a loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;
  • the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
  • the depth image will be and the depth image d tn or the depth image and the depth image
  • the method further includes:
  • the depth registration network is trained in the following manner:
  • the loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;
  • An endoscope positioning method provided according to this application also includes:
  • a registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network, and the pose estimation information of the real endoscope is corrected according to the corrected pose obtained by the registration method based on an iterative optimization algorithm to eliminate Cumulative error.
  • a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
  • pose estimation information Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
  • a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
  • An endoscope positioning method provided according to this application also includes:
  • the RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are Input the pre-trained deep registration network together;
  • the RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
  • This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program, the endoscope is implemented as any one of the above. Positioning method.
  • the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program When executed by a processor, it implements any of the above endoscope positioning methods.
  • the present application also provides a computer program product, which includes a computer program.
  • the computer program When the computer program is executed by a processor, the computer program implements any one of the above endoscope positioning methods.
  • the endoscope positioning method provided by this application can quickly, accurately and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network when the initial pose of the real endoscope is known. pose information.
  • the deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
  • Figure 1 is one of the flow diagrams of the endoscope positioning method provided by this application.
  • FIG. 2 is a schematic diagram of the depth extraction network structure provided by this application.
  • Figure 3 is a schematic flow chart of the training method of the depth extraction network provided by this application.
  • Figure 4a is a schematic diagram of the depth extraction network generator architecture provided by this application.
  • FIG. 4b is a schematic diagram of the deep extraction network Resnet block architecture provided by this application.
  • Figure 4c is a schematic diagram of the depth extraction network discriminator architecture provided by this application.
  • Figure 5 is a schematic flow chart of the training method of the deep registration network provided by this application.
  • Figure 6 is a schematic diagram of the deep registration network architecture provided by this application.
  • Figure 7 is one of the flow diagrams of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
  • Figure 8 is the second schematic flow chart of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
  • Figure 9 is the second schematic flow chart of the endoscope positioning method provided by this application.
  • Figure 10 is a schematic structural diagram of an electronic device provided by this application.
  • the endoscope positioning method of the present application is described below in conjunction with Figures 1-9. As shown in Figure 1, the method includes:
  • the endoscope positioning method can be used in the natural cavities of the human body such as the respiratory tract, biliary tract, and cerebral ventricle.
  • the t-th frame image also known as range image, refers to an image in which the distance (depth) from the image collector to each point in the scene is used as a pixel value. It directly reflects the geometry of the visible surface of the scene. .
  • Depth images can be calculated into point cloud data after coordinate conversion, and point cloud data with rules and necessary information can also be back-calculated into depth image data.
  • S102 Obtain the depth image dtn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the real endoscope collection based on the pre-trained depth extraction network The depth image of the tnth frame image Wherein, the virtual endoscope is determined based on the real endoscope.
  • the depth image d tn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model is obtained, or the depth image dtn of the tnth frame image collected by the real endoscope is obtained.
  • the virtual endoscope moves together with the movement of the real endoscope in the target virtual model.
  • the positioning position of the virtual endoscope at the tn frame in the target virtual model means that the real endoscope is collecting the tnth frame image.
  • the positioning position at that time corresponds to the target virtual model.
  • n ⁇ 10, that is, the images within ten frames before the current frame image, so that the tn frame and the t frame have more similar feature points.
  • n in this method is not fixed.
  • the virtual endoscope needs to be determined based on the real endoscope, so the internal parameters of the virtual endoscope need to be consistent with the internal parameters of the real endoscope.
  • Illustrative Use MATLAB software to perform checkerboard calibration on a real endoscope to obtain the internal reference of the endoscope.
  • the internal reference of the real endoscope is:
  • the image pixels are:
  • width*length width ⁇ height
  • the parameters of the virtual endoscope are:
  • the depth image can be And the depth image d tn is input into the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • Depth images can also be and depth images Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • S104 Convert the relative pose estimation information to The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
  • the relative pose estimation information will be obtained
  • the pose estimation information when collecting the tnth frame image with the real endoscope By superimposing, the pose estimation information of the t-th frame image collected by the real endoscope can be obtained. According to the pose estimation information Position the real endoscope.
  • the pose information of the initial position of the real endoscope It can be learned when the deep registration network is initialized.
  • the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained deep registration network
  • the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator
  • the first generator is used to convert the depth image into a real-style endoscopic image
  • the second generator is used to convert the real-style endoscope image into The image is converted into a depth image
  • the depth extraction network based on the recurrent generative adversarial network and the depth registration network is trained in the following way:
  • S301 Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual posture information corresponding to the virtual endoscope when collecting the virtual image.
  • a depth registration network needs to be trained first, and the depth extraction network needs to apply the trained depth registration network.
  • the style of an image refers to the texture, color, and visual patterns at different spatial scales in the image.
  • the depth extraction network performs training supervision, which can improve the robustness of the depth extraction network.
  • virtual models such as virtual models for the respiratory tract, virtual models for the biliary tract, etc. Corresponding virtual models can be established according to the needs of use.
  • the target body corresponding to the preset real endoscopic image is consistent with the target body corresponding to the virtual model.
  • the virtual model is a virtual model of the respiratory tract established based on the respiratory tract
  • the preset real endoscopic image is also an image of the collected respiratory tract.
  • S303 Use the preset real endoscopic image, the depth image of the virtual image, and the virtual pose information as training data to perform weakly supervised training on the initial depth extraction network.
  • the depth image and virtual pose information obtained in the above steps are used as training data to perform weakly supervised training on the initial depth extraction network.
  • S304 Obtain a loss function based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network.
  • Cycle GAN includes a first generator G image , a first discriminator G image , a second generator G depth and a second discriminator D depth , which combines the depth image domain and the endoscope
  • the image domains are denoted Z and X respectively.
  • mapping G depth For an endoscopic image x ⁇ X, the depth extraction algorithm aims to learn a mapping G depth : Next, map G image : Z ⁇ X will Reconstruct to domain The difference from x t after reconstruction to domain X. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
  • y is a variable, representing a certain frame of image
  • p represents the probability distribution
  • the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. image, therefore, a generative adversarial loss is introduced, where LS-GAN loss can be used:
  • is used to omit image or depth.
  • y ⁇ p(data) represents the distribution of the sample following domain X or Y.
  • the motion trajectory of the virtual endoscope can be collected from the virtual model, and the pose and corresponding depth image of the virtual endoscope at each moment can be recorded.
  • the pose and corresponding depth images impose view consistency constraints between the generated image frames collected by real endoscopes, and the image view consistency loss is added based on the Perspective-n-point (PnP) based on the adversarial loss.
  • PnP Perspective-n-point
  • t tn,t (t x , t y , t z ) is the translation vector of the camera from time tn to time t;
  • the camera rotation matrix R tn,t from time tn to time t is calculated by the following formula:
  • ⁇ 1 sin ⁇
  • ⁇ 2 sin ⁇
  • ⁇ 3 sin ⁇
  • ⁇ 1 cos ⁇
  • ⁇ 2 cos ⁇
  • ⁇ 3 cos ⁇
  • view consistency is also added to x tn and x t and the generated depth map and Although the relative pose of the endoscope cannot be collected at this time, there is a pre-trained depth registration network based on the depth pose estimation algorithm. and The relative pose of the corresponding endoscope can be calculated. Load the pre-trained pose estimation network during training to estimate the relative motion p tn,t of the endoscope. At this time, an ideal depth image estimate should include information that allows the pose estimation network to capture the motion of the endoscope, thus obtaining the reconstruction loss obtained by view consistency:
  • Depth map and The inconsistent z diff is defined as:
  • the geometric consistency loss is defined as:
  • the total loss function of deep extraction network training is:
  • ⁇ , ⁇ , ⁇ , ⁇ 1 , ⁇ 2 , and eta are hyperparameters that adjust the weight of each loss.
  • S305 Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network.
  • the deep extraction network Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network.
  • FIG. 4(a), 4(b), and 4(c) it is a schematic diagram of the architecture of the deep extraction network, including (a) generator, (b) Resnet block in the generator, (c) discriminator device.
  • the dimensionality of the tensor shown in the figure is based on the input of image size 1 ⁇ 256 ⁇ 256; Res(256, 256) represents the Resnet block with input and output channels of 256; IN represents the Instance Norm layer, and Leaky ReLU represents Leaky ReLU. activation function.
  • the depth extraction network can be trained with 7 preset real endoscopic video segments and 8 segments of data collected by virtual endoscopy, including multiple preset real endoscopic images, 2187 depth images and corresponding virtual Endoscopic position.
  • the generator is a conventional encoder-decoder architecture, in which the bottleneck layer consists of six Resnet blocks and the discriminator consists of five convolutional layers.
  • the Adam optimizer is used to train for 100 rounds.
  • ⁇ 1 , ⁇ 2 and ⁇ are set to 0.3, 5 and 5 respectively.
  • ⁇ , ⁇ and ⁇ are set to 10, 5 and 1 respectively throughout the training process.
  • the parameters of the depth extraction network are updated by continuously optimizing the loss function obtained in the above steps until the final depth extraction network is determined by the preset number of rounds.
  • the preset number of rounds can be 50 to 300 rounds, and further can be 100 rounds to 200 rounds.
  • the trained depth extraction network can generate depth images with clearer outlines than depth extraction networks such as SfMLearner. Compared with only using deep extraction networks such as Cycle GAN, it can ensure that the structure of the input image is not changed. Depth images with stable and knowable scales (basically the same scale as the training data) can be generated.
  • the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
  • the depth image will be and the depth image or the depth image and the depth image Before inputting the pre-trained deep registration network, the method further includes:
  • the depth estimation network estimates the depth information z from an input endoscopic image
  • the pose network estimates the relative poses T and R of the camera between the two images through the input two endoscopic images.
  • the depth estimation network can estimate the depth images of the two frames of images and
  • the pose network can estimate the relative motion of the camera t tn,t and R tn,t .
  • warping refers to manipulating the image to deform the pixels in the image.
  • the geometric consistency loss is defined as:
  • the loss function can include the following losses:
  • the depth extraction algorithm aims to learn a mapping G depth :
  • map G image Z ⁇ X will Rebuild to domain X, completing the loop.
  • the conversion from Z domain to X domain is similar.
  • the network model imposes cycle consistency loss on G image and G depth :
  • p represents the probability distribution, express expectations.
  • the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator.
  • a generative adversarial loss is introduced, here the LS-GAN loss is used:
  • is used to omit image or depth.
  • y ⁇ p(data) represents the distribution of the sample following domain X or Y.
  • the scale of the depth image obtained by the above two depth extraction networks is fuzzy and unitless, so it needs to be calibrated.
  • specific calibration methods include the following two methods. At least one of the following two methods can be used when calibrating:
  • the visual range of the real endoscope is segmented according to the depth threshold, and the diameter of the area above the threshold is the same diameter as the depth peak in the lumen in the virtual model established before surgery.
  • the depth is compared to obtain the true endoscope scale. For example, if the depth threshold is set to 5, the depth portion higher than the threshold in the depth image 0 extracted by the real endoscope is segmented into a circle with a diameter of 10 pixels.
  • the corresponding depth image contour can be found as a circle with a peak diameter of 10 pixels.
  • the pose network and the depth network have the same fuzzy scale.
  • the deep registration network is trained in the following manner:
  • S501 Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual posture information when the virtual endoscope collects the virtual image.
  • the deep registration network is a deep neural network in the form of an encoder-decoder.
  • the network input is two frames of depth information.
  • the encoder uses the structure of the FlowNetC encoder (the optical flow extracted by FlowNet is a simulation of the sports field).
  • the decoder uses several layers of CNN (Convolutional Neural Network) to finally transform the encoded information into It is the 6DOF (ie 3D translation and 3D Euler angle) pose parameter output.
  • CNN Convolutional Neural Network
  • S502 Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when two adjacent frames of virtual images are collected.
  • the depth image of the virtual image obtained in the above steps is input into the initial depth registration network for weak supervision training.
  • the output of the initial depth registration network can obtain the relative pose of the virtual endoscope when collecting two adjacent frames of virtual images. Estimate information.
  • S503 Use the virtual pose information as a training true value, and obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information.
  • the virtual pose information is used as the training true value.
  • the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images can be obtained.
  • the virtual pose information is obtained.
  • the endoscope collects relative pose true value information and relative pose estimation information when two adjacent frames of images are collected.
  • S504 Obtain the loss function by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information.
  • the translation loss and rotation loss between the relative pose estimation information of the virtual endoscope and the real relative pose are calculated respectively, and the translation loss and rotation loss are weighted and summed to obtain the final loss function:
  • L t is the translation loss: T tm,t , are the translation vectors in the real relative pose information and relative pose estimation information respectively;
  • L r is the rotation loss: R tm,t , are the rotation vectors in the real relative pose information and relative pose estimation information respectively;
  • is a hyperparameter used to adjust the proportion of the two losses of rotation loss and displacement loss.
  • the pose estimation network is trained with 37 virtual endoscope pose and depth images collected from the virtual endoscope trajectory, including 11,904 frames.
  • the network uses a pre-trained FlowNetC encoder to regress pose vectors with three convolutional blocks.
  • the network is trained by using the Adam optimizer with an initial learning rate of 1e-5 and training time of 300 epochs. ⁇ is set to 100.
  • S505 Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
  • the depth extraction network learns the endoscope pose transformation parameters between two input depth images through deep learning methods, thereby updating the endoscope pose transformation for each input endoscopic image.
  • This depth registration network is based on depth registration rather than image intensity, allowing the algorithm to have no additional requirements for the rendering of virtual images acquired by virtual endoscopes in the simulator.
  • the deep learning algorithm directly estimates pose transformation, allowing the algorithm to run quickly and in real time to obtain real-time positioning results.
  • it also includes:
  • a registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
  • the registration method based on the iterative optimization algorithm has a slow calculation speed and runs in parallel with the deep registration network for pose correction. It can correct the pose estimation information of the real endoscope lazily, so that the cumulative error does not increase. It will continue to increase and improve positioning accuracy.
  • a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
  • S701 Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k ⁇ t.
  • this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame.
  • the k-th image frame of k ⁇ t is obtained as the current corrected image, that is, the pose estimation information of the real endoscope corresponding to the image frame of the corrected image has been estimated and obtained.
  • S702 Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network.
  • the pose estimation information of the k-th frame image is It has been estimated and can be obtained directly.
  • Segmentation here refers to regional segmentation of all cavity images in the detection field of view, that is, partitioning.
  • the depth image can be utilized Either an RGB image x t or an RGBD image (x t and ) divides the cavity.
  • the segmentation method can be to use depth threshold to segment depth images, or the network can be trained to learn channel segmentation of RGB or RGBD images.
  • S704 Based on the image similarity measure and the semantic segmentation similarity measure, estimate the information by pose Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
  • this method is a correction method based on image registration.
  • the segmentation process is recorded as Seg( ⁇ ), and the corrected pose of the real endoscope at time k
  • the corresponding airway segmentation result is Given the camera pose at time t-1, from the initial value of the pose Start optimization solution
  • the optimization process is described as:
  • SIM1( ⁇ ) is the image similarity measure
  • SIM2( ⁇ ) is the segmentation similarity measure
  • P t ′ is a variable
  • Seg(P′ t ) is the result of segmenting the corresponding image or depth map when the virtual pose of the virtual endoscope is P′ t .
  • This method can make up for the situation where two channels, one deep and one shallow, appear when only using image similarity measures. Similarity measures such as NCC (Normalized Cross Correlation) will focus on aligning the two depth maps. For the deep cavity part, the characteristics of the shallow cavity are ignored, resulting in inaccurate calculations.
  • NCC Normalized Cross Correlation
  • a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
  • S801 Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k ⁇ t.
  • this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. When performing correction, the k-th frame image with k ⁇ t is obtained as the current corrected image.
  • the virtual endoscope moves together with the movement of the real endoscope in the target virtual model.
  • the positioning pose of the virtual endoscope at the kth frame in the target virtual model is the real endoscope in the collection.
  • the positioning pose of the kth frame image corresponds to the target virtual model.
  • the method further includes:
  • S901 Use the RGB image feature extraction method to extract the feature information of the t-th frame image collected by the real endoscope, and combine the feature information of the t-th frame image with the depth image Input the pre-trained deep registration network together;
  • S902 Use the RGB image feature extraction method to extract the feature information of the t-nth frame image collected by the real endoscope or extract the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the t-nth frame target virtual image The feature information is extracted after texture mapping the target virtual image of the t-nth frame;
  • S903 Combine the feature information of the tn-th frame target virtual image and the depth image dtn , or combine the feature information of the tn-th frame image and the depth image Enter the pretrained deep registration network.
  • RGB feature extraction is integrated into the relative pose calculation of real-time positioning.
  • Input can make up for the problem that the endoscope pose is difficult to estimate when the depth map structure is single, and assist in estimating the movement of the real endoscope.
  • texture mapping needs to be done on the virtual endoscope image, and the texture needs to be close to the texture of the image collected by the real endoscope.
  • the endoscope positioning method provided by this application can quickly and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network after knowing the initial position of the real endoscope. posture information.
  • the deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
  • Figure 10 illustrates a schematic diagram of the physical structure of an electronic device.
  • the electronic device may include: a processor (processor) 1010, a communications interface (Communications Interface) 1020, a memory (memory) 1030 and a communication bus 1040.
  • the processor 1010, the communication interface 1020, and the memory 1030 complete communication with each other through the communication bus 1040.
  • the processor 1010 can call logical instructions in the memory 1030 to perform an endoscope positioning method, which method includes: obtaining a depth image of the current frame collected by the real endoscope, that is, the t-th frame image, based on a pre-trained depth extraction network.
  • the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
  • the above-mentioned logical instructions in the memory 1030 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
  • the present application also provides a computer program product.
  • the computer program product includes a computer program.
  • the computer program can be stored on a non-transitory computer-readable storage medium.
  • the computer can Execute the endoscope positioning method provided by each of the above methods.
  • the method includes: obtaining the depth image of the current frame collected by the real endoscope, that is, the t-th frame image based on the pre-trained depth extraction network.
  • the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
  • the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is implemented when executed by a processor to perform the endoscope positioning method provided by each of the above methods.
  • the method Including: based on the pre-trained depth extraction network to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network.
  • the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
  • the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Surgery (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Robotics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Image Analysis (AREA)
  • Endoscopes (AREA)

Abstract

Provided in the present application are an endoscope positioning method, an electronic device, and a non-transitory computer-readable storage medium. The method comprises: on the basis of a depth extraction network, acquiring a depth image (I) of a t-th image frame collected by a real endoscope; acquiring a depth image dt-n of a (t-n)-th target virtual image frame collected by a virtual endoscope, or on the basis of the depth extraction network, acquiring a depth image (II) of a (t-n)-th image frame collected by the real endoscope; inputting the depth image (I) and the depth image dt-n or inputting the depth image (I) and the depth image (II) into a depth registration network to obtain the relative position and orientation estimation information (III) of the real endoscope; and superposing the relative position and orientation estimation information (III) with the position and orientation estimation information (IIII) of the real endoscope collecting the (t-n)-th image frame, so as to obtain the position and orientation estimation information (IV) of the real endoscope collecting the t-th image frame. The method can quickly, accurately and continuously obtain the current position and orientation information of the real endoscope.

Description

内窥镜定位方法、电子设备和非暂态计算机可读存储介质Endoscope positioning method, electronic device and non-transitory computer-readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年09月06日提交的申请号为202211086312.X,发明名称为“内窥镜定位方法、电子设备和非暂态计算机可读存储介质”的中国专利申请的优先权,其通过引用方式全部并入本文。This application requires the priority of the Chinese patent application with the application number 202211086312. All incorporated herein by reference.
技术领域Technical field
本申请涉及内窥镜定位技术领域,尤其涉及一种内窥镜定位方法、电子设备和非暂态计算机可读存储介质。The present application relates to the technical field of endoscope positioning, and in particular to an endoscope positioning method, electronic device and non-transitory computer-readable storage medium.
背景技术Background technique
内窥镜是集中了传统光学、人体工程学、精密机械、现代电子、数学、软件等于一体的检测仪器。具有图像传感器、光学镜头、光源照明、机械装置等,它可以经口腔进入胃内或经其他天然孔道进入体内。利用内窥镜可以看到X射线不能显示的病变,因此成为了医学检查中常用的技术手段。An endoscope is a testing instrument that integrates traditional optics, ergonomics, precision machinery, modern electronics, mathematics, and software. It has image sensors, optical lenses, light sources, mechanical devices, etc. It can enter the stomach through the mouth or enter the body through other natural orifices. Endoscopes can see lesions that cannot be shown by X-rays, so they have become a commonly used technical method in medical examinations.
目前,内窥镜定位常用的方法包括:(1)通过明暗恢复形状(Shape from shading,SFS)方法提取内窥镜图像深度,将深度大的部分识别为气道。在提取出气道后,对比术前CT重建出的模型,将当前图像映射到相机处于的气道分支或者根据相邻图像中气道最深处位置的变化,估算内窥镜运动。该方法气道分叉处可能实现,而在视野中没有或只有一个气道的情况下难以提供连续的内窥镜定位信息。(2)通过运动结构恢复(Structure From Motion,SFM)方法,提取内窥镜图像特征点,对于相邻两帧图像,将特征点一一匹配,并据此解算Perspective-n-Point(PnP)进行内窥镜位姿估计。该方法在内窥镜图像的特征点较少或缺少特征点时Perspective-n-Point(PnP)将不能求解,出现内窥镜定位丢失的问题。(3)2D/3D配准方法,通过将内窥镜拍摄到的2D图像配准到术前重建出的虚拟模型上,从而得到内窥镜在模型中的位置。该方法基于迭代优化算法,因此得到每帧定位都需要较长的计算时间,而内窥镜在实际检查中的位姿变化很快,过长的计算时间容易造成定位丢失。Currently, commonly used methods for endoscope positioning include: (1) Extracting the depth of the endoscopic image through the shape from shading (SFS) method, and identifying the part with greater depth as the airway. After the airway is extracted, the model reconstructed from the preoperative CT is compared, and the current image is mapped to the airway branch where the camera is located, or the endoscope movement is estimated based on changes in the deepest position of the airway in adjacent images. This method is possible at airway bifurcations, but it is difficult to provide continuous endoscopic positioning information when there is no or only one airway in the field of view. (2) Extract the feature points of the endoscopic image through the Structure From Motion (SFM) method. For two adjacent frames of images, match the feature points one by one, and solve the Perspective-n-Point (PnP ) for endoscopic pose estimation. This method will not be able to solve the problem of Perspective-n-Point (PnP) when the endoscopic image has few or missing feature points, causing the problem of endoscope positioning loss. (3) 2D/3D registration method, by registering the 2D image captured by the endoscope to the virtual model reconstructed before surgery, thereby obtaining the position of the endoscope in the model. This method is based on an iterative optimization algorithm, so it requires a long calculation time to obtain the positioning of each frame. However, the position of the endoscope changes rapidly during actual inspection, and excessive calculation time can easily cause positioning loss.
发明内容Contents of the invention
本申请提供一种内窥镜定位方法、电子设备和非暂态计算机可读存储介质,用以解决现有技术中不能提供连续定位信息、易造成定位丢失的缺陷,实现对内窥镜的快速、准确定位并能够对获得连续的位姿信息。This application provides an endoscope positioning method, electronic equipment and non-transitory computer-readable storage medium to solve the shortcomings in the existing technology of being unable to provide continuous positioning information and easily causing positioning loss, and to achieve rapid positioning of the endoscope. , accurate positioning and the ability to obtain continuous pose information.
本申请提供一种内窥镜定位方法,包括:This application provides an endoscope positioning method, including:
基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像
Figure PCTCN2022125009-appb-000001
Obtain the depth image of the t-th frame image collected by the real endoscope based on the pre-trained depth extraction network
Figure PCTCN2022125009-appb-000001
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000002
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;
Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image
Figure PCTCN2022125009-appb-000002
Wherein, the virtual endoscope is determined based on the real endoscope;
将所述深度图像
Figure PCTCN2022125009-appb-000003
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000004
和所述深度图像
Figure PCTCN2022125009-appb-000005
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000006
The depth image
Figure PCTCN2022125009-appb-000003
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000004
and the depth image
Figure PCTCN2022125009-appb-000005
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000006
将所述相对位姿估计信息
Figure PCTCN2022125009-appb-000007
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000008
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000009
并根据所述位姿估计信息
Figure PCTCN2022125009-appb-000010
对所述真实内窥镜进行定位。
The relative pose estimation information
Figure PCTCN2022125009-appb-000007
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000008
Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
Figure PCTCN2022125009-appb-000009
And based on the pose estimation information
Figure PCTCN2022125009-appb-000010
Position the real endoscope.
根据本申请提供的一种内窥镜定位方法,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;According to an endoscope positioning method provided by this application, the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network, and the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscope image into The image is converted into a depth image;
基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:The depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:
建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;
获取预设真实内窥镜图像;Obtain preset real endoscopic images;
将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练;Use the preset real endoscopic image, the depth image of the virtual image and the virtual pose information as training data to perform weak supervision training on the initial depth extraction network;
基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数;A loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;
优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. Deep extraction network.
根据本申请提供的一种内窥镜定位方法,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;According to an endoscope positioning method provided by this application, the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
在将所述深度图像
Figure PCTCN2022125009-appb-000011
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000012
和所述深度图像
Figure PCTCN2022125009-appb-000013
输入预训练的所述深度配准网络之前,所述方法还包括:
The depth image will be
Figure PCTCN2022125009-appb-000011
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000012
and the depth image
Figure PCTCN2022125009-appb-000013
Before inputting the pre-trained deep registration network, the method further includes:
对所述深度图像
Figure PCTCN2022125009-appb-000014
和所述深度图像
Figure PCTCN2022125009-appb-000015
进行尺度标定以得到所述深度图像
Figure PCTCN2022125009-appb-000016
和所述深度图像
Figure PCTCN2022125009-appb-000017
的单位。
to the depth image
Figure PCTCN2022125009-appb-000014
and the depth image
Figure PCTCN2022125009-appb-000015
Perform scaling to obtain the depth image
Figure PCTCN2022125009-appb-000016
and the depth image
Figure PCTCN2022125009-appb-000017
The unit.
根据本申请提供的一种内窥镜定位方法,所述深度配准网络为通过如下方式训练得到的:According to an endoscope positioning method provided by this application, the depth registration network is trained in the following manner:
建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual pose information when the virtual endoscope collects the virtual image;
将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息;Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when collecting two adjacent frames of virtual images;
将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息;Using the virtual pose information as the training truth value, obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information;
通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数;The loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;
优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
根据本申请提供的一种内窥镜定位方法,还包括:An endoscope positioning method provided according to this application also includes:
采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network, and the pose estimation information of the real endoscope is corrected according to the corrected pose obtained by the registration method based on an iterative optimization algorithm to eliminate Cumulative error.
根据本申请提供的一种内窥镜定位方法,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:According to an endoscope positioning method provided by this application, a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
Figure PCTCN2022125009-appb-000018
其中k≤t;
Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
Figure PCTCN2022125009-appb-000018
where k≤t;
获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000019
Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network
Figure PCTCN2022125009-appb-000019
利用所述当前修正图像、或所述深度图像
Figure PCTCN2022125009-appb-000020
或所述当前修正图像和所述深度图像
Figure PCTCN2022125009-appb-000021
对所述真实内窥镜视野中的腔道图像进行语义分割;
Using the current corrected image or the depth image
Figure PCTCN2022125009-appb-000020
or the current corrected image and the depth image
Figure PCTCN2022125009-appb-000021
Perform semantic segmentation on the lumen image in the real endoscopic field of view;
基于图像相似性测度和语义分割相似性测度,以位姿估计信息
Figure PCTCN2022125009-appb-000022
为初始值进行优化求解,得到当前修正图像的修正位姿
Figure PCTCN2022125009-appb-000023
Based on image similarity measure and semantic segmentation similarity measure, pose estimation information
Figure PCTCN2022125009-appb-000022
Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
Figure PCTCN2022125009-appb-000023
将所述真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000024
替换为所述修正位姿
Figure PCTCN2022125009-appb-000025
The pose estimation information when the real endoscope collects the kth frame image
Figure PCTCN2022125009-appb-000024
Replace with the corrected pose
Figure PCTCN2022125009-appb-000025
根据本申请提供的一种内窥镜定位方法,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:According to an endoscope positioning method provided by this application, a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
Figure PCTCN2022125009-appb-000026
其中k≤t;
Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
Figure PCTCN2022125009-appb-000026
where k≤t;
获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d kObtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model;
将所述深度图像
Figure PCTCN2022125009-appb-000027
转换为对应的点云
Figure PCTCN2022125009-appb-000028
将所述深度图像d k转换为点云图像Y k
The depth image
Figure PCTCN2022125009-appb-000027
Convert to corresponding point cloud
Figure PCTCN2022125009-appb-000028
Convert the depth image d k into a point cloud image Y k ;
通过ICP算法求解Y k
Figure PCTCN2022125009-appb-000029
之间的相对位姿
Figure PCTCN2022125009-appb-000030
Solve Y k through the ICP algorithm to
Figure PCTCN2022125009-appb-000029
relative posture between
Figure PCTCN2022125009-appb-000030
采用所述相对位姿
Figure PCTCN2022125009-appb-000031
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000032
Adopt the relative pose
Figure PCTCN2022125009-appb-000031
Correcting the pose estimation information when the real endoscope collects the kth frame image
Figure PCTCN2022125009-appb-000032
根据本申请提供的一种内窥镜定位方法,还包括:An endoscope positioning method provided according to this application also includes:
采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像
Figure PCTCN2022125009-appb-000033
一起输入预训练的所述深度配准网络;
The RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are
Figure PCTCN2022125009-appb-000033
Input the pre-trained deep registration network together;
采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标虚拟图像进行纹理贴图后提取的;The RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
将所述第t-n帧目标虚拟图像的特征信息和所述深度图像d t-n,或将 所述第t-n帧图像的特征信息和所述深度图像
Figure PCTCN2022125009-appb-000034
输入预训练的所述深度配准网络。
Combine the feature information of the tnth frame target virtual image and the depth image dtn , or combine the feature information of the tnth frame image and the depth image
Figure PCTCN2022125009-appb-000034
Enter the pretrained deep registration network.
本申请还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述内窥镜定位方法。This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the endoscope is implemented as any one of the above. Positioning method.
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述内窥镜定位方法。The present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements any of the above endoscope positioning methods.
本申请还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述内窥镜定位方法。The present application also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the computer program implements any one of the above endoscope positioning methods.
本申请提供的内窥镜定位方法,通过在获知真实内窥镜初始位姿的情况下,采用预训练的深度提取网络和深度配准网络,可以快速、准确且连续的获得真实内窥镜当前的位姿信息。该方法中的深度提取网络和深度配准网络训练学习后针对不同的病人可以直接进行使用,不需要在术前进行训练,方便且节省时间。The endoscope positioning method provided by this application can quickly, accurately and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network when the initial pose of the real endoscope is known. pose information. The deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
附图说明Description of the drawings
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in this application or the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.
图1是本申请提供的内窥镜定位方法的流程示意图之一;Figure 1 is one of the flow diagrams of the endoscope positioning method provided by this application;
图2是本申请提供的深度提取网络结构示意图;Figure 2 is a schematic diagram of the depth extraction network structure provided by this application;
图3是本申请提供的深度提取网络的训练方法的流程示意图;Figure 3 is a schematic flow chart of the training method of the depth extraction network provided by this application;
图4a是本申请提供的深度提取网络生成器架构示意图;Figure 4a is a schematic diagram of the depth extraction network generator architecture provided by this application;
图4b是本申请提供的深度提取网络Resnet块架构示意图Figure 4b is a schematic diagram of the deep extraction network Resnet block architecture provided by this application
图4c是本申请提供的深度提取网络判别器架构示意图Figure 4c is a schematic diagram of the depth extraction network discriminator architecture provided by this application.
图5是本申请提供的深度配准网络的训练方法的流程示意图;Figure 5 is a schematic flow chart of the training method of the deep registration network provided by this application;
图6是本申请提供的深度配准网络架构示意图;Figure 6 is a schematic diagram of the deep registration network architecture provided by this application;
图7是本申请提供的基于迭代优化算法的配准方法获得修正位姿的方法的流程示意图之一;Figure 7 is one of the flow diagrams of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
图8是本申请提供的基于迭代优化算法的配准方法获得修正位姿的方法的流程示意图之二;Figure 8 is the second schematic flow chart of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
图9是本申请提供的内窥镜定位方法的流程示意图之二;Figure 9 is the second schematic flow chart of the endoscope positioning method provided by this application;
图10是本申请提供的电子设备的结构示意图。Figure 10 is a schematic structural diagram of an electronic device provided by this application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application. , not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
下面结合图1-图9描述本申请的内窥镜定位方法,如图1所示,该方法包括:The endoscope positioning method of the present application is described below in conjunction with Figures 1-9. As shown in Figure 1, the method includes:
S101:基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像
Figure PCTCN2022125009-appb-000035
S101: Obtain the depth image of the t-th frame image collected by a real endoscope based on the pre-trained depth extraction network
Figure PCTCN2022125009-appb-000035
在本申请实施例中,该内窥镜定位方法可以使用在呼吸道、胆道、脑室等人体自然腔道。该方法中首先需要获取真实内窥镜采集的当前帧即第t帧图像的深度图像
Figure PCTCN2022125009-appb-000036
深度图像(depth image)也被称为距离影像(range image),是指将从图像采集器到场景中各点的距离(深度)作为像素值的图像,它直接反映了景物可见表面的几何形状。深度图像经过坐标转换可以计算为点云数据,有规则及必要信息的点云数据也可以反算为深度图像数据。
In the embodiment of the present application, the endoscope positioning method can be used in the natural cavities of the human body such as the respiratory tract, biliary tract, and cerebral ventricle. In this method, we first need to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image.
Figure PCTCN2022125009-appb-000036
Depth image, also known as range image, refers to an image in which the distance (depth) from the image collector to each point in the scene is used as a pixel value. It directly reflects the geometry of the visible surface of the scene. . Depth images can be calculated into point cloud data after coordinate conversion, and point cloud data with rules and necessary information can also be back-calculated into depth image data.
S102:获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000037
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的。
S102: Obtain the depth image dtn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the real endoscope collection based on the pre-trained depth extraction network The depth image of the tnth frame image
Figure PCTCN2022125009-appb-000037
Wherein, the virtual endoscope is determined based on the real endoscope.
具体的,获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000038
虚拟内窥镜在目标虚拟模型中是随着真实内窥镜的移动一起移动的,虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处 即是将真实内窥镜在采集第t-n帧图像时的定位位姿处对应到目标虚拟模型中得到的。其中,n≤10,即当前帧图像前十帧以内的图像,以使得t-n帧和t帧有较多的相似特征点。本方法中n的值不是固定的,例如当前帧是第8帧图像时,t-n可以等于7即是第7帧图像,此时n=1,也可以等于3即是第3帧图像,此时n=5。在当前帧为第9帧图像时,t-n可以等于7即第7帧图像,此时n=2。
Specifically, the depth image d tn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model is obtained, or the depth image dtn of the tnth frame image collected by the real endoscope is obtained.
Figure PCTCN2022125009-appb-000038
The virtual endoscope moves together with the movement of the real endoscope in the target virtual model. The positioning position of the virtual endoscope at the tn frame in the target virtual model means that the real endoscope is collecting the tnth frame image. The positioning position at that time corresponds to the target virtual model. Among them, n≤10, that is, the images within ten frames before the current frame image, so that the tn frame and the t frame have more similar feature points. The value of n in this method is not fixed. For example, when the current frame is the 8th frame image, tn can be equal to 7, which is the 7th frame image, in which case n = 1, or it can be equal to 3, which is the 3rd frame image, in which case n=5. When the current frame is the 9th frame image, tn can be equal to 7, which is the 7th frame image, and at this time n=2.
虚拟内窥镜需要基于真实内窥镜进行确定,因此虚拟内窥镜的内参需要与真实内窥镜的内参一致。The virtual endoscope needs to be determined based on the real endoscope, so the internal parameters of the virtual endoscope need to be consistent with the internal parameters of the real endoscope.
示例性的:对真实内窥镜使用MATLAB软件进行棋盘格标定,得到内窥镜的内参。Illustrative: Use MATLAB software to perform checkerboard calibration on a real endoscope to obtain the internal reference of the endoscope.
真实内窥镜的内参为:The internal reference of the real endoscope is:
Figure PCTCN2022125009-appb-000039
Figure PCTCN2022125009-appb-000039
图像像素为:The image pixels are:
宽度*长度=width×heightwidth*length=width×height
令:make:
平均焦距长度:
Figure PCTCN2022125009-appb-000040
Average focal length:
Figure PCTCN2022125009-appb-000040
窗口中心x轴坐标:wcx=-2×(cx-width/2)/widthWindow center x-axis coordinate: wcx=-2×(cx-width/2)/width
窗口中心y轴坐标:wcy=2×(cy-height/2)/heightWindow center y-axis coordinate: wcy=2×(cy-height/2)/height
此时,设计虚拟内窥镜时,虚拟内窥镜的参数为:At this time, when designing the virtual endoscope, the parameters of the virtual endoscope are:
视场角:Field of view:
ViewAngle=180/π*(2.0*atan2(height/2.0,focal_length))ViewAngle=180/π*(2.0*atan2(height/2.0,focal_length))
窗口大小:Window size:
WindowSize=[width,height]WindowSize=[width,height]
窗口中心位置:Window center position:
WindowCenter=[wcx,wcy]WindowCenter=[wcx,wcy]
S103:将所述深度图像
Figure PCTCN2022125009-appb-000041
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000042
和所述深度图像
Figure PCTCN2022125009-appb-000043
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000044
S103: Convert the depth image to
Figure PCTCN2022125009-appb-000041
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000042
and the depth image
Figure PCTCN2022125009-appb-000043
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000044
具体的,可以通过将深度图像
Figure PCTCN2022125009-appb-000045
和深度图像d t-n输入预训练的深度配 准网络,得到真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000046
也可以将深度图像
Figure PCTCN2022125009-appb-000047
和深度图像
Figure PCTCN2022125009-appb-000048
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000049
Specifically, the depth image can be
Figure PCTCN2022125009-appb-000045
And the depth image d tn is input into the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000046
Depth images can also be
Figure PCTCN2022125009-appb-000047
and depth images
Figure PCTCN2022125009-appb-000048
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000049
S104:将所述相对位姿估计信息
Figure PCTCN2022125009-appb-000050
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000051
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000052
并根据所述位姿估计信息
Figure PCTCN2022125009-appb-000053
对所述真实内窥镜进行定位。
S104: Convert the relative pose estimation information to
Figure PCTCN2022125009-appb-000050
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000051
Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
Figure PCTCN2022125009-appb-000052
And based on the pose estimation information
Figure PCTCN2022125009-appb-000053
Position the real endoscope.
具体的,将得到的相对位姿估计信息
Figure PCTCN2022125009-appb-000054
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000055
叠加,即可以获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000056
根据该位姿估计信息
Figure PCTCN2022125009-appb-000057
对所述真实内窥镜进行定位。
Specifically, the relative pose estimation information will be obtained
Figure PCTCN2022125009-appb-000054
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000055
By superimposing, the pose estimation information of the t-th frame image collected by the real endoscope can be obtained.
Figure PCTCN2022125009-appb-000056
According to the pose estimation information
Figure PCTCN2022125009-appb-000057
Position the real endoscope.
真实内窥镜初始位置的位姿信息
Figure PCTCN2022125009-appb-000058
可以是在深度配准网络初始化的时候获知的。
The pose information of the initial position of the real endoscope
Figure PCTCN2022125009-appb-000058
It can be learned when the deep registration network is initialized.
在一个实施例中,如图2中所示,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;In one embodiment, as shown in Figure 2, the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained deep registration network, and the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscope image into The image is converted into a depth image;
如图3中所示,基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:As shown in Figure 3, the depth extraction network based on the recurrent generative adversarial network and the depth registration network is trained in the following way:
S301:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息。S301: Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual posture information corresponding to the virtual endoscope when collecting the virtual image.
具体的,上述深度提取网络训练之前,需要先训练出深度配准网络,该深度提取网络需要应用训练好的深度配准网络。图像的风格是指图像中不同空间尺度的纹理、颜色和视觉图案。Specifically, before training the above-mentioned depth extraction network, a depth registration network needs to be trained first, and the depth extraction network needs to apply the trained depth registration network. The style of an image refers to the texture, color, and visual patterns at different spatial scales in the image.
在实际中,由于在真实内窥镜检查中得到内窥镜的位姿是比较困难的,因此,我们需要建立虚拟模型,通过虚拟内窥镜来获取大量的深度图像和虚拟位姿信息来对深度提取网络进行训练监督,由此可以提高深度提取网络的鲁棒性,虚拟模型可以有多种,如针对呼吸道的虚拟模型,针对胆道 的虚拟模型等,可以根据使用需要建立对应的虚拟模型。In practice, since it is difficult to obtain the pose of the endoscope during real endoscopy, we need to establish a virtual model and obtain a large amount of depth images and virtual pose information through the virtual endoscope to perform The depth extraction network performs training supervision, which can improve the robustness of the depth extraction network. There can be a variety of virtual models, such as virtual models for the respiratory tract, virtual models for the biliary tract, etc. Corresponding virtual models can be established according to the needs of use.
S302:获取预设真实内窥镜图像。S302: Obtain the preset real endoscopic image.
预设真实内窥镜图像对应的目标体与虚拟模型建立对应的目标体是一致的,例如虚拟模型是根据呼吸道建立的呼吸道虚拟模型,则预设真实内窥镜图像也是采集的呼吸道的图像。The target body corresponding to the preset real endoscopic image is consistent with the target body corresponding to the virtual model. For example, the virtual model is a virtual model of the respiratory tract established based on the respiratory tract, then the preset real endoscopic image is also an image of the collected respiratory tract.
S303:将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练。S303: Use the preset real endoscopic image, the depth image of the virtual image, and the virtual pose information as training data to perform weakly supervised training on the initial depth extraction network.
具体的,将上述步骤获得的深度图像和虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练。Specifically, the depth image and virtual pose information obtained in the above steps are used as training data to perform weakly supervised training on the initial depth extraction network.
S304:基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数。S304: Obtain a loss function based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network.
具体的,参考图2,循环生成对抗网络Cycle GAN包括第一生成器G image、第一判别器G image、第二生成器G depth和第二判别器D depth,将深度图像域和内窥镜图像域分别记为Z和X。 Specifically, referring to Figure 2, Cycle GAN includes a first generator G image , a first discriminator G image , a second generator G depth and a second discriminator D depth , which combines the depth image domain and the endoscope The image domains are denoted Z and X respectively.
循环一致性损失:Cycle consistency loss:
对于一张内窥镜图像x∈X,深度提取算法旨在学习一个映射G depth:X→Z,从x t生成其相应的深度图像
Figure PCTCN2022125009-appb-000059
接着,映射G image:Z→X将
Figure PCTCN2022125009-appb-000060
重建到域X,从而完成循环,循环一致性损失指的是将
Figure PCTCN2022125009-appb-000061
重建到域X后与x t的差距。从Z域到X域的转换也是类似。这里的重建循环中,网络模型对G image、G depth施加循环一致性损失:
For an endoscopic image x∈X, the depth extraction algorithm aims to learn a mapping G depth :
Figure PCTCN2022125009-appb-000059
Next, map G image : Z→X will
Figure PCTCN2022125009-appb-000060
Reconstruct to domain
Figure PCTCN2022125009-appb-000061
The difference from x t after reconstruction to domain X. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
Figure PCTCN2022125009-appb-000062
Figure PCTCN2022125009-appb-000062
其中,y为变量,表示某一帧图像,p表示概率分布,
Figure PCTCN2022125009-appb-000063
表示期望。
Among them, y is a variable, representing a certain frame of image, p represents the probability distribution,
Figure PCTCN2022125009-appb-000063
express expectations.
身份损失:Loss of identity:
为了对映射的学习添加约束,提出身份损失:To add constraints to the learning of mappings, an identity loss is proposed:
Figure PCTCN2022125009-appb-000064
Figure PCTCN2022125009-appb-000064
生成对抗损失:Generate adversarial loss:
在生成器完成映射循环的同时,判别器D image、D depth分别学习判别输 入的内窥镜图像和深度图像是真还是假;而生成器希望可以骗过判别器,生成可以被判别器认为真的图像,因此,引入生成对抗损失,这里可以采用LS-GAN损失: While the generator completes the mapping cycle, the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. image, therefore, a generative adversarial loss is introduced, where LS-GAN loss can be used:
Figure PCTCN2022125009-appb-000065
Figure PCTCN2022125009-appb-000065
其中·用来省略image或depth。y~p(data)代表样本服从域X或Y的分布。Among them, · is used to omit image or depth. y~p(data) represents the distribution of the sample following domain X or Y.
重建损失:Reconstruction loss:
为了使网络学习到给定尺度的深度图像估计,可以从虚拟模型中采集虚拟内窥镜的运动轨迹,记录每一时刻的虚拟内窥镜位姿和对应深度图像,通过采集的虚拟内窥镜位姿和对应深度图像在生成的真实内窥镜采集的图像帧之间施加视图一致性约束,在对抗损失基础上根据Perspective-n-point(PnP)添加了图像视图一致性损失。In order for the network to learn depth image estimation at a given scale, the motion trajectory of the virtual endoscope can be collected from the virtual model, and the pose and corresponding depth image of the virtual endoscope at each moment can be recorded. The pose and corresponding depth images impose view consistency constraints between the generated image frames collected by real endoscopes, and the image view consistency loss is added based on the Perspective-n-point (PnP) based on the adversarial loss.
有深度图像z t-n和z t,分别输入到生成器G image可以得到生成的内窥镜图像
Figure PCTCN2022125009-appb-000066
Figure PCTCN2022125009-appb-000067
由于t-n时刻和t时刻虚拟位姿信息在采集数据时也被记录下来,可以计算出从t-n时刻到t时刻的虚拟相对位姿p t-n,t=(t x,t y,t z,θ,φ,ψ)。已知相机内参K,齐次坐标下的像素点
Figure PCTCN2022125009-appb-000068
可以通过下式翘曲到
Figure PCTCN2022125009-appb-000069
There are depth images z tn and z t , which are input to the generator G image respectively to obtain the generated endoscopic image.
Figure PCTCN2022125009-appb-000066
and
Figure PCTCN2022125009-appb-000067
Since the virtual pose information at time tn and time t is also recorded when collecting data, the virtual relative pose p tn,t = (t x ,t y ,t z ,θ, from time tn to time t can be calculated φ,ψ). Known camera internal parameter K, pixel point under homogeneous coordinates
Figure PCTCN2022125009-appb-000068
It can be warped to
Figure PCTCN2022125009-appb-000069
Figure PCTCN2022125009-appb-000070
Figure PCTCN2022125009-appb-000070
其中,t t-n,t=(t x,t y,t z)为从t-n时刻到t时刻的相机的平移向量;从t-n时刻到t时刻的相机旋转矩阵R t-n,t由下式计算: Among them, t tn,t = (t x , t y , t z ) is the translation vector of the camera from time tn to time t; the camera rotation matrix R tn,t from time tn to time t is calculated by the following formula:
Figure PCTCN2022125009-appb-000071
Figure PCTCN2022125009-appb-000071
其中,α 1=sinθ,α 2=sinφ,α 3=sinψ,β 1=cosθ,β 2=cosφ,β 3=cosψ。 Among them, α 1 =sinθ, α 2 =sinφ, α 3 =sinψ, β 1 =cosθ, β 2 =cosφ, β 3 =cosψ.
在训练时使用n≤5。过大的n不能保证两张图像间有足够的共视区域。Use n≤5 when training. Too large n cannot ensure that there is enough common viewing area between the two images.
由于
Figure PCTCN2022125009-appb-000072
通常为非整数,需要通过双曲采样到整数像素坐标,最终得到从
Figure PCTCN2022125009-appb-000073
翘曲到的图像
Figure PCTCN2022125009-appb-000074
Figure PCTCN2022125009-appb-000075
应当与
Figure PCTCN2022125009-appb-000076
一致,因此由视图一致性得到重建损失:
because
Figure PCTCN2022125009-appb-000072
Usually non-integer, it needs to be hyperbolically sampled to integer pixel coordinates, and finally obtained from
Figure PCTCN2022125009-appb-000073
Warp the image to
Figure PCTCN2022125009-appb-000074
and
Figure PCTCN2022125009-appb-000075
should be with
Figure PCTCN2022125009-appb-000076
Consistent, so the reconstruction loss is obtained by view consistency:
Figure PCTCN2022125009-appb-000077
Figure PCTCN2022125009-appb-000077
其中w(·)是翘曲到
Figure PCTCN2022125009-appb-000078
空间的算子,
Figure PCTCN2022125009-appb-000079
是由
Figure PCTCN2022125009-appb-000080
和通过相对平移向 量t t-n,t及相对旋转向量R t-n,t重投影得到的深度图像;
Figure PCTCN2022125009-appb-000081
代表图像x中的一个像素。由此,G image被鼓励学习从深度图像到对应内窥镜图像的无偏估计。由于循环一致性的约束,G depth也将被鼓励学习从内窥镜图像到深度图像的无偏估计,也即生成与输入深度图尺度一致的深度图像。
where w(·) is warped to
Figure PCTCN2022125009-appb-000078
space operator,
Figure PCTCN2022125009-appb-000079
By
Figure PCTCN2022125009-appb-000080
and the depth image obtained by reprojecting the relative translation vector t tn,t and the relative rotation vector R tn,t ;
Figure PCTCN2022125009-appb-000081
represents a pixel in image x. From this, G image is encouraged to learn unbiased estimation from depth images to corresponding endoscopic images. Due to the cycle consistency constraint, G depth will also be encouraged to learn unbiased estimates from endoscopic images to depth images, that is, generate depth images that are consistent with the scale of the input depth map.
为了进一步约束生成器G depth:X→Z的学习,视图一致性也被加入到x t-n和x t以及生成的深度图
Figure PCTCN2022125009-appb-000082
Figure PCTCN2022125009-appb-000083
虽然此时的内窥镜相对位姿不能被采集,但有预训练的深度配准网络基于深度的位姿估计算法,通过
Figure PCTCN2022125009-appb-000084
Figure PCTCN2022125009-appb-000085
可以计算对应的内窥镜的相对位姿。在训练中加载预训练的位姿估计网络,来估计内窥镜的相对运动p t-n,t。此时,一个理想的深度图像估计应该包含令位姿估计网络捕捉到内窥镜运动的信息,也就得到由视图一致性得到的重建损失:
In order to further constrain the learning of the generator G depth : X→Z, view consistency is also added to x tn and x t and the generated depth map
Figure PCTCN2022125009-appb-000082
and
Figure PCTCN2022125009-appb-000083
Although the relative pose of the endoscope cannot be collected at this time, there is a pre-trained depth registration network based on the depth pose estimation algorithm.
Figure PCTCN2022125009-appb-000084
and
Figure PCTCN2022125009-appb-000085
The relative pose of the corresponding endoscope can be calculated. Load the pre-trained pose estimation network during training to estimate the relative motion p tn,t of the endoscope. At this time, an ideal depth image estimate should include information that allows the pose estimation network to capture the motion of the endoscope, thus obtaining the reconstruction loss obtained by view consistency:
Figure PCTCN2022125009-appb-000086
Figure PCTCN2022125009-appb-000086
因此,得到总的视图一致性重建损失:Therefore, the total view-consistent reconstruction loss is obtained:
Figure PCTCN2022125009-appb-000087
Figure PCTCN2022125009-appb-000087
几何一致性损失:Geometric consistency loss:
对于生成的深度图
Figure PCTCN2022125009-appb-000088
Figure PCTCN2022125009-appb-000089
若它们对应相同的3D场景,那么两者对应的深度信息应该一致。深度图
Figure PCTCN2022125009-appb-000090
Figure PCTCN2022125009-appb-000091
的不一致z diff被定义为:
For the generated depth map
Figure PCTCN2022125009-appb-000088
and
Figure PCTCN2022125009-appb-000089
If they correspond to the same 3D scene, then the corresponding depth information of the two should be consistent. Depth map
Figure PCTCN2022125009-appb-000090
and
Figure PCTCN2022125009-appb-000091
The inconsistent z diff is defined as:
Figure PCTCN2022125009-appb-000092
Figure PCTCN2022125009-appb-000092
其中,
Figure PCTCN2022125009-appb-000093
是由
Figure PCTCN2022125009-appb-000094
和通过预训练的深度配准网络计算出的虚拟内窥镜相对位姿p t-n,t重投影得到的深度图像。
Figure PCTCN2022125009-appb-000095
是从
Figure PCTCN2022125009-appb-000096
采样得到的深度图。这里计算
Figure PCTCN2022125009-appb-000097
Figure PCTCN2022125009-appb-000098
的误差,而不是
Figure PCTCN2022125009-appb-000099
Figure PCTCN2022125009-appb-000100
的误差,这是因为
Figure PCTCN2022125009-appb-000101
重投影的结果并不在一个整数坐标系上,需要把
Figure PCTCN2022125009-appb-000102
采样到同样的坐标系,以计算两者的差。
in,
Figure PCTCN2022125009-appb-000093
By
Figure PCTCN2022125009-appb-000094
and the depth image obtained by reprojecting the relative pose p tn,t of the virtual endoscope calculated by the pre-trained depth registration network.
Figure PCTCN2022125009-appb-000095
From
Figure PCTCN2022125009-appb-000096
Sampled depth map. Calculate here
Figure PCTCN2022125009-appb-000097
and
Figure PCTCN2022125009-appb-000098
error instead of
Figure PCTCN2022125009-appb-000099
and
Figure PCTCN2022125009-appb-000100
error, this is because
Figure PCTCN2022125009-appb-000101
The result of reprojection is not in an integer coordinate system and needs to be
Figure PCTCN2022125009-appb-000102
Sample to the same coordinate system to calculate the difference between the two.
几何一致性损失被定义为:The geometric consistency loss is defined as:
Figure PCTCN2022125009-appb-000103
Figure PCTCN2022125009-appb-000103
其中,
Figure PCTCN2022125009-appb-000104
代表图像z中的一个像素。
in,
Figure PCTCN2022125009-appb-000104
represents a pixel in image z.
综上,深度提取网络训练的总损失函数:In summary, the total loss function of deep extraction network training is:
Figure PCTCN2022125009-appb-000105
Figure PCTCN2022125009-appb-000105
其中,β、γ、δ、θ 1、θ 2、η为调节各损失权重的超参数。 Among them, β, γ, δ, θ 1 , θ 2 , and eta are hyperparameters that adjust the weight of each loss.
S305:优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。S305: Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. The deep extraction network.
如图4(a)、4(b)、4(c)中所示,为深度提取网络的架构示意图,其中,(a)生成器,(b)生成器中的Resnet块,(c)判别器。该图所示张量的维数是基于图像大小为1×256×256的输入;Res(256,256)表示输入和输出通道为256的Resnet块;IN表示Instance Norm层,Leaky ReLU表示Leaky ReLU激活函数。As shown in Figure 4(a), 4(b), and 4(c), it is a schematic diagram of the architecture of the deep extraction network, including (a) generator, (b) Resnet block in the generator, (c) discriminator device. The dimensionality of the tensor shown in the figure is based on the input of image size 1×256×256; Res(256, 256) represents the Resnet block with input and output channels of 256; IN represents the Instance Norm layer, and Leaky ReLU represents Leaky ReLU. activation function.
示例性的,深度提取网络可以由7段预设真实内窥镜视频和8段虚拟内窥镜采集的数据进行训练,包括多张预设真实内窥镜图像、2187张深度图像和对应的虚拟内窥镜位姿。在深度提取网络架构中,生成器为常规编码器-解码器架构,其中瓶颈层由六个Resnet块组成,判别器由五个卷积层组成。采用Adam优化器训练100轮,训练开始时设置学习率0.001和θ 1=θ 2=η=0,避免对早期生成结果不佳的深度图施加一致性约束。在训练10轮后,θ 1,θ 2和η分别设置为0.3、5和5。β,γ和δ在整个训练过程中分别设置为10、5和1。 For example, the depth extraction network can be trained with 7 preset real endoscopic video segments and 8 segments of data collected by virtual endoscopy, including multiple preset real endoscopic images, 2187 depth images and corresponding virtual Endoscopic position. In the deep extraction network architecture, the generator is a conventional encoder-decoder architecture, in which the bottleneck layer consists of six Resnet blocks and the discriminator consists of five convolutional layers. The Adam optimizer is used to train for 100 rounds. At the beginning of training, the learning rate is set to 0.001 and θ 12 =η = 0 to avoid imposing consistency constraints on depth maps with poor early generation results. After 10 rounds of training, θ 1 , θ 2 and η are set to 0.3, 5 and 5 respectively. β, γ and δ are set to 10, 5 and 1 respectively throughout the training process.
在训练过程中,通过持续优化上述步骤得到的损失函数,从而更新深度提取网络的参数,直至预设轮数确定最终的深度提取网络,预设轮数可以是50~300轮,进一步的可以是100轮~200轮。训练的该深度提取网络,相对于SfMLearner一类的深度提取网络,可以生成轮廓更清晰的深度图像。相对只使用Cycle GAN一类的深度提取网络,能够保证不改变输入图像的结构。可以生成尺度稳定且可知(尺度与训练数据尺度基本相同)的深度图像。During the training process, the parameters of the depth extraction network are updated by continuously optimizing the loss function obtained in the above steps until the final depth extraction network is determined by the preset number of rounds. The preset number of rounds can be 50 to 300 rounds, and further can be 100 rounds to 200 rounds. The trained depth extraction network can generate depth images with clearer outlines than depth extraction networks such as SfMLearner. Compared with only using deep extraction networks such as Cycle GAN, it can ensure that the structure of the input image is not changed. Depth images with stable and knowable scales (basically the same scale as the training data) can be generated.
在一个实施例中,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;In one embodiment, the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
在将所述深度图像
Figure PCTCN2022125009-appb-000106
和所述深度图像
Figure PCTCN2022125009-appb-000107
或将所述深度图像
Figure PCTCN2022125009-appb-000108
和所述深度图像
Figure PCTCN2022125009-appb-000109
输入预训练的所述深度配准网络之前,所述方法还包括:
The depth image will be
Figure PCTCN2022125009-appb-000106
and the depth image
Figure PCTCN2022125009-appb-000107
or the depth image
Figure PCTCN2022125009-appb-000108
and the depth image
Figure PCTCN2022125009-appb-000109
Before inputting the pre-trained deep registration network, the method further includes:
对所述深度图像
Figure PCTCN2022125009-appb-000110
和所述深度图像
Figure PCTCN2022125009-appb-000111
进行尺度标定以得到所述深度图像
Figure PCTCN2022125009-appb-000112
和所述深度图像
Figure PCTCN2022125009-appb-000113
的单位。
to the depth image
Figure PCTCN2022125009-appb-000110
and the depth image
Figure PCTCN2022125009-appb-000111
Perform scaling to obtain the depth image
Figure PCTCN2022125009-appb-000112
and the depth image
Figure PCTCN2022125009-appb-000113
The unit.
具体的,针对基于SfMLearner的深度提取网络:Specifically, for the deep extraction network based on SfMLearner:
同时训练一个深度估计网络和一个位姿网络。深度估计网络从输入的一张内窥镜图像中估计其深度信息z,位姿网络通过输入的两张内窥镜图像,估计两张图像之间的相机相对位姿T和R。Simultaneously train a depth estimation network and a pose network. The depth estimation network estimates the depth information z from an input endoscopic image, and the pose network estimates the relative poses T and R of the camera between the two images through the input two endoscopic images.
对于输入的连续两帧内窥镜图像x t-n、x t,深度估计网络可以估计两帧图像的深度图像
Figure PCTCN2022125009-appb-000114
Figure PCTCN2022125009-appb-000115
位姿网络可以估计相机相对运动t t-n,t及R t-n,t
For the input of two consecutive frames of endoscopic images x tn , x t , the depth estimation network can estimate the depth images of the two frames of images
Figure PCTCN2022125009-appb-000114
and
Figure PCTCN2022125009-appb-000115
The pose network can estimate the relative motion of the camera t tn,t and R tn,t .
已知相机内参K,齐次坐标下的像素点
Figure PCTCN2022125009-appb-000116
可以通过下式翘曲到
Figure PCTCN2022125009-appb-000117
Known camera internal parameter K, pixel point under homogeneous coordinates
Figure PCTCN2022125009-appb-000116
It can be warped to
Figure PCTCN2022125009-appb-000117
Figure PCTCN2022125009-appb-000118
Figure PCTCN2022125009-appb-000118
由于
Figure PCTCN2022125009-appb-000119
通常为非整数,需要通过双曲采样到整数像素坐标,最终得到从
Figure PCTCN2022125009-appb-000120
翘曲到的图像
Figure PCTCN2022125009-appb-000121
应与
Figure PCTCN2022125009-appb-000122
一致。由视图一致性得到重建损失:
because
Figure PCTCN2022125009-appb-000119
Usually non-integer, it needs to be hyperbolically sampled to integer pixel coordinates, and finally obtained from
Figure PCTCN2022125009-appb-000120
Warp the image to
Figure PCTCN2022125009-appb-000121
Should be with
Figure PCTCN2022125009-appb-000122
consistent. The reconstruction loss is obtained from view consistency:
Figure PCTCN2022125009-appb-000123
Figure PCTCN2022125009-appb-000123
其中w(·)是翘曲到
Figure PCTCN2022125009-appb-000124
空间的算子,
Figure PCTCN2022125009-appb-000125
是由
Figure PCTCN2022125009-appb-000126
和通过相对平移向量t t-n,t及相对旋转向量R t-n,t重投影得到的深度图像;
Figure PCTCN2022125009-appb-000127
代表图像x中的一个像素,翘曲指操纵图像以使图像中的像素变形。通过该损失函数,位姿网络和深度估计网络可以实现自监督,从而完成网络训练。
where w(·) is warped to
Figure PCTCN2022125009-appb-000124
space operator,
Figure PCTCN2022125009-appb-000125
By
Figure PCTCN2022125009-appb-000126
and the depth image obtained by reprojecting the relative translation vector t tn,t and the relative rotation vector R tn,t ;
Figure PCTCN2022125009-appb-000127
Representing a pixel in the image x, warping refers to manipulating the image to deform the pixels in the image. Through this loss function, the pose network and depth estimation network can achieve self-supervision, thereby completing network training.
为了使网络生成深度图像的尺度稳定,增加几何一致性损失。对于生成的深度图像
Figure PCTCN2022125009-appb-000128
Figure PCTCN2022125009-appb-000129
若它们对应相同的3D场景,那么两者对应的深度信息应该一致。深度图
Figure PCTCN2022125009-appb-000130
Figure PCTCN2022125009-appb-000131
的不一致z diff被定义为:
In order to stabilize the scale of the depth image generated by the network, a geometric consistency loss is added. For the generated depth image
Figure PCTCN2022125009-appb-000128
and
Figure PCTCN2022125009-appb-000129
If they correspond to the same 3D scene, then the corresponding depth information of the two should be consistent. Depth map
Figure PCTCN2022125009-appb-000130
and
Figure PCTCN2022125009-appb-000131
The inconsistent z diff is defined as:
Figure PCTCN2022125009-appb-000132
Figure PCTCN2022125009-appb-000132
其中
Figure PCTCN2022125009-appb-000133
是由
Figure PCTCN2022125009-appb-000134
和通过位姿网络计算出的真实内窥镜相对运动
Figure PCTCN2022125009-appb-000135
重投影得到的深度图。
Figure PCTCN2022125009-appb-000136
是从
Figure PCTCN2022125009-appb-000137
采样得到的深度图。这里计算
Figure PCTCN2022125009-appb-000138
Figure PCTCN2022125009-appb-000139
的误差,而不是
Figure PCTCN2022125009-appb-000140
Figure PCTCN2022125009-appb-000141
的误差,这是因为
Figure PCTCN2022125009-appb-000142
重投影的结果并不在一个整数坐标系上,需要把
Figure PCTCN2022125009-appb-000143
采样到同样的坐标系,以计算两者的差。
in
Figure PCTCN2022125009-appb-000133
By
Figure PCTCN2022125009-appb-000134
and the relative motion of the real endoscope calculated through the pose network
Figure PCTCN2022125009-appb-000135
Depth map obtained by reprojection.
Figure PCTCN2022125009-appb-000136
From
Figure PCTCN2022125009-appb-000137
Sampled depth map. Calculate here
Figure PCTCN2022125009-appb-000138
and
Figure PCTCN2022125009-appb-000139
error instead of
Figure PCTCN2022125009-appb-000140
and
Figure PCTCN2022125009-appb-000141
error, this is because
Figure PCTCN2022125009-appb-000142
The result of reprojection is not in an integer coordinate system and needs to be
Figure PCTCN2022125009-appb-000143
Sample to the same coordinate system to calculate the difference between the two.
几何一致性损失被定义为:The geometric consistency loss is defined as:
Figure PCTCN2022125009-appb-000144
Figure PCTCN2022125009-appb-000144
其中,
Figure PCTCN2022125009-appb-000145
代表图像z中的一个像素。
in,
Figure PCTCN2022125009-appb-000145
represents a pixel in image z.
综上得到损失函数:L=aL rec+bL gc,其中,a和b为调节各损失权重的超参数。 In summary, the loss function is obtained: L=aL rec +bL gc , where a and b are hyperparameters that adjust the weight of each loss.
具体的,针对基于Cycle GAN的深度提取网络,损失函数可以包括下述损失:Specifically, for the deep extraction network based on Cycle GAN, the loss function can include the following losses:
对于一张内窥镜图像x∈X,深度提取算法旨在学习一个映射G depth:X→Z,从x生成其相应的深度图
Figure PCTCN2022125009-appb-000146
接着,映射G image:Z→X将
Figure PCTCN2022125009-appb-000147
重建到域X,从而完成循环。从Z域到X域的转换也是类似。这里的重建循环中,网络模型对G image、G depth施加循环一致性损失:
For an endoscopic image x∈X, the depth extraction algorithm aims to learn a mapping G depth :
Figure PCTCN2022125009-appb-000146
Next, map G image : Z→X will
Figure PCTCN2022125009-appb-000147
Rebuild to domain X, completing the loop. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
Figure PCTCN2022125009-appb-000148
Figure PCTCN2022125009-appb-000148
其中,p表示概率分布,
Figure PCTCN2022125009-appb-000149
表示期望。
Among them, p represents the probability distribution,
Figure PCTCN2022125009-appb-000149
express expectations.
为了对映射的学习添加约束,其他的损失函数包括身份损失:To add constraints on the learning of mappings, other loss functions include identity loss:
Figure PCTCN2022125009-appb-000150
Figure PCTCN2022125009-appb-000150
在生成器完成映射循环的同时,判别器D image、D depth分别学习判别输入的内窥镜图像和深度图像是真还是假;而生成器希望可以骗过判别器,生成可以被判别器认为真的图像,引入生成对抗损失,这里采用LS-GAN损失: While the generator completes the mapping cycle, the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. For the image, a generative adversarial loss is introduced, here the LS-GAN loss is used:
Figure PCTCN2022125009-appb-000151
Figure PCTCN2022125009-appb-000151
其中·用来省略image或depth。y~p(data)代表样本服从域X或Y的分布。Among them, · is used to omit image or depth. y~p(data) represents the distribution of the sample following domain X or Y.
只使用Cycle GAN较难保证生成尺度稳定的深度图像,因此也可以考虑加上几何一致性损失。It is difficult to ensure the generation of scale-stable depth images using only Cycle GAN, so adding geometric consistency loss can also be considered.
上述两种深度提取网络获取的深度图像的尺度是模糊没有单位的,因此需要进行标定。对尺寸进行标定时,具体的标定方法包括以下两种,在进行标定时可以至少采用下述两种方法中的至少一种:The scale of the depth image obtained by the above two depth extraction networks is fuzzy and unitless, so it needs to be calibrated. When calibrating dimensions, specific calibration methods include the following two methods. At least one of the following two methods can be used when calibrating:
(1)在真实内窥镜进入腔道时,根据深度阈值对真实内窥镜可视范围进行分割,根据高于该阈值区域的直径和术前建立的虚拟模型中腔道中深度峰值同样直径处的深度进行比较,从而得到真实内窥镜尺度。示例性的,比如设定深度阈值为5,在真实内窥镜提取出的深度图像0中分割出高于该阈值的深度部分为一个直径为10像素的圆。针对于主气道建立的虚拟模型,假定真实内窥镜处于主气道中央位置,此时对应的深度图画等高线, 可以找到围绕峰值直径为10像素的圆。该等高线对应的深度为1cm,那么可以得到深度网络的尺度为1/5=0.2cm。(1) When the real endoscope enters the lumen, the visual range of the real endoscope is segmented according to the depth threshold, and the diameter of the area above the threshold is the same diameter as the depth peak in the lumen in the virtual model established before surgery. The depth is compared to obtain the true endoscope scale. For example, if the depth threshold is set to 5, the depth portion higher than the threshold in the depth image 0 extracted by the real endoscope is segmented into a circle with a diameter of 10 pixels. For the virtual model established for the main airway, it is assumed that the real endoscope is in the center of the main airway. At this time, the corresponding depth image contour can be found as a circle with a peak diameter of 10 pixels. The depth corresponding to this contour line is 1cm, then the scale of the depth network can be obtained as 1/5 = 0.2cm.
(2)基于上述实施例中的深度提取网络,其位姿网络和深度网络具有相同的模糊的尺度,在真实内窥镜进境时,可以参考机器人控制信号,来比较位姿网络的相对位姿估计信息进行标定。比如机器人控制信号控制内窥镜进境1cm,而位姿网络得到的相对平移向量为向进境方向平移2,那么该尺度为1/2=0.5cm。(2) Based on the depth extraction network in the above embodiment, the pose network and the depth network have the same fuzzy scale. When the real endoscope enters the country, the relative position of the pose network can be compared with reference to the robot control signal. Calibrate the attitude estimation information. For example, the robot control signal controls the endoscope to enter 1cm, and the relative translation vector obtained by the pose network is 2 translations in the entry direction, then the scale is 1/2 = 0.5cm.
在一个实施例中,如图5所示,所述深度配准网络为通过如下方式训练得到的:In one embodiment, as shown in Figure 5, the deep registration network is trained in the following manner:
S501:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息。S501: Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual posture information when the virtual endoscope collects the virtual image.
具体的,深度配准网络为编码器-解码器形式的深度神经网络。网络输入为两帧深度信息,编码器采用FlowNetC编码器的结构(FlowNet提取的光流是对运动场的模拟),解码器采用几层CNN(Convolutional Neural Network,卷积神经网络)将编码信息最后变为6DOF(即3维平移和3维欧拉角)位姿参数输出。Specifically, the deep registration network is a deep neural network in the form of an encoder-decoder. The network input is two frames of depth information. The encoder uses the structure of the FlowNetC encoder (the optical flow extracted by FlowNet is a simulation of the sports field). The decoder uses several layers of CNN (Convolutional Neural Network) to finally transform the encoded information into It is the 6DOF (ie 3D translation and 3D Euler angle) pose parameter output.
对深度配准网络进行训练时,首先需要建立虚拟模型,通过虚拟内窥镜来获取大量的深度图像和虚拟位姿信息来对深度配准网络进行训练监督,以提高深度配准网络的鲁棒性。When training the deep registration network, you first need to establish a virtual model, and use a virtual endoscope to obtain a large number of depth images and virtual pose information to train and supervise the deep registration network to improve the robustness of the deep registration network. sex.
S502:将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息。S502: Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when two adjacent frames of virtual images are collected.
具体的,将上述步骤获得的虚拟图像的深度图像输入初始深度配准网络进行弱监督训练,初始深度配准网络输出可以得到采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息。Specifically, the depth image of the virtual image obtained in the above steps is input into the initial depth registration network for weak supervision training. The output of the initial depth registration network can obtain the relative pose of the virtual endoscope when collecting two adjacent frames of virtual images. Estimate information.
S503:将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息。S503: Use the virtual pose information as a training true value, and obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information.
同时,将所述虚拟位姿信息作为训练真值,通过对虚拟位姿信息进行计算可以获得虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信 息,此时得到了虚拟内窥镜采集相邻两帧图像时的相对位姿真值信息和相对位姿估计信息。At the same time, the virtual pose information is used as the training true value. By calculating the virtual pose information, the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images can be obtained. At this time, the virtual pose information is obtained. The endoscope collects relative pose true value information and relative pose estimation information when two adjacent frames of images are collected.
S504:通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数。S504: Obtain the loss function by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information.
具体的,分别计算虚拟内窥镜的相对位姿估计信息与真实相对位姿之间的平移损失和旋转损失,将平移损失和旋转损失进行加权求和得到最终的损失函数:Specifically, the translation loss and rotation loss between the relative pose estimation information of the virtual endoscope and the real relative pose are calculated respectively, and the translation loss and rotation loss are weighted and summed to obtain the final loss function:
L(z t-m,z t)=L t(z t-m,z t)+ωL r(z t-m,z t) L(z tm ,z t )=L t (z tm ,z t )+ωL r (z tm ,z t )
其中,L t为平移损失:
Figure PCTCN2022125009-appb-000152
T t-m,t
Figure PCTCN2022125009-appb-000153
分别为真实相对位姿信息和相对位姿估计信息中的平移向量;L r为旋转损失:
Figure PCTCN2022125009-appb-000154
R t-m,t
Figure PCTCN2022125009-appb-000155
分别为真实相对位姿信息和相对位姿估计信息中的旋转向量;ω为用于调整旋转损失和位移损失两个损失占比的超参数。
Among them, L t is the translation loss:
Figure PCTCN2022125009-appb-000152
T tm,t ,
Figure PCTCN2022125009-appb-000153
are the translation vectors in the real relative pose information and relative pose estimation information respectively; L r is the rotation loss:
Figure PCTCN2022125009-appb-000154
R tm,t ,
Figure PCTCN2022125009-appb-000155
are the rotation vectors in the real relative pose information and relative pose estimation information respectively; ω is a hyperparameter used to adjust the proportion of the two losses of rotation loss and displacement loss.
如图6中所示,为深度配准的架构示意图:As shown in Figure 6, it is a schematic diagram of the depth registration architecture:
位姿估计网络用37段虚拟内窥镜轨迹采集的虚拟内窥镜位姿和深度图像进行训练,包括11904帧。网络采用预训练的FlowNetC编码器,以三个卷积块回归姿态向量。网络通过使用Adam优化器进行训练,初始学习率为1e-5,训练时间为300个时期。ω被设置为100。The pose estimation network is trained with 37 virtual endoscope pose and depth images collected from the virtual endoscope trajectory, including 11,904 frames. The network uses a pre-trained FlowNetC encoder to regress pose vectors with three convolutional blocks. The network is trained by using the Adam optimizer with an initial learning rate of 1e-5 and training time of 300 epochs. ω is set to 100.
S505:优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。S505: Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
该深度提取网络,通过深度学习方法学习两个输入深度图像间的内窥镜位姿变换参数,从而对每一输入的内窥镜图像,更新内窥镜的位姿变换。该深度配准网络是基于深度配准而不是图像强度,使算法对于模拟器中虚拟内窥镜采集的虚拟图像的渲染没有额外要求。深度学习算法直接估计位姿变换,使算法可以快速实时地运行,得到实时的定位结果。The depth extraction network learns the endoscope pose transformation parameters between two input depth images through deep learning methods, thereby updating the endoscope pose transformation for each input endoscopic image. This depth registration network is based on depth registration rather than image intensity, allowing the algorithm to have no additional requirements for the rendering of virtual images acquired by virtual endoscopes in the simulator. The deep learning algorithm directly estimates pose transformation, allowing the algorithm to run quickly and in real time to obtain real-time positioning results.
在一个实施例中,还包括:In one embodiment, it also includes:
采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
具体的,采用基于迭代优化算法的配准方法计算速度较慢,与深度配 准网络并行运行来进行位姿修正,可以迟滞地对真实内窥镜的位姿估计信息进行修正,使得累积误差不会持续增大,提高定位精度。Specifically, the registration method based on the iterative optimization algorithm has a slow calculation speed and runs in parallel with the deep registration network for pose correction. It can correct the pose estimation information of the real endoscope lazily, so that the cumulative error does not increase. It will continue to increase and improve positioning accuracy.
在一个实施例中,如图7所示,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:In one embodiment, as shown in Figure 7, a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
S701:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
Figure PCTCN2022125009-appb-000156
其中k≤t。
S701: Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
Figure PCTCN2022125009-appb-000156
where k≤t.
具体的,该修正方法相较于估算真实内窥镜位姿估计信息的网络运行较慢,因此在进行并行修正时,并不是逐帧进行修正的。获取k≤t的第k帧图像作为当前修正图像,即作为修正图像的图像帧对应的真实内窥镜的位姿估计信息已经被估算获得。Specifically, this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. The k-th image frame of k≤t is obtained as the current corrected image, that is, the pose estimation information of the real endoscope corresponding to the image frame of the corrected image has been estimated and obtained.
S702:获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000157
S702: Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network.
Figure PCTCN2022125009-appb-000157
具体的,由于k≤t,因此在对第k帧图像做修正时,第k帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000158
已经估算得到,可以直接获取。
Specifically, since k≤t, when correcting the k-th frame image, the pose estimation information of the k-th frame image is
Figure PCTCN2022125009-appb-000158
It has been estimated and can be obtained directly.
S703:利用所述当前修正图像、或所述深度图像
Figure PCTCN2022125009-appb-000159
或所述当前修正图像和所述深度图像
Figure PCTCN2022125009-appb-000160
对所述真实内窥镜视野中的腔道图像进行语义分割。
S703: Using the current corrected image or the depth image
Figure PCTCN2022125009-appb-000159
or the current corrected image and the depth image
Figure PCTCN2022125009-appb-000160
Semantic segmentation is performed on the lumen image in the real endoscopic field of view.
在实验中发现,由于配准过程中使用的是相似性测度,当图像中同时出现一个较深的腔道和数个较浅的腔道时,由于较深腔道的深度相较其他腔道更大,配准优化过程中会优先满足对准该腔道,容易忽视其他较浅腔道的配准。此时的配准就容易忽略较浅腔道的结构信息。为解决这一问题,在配准前利用深度图像进行腔道图像分割,配准过程不仅需要配准到相似的深度,还要配准到相似的腔道结构。It was found in the experiment that due to the similarity measure used in the registration process, when a deeper cavity and several shallower channels appear in the image at the same time, because the depth of the deeper cavity is compared with other channels, If it is larger, the alignment of this cavity will be given priority during the registration optimization process, and the registration of other shallower channels will be easily ignored. At this time, the registration will easily ignore the structural information of the shallower cavity. To solve this problem, depth images are used to segment the lumen images before registration. The registration process not only requires registration to similar depths, but also to similar lumen structures.
这里的分割指将检测视野中所有的腔道图像进行区域分割,即进行分区。对于输入的内窥镜图像x t,可以利用深度图像
Figure PCTCN2022125009-appb-000161
或是RGB图像x t或是RGBD图像(x t
Figure PCTCN2022125009-appb-000162
)分割出腔道。分割方法可以为利用深度阈值分割深度图像,也可以训练网络学习对于RGB或RGBD图像的腔道分割。
Segmentation here refers to regional segmentation of all cavity images in the detection field of view, that is, partitioning. For the input endoscopic image x t , the depth image can be utilized
Figure PCTCN2022125009-appb-000161
Either an RGB image x t or an RGBD image (x t and
Figure PCTCN2022125009-appb-000162
) divides the cavity. The segmentation method can be to use depth threshold to segment depth images, or the network can be trained to learn channel segmentation of RGB or RGBD images.
S704:基于图像相似性测度和语义分割相似性测度,以位姿估计信息
Figure PCTCN2022125009-appb-000163
为初始值进行优化求解,得到当前修正图像的修正位姿
Figure PCTCN2022125009-appb-000164
S704: Based on the image similarity measure and the semantic segmentation similarity measure, estimate the information by pose
Figure PCTCN2022125009-appb-000163
Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
Figure PCTCN2022125009-appb-000164
具体的,该方法为基于图像配准的修正方法,将分割过程记为Seg(·),k时刻真实内窥镜的修正位姿
Figure PCTCN2022125009-appb-000165
对应的气道分割结果为
Figure PCTCN2022125009-appb-000166
给定时间t-1的相机位姿,从位姿初始值
Figure PCTCN2022125009-appb-000167
开始优化求解
Figure PCTCN2022125009-appb-000168
优化过程描述为:
Specifically, this method is a correction method based on image registration. The segmentation process is recorded as Seg(·), and the corrected pose of the real endoscope at time k
Figure PCTCN2022125009-appb-000165
The corresponding airway segmentation result is
Figure PCTCN2022125009-appb-000166
Given the camera pose at time t-1, from the initial value of the pose
Figure PCTCN2022125009-appb-000167
Start optimization solution
Figure PCTCN2022125009-appb-000168
The optimization process is described as:
Figure PCTCN2022125009-appb-000169
Figure PCTCN2022125009-appb-000169
其中SIM1(·)为图像相似性测度,SIM2(·)为分割相似性测度,P t 为变量。Seg(P′ t)为对虚拟内窥镜的虚拟位姿为P′ t时对应的图像或深度图做分割的结果。同样选用Powell算法作为优化策略进行优化。示例性的,当取k=t时,即以最新计算的位姿估计信息
Figure PCTCN2022125009-appb-000170
作为初始值进行优化求解,能够提高算法的收敛性且能够减少迭代的次数。
Among them, SIM1(·) is the image similarity measure, SIM2(·) is the segmentation similarity measure, and P t is a variable. Seg(P′ t ) is the result of segmenting the corresponding image or depth map when the virtual pose of the virtual endoscope is P′ t . Powell's algorithm is also used as the optimization strategy for optimization. For example, when taking k=t, that is, using the latest calculated pose estimation information
Figure PCTCN2022125009-appb-000170
Optimizing solutions as initial values can improve the convergence of the algorithm and reduce the number of iterations.
该方法可以弥补只使用图像相似性测度时,若出现一深一浅两个腔道,如NCC(Normalized Cross Correlation,归一化互相关)一类的相似性测度会着重对准两张深度图的深腔道部分,忽略浅腔道的特征从而造成的计算不准确问题。This method can make up for the situation where two channels, one deep and one shallow, appear when only using image similarity measures. Similarity measures such as NCC (Normalized Cross Correlation) will focus on aligning the two depth maps. For the deep cavity part, the characteristics of the shallow cavity are ignored, resulting in inaccurate calculations.
S705:将所述真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000171
替换为所述修正位姿
Figure PCTCN2022125009-appb-000172
S705: Use the pose estimation information when the real endoscope collects the k-th image.
Figure PCTCN2022125009-appb-000171
Replace with the corrected pose
Figure PCTCN2022125009-appb-000172
得到修正位姿后,将真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000173
替换为修正位姿
Figure PCTCN2022125009-appb-000174
此时真实内窥镜轨迹上采集第t帧图像时的位姿得到修正。在一个实施例中,如图8所示,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:
After the corrected pose is obtained, the pose estimation information when the k-th frame image is collected by the real endoscope
Figure PCTCN2022125009-appb-000173
Replaced with corrected pose
Figure PCTCN2022125009-appb-000174
At this time, the pose when the t-th frame image is collected on the real endoscope trajectory is corrected. In one embodiment, as shown in Figure 8, a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
S801:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
Figure PCTCN2022125009-appb-000175
其中k≤t。
S801: Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
Figure PCTCN2022125009-appb-000175
where k≤t.
具体的,该修正方法相较于估算真实内窥镜位姿估计信息的网络运行较慢,因此在进行并行修正时,并不是逐帧进行修正的。在进行修正时获取k≤t的第k帧图像作为当前修正图像。Specifically, this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. When performing correction, the k-th frame image with k≤t is obtained as the current corrected image.
S802:获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d kS802: Obtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model.
具体的,虚拟内窥镜在目标虚拟模型中是随着真实内窥镜的移动一起移动的,虚拟内窥镜在目标虚拟模型中第k帧定位位姿处即是将真实内窥镜在采集第k帧图像时的定位位姿处对应到目标虚拟模型中得到的。Specifically, the virtual endoscope moves together with the movement of the real endoscope in the target virtual model. The positioning pose of the virtual endoscope at the kth frame in the target virtual model is the real endoscope in the collection. The positioning pose of the kth frame image corresponds to the target virtual model.
S803:将所述深度图像
Figure PCTCN2022125009-appb-000176
转换为对应的点云
Figure PCTCN2022125009-appb-000177
将所述深度图像d k转换为点云图像Y k
S803: Convert the depth image to
Figure PCTCN2022125009-appb-000176
Convert to corresponding point cloud
Figure PCTCN2022125009-appb-000177
The depth image d k is converted into a point cloud image Y k .
S804:通过ICP算法求解Y k
Figure PCTCN2022125009-appb-000178
之间的相对位姿
Figure PCTCN2022125009-appb-000179
S804: Solve Y k to
Figure PCTCN2022125009-appb-000178
relative posture between
Figure PCTCN2022125009-appb-000179
S805:采用所述相对位姿
Figure PCTCN2022125009-appb-000180
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000181
S805: Adopt the relative pose
Figure PCTCN2022125009-appb-000180
Correcting the pose estimation information when the real endoscope collects the kth frame image
Figure PCTCN2022125009-appb-000181
具体的,采用所述相对位姿
Figure PCTCN2022125009-appb-000182
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000183
此时真实内窥镜轨迹上采集第k帧图像时的位姿得到修正。
Specifically, using the relative pose
Figure PCTCN2022125009-appb-000182
Correct the pose estimation information when the real endoscope collects the kth frame image
Figure PCTCN2022125009-appb-000183
At this time, the pose when the k-th frame image is collected on the real endoscope trajectory is corrected.
在一个实施例中,该方法还包括:In one embodiment, the method further includes:
S901:采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像
Figure PCTCN2022125009-appb-000184
一起输入预训练的所述深度配准网络;
S901: Use the RGB image feature extraction method to extract the feature information of the t-th frame image collected by the real endoscope, and combine the feature information of the t-th frame image with the depth image
Figure PCTCN2022125009-appb-000184
Input the pre-trained deep registration network together;
S902:采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标虚拟图像进行纹理贴图后提取的;S902: Use the RGB image feature extraction method to extract the feature information of the t-nth frame image collected by the real endoscope or extract the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the t-nth frame target virtual image The feature information is extracted after texture mapping the target virtual image of the t-nth frame;
S903:将所述第t-n帧目标虚拟图像的特征信息和所述深度图像d t-n,或将所述第t-n帧图像的特征信息和所述深度图像
Figure PCTCN2022125009-appb-000185
输入预训练的所述深度配准网络。
S903: Combine the feature information of the tn-th frame target virtual image and the depth image dtn , or combine the feature information of the tn-th frame image and the depth image
Figure PCTCN2022125009-appb-000185
Enter the pretrained deep registration network.
目前算法都只使用RGB图像信息或者只使用深度信息。虽然基于深度的定位技术被证明有更强的鲁棒性,但依赖深度做定位在实际使用中,视野中只有一个腔道时,深度图像中会存在一个圆形的深度峰值区域,此时内窥镜的旋转和平移运动将会较难估计。Current algorithms only use RGB image information or only depth information. Although depth-based positioning technology has been proven to be more robust, in actual use, when there is only one cavity in the field of view, there will be a circular depth peak area in the depth image. The rotational and translational motion of the speculum will be difficult to estimate.
因此,将RGB特征提取融合到实时定位的相对位姿计算中。具体可以利用RGB图像提取的腔道纹理等特征,用特征描述子(如SIFT,ORB)或预训练的特征提取网络提取两帧内窥镜图像特征,然后和深度图像一起作为深度配准网络的输入,可以弥补在深度图结构单一时内窥镜位姿难以估计的问题,辅助估计真实内窥镜的运动。此种情况下,需要采集虚拟内窥镜图像、深度图像和对应的虚拟内窥镜位姿,来训练深度提取网络。Therefore, RGB feature extraction is integrated into the relative pose calculation of real-time positioning. Specifically, you can use features such as lumen texture extracted from RGB images, use feature descriptors (such as SIFT, ORB) or pre-trained feature extraction networks to extract the features of two frames of endoscopic images, and then use them together with the depth image as the depth registration network. Input can make up for the problem that the endoscope pose is difficult to estimate when the depth map structure is single, and assist in estimating the movement of the real endoscope. In this case, it is necessary to collect virtual endoscope images, depth images and corresponding virtual endoscope poses to train the depth extraction network.
数据采集中需要对虚拟内窥镜图像做纹理贴图,该贴图需要接近真实内窥镜采集的图像的纹理。During data collection, texture mapping needs to be done on the virtual endoscope image, and the texture needs to be close to the texture of the image collected by the real endoscope.
本申请提供的内窥镜定位方法,通过在获知真实内窥镜初始位姿的情况下,采用预训练的深度提取网络和深度配准网络,可以快速且连续的获得真实内窥镜当前的位姿信息。该方法中的深度提取网络和深度配准网络训练学习后针对不同的病人可以直接进行使用,不需要在术前进行训练, 方便且节省时间。The endoscope positioning method provided by this application can quickly and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network after knowing the initial position of the real endoscope. posture information. The deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
图10示例了一种电子设备的实体结构示意图,如图10所示,该电子设备可以包括:处理器(processor)1010、通信接口(Communications Interface)1020、存储器(memory)1030和通信总线1040,其中,处理器1010,通信接口1020,存储器1030通过通信总线1040完成相互间的通信。处理器1010可以调用存储器1030中的逻辑指令,以执行内窥镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
Figure PCTCN2022125009-appb-000186
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000187
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
Figure PCTCN2022125009-appb-000188
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000189
和所述深度图像
Figure PCTCN2022125009-appb-000190
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000191
将所述相对位姿估计信息
Figure PCTCN2022125009-appb-000192
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000193
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000194
并根据所述位姿估计信息
Figure PCTCN2022125009-appb-000195
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
Figure PCTCN2022125009-appb-000196
是已知的。
Figure 10 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 10, the electronic device may include: a processor (processor) 1010, a communications interface (Communications Interface) 1020, a memory (memory) 1030 and a communication bus 1040. Among them, the processor 1010, the communication interface 1020, and the memory 1030 complete communication with each other through the communication bus 1040. The processor 1010 can call logical instructions in the memory 1030 to perform an endoscope positioning method, which method includes: obtaining a depth image of the current frame collected by the real endoscope, that is, the t-th frame image, based on a pre-trained depth extraction network.
Figure PCTCN2022125009-appb-000186
Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image
Figure PCTCN2022125009-appb-000187
Wherein, the virtual endoscope is determined based on the real endoscope; the depth image
Figure PCTCN2022125009-appb-000188
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000189
and the depth image
Figure PCTCN2022125009-appb-000190
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000191
The relative pose estimation information
Figure PCTCN2022125009-appb-000192
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000193
Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
Figure PCTCN2022125009-appb-000194
And based on the pose estimation information
Figure PCTCN2022125009-appb-000195
Position the real endoscope, where the pose information of the initial position of the real endoscope is
Figure PCTCN2022125009-appb-000196
is known.
此外,上述的存储器1030中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 1030 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
另一方面,本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的内窥 镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
Figure PCTCN2022125009-appb-000197
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于所述预训练的深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000198
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
Figure PCTCN2022125009-appb-000199
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000200
和所述深度图像
Figure PCTCN2022125009-appb-000201
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000202
将所述相对位姿估计信息
Figure PCTCN2022125009-appb-000203
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000204
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000205
并根据所述位姿估计信息
Figure PCTCN2022125009-appb-000206
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
Figure PCTCN2022125009-appb-000207
是已知的。
On the other hand, the present application also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Execute the endoscope positioning method provided by each of the above methods. The method includes: obtaining the depth image of the current frame collected by the real endoscope, that is, the t-th frame image based on the pre-trained depth extraction network.
Figure PCTCN2022125009-appb-000197
Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the depth image dtn of the tnth frame of the target virtual image collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image
Figure PCTCN2022125009-appb-000198
Wherein, the virtual endoscope is determined based on the real endoscope; the depth image
Figure PCTCN2022125009-appb-000199
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000200
and the depth image
Figure PCTCN2022125009-appb-000201
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000202
The relative pose estimation information
Figure PCTCN2022125009-appb-000203
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000204
Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
Figure PCTCN2022125009-appb-000205
And based on the pose estimation information
Figure PCTCN2022125009-appb-000206
Position the real endoscope, where the pose information of the initial position of the real endoscope is
Figure PCTCN2022125009-appb-000207
is known.
又一方面,本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的内窥镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
Figure PCTCN2022125009-appb-000208
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
Figure PCTCN2022125009-appb-000209
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
Figure PCTCN2022125009-appb-000210
和所述深度图像d t-n或将所述深度图像
Figure PCTCN2022125009-appb-000211
和所述深度图像
Figure PCTCN2022125009-appb-000212
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Figure PCTCN2022125009-appb-000213
将所述相对位姿估计信息
Figure PCTCN2022125009-appb-000214
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
Figure PCTCN2022125009-appb-000215
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
Figure PCTCN2022125009-appb-000216
并根据所述位姿估计信息
Figure PCTCN2022125009-appb-000217
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
Figure PCTCN2022125009-appb-000218
是已知的。
On the other hand, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to perform the endoscope positioning method provided by each of the above methods. The method Including: based on the pre-trained depth extraction network to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image
Figure PCTCN2022125009-appb-000208
Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image
Figure PCTCN2022125009-appb-000209
Wherein, the virtual endoscope is determined based on the real endoscope; the depth image
Figure PCTCN2022125009-appb-000210
and the depth image d tn or the depth image
Figure PCTCN2022125009-appb-000211
and the depth image
Figure PCTCN2022125009-appb-000212
Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
Figure PCTCN2022125009-appb-000213
The relative pose estimation information
Figure PCTCN2022125009-appb-000214
The pose estimation information when collecting the tnth frame image with the real endoscope
Figure PCTCN2022125009-appb-000215
Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
Figure PCTCN2022125009-appb-000216
And based on the pose estimation information
Figure PCTCN2022125009-appb-000217
Position the real endoscope, where the pose information of the initial position of the real endoscope is
Figure PCTCN2022125009-appb-000218
is known.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现 本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (10)

  1. 一种内窥镜定位方法,包括:An endoscope positioning method, including:
    基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像
    Figure PCTCN2022125009-appb-100001
    Obtain the depth image of the t-th frame image collected by the real endoscope based on the pre-trained depth extraction network
    Figure PCTCN2022125009-appb-100001
    获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
    Figure PCTCN2022125009-appb-100002
    其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;
    Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image
    Figure PCTCN2022125009-appb-100002
    Wherein, the virtual endoscope is determined based on the real endoscope;
    将所述深度图像
    Figure PCTCN2022125009-appb-100003
    和所述深度图像d t-n或将所述深度图像
    Figure PCTCN2022125009-appb-100004
    和所述深度图像
    Figure PCTCN2022125009-appb-100005
    输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
    Figure PCTCN2022125009-appb-100006
    The depth image
    Figure PCTCN2022125009-appb-100003
    and the depth image d tn or the depth image
    Figure PCTCN2022125009-appb-100004
    and the depth image
    Figure PCTCN2022125009-appb-100005
    Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
    Figure PCTCN2022125009-appb-100006
    将所述相对位姿估计信息
    Figure PCTCN2022125009-appb-100007
    与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
    Figure PCTCN2022125009-appb-100008
    叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
    Figure PCTCN2022125009-appb-100009
    并根据所述位姿估计信息
    Figure PCTCN2022125009-appb-100010
    对所述真实内窥镜进行定位。
    The relative pose estimation information
    Figure PCTCN2022125009-appb-100007
    The pose estimation information when collecting the tnth frame image with the real endoscope
    Figure PCTCN2022125009-appb-100008
    Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope.
    Figure PCTCN2022125009-appb-100009
    And based on the pose estimation information
    Figure PCTCN2022125009-appb-100010
    Position the real endoscope.
  2. 根据权利要求1所述的内窥镜定位方法,其中,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;The endoscope positioning method according to claim 1, wherein the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network, and the recurrent generative adversarial network includes a first A generator, a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscopic image into a real-style endoscopic image. The speculum image is converted into a depth image;
    基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:The depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:
    建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;
    获取预设真实内窥镜图像;Obtain preset real endoscopic images;
    将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练;Use the preset real endoscopic image, the depth image of the virtual image and the virtual pose information as training data to perform weak supervision training on the initial depth extraction network;
    基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数;A loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;
    优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. Deep extraction network.
  3. 根据权利要求1所述的内窥镜定位方法,其中,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;The endoscope positioning method according to claim 1, wherein the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
    在将所述深度图像
    Figure PCTCN2022125009-appb-100011
    和所述深度图像d t-n或将所述深度图像
    Figure PCTCN2022125009-appb-100012
    和所述深度图像
    Figure PCTCN2022125009-appb-100013
    输入预训练的所述深度配准网络之前,所述方法还包括:
    The depth image will be
    Figure PCTCN2022125009-appb-100011
    and the depth image d tn or the depth image
    Figure PCTCN2022125009-appb-100012
    and the depth image
    Figure PCTCN2022125009-appb-100013
    Before inputting the pre-trained deep registration network, the method further includes:
    对所述深度图像
    Figure PCTCN2022125009-appb-100014
    和所述深度图像
    Figure PCTCN2022125009-appb-100015
    进行尺度标定,以得到所述深度图像
    Figure PCTCN2022125009-appb-100016
    和所述深度图像
    Figure PCTCN2022125009-appb-100017
    的单位。
    to the depth image
    Figure PCTCN2022125009-appb-100014
    and the depth image
    Figure PCTCN2022125009-appb-100015
    Perform scaling to obtain the depth image
    Figure PCTCN2022125009-appb-100016
    and the depth image
    Figure PCTCN2022125009-appb-100017
    The unit.
  4. 根据权利要求1所述的内窥镜定位方法,其中,所述深度配准网络为通过如下方式训练得到的:The endoscope positioning method according to claim 1, wherein the depth registration network is trained in the following manner:
    建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual pose information when the virtual endoscope collects the virtual image;
    将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息;Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when collecting two adjacent frames of virtual images;
    将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息;Using the virtual pose information as the training truth value, obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information;
    通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数;The loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;
    优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
  5. 根据权利要求1~4任一项所述的内窥镜定位方法,其中,还包括:The endoscope positioning method according to any one of claims 1 to 4, further comprising:
    采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
  6. 根据权利要求5所述的内窥镜定位方法,其中,根据基于迭代优 化算法的配准方法获得修正位姿的方法,包括:The endoscope positioning method according to claim 5, wherein the method for obtaining the corrected pose according to a registration method based on an iterative optimization algorithm includes:
    获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
    Figure PCTCN2022125009-appb-100018
    其中k≤t;
    Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
    Figure PCTCN2022125009-appb-100018
    where k≤t;
    获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息
    Figure PCTCN2022125009-appb-100019
    Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network
    Figure PCTCN2022125009-appb-100019
    利用所述当前修正图像、或所述深度图像
    Figure PCTCN2022125009-appb-100020
    或所述当前修正图像和所述深度图像
    Figure PCTCN2022125009-appb-100021
    对所述真实内窥镜视野中的腔道图像进行语义分割;
    Using the current corrected image or the depth image
    Figure PCTCN2022125009-appb-100020
    or the current corrected image and the depth image
    Figure PCTCN2022125009-appb-100021
    Perform semantic segmentation on the lumen image in the real endoscopic field of view;
    基于图像相似性测度和语义分割相似性测度,以位姿估计信息
    Figure PCTCN2022125009-appb-100022
    为初始值进行优化求解,得到当前修正图像的修正位姿
    Figure PCTCN2022125009-appb-100023
    Based on image similarity measure and semantic segmentation similarity measure, pose estimation information
    Figure PCTCN2022125009-appb-100022
    Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
    Figure PCTCN2022125009-appb-100023
    将所述真实内窥镜采集第k帧图像时的位姿估计信息
    Figure PCTCN2022125009-appb-100024
    替换为所述修正位姿
    Figure PCTCN2022125009-appb-100025
    The pose estimation information when the real endoscope collects the kth frame image
    Figure PCTCN2022125009-appb-100024
    Replace with the corrected pose
    Figure PCTCN2022125009-appb-100025
  7. 根据权利要求5所述的内窥镜定位方法,其中,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:The endoscope positioning method according to claim 5, wherein the method for obtaining the corrected pose according to a registration method based on an iterative optimization algorithm includes:
    获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
    Figure PCTCN2022125009-appb-100026
    其中k≤t;
    Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network
    Figure PCTCN2022125009-appb-100026
    where k≤t;
    获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d kObtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model;
    将所述深度图像
    Figure PCTCN2022125009-appb-100027
    转换为对应的点云
    Figure PCTCN2022125009-appb-100028
    将所述深度图像d k转换为点云图像Y k
    The depth image
    Figure PCTCN2022125009-appb-100027
    Convert to corresponding point cloud
    Figure PCTCN2022125009-appb-100028
    Convert the depth image d k into a point cloud image Y k ;
    通过ICP算法求解Y k
    Figure PCTCN2022125009-appb-100029
    之间的相对位姿
    Figure PCTCN2022125009-appb-100030
    Solve Y k through the ICP algorithm to
    Figure PCTCN2022125009-appb-100029
    relative posture between
    Figure PCTCN2022125009-appb-100030
    采用所述相对位姿
    Figure PCTCN2022125009-appb-100031
    修正所述真实内窥镜采集第k帧图像时的位姿估计信息
    Figure PCTCN2022125009-appb-100032
    Adopt the relative pose
    Figure PCTCN2022125009-appb-100031
    Correcting the pose estimation information when the real endoscope collects the kth frame image
    Figure PCTCN2022125009-appb-100032
  8. 根据权利要求1~4任一项所述的内窥镜定位方法,其中,还包括:The endoscope positioning method according to any one of claims 1 to 4, further comprising:
    采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像
    Figure PCTCN2022125009-appb-100033
    一起输入预训练的所述深度配准网络;
    The RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are
    Figure PCTCN2022125009-appb-100033
    Input the pre-trained deep registration network together;
    采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标 虚拟图像进行纹理贴图后提取的;The RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
    将所述第t-n帧目标虚拟图像的特征信息和所述深度图像d t-n,或将所述第t-n帧图像的特征信息和所述深度图像
    Figure PCTCN2022125009-appb-100034
    输入预训练的所述深度配准网络。
    Combine the feature information of the tnth frame target virtual image and the depth image dtn , or combine the feature information of the tnth frame image and the depth image
    Figure PCTCN2022125009-appb-100034
    Enter the pretrained deep registration network.
  9. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至8任一项所述内窥镜定位方法。An electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein when the processor executes the program, any one of claims 1 to 8 is implemented. The endoscope positioning method described in the item.
  10. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述内窥镜定位方法。A non-transitory computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the endoscope positioning method according to any one of claims 1 to 8 is implemented.
PCT/CN2022/125009 2022-09-06 2022-10-13 Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium WO2024050918A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211086312.X 2022-09-06
CN202211086312.XA CN117710279A (en) 2022-09-06 2022-09-06 Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2024050918A1 true WO2024050918A1 (en) 2024-03-14

Family

ID=90142942

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125009 WO2024050918A1 (en) 2022-09-06 2022-10-13 Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN117710279A (en)
WO (1) WO2024050918A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070013710A1 (en) * 2005-05-23 2007-01-18 Higgins William E Fast 3D-2D image registration method with application to continuously guided endoscopy
CN104540439A (en) * 2012-08-14 2015-04-22 直观外科手术操作公司 Systems and methods for registration of multiple vision systems
CN111325797A (en) * 2020-03-03 2020-06-23 华东理工大学 Pose estimation method based on self-supervision learning
CN111772792A (en) * 2020-08-05 2020-10-16 山东省肿瘤防治研究院(山东省肿瘤医院) Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning
CN114022527A (en) * 2021-10-20 2022-02-08 华中科技大学 Monocular endoscope depth and pose estimation method and device based on unsupervised learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070013710A1 (en) * 2005-05-23 2007-01-18 Higgins William E Fast 3D-2D image registration method with application to continuously guided endoscopy
CN104540439A (en) * 2012-08-14 2015-04-22 直观外科手术操作公司 Systems and methods for registration of multiple vision systems
CN111325797A (en) * 2020-03-03 2020-06-23 华东理工大学 Pose estimation method based on self-supervision learning
CN111772792A (en) * 2020-08-05 2020-10-16 山东省肿瘤防治研究院(山东省肿瘤医院) Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning
CN114022527A (en) * 2021-10-20 2022-02-08 华中科技大学 Monocular endoscope depth and pose estimation method and device based on unsupervised learning

Also Published As

Publication number Publication date
CN117710279A (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN109448041B (en) Capsule endoscope image three-dimensional reconstruction method and system
Song et al. Mis-slam: Real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing
Visentini-Scarzanella et al. Deep monocular 3D reconstruction for assisted navigation in bronchoscopy
Song et al. Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery
JP5797352B1 (en) Method for tracking a three-dimensional object
CN111080778B (en) Online three-dimensional reconstruction method of binocular endoscope soft tissue image
US20180174311A1 (en) Method and system for simultaneous scene parsing and model fusion for endoscopic and laparoscopic navigation
CN112614169B (en) 2D/3D spine CT (computed tomography) level registration method based on deep learning network
CN111783582A (en) Unsupervised monocular depth estimation algorithm based on deep learning
CN108090954A (en) Abdominal cavity environmental map based on characteristics of image rebuilds the method with laparoscope positioning
Wu et al. Three-dimensional modeling from endoscopic video using geometric constraints via feature positioning
CN110992431B (en) Combined three-dimensional reconstruction method for binocular endoscope soft tissue image
US20220198693A1 (en) Image processing method, device and computer-readable storage medium
CN112598649A (en) 2D/3D spine CT non-rigid registration method based on generation of countermeasure network
CN116452752A (en) Intestinal wall reconstruction method combining monocular dense SLAM and residual error network
CN111260765A (en) Dynamic three-dimensional reconstruction method for microsurgery operative field
WO2024050918A1 (en) Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium
Liu et al. Sparse-to-dense coarse-to-fine depth estimation for colonoscopy
CN114399527A (en) Method and device for unsupervised depth and motion estimation of monocular endoscope
CN115018890A (en) Three-dimensional model registration method and system
WO2021213053A1 (en) System and method for estimating motion of target inside tissue on basis of soft tissue surface deformation
Luo et al. Bronchoscopy navigation beyond electromagnetic tracking systems: a novel bronchoscope tracking prototype
CN114298986A (en) Thoracic skeleton three-dimensional construction method and system based on multi-viewpoint disordered X-ray film
CN113538335A (en) In-vivo relative positioning method and device of wireless capsule endoscope
CN114092643A (en) Soft tissue self-adaptive deformation method based on mixed reality and 3DGAN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22957884

Country of ref document: EP

Kind code of ref document: A1