WO2024050918A1 - Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium - Google Patents
Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium Download PDFInfo
- Publication number
- WO2024050918A1 WO2024050918A1 PCT/CN2022/125009 CN2022125009W WO2024050918A1 WO 2024050918 A1 WO2024050918 A1 WO 2024050918A1 CN 2022125009 W CN2022125009 W CN 2022125009W WO 2024050918 A1 WO2024050918 A1 WO 2024050918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- depth
- endoscope
- virtual
- network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 104
- 238000000605 extraction Methods 0.000 claims abstract description 92
- 238000012549 training Methods 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 238000005457 optimization Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 230000000306 recurrent effect Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 17
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000013519 translation Methods 0.000 claims description 13
- 238000011524 similarity measure Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 10
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 230000014616 translation Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 9
- 238000012937 correction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 210000002345 respiratory system Anatomy 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000001356 surgical procedure Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 210000003445 biliary tract Anatomy 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 208000032538 Depersonalisation Diseases 0.000 description 1
- 101000703681 Homo sapiens Single-minded homolog 1 Proteins 0.000 description 1
- 101000616761 Homo sapiens Single-minded homolog 2 Proteins 0.000 description 1
- 102100031980 Single-minded homolog 1 Human genes 0.000 description 1
- 102100021825 Single-minded homolog 2 Human genes 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004289 cerebral ventricle Anatomy 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000001839 endoscopy Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/20—Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/06—Devices, other than using radiation, for detecting or locating foreign bodies ; determining position of probes within or on the body of the patient
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
Definitions
- the present application relates to the technical field of endoscope positioning, and in particular to an endoscope positioning method, electronic device and non-transitory computer-readable storage medium.
- An endoscope is a testing instrument that integrates traditional optics, ergonomics, precision machinery, modern electronics, mathematics, and software. It has image sensors, optical lenses, light sources, mechanical devices, etc. It can enter the stomach through the mouth or enter the body through other natural orifices. Endoscopes can see lesions that cannot be shown by X-rays, so they have become a commonly used technical method in medical examinations.
- endoscope positioning include: (1) Extracting the depth of the endoscopic image through the shape from shading (SFS) method, and identifying the part with greater depth as the airway. After the airway is extracted, the model reconstructed from the preoperative CT is compared, and the current image is mapped to the airway branch where the camera is located, or the endoscope movement is estimated based on changes in the deepest position of the airway in adjacent images. This method is possible at airway bifurcations, but it is difficult to provide continuous endoscopic positioning information when there is no or only one airway in the field of view. (2) Extract the feature points of the endoscopic image through the Structure From Motion (SFM) method.
- SFM Structure From Motion
- This application provides an endoscope positioning method, electronic equipment and non-transitory computer-readable storage medium to solve the shortcomings in the existing technology of being unable to provide continuous positioning information and easily causing positioning loss, and to achieve rapid positioning of the endoscope. , accurate positioning and the ability to obtain continuous pose information.
- This application provides an endoscope positioning method, including:
- Depth image of tn frame image wherein, the virtual endoscope is determined based on the real endoscope;
- the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
- the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network
- the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator
- the first generator is used to convert the depth image into a real-style endoscopic image
- the second generator is used to convert the real-style endoscope image into The image is converted into a depth image
- the depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:
- Establish a virtual model obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;
- a loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;
- the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
- the depth image will be and the depth image d tn or the depth image and the depth image
- the method further includes:
- the depth registration network is trained in the following manner:
- the loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;
- An endoscope positioning method provided according to this application also includes:
- a registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network, and the pose estimation information of the real endoscope is corrected according to the corrected pose obtained by the registration method based on an iterative optimization algorithm to eliminate Cumulative error.
- a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
- pose estimation information Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
- a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
- An endoscope positioning method provided according to this application also includes:
- the RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are Input the pre-trained deep registration network together;
- the RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
- This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
- the processor executes the program, the endoscope is implemented as any one of the above. Positioning method.
- the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
- the computer program When executed by a processor, it implements any of the above endoscope positioning methods.
- the present application also provides a computer program product, which includes a computer program.
- the computer program When the computer program is executed by a processor, the computer program implements any one of the above endoscope positioning methods.
- the endoscope positioning method provided by this application can quickly, accurately and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network when the initial pose of the real endoscope is known. pose information.
- the deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
- Figure 1 is one of the flow diagrams of the endoscope positioning method provided by this application.
- FIG. 2 is a schematic diagram of the depth extraction network structure provided by this application.
- Figure 3 is a schematic flow chart of the training method of the depth extraction network provided by this application.
- Figure 4a is a schematic diagram of the depth extraction network generator architecture provided by this application.
- FIG. 4b is a schematic diagram of the deep extraction network Resnet block architecture provided by this application.
- Figure 4c is a schematic diagram of the depth extraction network discriminator architecture provided by this application.
- Figure 5 is a schematic flow chart of the training method of the deep registration network provided by this application.
- Figure 6 is a schematic diagram of the deep registration network architecture provided by this application.
- Figure 7 is one of the flow diagrams of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
- Figure 8 is the second schematic flow chart of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
- Figure 9 is the second schematic flow chart of the endoscope positioning method provided by this application.
- Figure 10 is a schematic structural diagram of an electronic device provided by this application.
- the endoscope positioning method of the present application is described below in conjunction with Figures 1-9. As shown in Figure 1, the method includes:
- the endoscope positioning method can be used in the natural cavities of the human body such as the respiratory tract, biliary tract, and cerebral ventricle.
- the t-th frame image also known as range image, refers to an image in which the distance (depth) from the image collector to each point in the scene is used as a pixel value. It directly reflects the geometry of the visible surface of the scene. .
- Depth images can be calculated into point cloud data after coordinate conversion, and point cloud data with rules and necessary information can also be back-calculated into depth image data.
- S102 Obtain the depth image dtn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the real endoscope collection based on the pre-trained depth extraction network The depth image of the tnth frame image Wherein, the virtual endoscope is determined based on the real endoscope.
- the depth image d tn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model is obtained, or the depth image dtn of the tnth frame image collected by the real endoscope is obtained.
- the virtual endoscope moves together with the movement of the real endoscope in the target virtual model.
- the positioning position of the virtual endoscope at the tn frame in the target virtual model means that the real endoscope is collecting the tnth frame image.
- the positioning position at that time corresponds to the target virtual model.
- n ⁇ 10, that is, the images within ten frames before the current frame image, so that the tn frame and the t frame have more similar feature points.
- n in this method is not fixed.
- the virtual endoscope needs to be determined based on the real endoscope, so the internal parameters of the virtual endoscope need to be consistent with the internal parameters of the real endoscope.
- Illustrative Use MATLAB software to perform checkerboard calibration on a real endoscope to obtain the internal reference of the endoscope.
- the internal reference of the real endoscope is:
- the image pixels are:
- width*length width ⁇ height
- the parameters of the virtual endoscope are:
- the depth image can be And the depth image d tn is input into the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- Depth images can also be and depth images Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- S104 Convert the relative pose estimation information to The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
- the relative pose estimation information will be obtained
- the pose estimation information when collecting the tnth frame image with the real endoscope By superimposing, the pose estimation information of the t-th frame image collected by the real endoscope can be obtained. According to the pose estimation information Position the real endoscope.
- the pose information of the initial position of the real endoscope It can be learned when the deep registration network is initialized.
- the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained deep registration network
- the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator
- the first generator is used to convert the depth image into a real-style endoscopic image
- the second generator is used to convert the real-style endoscope image into The image is converted into a depth image
- the depth extraction network based on the recurrent generative adversarial network and the depth registration network is trained in the following way:
- S301 Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual posture information corresponding to the virtual endoscope when collecting the virtual image.
- a depth registration network needs to be trained first, and the depth extraction network needs to apply the trained depth registration network.
- the style of an image refers to the texture, color, and visual patterns at different spatial scales in the image.
- the depth extraction network performs training supervision, which can improve the robustness of the depth extraction network.
- virtual models such as virtual models for the respiratory tract, virtual models for the biliary tract, etc. Corresponding virtual models can be established according to the needs of use.
- the target body corresponding to the preset real endoscopic image is consistent with the target body corresponding to the virtual model.
- the virtual model is a virtual model of the respiratory tract established based on the respiratory tract
- the preset real endoscopic image is also an image of the collected respiratory tract.
- S303 Use the preset real endoscopic image, the depth image of the virtual image, and the virtual pose information as training data to perform weakly supervised training on the initial depth extraction network.
- the depth image and virtual pose information obtained in the above steps are used as training data to perform weakly supervised training on the initial depth extraction network.
- S304 Obtain a loss function based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network.
- Cycle GAN includes a first generator G image , a first discriminator G image , a second generator G depth and a second discriminator D depth , which combines the depth image domain and the endoscope
- the image domains are denoted Z and X respectively.
- mapping G depth For an endoscopic image x ⁇ X, the depth extraction algorithm aims to learn a mapping G depth : Next, map G image : Z ⁇ X will Reconstruct to domain The difference from x t after reconstruction to domain X. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
- y is a variable, representing a certain frame of image
- p represents the probability distribution
- the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. image, therefore, a generative adversarial loss is introduced, where LS-GAN loss can be used:
- ⁇ is used to omit image or depth.
- y ⁇ p(data) represents the distribution of the sample following domain X or Y.
- the motion trajectory of the virtual endoscope can be collected from the virtual model, and the pose and corresponding depth image of the virtual endoscope at each moment can be recorded.
- the pose and corresponding depth images impose view consistency constraints between the generated image frames collected by real endoscopes, and the image view consistency loss is added based on the Perspective-n-point (PnP) based on the adversarial loss.
- PnP Perspective-n-point
- t tn,t (t x , t y , t z ) is the translation vector of the camera from time tn to time t;
- the camera rotation matrix R tn,t from time tn to time t is calculated by the following formula:
- ⁇ 1 sin ⁇
- ⁇ 2 sin ⁇
- ⁇ 3 sin ⁇
- ⁇ 1 cos ⁇
- ⁇ 2 cos ⁇
- ⁇ 3 cos ⁇
- view consistency is also added to x tn and x t and the generated depth map and Although the relative pose of the endoscope cannot be collected at this time, there is a pre-trained depth registration network based on the depth pose estimation algorithm. and The relative pose of the corresponding endoscope can be calculated. Load the pre-trained pose estimation network during training to estimate the relative motion p tn,t of the endoscope. At this time, an ideal depth image estimate should include information that allows the pose estimation network to capture the motion of the endoscope, thus obtaining the reconstruction loss obtained by view consistency:
- Depth map and The inconsistent z diff is defined as:
- the geometric consistency loss is defined as:
- the total loss function of deep extraction network training is:
- ⁇ , ⁇ , ⁇ , ⁇ 1 , ⁇ 2 , and eta are hyperparameters that adjust the weight of each loss.
- S305 Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network.
- the deep extraction network Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network.
- FIG. 4(a), 4(b), and 4(c) it is a schematic diagram of the architecture of the deep extraction network, including (a) generator, (b) Resnet block in the generator, (c) discriminator device.
- the dimensionality of the tensor shown in the figure is based on the input of image size 1 ⁇ 256 ⁇ 256; Res(256, 256) represents the Resnet block with input and output channels of 256; IN represents the Instance Norm layer, and Leaky ReLU represents Leaky ReLU. activation function.
- the depth extraction network can be trained with 7 preset real endoscopic video segments and 8 segments of data collected by virtual endoscopy, including multiple preset real endoscopic images, 2187 depth images and corresponding virtual Endoscopic position.
- the generator is a conventional encoder-decoder architecture, in which the bottleneck layer consists of six Resnet blocks and the discriminator consists of five convolutional layers.
- the Adam optimizer is used to train for 100 rounds.
- ⁇ 1 , ⁇ 2 and ⁇ are set to 0.3, 5 and 5 respectively.
- ⁇ , ⁇ and ⁇ are set to 10, 5 and 1 respectively throughout the training process.
- the parameters of the depth extraction network are updated by continuously optimizing the loss function obtained in the above steps until the final depth extraction network is determined by the preset number of rounds.
- the preset number of rounds can be 50 to 300 rounds, and further can be 100 rounds to 200 rounds.
- the trained depth extraction network can generate depth images with clearer outlines than depth extraction networks such as SfMLearner. Compared with only using deep extraction networks such as Cycle GAN, it can ensure that the structure of the input image is not changed. Depth images with stable and knowable scales (basically the same scale as the training data) can be generated.
- the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
- the depth image will be and the depth image or the depth image and the depth image Before inputting the pre-trained deep registration network, the method further includes:
- the depth estimation network estimates the depth information z from an input endoscopic image
- the pose network estimates the relative poses T and R of the camera between the two images through the input two endoscopic images.
- the depth estimation network can estimate the depth images of the two frames of images and
- the pose network can estimate the relative motion of the camera t tn,t and R tn,t .
- warping refers to manipulating the image to deform the pixels in the image.
- the geometric consistency loss is defined as:
- the loss function can include the following losses:
- the depth extraction algorithm aims to learn a mapping G depth :
- map G image Z ⁇ X will Rebuild to domain X, completing the loop.
- the conversion from Z domain to X domain is similar.
- the network model imposes cycle consistency loss on G image and G depth :
- p represents the probability distribution, express expectations.
- the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator.
- a generative adversarial loss is introduced, here the LS-GAN loss is used:
- ⁇ is used to omit image or depth.
- y ⁇ p(data) represents the distribution of the sample following domain X or Y.
- the scale of the depth image obtained by the above two depth extraction networks is fuzzy and unitless, so it needs to be calibrated.
- specific calibration methods include the following two methods. At least one of the following two methods can be used when calibrating:
- the visual range of the real endoscope is segmented according to the depth threshold, and the diameter of the area above the threshold is the same diameter as the depth peak in the lumen in the virtual model established before surgery.
- the depth is compared to obtain the true endoscope scale. For example, if the depth threshold is set to 5, the depth portion higher than the threshold in the depth image 0 extracted by the real endoscope is segmented into a circle with a diameter of 10 pixels.
- the corresponding depth image contour can be found as a circle with a peak diameter of 10 pixels.
- the pose network and the depth network have the same fuzzy scale.
- the deep registration network is trained in the following manner:
- S501 Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual posture information when the virtual endoscope collects the virtual image.
- the deep registration network is a deep neural network in the form of an encoder-decoder.
- the network input is two frames of depth information.
- the encoder uses the structure of the FlowNetC encoder (the optical flow extracted by FlowNet is a simulation of the sports field).
- the decoder uses several layers of CNN (Convolutional Neural Network) to finally transform the encoded information into It is the 6DOF (ie 3D translation and 3D Euler angle) pose parameter output.
- CNN Convolutional Neural Network
- S502 Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when two adjacent frames of virtual images are collected.
- the depth image of the virtual image obtained in the above steps is input into the initial depth registration network for weak supervision training.
- the output of the initial depth registration network can obtain the relative pose of the virtual endoscope when collecting two adjacent frames of virtual images. Estimate information.
- S503 Use the virtual pose information as a training true value, and obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information.
- the virtual pose information is used as the training true value.
- the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images can be obtained.
- the virtual pose information is obtained.
- the endoscope collects relative pose true value information and relative pose estimation information when two adjacent frames of images are collected.
- S504 Obtain the loss function by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information.
- the translation loss and rotation loss between the relative pose estimation information of the virtual endoscope and the real relative pose are calculated respectively, and the translation loss and rotation loss are weighted and summed to obtain the final loss function:
- L t is the translation loss: T tm,t , are the translation vectors in the real relative pose information and relative pose estimation information respectively;
- L r is the rotation loss: R tm,t , are the rotation vectors in the real relative pose information and relative pose estimation information respectively;
- ⁇ is a hyperparameter used to adjust the proportion of the two losses of rotation loss and displacement loss.
- the pose estimation network is trained with 37 virtual endoscope pose and depth images collected from the virtual endoscope trajectory, including 11,904 frames.
- the network uses a pre-trained FlowNetC encoder to regress pose vectors with three convolutional blocks.
- the network is trained by using the Adam optimizer with an initial learning rate of 1e-5 and training time of 300 epochs. ⁇ is set to 100.
- S505 Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
- the depth extraction network learns the endoscope pose transformation parameters between two input depth images through deep learning methods, thereby updating the endoscope pose transformation for each input endoscopic image.
- This depth registration network is based on depth registration rather than image intensity, allowing the algorithm to have no additional requirements for the rendering of virtual images acquired by virtual endoscopes in the simulator.
- the deep learning algorithm directly estimates pose transformation, allowing the algorithm to run quickly and in real time to obtain real-time positioning results.
- it also includes:
- a registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
- the registration method based on the iterative optimization algorithm has a slow calculation speed and runs in parallel with the deep registration network for pose correction. It can correct the pose estimation information of the real endoscope lazily, so that the cumulative error does not increase. It will continue to increase and improve positioning accuracy.
- a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
- S701 Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k ⁇ t.
- this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame.
- the k-th image frame of k ⁇ t is obtained as the current corrected image, that is, the pose estimation information of the real endoscope corresponding to the image frame of the corrected image has been estimated and obtained.
- S702 Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network.
- the pose estimation information of the k-th frame image is It has been estimated and can be obtained directly.
- Segmentation here refers to regional segmentation of all cavity images in the detection field of view, that is, partitioning.
- the depth image can be utilized Either an RGB image x t or an RGBD image (x t and ) divides the cavity.
- the segmentation method can be to use depth threshold to segment depth images, or the network can be trained to learn channel segmentation of RGB or RGBD images.
- S704 Based on the image similarity measure and the semantic segmentation similarity measure, estimate the information by pose Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
- this method is a correction method based on image registration.
- the segmentation process is recorded as Seg( ⁇ ), and the corrected pose of the real endoscope at time k
- the corresponding airway segmentation result is Given the camera pose at time t-1, from the initial value of the pose Start optimization solution
- the optimization process is described as:
- SIM1( ⁇ ) is the image similarity measure
- SIM2( ⁇ ) is the segmentation similarity measure
- P t ′ is a variable
- Seg(P′ t ) is the result of segmenting the corresponding image or depth map when the virtual pose of the virtual endoscope is P′ t .
- This method can make up for the situation where two channels, one deep and one shallow, appear when only using image similarity measures. Similarity measures such as NCC (Normalized Cross Correlation) will focus on aligning the two depth maps. For the deep cavity part, the characteristics of the shallow cavity are ignored, resulting in inaccurate calculations.
- NCC Normalized Cross Correlation
- a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
- S801 Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k ⁇ t.
- this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. When performing correction, the k-th frame image with k ⁇ t is obtained as the current corrected image.
- the virtual endoscope moves together with the movement of the real endoscope in the target virtual model.
- the positioning pose of the virtual endoscope at the kth frame in the target virtual model is the real endoscope in the collection.
- the positioning pose of the kth frame image corresponds to the target virtual model.
- the method further includes:
- S901 Use the RGB image feature extraction method to extract the feature information of the t-th frame image collected by the real endoscope, and combine the feature information of the t-th frame image with the depth image Input the pre-trained deep registration network together;
- S902 Use the RGB image feature extraction method to extract the feature information of the t-nth frame image collected by the real endoscope or extract the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the t-nth frame target virtual image The feature information is extracted after texture mapping the target virtual image of the t-nth frame;
- S903 Combine the feature information of the tn-th frame target virtual image and the depth image dtn , or combine the feature information of the tn-th frame image and the depth image Enter the pretrained deep registration network.
- RGB feature extraction is integrated into the relative pose calculation of real-time positioning.
- Input can make up for the problem that the endoscope pose is difficult to estimate when the depth map structure is single, and assist in estimating the movement of the real endoscope.
- texture mapping needs to be done on the virtual endoscope image, and the texture needs to be close to the texture of the image collected by the real endoscope.
- the endoscope positioning method provided by this application can quickly and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network after knowing the initial position of the real endoscope. posture information.
- the deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
- Figure 10 illustrates a schematic diagram of the physical structure of an electronic device.
- the electronic device may include: a processor (processor) 1010, a communications interface (Communications Interface) 1020, a memory (memory) 1030 and a communication bus 1040.
- the processor 1010, the communication interface 1020, and the memory 1030 complete communication with each other through the communication bus 1040.
- the processor 1010 can call logical instructions in the memory 1030 to perform an endoscope positioning method, which method includes: obtaining a depth image of the current frame collected by the real endoscope, that is, the t-th frame image, based on a pre-trained depth extraction network.
- the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
- the above-mentioned logical instructions in the memory 1030 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
- the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
- the present application also provides a computer program product.
- the computer program product includes a computer program.
- the computer program can be stored on a non-transitory computer-readable storage medium.
- the computer can Execute the endoscope positioning method provided by each of the above methods.
- the method includes: obtaining the depth image of the current frame collected by the real endoscope, that is, the t-th frame image based on the pre-trained depth extraction network.
- the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
- the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
- the computer program is implemented when executed by a processor to perform the endoscope positioning method provided by each of the above methods.
- the method Including: based on the pre-trained depth extraction network to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network.
- the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
- the relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
- the device embodiments described above are only illustrative.
- the units described as separate components may or may not be physically separated.
- the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
- each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
- the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Surgery (AREA)
- Biomedical Technology (AREA)
- Animal Behavior & Ethology (AREA)
- Veterinary Medicine (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Robotics (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Image Analysis (AREA)
- Endoscopes (AREA)
Abstract
Provided in the present application are an endoscope positioning method, an electronic device, and a non-transitory computer-readable storage medium. The method comprises: on the basis of a depth extraction network, acquiring a depth image (I) of a t-th image frame collected by a real endoscope; acquiring a depth image dt-n of a (t-n)-th target virtual image frame collected by a virtual endoscope, or on the basis of the depth extraction network, acquiring a depth image (II) of a (t-n)-th image frame collected by the real endoscope; inputting the depth image (I) and the depth image dt-n or inputting the depth image (I) and the depth image (II) into a depth registration network to obtain the relative position and orientation estimation information (III) of the real endoscope; and superposing the relative position and orientation estimation information (III) with the position and orientation estimation information (IIII) of the real endoscope collecting the (t-n)-th image frame, so as to obtain the position and orientation estimation information (IV) of the real endoscope collecting the t-th image frame. The method can quickly, accurately and continuously obtain the current position and orientation information of the real endoscope.
Description
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年09月06日提交的申请号为202211086312.X,发明名称为“内窥镜定位方法、电子设备和非暂态计算机可读存储介质”的中国专利申请的优先权,其通过引用方式全部并入本文。This application requires the priority of the Chinese patent application with the application number 202211086312. All incorporated herein by reference.
本申请涉及内窥镜定位技术领域,尤其涉及一种内窥镜定位方法、电子设备和非暂态计算机可读存储介质。The present application relates to the technical field of endoscope positioning, and in particular to an endoscope positioning method, electronic device and non-transitory computer-readable storage medium.
内窥镜是集中了传统光学、人体工程学、精密机械、现代电子、数学、软件等于一体的检测仪器。具有图像传感器、光学镜头、光源照明、机械装置等,它可以经口腔进入胃内或经其他天然孔道进入体内。利用内窥镜可以看到X射线不能显示的病变,因此成为了医学检查中常用的技术手段。An endoscope is a testing instrument that integrates traditional optics, ergonomics, precision machinery, modern electronics, mathematics, and software. It has image sensors, optical lenses, light sources, mechanical devices, etc. It can enter the stomach through the mouth or enter the body through other natural orifices. Endoscopes can see lesions that cannot be shown by X-rays, so they have become a commonly used technical method in medical examinations.
目前,内窥镜定位常用的方法包括:(1)通过明暗恢复形状(Shape from shading,SFS)方法提取内窥镜图像深度,将深度大的部分识别为气道。在提取出气道后,对比术前CT重建出的模型,将当前图像映射到相机处于的气道分支或者根据相邻图像中气道最深处位置的变化,估算内窥镜运动。该方法气道分叉处可能实现,而在视野中没有或只有一个气道的情况下难以提供连续的内窥镜定位信息。(2)通过运动结构恢复(Structure From Motion,SFM)方法,提取内窥镜图像特征点,对于相邻两帧图像,将特征点一一匹配,并据此解算Perspective-n-Point(PnP)进行内窥镜位姿估计。该方法在内窥镜图像的特征点较少或缺少特征点时Perspective-n-Point(PnP)将不能求解,出现内窥镜定位丢失的问题。(3)2D/3D配准方法,通过将内窥镜拍摄到的2D图像配准到术前重建出的虚拟模型上,从而得到内窥镜在模型中的位置。该方法基于迭代优化算法,因此得到每帧定位都需要较长的计算时间,而内窥镜在实际检查中的位姿变化很快,过长的计算时间容易造成定位丢失。Currently, commonly used methods for endoscope positioning include: (1) Extracting the depth of the endoscopic image through the shape from shading (SFS) method, and identifying the part with greater depth as the airway. After the airway is extracted, the model reconstructed from the preoperative CT is compared, and the current image is mapped to the airway branch where the camera is located, or the endoscope movement is estimated based on changes in the deepest position of the airway in adjacent images. This method is possible at airway bifurcations, but it is difficult to provide continuous endoscopic positioning information when there is no or only one airway in the field of view. (2) Extract the feature points of the endoscopic image through the Structure From Motion (SFM) method. For two adjacent frames of images, match the feature points one by one, and solve the Perspective-n-Point (PnP ) for endoscopic pose estimation. This method will not be able to solve the problem of Perspective-n-Point (PnP) when the endoscopic image has few or missing feature points, causing the problem of endoscope positioning loss. (3) 2D/3D registration method, by registering the 2D image captured by the endoscope to the virtual model reconstructed before surgery, thereby obtaining the position of the endoscope in the model. This method is based on an iterative optimization algorithm, so it requires a long calculation time to obtain the positioning of each frame. However, the position of the endoscope changes rapidly during actual inspection, and excessive calculation time can easily cause positioning loss.
发明内容Contents of the invention
本申请提供一种内窥镜定位方法、电子设备和非暂态计算机可读存储介质,用以解决现有技术中不能提供连续定位信息、易造成定位丢失的缺陷,实现对内窥镜的快速、准确定位并能够对获得连续的位姿信息。This application provides an endoscope positioning method, electronic equipment and non-transitory computer-readable storage medium to solve the shortcomings in the existing technology of being unable to provide continuous positioning information and easily causing positioning loss, and to achieve rapid positioning of the endoscope. , accurate positioning and the ability to obtain continuous pose information.
本申请提供一种内窥镜定位方法,包括:This application provides an endoscope positioning method, including:
基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像
Obtain the depth image of the t-th frame image collected by the real endoscope based on the pre-trained depth extraction network
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;
Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image Wherein, the virtual endoscope is determined based on the real endoscope;
将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
The depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
将所述相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
并根据所述位姿估计信息
对所述真实内窥镜进行定位。
The relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
根据本申请提供的一种内窥镜定位方法,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;According to an endoscope positioning method provided by this application, the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network, and the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscope image into The image is converted into a depth image;
基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:The depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:
建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;
获取预设真实内窥镜图像;Obtain preset real endoscopic images;
将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练;Use the preset real endoscopic image, the depth image of the virtual image and the virtual pose information as training data to perform weak supervision training on the initial depth extraction network;
基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数;A loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;
优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. Deep extraction network.
根据本申请提供的一种内窥镜定位方法,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;According to an endoscope positioning method provided by this application, the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
在将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的所述深度配准网络之前,所述方法还包括:
The depth image will be and the depth image d tn or the depth image and the depth image Before inputting the pre-trained deep registration network, the method further includes:
对所述深度图像
和所述深度图像
进行尺度标定以得到所述深度图像
和所述深度图像
的单位。
to the depth image and the depth image Perform scaling to obtain the depth image and the depth image The unit.
根据本申请提供的一种内窥镜定位方法,所述深度配准网络为通过如下方式训练得到的:According to an endoscope positioning method provided by this application, the depth registration network is trained in the following manner:
建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual pose information when the virtual endoscope collects the virtual image;
将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息;Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when collecting two adjacent frames of virtual images;
将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息;Using the virtual pose information as the training truth value, obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information;
通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数;The loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;
优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
根据本申请提供的一种内窥镜定位方法,还包括:An endoscope positioning method provided according to this application also includes:
采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network, and the pose estimation information of the real endoscope is corrected according to the corrected pose obtained by the registration method based on an iterative optimization algorithm to eliminate Cumulative error.
根据本申请提供的一种内窥镜定位方法,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:According to an endoscope positioning method provided by this application, a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
其中k≤t;
Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t;
获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息
Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network
利用所述当前修正图像、或所述深度图像
或所述当前修正图像和所述深度图像
对所述真实内窥镜视野中的腔道图像进行语义分割;
Using the current corrected image or the depth image or the current corrected image and the depth image Perform semantic segmentation on the lumen image in the real endoscopic field of view;
基于图像相似性测度和语义分割相似性测度,以位姿估计信息
为初始值进行优化求解,得到当前修正图像的修正位姿
Based on image similarity measure and semantic segmentation similarity measure, pose estimation information Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
将所述真实内窥镜采集第k帧图像时的位姿估计信息
替换为所述修正位姿
The pose estimation information when the real endoscope collects the kth frame image Replace with the corrected pose
根据本申请提供的一种内窥镜定位方法,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:According to an endoscope positioning method provided by this application, a method for obtaining corrected posture according to a registration method based on an iterative optimization algorithm includes:
获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
其中k≤t;
Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t;
获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d
k;
Obtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model;
将所述深度图像
转换为对应的点云
将所述深度图像d
k转换为点云图像Y
k;
The depth image Convert to corresponding point cloud Convert the depth image d k into a point cloud image Y k ;
采用所述相对位姿
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
Adopt the relative pose Correcting the pose estimation information when the real endoscope collects the kth frame image
根据本申请提供的一种内窥镜定位方法,还包括:An endoscope positioning method provided according to this application also includes:
采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像
一起输入预训练的所述深度配准网络;
The RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are Input the pre-trained deep registration network together;
采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标虚拟图像进行纹理贴图后提取的;The RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
将所述第t-n帧目标虚拟图像的特征信息和所述深度图像d
t-n,或将 所述第t-n帧图像的特征信息和所述深度图像
输入预训练的所述深度配准网络。
Combine the feature information of the tnth frame target virtual image and the depth image dtn , or combine the feature information of the tnth frame image and the depth image Enter the pretrained deep registration network.
本申请还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述内窥镜定位方法。This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the endoscope is implemented as any one of the above. Positioning method.
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述内窥镜定位方法。The present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements any of the above endoscope positioning methods.
本申请还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述内窥镜定位方法。The present application also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the computer program implements any one of the above endoscope positioning methods.
本申请提供的内窥镜定位方法,通过在获知真实内窥镜初始位姿的情况下,采用预训练的深度提取网络和深度配准网络,可以快速、准确且连续的获得真实内窥镜当前的位姿信息。该方法中的深度提取网络和深度配准网络训练学习后针对不同的病人可以直接进行使用,不需要在术前进行训练,方便且节省时间。The endoscope positioning method provided by this application can quickly, accurately and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network when the initial pose of the real endoscope is known. pose information. The deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in this application or the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.
图1是本申请提供的内窥镜定位方法的流程示意图之一;Figure 1 is one of the flow diagrams of the endoscope positioning method provided by this application;
图2是本申请提供的深度提取网络结构示意图;Figure 2 is a schematic diagram of the depth extraction network structure provided by this application;
图3是本申请提供的深度提取网络的训练方法的流程示意图;Figure 3 is a schematic flow chart of the training method of the depth extraction network provided by this application;
图4a是本申请提供的深度提取网络生成器架构示意图;Figure 4a is a schematic diagram of the depth extraction network generator architecture provided by this application;
图4b是本申请提供的深度提取网络Resnet块架构示意图Figure 4b is a schematic diagram of the deep extraction network Resnet block architecture provided by this application
图4c是本申请提供的深度提取网络判别器架构示意图Figure 4c is a schematic diagram of the depth extraction network discriminator architecture provided by this application.
图5是本申请提供的深度配准网络的训练方法的流程示意图;Figure 5 is a schematic flow chart of the training method of the deep registration network provided by this application;
图6是本申请提供的深度配准网络架构示意图;Figure 6 is a schematic diagram of the deep registration network architecture provided by this application;
图7是本申请提供的基于迭代优化算法的配准方法获得修正位姿的方法的流程示意图之一;Figure 7 is one of the flow diagrams of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
图8是本申请提供的基于迭代优化算法的配准方法获得修正位姿的方法的流程示意图之二;Figure 8 is the second schematic flow chart of the method for obtaining the corrected pose using the registration method based on the iterative optimization algorithm provided by this application;
图9是本申请提供的内窥镜定位方法的流程示意图之二;Figure 9 is the second schematic flow chart of the endoscope positioning method provided by this application;
图10是本申请提供的电子设备的结构示意图。Figure 10 is a schematic structural diagram of an electronic device provided by this application.
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application. , not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
下面结合图1-图9描述本申请的内窥镜定位方法,如图1所示,该方法包括:The endoscope positioning method of the present application is described below in conjunction with Figures 1-9. As shown in Figure 1, the method includes:
S101:基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像
S101: Obtain the depth image of the t-th frame image collected by a real endoscope based on the pre-trained depth extraction network
在本申请实施例中,该内窥镜定位方法可以使用在呼吸道、胆道、脑室等人体自然腔道。该方法中首先需要获取真实内窥镜采集的当前帧即第t帧图像的深度图像
深度图像(depth image)也被称为距离影像(range image),是指将从图像采集器到场景中各点的距离(深度)作为像素值的图像,它直接反映了景物可见表面的几何形状。深度图像经过坐标转换可以计算为点云数据,有规则及必要信息的点云数据也可以反算为深度图像数据。
In the embodiment of the present application, the endoscope positioning method can be used in the natural cavities of the human body such as the respiratory tract, biliary tract, and cerebral ventricle. In this method, we first need to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image. Depth image, also known as range image, refers to an image in which the distance (depth) from the image collector to each point in the scene is used as a pixel value. It directly reflects the geometry of the visible surface of the scene. . Depth images can be calculated into point cloud data after coordinate conversion, and point cloud data with rules and necessary information can also be back-calculated into depth image data.
S102:获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的。
S102: Obtain the depth image dtn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the real endoscope collection based on the pre-trained depth extraction network The depth image of the tnth frame image Wherein, the virtual endoscope is determined based on the real endoscope.
具体的,获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或获取所述真实内窥镜采集的第t-n帧图像的深度图像
虚拟内窥镜在目标虚拟模型中是随着真实内窥镜的移动一起移动的,虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处 即是将真实内窥镜在采集第t-n帧图像时的定位位姿处对应到目标虚拟模型中得到的。其中,n≤10,即当前帧图像前十帧以内的图像,以使得t-n帧和t帧有较多的相似特征点。本方法中n的值不是固定的,例如当前帧是第8帧图像时,t-n可以等于7即是第7帧图像,此时n=1,也可以等于3即是第3帧图像,此时n=5。在当前帧为第9帧图像时,t-n可以等于7即第7帧图像,此时n=2。
Specifically, the depth image d tn of the tnth frame target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model is obtained, or the depth image dtn of the tnth frame image collected by the real endoscope is obtained. The virtual endoscope moves together with the movement of the real endoscope in the target virtual model. The positioning position of the virtual endoscope at the tn frame in the target virtual model means that the real endoscope is collecting the tnth frame image. The positioning position at that time corresponds to the target virtual model. Among them, n≤10, that is, the images within ten frames before the current frame image, so that the tn frame and the t frame have more similar feature points. The value of n in this method is not fixed. For example, when the current frame is the 8th frame image, tn can be equal to 7, which is the 7th frame image, in which case n = 1, or it can be equal to 3, which is the 3rd frame image, in which case n=5. When the current frame is the 9th frame image, tn can be equal to 7, which is the 7th frame image, and at this time n=2.
虚拟内窥镜需要基于真实内窥镜进行确定,因此虚拟内窥镜的内参需要与真实内窥镜的内参一致。The virtual endoscope needs to be determined based on the real endoscope, so the internal parameters of the virtual endoscope need to be consistent with the internal parameters of the real endoscope.
示例性的:对真实内窥镜使用MATLAB软件进行棋盘格标定,得到内窥镜的内参。Illustrative: Use MATLAB software to perform checkerboard calibration on a real endoscope to obtain the internal reference of the endoscope.
真实内窥镜的内参为:The internal reference of the real endoscope is:
图像像素为:The image pixels are:
宽度*长度=width×heightwidth*length=width×height
令:make:
窗口中心x轴坐标:wcx=-2×(cx-width/2)/widthWindow center x-axis coordinate: wcx=-2×(cx-width/2)/width
窗口中心y轴坐标:wcy=2×(cy-height/2)/heightWindow center y-axis coordinate: wcy=2×(cy-height/2)/height
此时,设计虚拟内窥镜时,虚拟内窥镜的参数为:At this time, when designing the virtual endoscope, the parameters of the virtual endoscope are:
视场角:Field of view:
ViewAngle=180/π*(2.0*atan2(height/2.0,focal_length))ViewAngle=180/π*(2.0*atan2(height/2.0,focal_length))
窗口大小:Window size:
WindowSize=[width,height]WindowSize=[width,height]
窗口中心位置:Window center position:
WindowCenter=[wcx,wcy]WindowCenter=[wcx,wcy]
S103:将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
S103: Convert the depth image to and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
具体的,可以通过将深度图像
和深度图像d
t-n输入预训练的深度配 准网络,得到真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
也可以将深度图像
和深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
Specifically, the depth image can be And the depth image d tn is input into the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image. Depth images can also be and depth images Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.
S104:将所述相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
并根据所述位姿估计信息
对所述真实内窥镜进行定位。
S104: Convert the relative pose estimation information to The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
具体的,将得到的相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,即可以获得所述真实内窥镜采集第t帧图像的位姿估计信息
根据该位姿估计信息
对所述真实内窥镜进行定位。
Specifically, the relative pose estimation information will be obtained The pose estimation information when collecting the tnth frame image with the real endoscope By superimposing, the pose estimation information of the t-th frame image collected by the real endoscope can be obtained. According to the pose estimation information Position the real endoscope.
真实内窥镜初始位置的位姿信息
可以是在深度配准网络初始化的时候获知的。
The pose information of the initial position of the real endoscope It can be learned when the deep registration network is initialized.
在一个实施例中,如图2中所示,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;In one embodiment, as shown in Figure 2, the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained deep registration network, and the recurrent generative adversarial network includes a first generator , a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscope image into The image is converted into a depth image;
如图3中所示,基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:As shown in Figure 3, the depth extraction network based on the recurrent generative adversarial network and the depth registration network is trained in the following way:
S301:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息。S301: Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual posture information corresponding to the virtual endoscope when collecting the virtual image.
具体的,上述深度提取网络训练之前,需要先训练出深度配准网络,该深度提取网络需要应用训练好的深度配准网络。图像的风格是指图像中不同空间尺度的纹理、颜色和视觉图案。Specifically, before training the above-mentioned depth extraction network, a depth registration network needs to be trained first, and the depth extraction network needs to apply the trained depth registration network. The style of an image refers to the texture, color, and visual patterns at different spatial scales in the image.
在实际中,由于在真实内窥镜检查中得到内窥镜的位姿是比较困难的,因此,我们需要建立虚拟模型,通过虚拟内窥镜来获取大量的深度图像和虚拟位姿信息来对深度提取网络进行训练监督,由此可以提高深度提取网络的鲁棒性,虚拟模型可以有多种,如针对呼吸道的虚拟模型,针对胆道 的虚拟模型等,可以根据使用需要建立对应的虚拟模型。In practice, since it is difficult to obtain the pose of the endoscope during real endoscopy, we need to establish a virtual model and obtain a large amount of depth images and virtual pose information through the virtual endoscope to perform The depth extraction network performs training supervision, which can improve the robustness of the depth extraction network. There can be a variety of virtual models, such as virtual models for the respiratory tract, virtual models for the biliary tract, etc. Corresponding virtual models can be established according to the needs of use.
S302:获取预设真实内窥镜图像。S302: Obtain the preset real endoscopic image.
预设真实内窥镜图像对应的目标体与虚拟模型建立对应的目标体是一致的,例如虚拟模型是根据呼吸道建立的呼吸道虚拟模型,则预设真实内窥镜图像也是采集的呼吸道的图像。The target body corresponding to the preset real endoscopic image is consistent with the target body corresponding to the virtual model. For example, the virtual model is a virtual model of the respiratory tract established based on the respiratory tract, then the preset real endoscopic image is also an image of the collected respiratory tract.
S303:将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练。S303: Use the preset real endoscopic image, the depth image of the virtual image, and the virtual pose information as training data to perform weakly supervised training on the initial depth extraction network.
具体的,将上述步骤获得的深度图像和虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练。Specifically, the depth image and virtual pose information obtained in the above steps are used as training data to perform weakly supervised training on the initial depth extraction network.
S304:基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数。S304: Obtain a loss function based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network.
具体的,参考图2,循环生成对抗网络Cycle GAN包括第一生成器G
image、第一判别器G
image、第二生成器G
depth和第二判别器D
depth,将深度图像域和内窥镜图像域分别记为Z和X。
Specifically, referring to Figure 2, Cycle GAN includes a first generator G image , a first discriminator G image , a second generator G depth and a second discriminator D depth , which combines the depth image domain and the endoscope The image domains are denoted Z and X respectively.
循环一致性损失:Cycle consistency loss:
对于一张内窥镜图像x∈X,深度提取算法旨在学习一个映射G
depth:X→Z,从x
t生成其相应的深度图像
接着,映射G
image:Z→X将
重建到域X,从而完成循环,循环一致性损失指的是将
重建到域X后与x
t的差距。从Z域到X域的转换也是类似。这里的重建循环中,网络模型对G
image、G
depth施加循环一致性损失:
For an endoscopic image x∈X, the depth extraction algorithm aims to learn a mapping G depth : Next, map G image : Z→X will Reconstruct to domain The difference from x t after reconstruction to domain X. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
其中,y为变量,表示某一帧图像,p表示概率分布,
表示期望。
Among them, y is a variable, representing a certain frame of image, p represents the probability distribution, express expectations.
身份损失:Loss of identity:
为了对映射的学习添加约束,提出身份损失:To add constraints to the learning of mappings, an identity loss is proposed:
生成对抗损失:Generate adversarial loss:
在生成器完成映射循环的同时,判别器D
image、D
depth分别学习判别输 入的内窥镜图像和深度图像是真还是假;而生成器希望可以骗过判别器,生成可以被判别器认为真的图像,因此,引入生成对抗损失,这里可以采用LS-GAN损失:
While the generator completes the mapping cycle, the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. image, therefore, a generative adversarial loss is introduced, where LS-GAN loss can be used:
其中·用来省略image或depth。y~p(data)代表样本服从域X或Y的分布。Among them, · is used to omit image or depth. y~p(data) represents the distribution of the sample following domain X or Y.
重建损失:Reconstruction loss:
为了使网络学习到给定尺度的深度图像估计,可以从虚拟模型中采集虚拟内窥镜的运动轨迹,记录每一时刻的虚拟内窥镜位姿和对应深度图像,通过采集的虚拟内窥镜位姿和对应深度图像在生成的真实内窥镜采集的图像帧之间施加视图一致性约束,在对抗损失基础上根据Perspective-n-point(PnP)添加了图像视图一致性损失。In order for the network to learn depth image estimation at a given scale, the motion trajectory of the virtual endoscope can be collected from the virtual model, and the pose and corresponding depth image of the virtual endoscope at each moment can be recorded. The pose and corresponding depth images impose view consistency constraints between the generated image frames collected by real endoscopes, and the image view consistency loss is added based on the Perspective-n-point (PnP) based on the adversarial loss.
有深度图像z
t-n和z
t,分别输入到生成器G
image可以得到生成的内窥镜图像
和
由于t-n时刻和t时刻虚拟位姿信息在采集数据时也被记录下来,可以计算出从t-n时刻到t时刻的虚拟相对位姿p
t-n,t=(t
x,t
y,t
z,θ,φ,ψ)。已知相机内参K,齐次坐标下的像素点
可以通过下式翘曲到
There are depth images z tn and z t , which are input to the generator G image respectively to obtain the generated endoscopic image. and Since the virtual pose information at time tn and time t is also recorded when collecting data, the virtual relative pose p tn,t = (t x ,t y ,t z ,θ, from time tn to time t can be calculated φ,ψ). Known camera internal parameter K, pixel point under homogeneous coordinates It can be warped to
其中,t
t-n,t=(t
x,t
y,t
z)为从t-n时刻到t时刻的相机的平移向量;从t-n时刻到t时刻的相机旋转矩阵R
t-n,t由下式计算:
Among them, t tn,t = (t x , t y , t z ) is the translation vector of the camera from time tn to time t; the camera rotation matrix R tn,t from time tn to time t is calculated by the following formula:
其中,α
1=sinθ,α
2=sinφ,α
3=sinψ,β
1=cosθ,β
2=cosφ,β
3=cosψ。
Among them, α 1 =sinθ, α 2 =sinφ, α 3 =sinψ, β 1 =cosθ, β 2 =cosφ, β 3 =cosψ.
在训练时使用n≤5。过大的n不能保证两张图像间有足够的共视区域。Use n≤5 when training. Too large n cannot ensure that there is enough common viewing area between the two images.
由于
通常为非整数,需要通过双曲采样到整数像素坐标,最终得到从
翘曲到的图像
而
应当与
一致,因此由视图一致性得到重建损失:
because Usually non-integer, it needs to be hyperbolically sampled to integer pixel coordinates, and finally obtained from Warp the image to and should be with Consistent, so the reconstruction loss is obtained by view consistency:
其中w(·)是翘曲到
空间的算子,
是由
和通过相对平移向 量t
t-n,t及相对旋转向量R
t-n,t重投影得到的深度图像;
代表图像x中的一个像素。由此,G
image被鼓励学习从深度图像到对应内窥镜图像的无偏估计。由于循环一致性的约束,G
depth也将被鼓励学习从内窥镜图像到深度图像的无偏估计,也即生成与输入深度图尺度一致的深度图像。
where w(·) is warped to space operator, By and the depth image obtained by reprojecting the relative translation vector t tn,t and the relative rotation vector R tn,t ; represents a pixel in image x. From this, G image is encouraged to learn unbiased estimation from depth images to corresponding endoscopic images. Due to the cycle consistency constraint, G depth will also be encouraged to learn unbiased estimates from endoscopic images to depth images, that is, generate depth images that are consistent with the scale of the input depth map.
为了进一步约束生成器G
depth:X→Z的学习,视图一致性也被加入到x
t-n和x
t以及生成的深度图
和
虽然此时的内窥镜相对位姿不能被采集,但有预训练的深度配准网络基于深度的位姿估计算法,通过
和
可以计算对应的内窥镜的相对位姿。在训练中加载预训练的位姿估计网络,来估计内窥镜的相对运动p
t-n,t。此时,一个理想的深度图像估计应该包含令位姿估计网络捕捉到内窥镜运动的信息,也就得到由视图一致性得到的重建损失:
In order to further constrain the learning of the generator G depth : X→Z, view consistency is also added to x tn and x t and the generated depth map and Although the relative pose of the endoscope cannot be collected at this time, there is a pre-trained depth registration network based on the depth pose estimation algorithm. and The relative pose of the corresponding endoscope can be calculated. Load the pre-trained pose estimation network during training to estimate the relative motion p tn,t of the endoscope. At this time, an ideal depth image estimate should include information that allows the pose estimation network to capture the motion of the endoscope, thus obtaining the reconstruction loss obtained by view consistency:
因此,得到总的视图一致性重建损失:Therefore, the total view-consistent reconstruction loss is obtained:
几何一致性损失:Geometric consistency loss:
对于生成的深度图
和
若它们对应相同的3D场景,那么两者对应的深度信息应该一致。深度图
和
的不一致z
diff被定义为:
For the generated depth map and If they correspond to the same 3D scene, then the corresponding depth information of the two should be consistent. Depth map and The inconsistent z diff is defined as:
其中,
是由
和通过预训练的深度配准网络计算出的虚拟内窥镜相对位姿p
t-n,t重投影得到的深度图像。
是从
采样得到的深度图。这里计算
和
的误差,而不是
和
的误差,这是因为
重投影的结果并不在一个整数坐标系上,需要把
采样到同样的坐标系,以计算两者的差。
in, By and the depth image obtained by reprojecting the relative pose p tn,t of the virtual endoscope calculated by the pre-trained depth registration network. From Sampled depth map. Calculate here and error instead of and error, this is because The result of reprojection is not in an integer coordinate system and needs to be Sample to the same coordinate system to calculate the difference between the two.
几何一致性损失被定义为:The geometric consistency loss is defined as:
综上,深度提取网络训练的总损失函数:In summary, the total loss function of deep extraction network training is:
其中,β、γ、δ、θ
1、θ
2、η为调节各损失权重的超参数。
Among them, β, γ, δ, θ 1 , θ 2 , and eta are hyperparameters that adjust the weight of each loss.
S305:优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。S305: Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. The deep extraction network.
如图4(a)、4(b)、4(c)中所示,为深度提取网络的架构示意图,其中,(a)生成器,(b)生成器中的Resnet块,(c)判别器。该图所示张量的维数是基于图像大小为1×256×256的输入;Res(256,256)表示输入和输出通道为256的Resnet块;IN表示Instance Norm层,Leaky ReLU表示Leaky ReLU激活函数。As shown in Figure 4(a), 4(b), and 4(c), it is a schematic diagram of the architecture of the deep extraction network, including (a) generator, (b) Resnet block in the generator, (c) discriminator device. The dimensionality of the tensor shown in the figure is based on the input of image size 1×256×256; Res(256, 256) represents the Resnet block with input and output channels of 256; IN represents the Instance Norm layer, and Leaky ReLU represents Leaky ReLU. activation function.
示例性的,深度提取网络可以由7段预设真实内窥镜视频和8段虚拟内窥镜采集的数据进行训练,包括多张预设真实内窥镜图像、2187张深度图像和对应的虚拟内窥镜位姿。在深度提取网络架构中,生成器为常规编码器-解码器架构,其中瓶颈层由六个Resnet块组成,判别器由五个卷积层组成。采用Adam优化器训练100轮,训练开始时设置学习率0.001和θ
1=θ
2=η=0,避免对早期生成结果不佳的深度图施加一致性约束。在训练10轮后,θ
1,θ
2和η分别设置为0.3、5和5。β,γ和δ在整个训练过程中分别设置为10、5和1。
For example, the depth extraction network can be trained with 7 preset real endoscopic video segments and 8 segments of data collected by virtual endoscopy, including multiple preset real endoscopic images, 2187 depth images and corresponding virtual Endoscopic position. In the deep extraction network architecture, the generator is a conventional encoder-decoder architecture, in which the bottleneck layer consists of six Resnet blocks and the discriminator consists of five convolutional layers. The Adam optimizer is used to train for 100 rounds. At the beginning of training, the learning rate is set to 0.001 and θ 1 =θ 2 =η = 0 to avoid imposing consistency constraints on depth maps with poor early generation results. After 10 rounds of training, θ 1 , θ 2 and η are set to 0.3, 5 and 5 respectively. β, γ and δ are set to 10, 5 and 1 respectively throughout the training process.
在训练过程中,通过持续优化上述步骤得到的损失函数,从而更新深度提取网络的参数,直至预设轮数确定最终的深度提取网络,预设轮数可以是50~300轮,进一步的可以是100轮~200轮。训练的该深度提取网络,相对于SfMLearner一类的深度提取网络,可以生成轮廓更清晰的深度图像。相对只使用Cycle GAN一类的深度提取网络,能够保证不改变输入图像的结构。可以生成尺度稳定且可知(尺度与训练数据尺度基本相同)的深度图像。During the training process, the parameters of the depth extraction network are updated by continuously optimizing the loss function obtained in the above steps until the final depth extraction network is determined by the preset number of rounds. The preset number of rounds can be 50 to 300 rounds, and further can be 100 rounds to 200 rounds. The trained depth extraction network can generate depth images with clearer outlines than depth extraction networks such as SfMLearner. Compared with only using deep extraction networks such as Cycle GAN, it can ensure that the structure of the input image is not changed. Depth images with stable and knowable scales (basically the same scale as the training data) can be generated.
在一个实施例中,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;In one embodiment, the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;
在将所述深度图像
和所述深度图像
或将所述深度图像
和所述深度图像
输入预训练的所述深度配准网络之前,所述方法还包括:
The depth image will be and the depth image or the depth image and the depth image Before inputting the pre-trained deep registration network, the method further includes:
对所述深度图像
和所述深度图像
进行尺度标定以得到所述深度图像
和所述深度图像
的单位。
to the depth image and the depth image Perform scaling to obtain the depth image and the depth image The unit.
具体的,针对基于SfMLearner的深度提取网络:Specifically, for the deep extraction network based on SfMLearner:
同时训练一个深度估计网络和一个位姿网络。深度估计网络从输入的一张内窥镜图像中估计其深度信息z,位姿网络通过输入的两张内窥镜图像,估计两张图像之间的相机相对位姿T和R。Simultaneously train a depth estimation network and a pose network. The depth estimation network estimates the depth information z from an input endoscopic image, and the pose network estimates the relative poses T and R of the camera between the two images through the input two endoscopic images.
对于输入的连续两帧内窥镜图像x
t-n、x
t,深度估计网络可以估计两帧图像的深度图像
和
位姿网络可以估计相机相对运动t
t-n,t及R
t-n,t。
For the input of two consecutive frames of endoscopic images x tn , x t , the depth estimation network can estimate the depth images of the two frames of images and The pose network can estimate the relative motion of the camera t tn,t and R tn,t .
已知相机内参K,齐次坐标下的像素点
可以通过下式翘曲到
Known camera internal parameter K, pixel point under homogeneous coordinates It can be warped to
由于
通常为非整数,需要通过双曲采样到整数像素坐标,最终得到从
翘曲到的图像
应与
一致。由视图一致性得到重建损失:
because Usually non-integer, it needs to be hyperbolically sampled to integer pixel coordinates, and finally obtained from Warp the image to Should be with consistent. The reconstruction loss is obtained from view consistency:
其中w(·)是翘曲到
空间的算子,
是由
和通过相对平移向量t
t-n,t及相对旋转向量R
t-n,t重投影得到的深度图像;
代表图像x中的一个像素,翘曲指操纵图像以使图像中的像素变形。通过该损失函数,位姿网络和深度估计网络可以实现自监督,从而完成网络训练。
where w(·) is warped to space operator, By and the depth image obtained by reprojecting the relative translation vector t tn,t and the relative rotation vector R tn,t ; Representing a pixel in the image x, warping refers to manipulating the image to deform the pixels in the image. Through this loss function, the pose network and depth estimation network can achieve self-supervision, thereby completing network training.
为了使网络生成深度图像的尺度稳定,增加几何一致性损失。对于生成的深度图像
和
若它们对应相同的3D场景,那么两者对应的深度信息应该一致。深度图
和
的不一致z
diff被定义为:
In order to stabilize the scale of the depth image generated by the network, a geometric consistency loss is added. For the generated depth image and If they correspond to the same 3D scene, then the corresponding depth information of the two should be consistent. Depth map and The inconsistent z diff is defined as:
其中
是由
和通过位姿网络计算出的真实内窥镜相对运动
重投影得到的深度图。
是从
采样得到的深度图。这里计算
和
的误差,而不是
和
的误差,这是因为
重投影的结果并不在一个整数坐标系上,需要把
采样到同样的坐标系,以计算两者的差。
in By and the relative motion of the real endoscope calculated through the pose network Depth map obtained by reprojection. From Sampled depth map. Calculate here and error instead of and error, this is because The result of reprojection is not in an integer coordinate system and needs to be Sample to the same coordinate system to calculate the difference between the two.
几何一致性损失被定义为:The geometric consistency loss is defined as:
综上得到损失函数:L=aL
rec+bL
gc,其中,a和b为调节各损失权重的超参数。
In summary, the loss function is obtained: L=aL rec +bL gc , where a and b are hyperparameters that adjust the weight of each loss.
具体的,针对基于Cycle GAN的深度提取网络,损失函数可以包括下述损失:Specifically, for the deep extraction network based on Cycle GAN, the loss function can include the following losses:
对于一张内窥镜图像x∈X,深度提取算法旨在学习一个映射G
depth:X→Z,从x生成其相应的深度图
接着,映射G
image:Z→X将
重建到域X,从而完成循环。从Z域到X域的转换也是类似。这里的重建循环中,网络模型对G
image、G
depth施加循环一致性损失:
For an endoscopic image x∈X, the depth extraction algorithm aims to learn a mapping G depth : Next, map G image : Z→X will Rebuild to domain X, completing the loop. The conversion from Z domain to X domain is similar. In the reconstruction loop here, the network model imposes cycle consistency loss on G image and G depth :
为了对映射的学习添加约束,其他的损失函数包括身份损失:To add constraints on the learning of mappings, other loss functions include identity loss:
在生成器完成映射循环的同时,判别器D
image、D
depth分别学习判别输入的内窥镜图像和深度图像是真还是假;而生成器希望可以骗过判别器,生成可以被判别器认为真的图像,引入生成对抗损失,这里采用LS-GAN损失:
While the generator completes the mapping cycle, the discriminators D image and D depth respectively learn to determine whether the input endoscopic image and depth image are true or false; and the generator hopes to fool the discriminator and generate a code that can be considered true by the discriminator. For the image, a generative adversarial loss is introduced, here the LS-GAN loss is used:
其中·用来省略image或depth。y~p(data)代表样本服从域X或Y的分布。Among them, · is used to omit image or depth. y~p(data) represents the distribution of the sample following domain X or Y.
只使用Cycle GAN较难保证生成尺度稳定的深度图像,因此也可以考虑加上几何一致性损失。It is difficult to ensure the generation of scale-stable depth images using only Cycle GAN, so adding geometric consistency loss can also be considered.
上述两种深度提取网络获取的深度图像的尺度是模糊没有单位的,因此需要进行标定。对尺寸进行标定时,具体的标定方法包括以下两种,在进行标定时可以至少采用下述两种方法中的至少一种:The scale of the depth image obtained by the above two depth extraction networks is fuzzy and unitless, so it needs to be calibrated. When calibrating dimensions, specific calibration methods include the following two methods. At least one of the following two methods can be used when calibrating:
(1)在真实内窥镜进入腔道时,根据深度阈值对真实内窥镜可视范围进行分割,根据高于该阈值区域的直径和术前建立的虚拟模型中腔道中深度峰值同样直径处的深度进行比较,从而得到真实内窥镜尺度。示例性的,比如设定深度阈值为5,在真实内窥镜提取出的深度图像0中分割出高于该阈值的深度部分为一个直径为10像素的圆。针对于主气道建立的虚拟模型,假定真实内窥镜处于主气道中央位置,此时对应的深度图画等高线, 可以找到围绕峰值直径为10像素的圆。该等高线对应的深度为1cm,那么可以得到深度网络的尺度为1/5=0.2cm。(1) When the real endoscope enters the lumen, the visual range of the real endoscope is segmented according to the depth threshold, and the diameter of the area above the threshold is the same diameter as the depth peak in the lumen in the virtual model established before surgery. The depth is compared to obtain the true endoscope scale. For example, if the depth threshold is set to 5, the depth portion higher than the threshold in the depth image 0 extracted by the real endoscope is segmented into a circle with a diameter of 10 pixels. For the virtual model established for the main airway, it is assumed that the real endoscope is in the center of the main airway. At this time, the corresponding depth image contour can be found as a circle with a peak diameter of 10 pixels. The depth corresponding to this contour line is 1cm, then the scale of the depth network can be obtained as 1/5 = 0.2cm.
(2)基于上述实施例中的深度提取网络,其位姿网络和深度网络具有相同的模糊的尺度,在真实内窥镜进境时,可以参考机器人控制信号,来比较位姿网络的相对位姿估计信息进行标定。比如机器人控制信号控制内窥镜进境1cm,而位姿网络得到的相对平移向量为向进境方向平移2,那么该尺度为1/2=0.5cm。(2) Based on the depth extraction network in the above embodiment, the pose network and the depth network have the same fuzzy scale. When the real endoscope enters the country, the relative position of the pose network can be compared with reference to the robot control signal. Calibrate the attitude estimation information. For example, the robot control signal controls the endoscope to enter 1cm, and the relative translation vector obtained by the pose network is 2 translations in the entry direction, then the scale is 1/2 = 0.5cm.
在一个实施例中,如图5所示,所述深度配准网络为通过如下方式训练得到的:In one embodiment, as shown in Figure 5, the deep registration network is trained in the following manner:
S501:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息。S501: Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual posture information when the virtual endoscope collects the virtual image.
具体的,深度配准网络为编码器-解码器形式的深度神经网络。网络输入为两帧深度信息,编码器采用FlowNetC编码器的结构(FlowNet提取的光流是对运动场的模拟),解码器采用几层CNN(Convolutional Neural Network,卷积神经网络)将编码信息最后变为6DOF(即3维平移和3维欧拉角)位姿参数输出。Specifically, the deep registration network is a deep neural network in the form of an encoder-decoder. The network input is two frames of depth information. The encoder uses the structure of the FlowNetC encoder (the optical flow extracted by FlowNet is a simulation of the sports field). The decoder uses several layers of CNN (Convolutional Neural Network) to finally transform the encoded information into It is the 6DOF (ie 3D translation and 3D Euler angle) pose parameter output.
对深度配准网络进行训练时,首先需要建立虚拟模型,通过虚拟内窥镜来获取大量的深度图像和虚拟位姿信息来对深度配准网络进行训练监督,以提高深度配准网络的鲁棒性。When training the deep registration network, you first need to establish a virtual model, and use a virtual endoscope to obtain a large number of depth images and virtual pose information to train and supervise the deep registration network to improve the robustness of the deep registration network. sex.
S502:将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息。S502: Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when two adjacent frames of virtual images are collected.
具体的,将上述步骤获得的虚拟图像的深度图像输入初始深度配准网络进行弱监督训练,初始深度配准网络输出可以得到采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息。Specifically, the depth image of the virtual image obtained in the above steps is input into the initial depth registration network for weak supervision training. The output of the initial depth registration network can obtain the relative pose of the virtual endoscope when collecting two adjacent frames of virtual images. Estimate information.
S503:将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息。S503: Use the virtual pose information as a training true value, and obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information.
同时,将所述虚拟位姿信息作为训练真值,通过对虚拟位姿信息进行计算可以获得虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信 息,此时得到了虚拟内窥镜采集相邻两帧图像时的相对位姿真值信息和相对位姿估计信息。At the same time, the virtual pose information is used as the training true value. By calculating the virtual pose information, the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images can be obtained. At this time, the virtual pose information is obtained. The endoscope collects relative pose true value information and relative pose estimation information when two adjacent frames of images are collected.
S504:通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数。S504: Obtain the loss function by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information.
具体的,分别计算虚拟内窥镜的相对位姿估计信息与真实相对位姿之间的平移损失和旋转损失,将平移损失和旋转损失进行加权求和得到最终的损失函数:Specifically, the translation loss and rotation loss between the relative pose estimation information of the virtual endoscope and the real relative pose are calculated respectively, and the translation loss and rotation loss are weighted and summed to obtain the final loss function:
L(z
t-m,z
t)=L
t(z
t-m,z
t)+ωL
r(z
t-m,z
t)
L(z tm ,z t )=L t (z tm ,z t )+ωL r (z tm ,z t )
其中,L
t为平移损失:
T
t-m,t、
分别为真实相对位姿信息和相对位姿估计信息中的平移向量;L
r为旋转损失:
R
t-m,t、
分别为真实相对位姿信息和相对位姿估计信息中的旋转向量;ω为用于调整旋转损失和位移损失两个损失占比的超参数。
Among them, L t is the translation loss: T tm,t , are the translation vectors in the real relative pose information and relative pose estimation information respectively; L r is the rotation loss: R tm,t , are the rotation vectors in the real relative pose information and relative pose estimation information respectively; ω is a hyperparameter used to adjust the proportion of the two losses of rotation loss and displacement loss.
如图6中所示,为深度配准的架构示意图:As shown in Figure 6, it is a schematic diagram of the depth registration architecture:
位姿估计网络用37段虚拟内窥镜轨迹采集的虚拟内窥镜位姿和深度图像进行训练,包括11904帧。网络采用预训练的FlowNetC编码器,以三个卷积块回归姿态向量。网络通过使用Adam优化器进行训练,初始学习率为1e-5,训练时间为300个时期。ω被设置为100。The pose estimation network is trained with 37 virtual endoscope pose and depth images collected from the virtual endoscope trajectory, including 11,904 frames. The network uses a pre-trained FlowNetC encoder to regress pose vectors with three convolutional blocks. The network is trained by using the Adam optimizer with an initial learning rate of 1e-5 and training time of 300 epochs. ω is set to 100.
S505:优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。S505: Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
该深度提取网络,通过深度学习方法学习两个输入深度图像间的内窥镜位姿变换参数,从而对每一输入的内窥镜图像,更新内窥镜的位姿变换。该深度配准网络是基于深度配准而不是图像强度,使算法对于模拟器中虚拟内窥镜采集的虚拟图像的渲染没有额外要求。深度学习算法直接估计位姿变换,使算法可以快速实时地运行,得到实时的定位结果。The depth extraction network learns the endoscope pose transformation parameters between two input depth images through deep learning methods, thereby updating the endoscope pose transformation for each input endoscopic image. This depth registration network is based on depth registration rather than image intensity, allowing the algorithm to have no additional requirements for the rendering of virtual images acquired by virtual endoscopes in the simulator. The deep learning algorithm directly estimates pose transformation, allowing the algorithm to run quickly and in real time to obtain real-time positioning results.
在一个实施例中,还包括:In one embodiment, it also includes:
采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
具体的,采用基于迭代优化算法的配准方法计算速度较慢,与深度配 准网络并行运行来进行位姿修正,可以迟滞地对真实内窥镜的位姿估计信息进行修正,使得累积误差不会持续增大,提高定位精度。Specifically, the registration method based on the iterative optimization algorithm has a slow calculation speed and runs in parallel with the deep registration network for pose correction. It can correct the pose estimation information of the real endoscope lazily, so that the cumulative error does not increase. It will continue to increase and improve positioning accuracy.
在一个实施例中,如图7所示,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:In one embodiment, as shown in Figure 7, a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
S701:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
其中k≤t。
S701: Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t.
具体的,该修正方法相较于估算真实内窥镜位姿估计信息的网络运行较慢,因此在进行并行修正时,并不是逐帧进行修正的。获取k≤t的第k帧图像作为当前修正图像,即作为修正图像的图像帧对应的真实内窥镜的位姿估计信息已经被估算获得。Specifically, this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. The k-th image frame of k≤t is obtained as the current corrected image, that is, the pose estimation information of the real endoscope corresponding to the image frame of the corrected image has been estimated and obtained.
S702:获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息
S702: Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network.
具体的,由于k≤t,因此在对第k帧图像做修正时,第k帧图像的位姿估计信息
已经估算得到,可以直接获取。
Specifically, since k≤t, when correcting the k-th frame image, the pose estimation information of the k-th frame image is It has been estimated and can be obtained directly.
S703:利用所述当前修正图像、或所述深度图像
或所述当前修正图像和所述深度图像
对所述真实内窥镜视野中的腔道图像进行语义分割。
S703: Using the current corrected image or the depth image or the current corrected image and the depth image Semantic segmentation is performed on the lumen image in the real endoscopic field of view.
在实验中发现,由于配准过程中使用的是相似性测度,当图像中同时出现一个较深的腔道和数个较浅的腔道时,由于较深腔道的深度相较其他腔道更大,配准优化过程中会优先满足对准该腔道,容易忽视其他较浅腔道的配准。此时的配准就容易忽略较浅腔道的结构信息。为解决这一问题,在配准前利用深度图像进行腔道图像分割,配准过程不仅需要配准到相似的深度,还要配准到相似的腔道结构。It was found in the experiment that due to the similarity measure used in the registration process, when a deeper cavity and several shallower channels appear in the image at the same time, because the depth of the deeper cavity is compared with other channels, If it is larger, the alignment of this cavity will be given priority during the registration optimization process, and the registration of other shallower channels will be easily ignored. At this time, the registration will easily ignore the structural information of the shallower cavity. To solve this problem, depth images are used to segment the lumen images before registration. The registration process not only requires registration to similar depths, but also to similar lumen structures.
这里的分割指将检测视野中所有的腔道图像进行区域分割,即进行分区。对于输入的内窥镜图像x
t,可以利用深度图像
或是RGB图像x
t或是RGBD图像(x
t及
)分割出腔道。分割方法可以为利用深度阈值分割深度图像,也可以训练网络学习对于RGB或RGBD图像的腔道分割。
Segmentation here refers to regional segmentation of all cavity images in the detection field of view, that is, partitioning. For the input endoscopic image x t , the depth image can be utilized Either an RGB image x t or an RGBD image (x t and ) divides the cavity. The segmentation method can be to use depth threshold to segment depth images, or the network can be trained to learn channel segmentation of RGB or RGBD images.
S704:基于图像相似性测度和语义分割相似性测度,以位姿估计信息
为初始值进行优化求解,得到当前修正图像的修正位姿
S704: Based on the image similarity measure and the semantic segmentation similarity measure, estimate the information by pose Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
具体的,该方法为基于图像配准的修正方法,将分割过程记为Seg(·),k时刻真实内窥镜的修正位姿
对应的气道分割结果为
给定时间t-1的相机位姿,从位姿初始值
开始优化求解
优化过程描述为:
Specifically, this method is a correction method based on image registration. The segmentation process is recorded as Seg(·), and the corrected pose of the real endoscope at time k The corresponding airway segmentation result is Given the camera pose at time t-1, from the initial value of the pose Start optimization solution The optimization process is described as:
其中SIM1(·)为图像相似性测度,SIM2(·)为分割相似性测度,P
t
′为变量。Seg(P′
t)为对虚拟内窥镜的虚拟位姿为P′
t时对应的图像或深度图做分割的结果。同样选用Powell算法作为优化策略进行优化。示例性的,当取k=t时,即以最新计算的位姿估计信息
作为初始值进行优化求解,能够提高算法的收敛性且能够减少迭代的次数。
Among them, SIM1(·) is the image similarity measure, SIM2(·) is the segmentation similarity measure, and P t ′ is a variable. Seg(P′ t ) is the result of segmenting the corresponding image or depth map when the virtual pose of the virtual endoscope is P′ t . Powell's algorithm is also used as the optimization strategy for optimization. For example, when taking k=t, that is, using the latest calculated pose estimation information Optimizing solutions as initial values can improve the convergence of the algorithm and reduce the number of iterations.
该方法可以弥补只使用图像相似性测度时,若出现一深一浅两个腔道,如NCC(Normalized Cross Correlation,归一化互相关)一类的相似性测度会着重对准两张深度图的深腔道部分,忽略浅腔道的特征从而造成的计算不准确问题。This method can make up for the situation where two channels, one deep and one shallow, appear when only using image similarity measures. Similarity measures such as NCC (Normalized Cross Correlation) will focus on aligning the two depth maps. For the deep cavity part, the characteristics of the shallow cavity are ignored, resulting in inaccurate calculations.
S705:将所述真实内窥镜采集第k帧图像时的位姿估计信息
替换为所述修正位姿
S705: Use the pose estimation information when the real endoscope collects the k-th image. Replace with the corrected pose
得到修正位姿后,将真实内窥镜采集第k帧图像时的位姿估计信息
替换为修正位姿
此时真实内窥镜轨迹上采集第t帧图像时的位姿得到修正。在一个实施例中,如图8所示,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:
After the corrected pose is obtained, the pose estimation information when the k-th frame image is collected by the real endoscope Replaced with corrected pose At this time, the pose when the t-th frame image is collected on the real endoscope trajectory is corrected. In one embodiment, as shown in Figure 8, a method for obtaining a corrected pose according to a registration method based on an iterative optimization algorithm includes:
S801:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像
其中k≤t。
S801: Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t.
具体的,该修正方法相较于估算真实内窥镜位姿估计信息的网络运行较慢,因此在进行并行修正时,并不是逐帧进行修正的。在进行修正时获取k≤t的第k帧图像作为当前修正图像。Specifically, this correction method runs slower than the network that estimates the real endoscope pose estimation information. Therefore, when performing parallel correction, it is not corrected frame by frame. When performing correction, the k-th frame image with k≤t is obtained as the current corrected image.
S802:获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d
k。
S802: Obtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model.
具体的,虚拟内窥镜在目标虚拟模型中是随着真实内窥镜的移动一起移动的,虚拟内窥镜在目标虚拟模型中第k帧定位位姿处即是将真实内窥镜在采集第k帧图像时的定位位姿处对应到目标虚拟模型中得到的。Specifically, the virtual endoscope moves together with the movement of the real endoscope in the target virtual model. The positioning pose of the virtual endoscope at the kth frame in the target virtual model is the real endoscope in the collection. The positioning pose of the kth frame image corresponds to the target virtual model.
S803:将所述深度图像
转换为对应的点云
将所述深度图像d
k转换为点云图像Y
k。
S803: Convert the depth image to Convert to corresponding point cloud The depth image d k is converted into a point cloud image Y k .
S805:采用所述相对位姿
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
S805: Adopt the relative pose Correcting the pose estimation information when the real endoscope collects the kth frame image
具体的,采用所述相对位姿
修正所述真实内窥镜采集第k帧图像时的位姿估计信息
此时真实内窥镜轨迹上采集第k帧图像时的位姿得到修正。
Specifically, using the relative pose Correct the pose estimation information when the real endoscope collects the kth frame image At this time, the pose when the k-th frame image is collected on the real endoscope trajectory is corrected.
在一个实施例中,该方法还包括:In one embodiment, the method further includes:
S901:采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像
一起输入预训练的所述深度配准网络;
S901: Use the RGB image feature extraction method to extract the feature information of the t-th frame image collected by the real endoscope, and combine the feature information of the t-th frame image with the depth image Input the pre-trained deep registration network together;
S902:采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标虚拟图像进行纹理贴图后提取的;S902: Use the RGB image feature extraction method to extract the feature information of the t-nth frame image collected by the real endoscope or extract the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the t-nth frame target virtual image The feature information is extracted after texture mapping the target virtual image of the t-nth frame;
S903:将所述第t-n帧目标虚拟图像的特征信息和所述深度图像d
t-n,或将所述第t-n帧图像的特征信息和所述深度图像
输入预训练的所述深度配准网络。
S903: Combine the feature information of the tn-th frame target virtual image and the depth image dtn , or combine the feature information of the tn-th frame image and the depth image Enter the pretrained deep registration network.
目前算法都只使用RGB图像信息或者只使用深度信息。虽然基于深度的定位技术被证明有更强的鲁棒性,但依赖深度做定位在实际使用中,视野中只有一个腔道时,深度图像中会存在一个圆形的深度峰值区域,此时内窥镜的旋转和平移运动将会较难估计。Current algorithms only use RGB image information or only depth information. Although depth-based positioning technology has been proven to be more robust, in actual use, when there is only one cavity in the field of view, there will be a circular depth peak area in the depth image. The rotational and translational motion of the speculum will be difficult to estimate.
因此,将RGB特征提取融合到实时定位的相对位姿计算中。具体可以利用RGB图像提取的腔道纹理等特征,用特征描述子(如SIFT,ORB)或预训练的特征提取网络提取两帧内窥镜图像特征,然后和深度图像一起作为深度配准网络的输入,可以弥补在深度图结构单一时内窥镜位姿难以估计的问题,辅助估计真实内窥镜的运动。此种情况下,需要采集虚拟内窥镜图像、深度图像和对应的虚拟内窥镜位姿,来训练深度提取网络。Therefore, RGB feature extraction is integrated into the relative pose calculation of real-time positioning. Specifically, you can use features such as lumen texture extracted from RGB images, use feature descriptors (such as SIFT, ORB) or pre-trained feature extraction networks to extract the features of two frames of endoscopic images, and then use them together with the depth image as the depth registration network. Input can make up for the problem that the endoscope pose is difficult to estimate when the depth map structure is single, and assist in estimating the movement of the real endoscope. In this case, it is necessary to collect virtual endoscope images, depth images and corresponding virtual endoscope poses to train the depth extraction network.
数据采集中需要对虚拟内窥镜图像做纹理贴图,该贴图需要接近真实内窥镜采集的图像的纹理。During data collection, texture mapping needs to be done on the virtual endoscope image, and the texture needs to be close to the texture of the image collected by the real endoscope.
本申请提供的内窥镜定位方法,通过在获知真实内窥镜初始位姿的情况下,采用预训练的深度提取网络和深度配准网络,可以快速且连续的获得真实内窥镜当前的位姿信息。该方法中的深度提取网络和深度配准网络训练学习后针对不同的病人可以直接进行使用,不需要在术前进行训练, 方便且节省时间。The endoscope positioning method provided by this application can quickly and continuously obtain the current position of the real endoscope by using the pre-trained depth extraction network and depth registration network after knowing the initial position of the real endoscope. posture information. The deep extraction network and deep registration network in this method can be directly used for different patients after training and learning. They do not need to be trained before surgery, which is convenient and time-saving.
图10示例了一种电子设备的实体结构示意图,如图10所示,该电子设备可以包括:处理器(processor)1010、通信接口(Communications Interface)1020、存储器(memory)1030和通信总线1040,其中,处理器1010,通信接口1020,存储器1030通过通信总线1040完成相互间的通信。处理器1010可以调用存储器1030中的逻辑指令,以执行内窥镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
将所述相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
并根据所述位姿估计信息
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
是已知的。
Figure 10 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 10, the electronic device may include: a processor (processor) 1010, a communications interface (Communications Interface) 1020, a memory (memory) 1030 and a communication bus 1040. Among them, the processor 1010, the communication interface 1020, and the memory 1030 complete communication with each other through the communication bus 1040. The processor 1010 can call logical instructions in the memory 1030 to perform an endoscope positioning method, which method includes: obtaining a depth image of the current frame collected by the real endoscope, that is, the t-th frame image, based on a pre-trained depth extraction network. Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image Wherein, the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image. The relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
此外,上述的存储器1030中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 1030 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
另一方面,本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的内窥 镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或基于所述预训练的深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
将所述相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
并根据所述位姿估计信息
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
是已知的。
On the other hand, the present application also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Execute the endoscope positioning method provided by each of the above methods. The method includes: obtaining the depth image of the current frame collected by the real endoscope, that is, the t-th frame image based on the pre-trained depth extraction network. Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning pose in the target virtual model, or obtain the depth image dtn of the tnth frame of the target virtual image collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image Wherein, the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image. The relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
又一方面,本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的内窥镜定位方法,该方法包括:基于预训练的深度提取网络获取真实内窥镜采集的当前帧即第t帧图像的深度图像
获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d
t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像
其中,所述虚拟内窥镜是基于所述真实内窥镜确定的;将所述深度图像
和所述深度图像d
t-n或将所述深度图像
和所述深度图像
输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息
将所述相对位姿估计信息
与所述真实内窥镜采集第t-n帧图像时的位姿估计信息
叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息
并根据所述位姿估计信息
对所述真实内窥镜进行定位,其中,所述真实内窥镜初始位置的位姿信息
是已知的。
On the other hand, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to perform the endoscope positioning method provided by each of the above methods. The method Including: based on the pre-trained depth extraction network to obtain the depth image of the current frame collected by the real endoscope, that is, the t-th frame image Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image Wherein, the virtual endoscope is determined based on the real endoscope; the depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image. The relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope, where the pose information of the initial position of the real endoscope is is known.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现 本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.
Claims (10)
- 一种内窥镜定位方法,包括:An endoscope positioning method, including:基于预训练的深度提取网络获取真实内窥镜采集的第t帧图像的深度图像 Obtain the depth image of the t-th frame image collected by the real endoscope based on the pre-trained depth extraction network获取虚拟内窥镜在目标虚拟模型中t-n帧定位位姿处采集的第t-n帧目标虚拟图像的深度图像d t-n,或基于预训练的所述深度提取网络获取所述真实内窥镜采集的第t-n帧图像的深度图像 其中,所述虚拟内窥镜是基于所述真实内窥镜确定的; Obtain the depth image dtn of the tnth frame of the target virtual image collected by the virtual endoscope at the tn frame positioning position in the target virtual model, or obtain the tnth depth image dtn collected by the real endoscope based on the pre-trained depth extraction network. Depth image of tn frame image Wherein, the virtual endoscope is determined based on the real endoscope;将所述深度图像 和所述深度图像d t-n或将所述深度图像 和所述深度图像 输入预训练的深度配准网络,得到所述真实内窥镜采集第t帧图像与采集第t-n帧图像时的相对位姿估计信息 The depth image and the depth image d tn or the depth image and the depth image Input the pre-trained depth registration network to obtain the relative pose estimation information when the real endoscope collects the t-th frame image and the tn-th frame image.将所述相对位姿估计信息 与所述真实内窥镜采集第t-n帧图像时的位姿估计信息 叠加,获得所述真实内窥镜采集第t帧图像的位姿估计信息 并根据所述位姿估计信息 对所述真实内窥镜进行定位。 The relative pose estimation information The pose estimation information when collecting the tnth frame image with the real endoscope Overlay to obtain the pose estimation information of the t-th frame image collected by the real endoscope. And based on the pose estimation information Position the real endoscope.
- 根据权利要求1所述的内窥镜定位方法,其中,所述深度提取网络为基于循环生成对抗网络和预训练的所述深度配准网络的深度提取网络,所述循环生成对抗网络包括第一生成器、第一判别器、第二生成器和第二判别器,所述第一生成器用于将深度图像转换为真实风格的内窥镜图像,所述第二生成器用于将真实风格的内窥镜图像转换为深度图像;The endoscope positioning method according to claim 1, wherein the depth extraction network is a depth extraction network based on a recurrent generative adversarial network and the pre-trained depth registration network, and the recurrent generative adversarial network includes a first A generator, a first discriminator, a second generator and a second discriminator, the first generator is used to convert the depth image into a real-style endoscopic image, the second generator is used to convert the real-style endoscopic image into a real-style endoscopic image. The speculum image is converted into a depth image;基于循环生成对抗网络和所述深度配准网络的所述深度提取网络是通过下述方式训练得到的:The depth extraction network based on the recurrent generative adversarial network and the deep registration network is trained in the following way:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像并获取采集所述虚拟图像时所述虚拟内窥镜对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the virtual pose information corresponding to the virtual endoscope when collecting the virtual image;获取预设真实内窥镜图像;Obtain preset real endoscopic images;将所述预设真实内窥镜图像、所述虚拟图像的深度图像和所述虚拟位姿信息作为训练数据对初始深度提取网络进行弱监督训练;Use the preset real endoscopic image, the depth image of the virtual image and the virtual pose information as training data to perform weak supervision training on the initial depth extraction network;基于对所述初始深度提取网络进行约束的循环一致性损失、身份损失、生成对抗损失、重建损失、几何一致性损失进行加权求和得到损失函数;A loss function is obtained based on the weighted summation of cycle consistency loss, identity loss, generative adversarial loss, reconstruction loss, and geometric consistency loss that constrain the initial depth extraction network;优化所述损失函数,更新基于循环生成对抗网络和所述深度配准网络的初始深度提取网络的参数,直至预设轮数,以得到基于循环生成对抗网络和所述深度配准网络的所述深度提取网络。Optimize the loss function and update the parameters of the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network until the preset number of rounds to obtain the initial depth extraction network based on the recurrent generative adversarial network and the deep registration network. Deep extraction network.
- 根据权利要求1所述的内窥镜定位方法,其中,所述深度提取网络为基于SfMLearner的深度提取网络或基于循环生成对抗网络的深度提取网络;The endoscope positioning method according to claim 1, wherein the depth extraction network is a depth extraction network based on SfMLearner or a depth extraction network based on a recurrent generative adversarial network;在将所述深度图像 和所述深度图像d t-n或将所述深度图像 和所述深度图像 输入预训练的所述深度配准网络之前,所述方法还包括: The depth image will be and the depth image d tn or the depth image and the depth image Before inputting the pre-trained deep registration network, the method further includes:
- 根据权利要求1所述的内窥镜定位方法,其中,所述深度配准网络为通过如下方式训练得到的:The endoscope positioning method according to claim 1, wherein the depth registration network is trained in the following manner:建立虚拟模型,获取所述虚拟内窥镜在所述虚拟模型中采集的虚拟图像的深度图像,并获取所述虚拟内窥镜采集所述虚拟图像时对应的虚拟位姿信息;Establish a virtual model, obtain the depth image of the virtual image collected by the virtual endoscope in the virtual model, and obtain the corresponding virtual pose information when the virtual endoscope collects the virtual image;将所述虚拟图像的深度图像输入初始深度配准网络,所述初始深度配准网络输出采集相邻两帧虚拟图像时所述虚拟内窥镜的相对位姿估计信息;Input the depth image of the virtual image into an initial depth registration network, and the initial depth registration network outputs the relative pose estimation information of the virtual endoscope when collecting two adjacent frames of virtual images;将所述虚拟位姿信息作为训练真值,根据所述虚拟位姿信息获得所述虚拟内窥镜采集所述相邻两帧虚拟图像时的虚拟相对位姿信息;Using the virtual pose information as the training truth value, obtain the virtual relative pose information when the virtual endoscope collects the two adjacent frames of virtual images according to the virtual pose information;通过对所述相对位姿估计信息与虚拟相对位姿信息之间的平移损失和旋转损失进行加权求和得到所述损失函数;The loss function is obtained by performing a weighted sum of the translation loss and rotation loss between the relative pose estimation information and the virtual relative pose information;优化所述损失函数,更新所述初始深度配准网络的参数,直至收敛,以得到所述深度配准网络。Optimize the loss function and update the parameters of the initial depth registration network until convergence to obtain the depth registration network.
- 根据权利要求1~4任一项所述的内窥镜定位方法,其中,还包括:The endoscope positioning method according to any one of claims 1 to 4, further comprising:采用基于迭代优化算法的配准方法与所述深度配准网络并行运行的方式,根据基于迭代优化算法的配准方法获得修正位姿对所述真实内窥镜的位姿估计信息进行修正,消除累积误差。A registration method based on an iterative optimization algorithm is used to run in parallel with the depth registration network. According to the registration method based on an iterative optimization algorithm, the corrected pose is obtained to correct the pose estimation information of the real endoscope and eliminate the problem. Cumulative error.
- 根据权利要求5所述的内窥镜定位方法,其中,根据基于迭代优 化算法的配准方法获得修正位姿的方法,包括:The endoscope positioning method according to claim 5, wherein the method for obtaining the corrected pose according to a registration method based on an iterative optimization algorithm includes:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像 其中k≤t; Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t;获取基于所述深度配准网络获得的所述真实内窥镜采集第k帧图像的位姿估计信息 Obtain the pose estimation information of the k-th frame image collected by the real endoscope based on the depth registration network利用所述当前修正图像、或所述深度图像 或所述当前修正图像和所述深度图像 对所述真实内窥镜视野中的腔道图像进行语义分割; Using the current corrected image or the depth image or the current corrected image and the depth image Perform semantic segmentation on the lumen image in the real endoscopic field of view;基于图像相似性测度和语义分割相似性测度,以位姿估计信息 为初始值进行优化求解,得到当前修正图像的修正位姿 Based on image similarity measure and semantic segmentation similarity measure, pose estimation information Optimize and solve the initial value to obtain the corrected pose of the current corrected image.
- 根据权利要求5所述的内窥镜定位方法,其中,根据基于迭代优化算法的配准方法获得修正位姿的方法,包括:The endoscope positioning method according to claim 5, wherein the method for obtaining the corrected pose according to a registration method based on an iterative optimization algorithm includes:获取真实内窥镜采集的第k帧图像作为当前修正图像,并通过所述深度提取网络获取第k帧图像的深度图像 其中k≤t; Obtain the k-th frame image collected by the real endoscope as the current corrected image, and obtain the depth image of the k-th frame image through the depth extraction network where k≤t;获取所述虚拟内窥镜在所述目标虚拟模型中第k帧定位位姿处采集的第k帧目标虚拟图像的深度图像d k; Obtain the depth image d k of the k-th frame target virtual image collected by the virtual endoscope at the k-th frame positioning pose in the target virtual model;将所述深度图像 转换为对应的点云 将所述深度图像d k转换为点云图像Y k; The depth image Convert to corresponding point cloud Convert the depth image d k into a point cloud image Y k ;
- 根据权利要求1~4任一项所述的内窥镜定位方法,其中,还包括:The endoscope positioning method according to any one of claims 1 to 4, further comprising:采用RGB图像特征提取方法提取真实内窥镜采集的第t帧图像的特征信息,将所述第t帧图像的特征信息和所述深度图像 一起输入预训练的所述深度配准网络; The RGB image feature extraction method is used to extract the feature information of the t-th frame image collected by the real endoscope, and the feature information of the t-th frame image and the depth image are Input the pre-trained deep registration network together;采用RGB图像特征提取方法提取真实内窥镜采集的第t-n帧图像的特征信息或提取虚拟内窥镜采集的第t-n帧目标虚拟图像的特征信息,其中,所述第t-n帧目标虚拟图像的特征信息是在对所述第t-n帧目标 虚拟图像进行纹理贴图后提取的;The RGB image feature extraction method is used to extract the feature information of the t-nth frame image collected by the real endoscope or the feature information of the t-nth frame target virtual image collected by the virtual endoscope, wherein the features of the t-nth frame target virtual image are The information is extracted after texture mapping the t-nth frame target virtual image;
- 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至8任一项所述内窥镜定位方法。An electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein when the processor executes the program, any one of claims 1 to 8 is implemented. The endoscope positioning method described in the item.
- 一种非暂态计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述内窥镜定位方法。A non-transitory computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the endoscope positioning method according to any one of claims 1 to 8 is implemented.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211086312.X | 2022-09-06 | ||
CN202211086312.XA CN117710279A (en) | 2022-09-06 | 2022-09-06 | Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024050918A1 true WO2024050918A1 (en) | 2024-03-14 |
Family
ID=90142942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/125009 WO2024050918A1 (en) | 2022-09-06 | 2022-10-13 | Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117710279A (en) |
WO (1) | WO2024050918A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070013710A1 (en) * | 2005-05-23 | 2007-01-18 | Higgins William E | Fast 3D-2D image registration method with application to continuously guided endoscopy |
CN104540439A (en) * | 2012-08-14 | 2015-04-22 | 直观外科手术操作公司 | Systems and methods for registration of multiple vision systems |
CN111325797A (en) * | 2020-03-03 | 2020-06-23 | 华东理工大学 | Pose estimation method based on self-supervision learning |
CN111772792A (en) * | 2020-08-05 | 2020-10-16 | 山东省肿瘤防治研究院(山东省肿瘤医院) | Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning |
CN114022527A (en) * | 2021-10-20 | 2022-02-08 | 华中科技大学 | Monocular endoscope depth and pose estimation method and device based on unsupervised learning |
-
2022
- 2022-09-06 CN CN202211086312.XA patent/CN117710279A/en active Pending
- 2022-10-13 WO PCT/CN2022/125009 patent/WO2024050918A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070013710A1 (en) * | 2005-05-23 | 2007-01-18 | Higgins William E | Fast 3D-2D image registration method with application to continuously guided endoscopy |
CN104540439A (en) * | 2012-08-14 | 2015-04-22 | 直观外科手术操作公司 | Systems and methods for registration of multiple vision systems |
CN111325797A (en) * | 2020-03-03 | 2020-06-23 | 华东理工大学 | Pose estimation method based on self-supervision learning |
CN111772792A (en) * | 2020-08-05 | 2020-10-16 | 山东省肿瘤防治研究院(山东省肿瘤医院) | Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning |
CN114022527A (en) * | 2021-10-20 | 2022-02-08 | 华中科技大学 | Monocular endoscope depth and pose estimation method and device based on unsupervised learning |
Also Published As
Publication number | Publication date |
---|---|
CN117710279A (en) | 2024-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109448041B (en) | Capsule endoscope image three-dimensional reconstruction method and system | |
Song et al. | Mis-slam: Real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing | |
Visentini-Scarzanella et al. | Deep monocular 3D reconstruction for assisted navigation in bronchoscopy | |
Song et al. | Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery | |
JP5797352B1 (en) | Method for tracking a three-dimensional object | |
CN111080778B (en) | Online three-dimensional reconstruction method of binocular endoscope soft tissue image | |
US20180174311A1 (en) | Method and system for simultaneous scene parsing and model fusion for endoscopic and laparoscopic navigation | |
CN112614169B (en) | 2D/3D spine CT (computed tomography) level registration method based on deep learning network | |
CN111783582A (en) | Unsupervised monocular depth estimation algorithm based on deep learning | |
CN108090954A (en) | Abdominal cavity environmental map based on characteristics of image rebuilds the method with laparoscope positioning | |
Wu et al. | Three-dimensional modeling from endoscopic video using geometric constraints via feature positioning | |
CN110992431B (en) | Combined three-dimensional reconstruction method for binocular endoscope soft tissue image | |
US20220198693A1 (en) | Image processing method, device and computer-readable storage medium | |
CN112598649A (en) | 2D/3D spine CT non-rigid registration method based on generation of countermeasure network | |
CN116452752A (en) | Intestinal wall reconstruction method combining monocular dense SLAM and residual error network | |
CN111260765A (en) | Dynamic three-dimensional reconstruction method for microsurgery operative field | |
WO2024050918A1 (en) | Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium | |
Liu et al. | Sparse-to-dense coarse-to-fine depth estimation for colonoscopy | |
CN114399527A (en) | Method and device for unsupervised depth and motion estimation of monocular endoscope | |
CN115018890A (en) | Three-dimensional model registration method and system | |
WO2021213053A1 (en) | System and method for estimating motion of target inside tissue on basis of soft tissue surface deformation | |
Luo et al. | Bronchoscopy navigation beyond electromagnetic tracking systems: a novel bronchoscope tracking prototype | |
CN114298986A (en) | Thoracic skeleton three-dimensional construction method and system based on multi-viewpoint disordered X-ray film | |
CN113538335A (en) | In-vivo relative positioning method and device of wireless capsule endoscope | |
CN114092643A (en) | Soft tissue self-adaptive deformation method based on mixed reality and 3DGAN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22957884 Country of ref document: EP Kind code of ref document: A1 |