CN112802185B - Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception - Google Patents

Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception Download PDF

Info

Publication number
CN112802185B
CN112802185B CN202110106321.XA CN202110106321A CN112802185B CN 112802185 B CN112802185 B CN 112802185B CN 202110106321 A CN202110106321 A CN 202110106321A CN 112802185 B CN112802185 B CN 112802185B
Authority
CN
China
Prior art keywords
point cloud
image
depth
endoscope
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110106321.XA
Other languages
Chinese (zh)
Other versions
CN112802185A (en
Inventor
李霄剑
李玲
丁帅
杨善林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202110106321.XA priority Critical patent/CN112802185B/en
Publication of CN112802185A publication Critical patent/CN112802185A/en
Application granted granted Critical
Publication of CN112802185B publication Critical patent/CN112802185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Endoscopes (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction method and system, and relates to the technical field of three-dimensional reconstruction. According to the method, an endoscope image is obtained, depth estimation is carried out on a current frame of the endoscope image based on a preset multitask neural network model, and point cloud depth of the current frame is obtained; acquiring a local point cloud based on the point cloud depth and the camera model; carrying out registration fusion on the plurality of local point clouds; and splicing the registered and fused local point clouds to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud. The invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on deep learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

Description

Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method and system.
Background
Minimally invasive surgery refers to surgery performed using modern medical instruments such as endoscopes and related equipment. In the past decade, minimally invasive surgery has the advantages of small wound, light pain, less bleeding, quick recovery and the like, and becomes an important diagnosis and treatment means for multiple departments such as general surgery, urinary surgery, extracerebral surgery, extracardiac surgery and the like.
In minimally invasive surgery, it is difficult for a doctor to obtain comprehensive in-vivo environmental information due to a limitation of the angle of view of the endoscope. In addition, the displacement of organs before and during the operation, the operation in the operation may cause the lack of anatomical features, which brings challenges to the operation of locating, suturing, cutting and the like of the focus point in the operation and reduces the operation precision. The three-dimensional reconstruction of the in-vivo model can solve the problems and assist the development of minimally invasive surgery.
The existing endoscope image three-dimensional reconstruction method based on deep learning only can estimate the depth of field information of the current endoscope image and cannot reconstruct and dynamically update the whole three-dimensional model.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method and system, and solves the technical problem that the existing method cannot reconstruct and dynamically update the whole three-dimensional model.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method, which comprises the following steps:
s1, acquiring an endoscope image;
s2, depth estimation is carried out on the current frame of the endoscope image based on a preset multitask neural network model, and the point cloud depth of the current frame is obtained;
s3, acquiring local point cloud based on the point cloud depth and a camera model;
s4, carrying out registration fusion on the local point clouds;
and S5, splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the passage of time, and visually displaying the global point cloud.
Preferably, the preset multitasking neural network model comprises: the method comprises three types of convolution blocks and a global pooling layer, wherein the three types of convolution blocks comprise a convolution block I, a convolution block II and a convolution block III, and the process of processing the endoscope image by the multitask neural network model comprises the following steps:
extracting a feature map of an endoscopic image through two convolution blocks and a pair of two frames of endoscopic images to obtain a first feature map and a second feature map, wherein network parameter weights of the two convolution blocks are shared;
splicing the first characteristic diagram and the second characteristic diagram, and extracting the characteristics of the spliced characteristic diagrams through the two pairs of the rolling blocks to obtain the inter-frame motion vector estimation characteristics;
pooling the inter-frame motion vector estimation characteristics through the global pooling layer to obtain a camera motion vector between two frames of endoscope images;
adjusting feature extraction is carried out on the spliced feature map through the volume block III to obtain depth information features; and the second feature map is connected with the depth information feature layer jump and outputs a multi-scale parallax map suitable for the second endoscopic image.
Preferably, the training process of the preset multitask neural network model includes:
acquiring and processing an endoscope image;
inputting the processed endoscope image into an initial neural network model, and training the initial neural network model in a self-supervision mode to obtain a multitask neural network model;
wherein, the loss function in the training process comprises:
camera inter-frame motion estimation loss:
Figure GDA0003703887200000031
wherein the content of the first and second substances,
Figure GDA0003703887200000032
a camera translation vector representing a neural network model prediction;
Figure GDA0003703887200000033
a camera rotation vector representing a neural network model prediction; δ 1 and δ 2 are parameters of the Huber function applied to the translation vector and the rotation vector, respectively;
among them, the Huber function
Figure GDA0003703887200000034
The calculation is as follows:
Figure GDA0003703887200000035
wherein y and
Figure GDA0003703887200000036
two numbers to be compared are indicated;
the image restoration loss comprises pixel error loss and similarity error loss, and specifically comprises the following steps:
loss of pixel error:
Figure GDA0003703887200000037
wherein: m and N represent the pixel width and height values of the image, respectively; i (I, j) represents the true pixel value of the second graph at coordinate (I, j);
Figure GDA0003703887200000041
representing the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm; θ is a Huber function parameter representing pixel error;
loss of similarity error:
Figure GDA0003703887200000042
wherein: sim represents an image similarity evaluation function, and the value of the Sim is between 0 and 1; i represents the second real graph;
Figure GDA0003703887200000043
representing a second graph reconstructed by the algorithm;
depth smoothing error loss:
Figure GDA0003703887200000044
wherein: d (i, j) represents the inverse of the estimated depth of the second map at coordinate (i, j);
the total loss function is the weighted sum of the loss functions, and the weight distribution of each part is obtained through the neural network hyper-parameter learning.
Preferably, the S3 includes:
distortion correction is performed on an endoscope image based on camera external parameters, and a pixel value restoration step for undistorted image pixel coordinates (u v) includes:
for the normalized plane of the undistorted image, there are:
Figure GDA0003703887200000045
wherein (x 'y') represents the corresponding coordinates of the undistorted image pixel coordinates (u v) on the normalized plane;
the coordinates are distorted and the coordinates on the normalized plane are (x "y") as follows:
Figure GDA0003703887200000051
wherein r is 2 =x′ 2 +y′ 2
Projecting the distorted normalized plane coordinates onto a pixel plane to obtain pixel coordinates as follows:
Figure GDA0003703887200000052
therefore, the pixel value of the undistorted image coordinate (u v) is the distorted image coordinate (u) d v d ) The corresponding pixel value; and u d And v d Usually a non-integer, and can be obtained by bilinear interpolation d v d ) The corresponding pixel value;
bilinear interpolation is as follows:
if u d And v d If all are non-integers, then u is taken 1 <u d <u 1 +1,v 1 <v d <v 1 + 1; if u 1 And v 1 Are integers, then:
I(u d ,v d )=(v 1 +1-v d )I(u d ,v 1 )+(v d -v 1 )I(u d ,v 1 +1)
wherein the content of the first and second substances,
I(u d ,v 1 )=(u 1 +1-u d )I(u 1 ,v 1 )+(u d -u 1 )I(u 1 +1,v 1 )
I(ud,v 1 +1)=(u 1 +1-u d )I(u 1 ,v 1 +1)+(u d -u 1 )I(u 1 +1,v 1 +1);
solving x and y of the point cloud according to the coordinates and the camera model after the pixel value reduction, specifically:
x=z(u-c x )/f x
y=z(v-c y )/f y
and taking the depth of the point cloud in the step S2 as z according to the x and y of the point cloud, and obtaining a local point cloud under the endoscope camera coordinate system corresponding to the current frame.
Preferably, the S4 includes:
if the endoscope is supported by the robot, the camera pose corresponding to each frame of image is obtained, and the inter-frame motion information of the endoscope camera is obtained through pose conversion;
and (3) taking the interframe motion information as an initial value of point cloud registration, and performing registration fusion on a plurality of local point clouds by adopting a coherent point drift algorithm.
Preferably, the S4 further includes:
if the endoscope is not supported by the robot, acquiring interframe motion information through a multitask neural network model;
and (3) taking the interframe motion information as an initial value of point cloud registration, and performing registration fusion on a plurality of local point clouds by adopting a coherent point drift algorithm.
Preferably, the S5 includes:
and splicing the registered and fused local point clouds by adopting a dynamic updating mechanism to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud by adopting a three-dimensional data processing library.
The invention also provides a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction system, which is characterized by comprising:
the acquisition module is used for acquiring an endoscope image;
the depth estimation module is used for carrying out depth estimation on a current frame of the endoscope image based on a preset multitask neural network model to obtain the point cloud depth of the current frame;
the local point cloud obtaining module is used for obtaining a local point cloud based on the point cloud depth and the camera model;
the registration fusion module is used for performing registration fusion on the local point clouds;
and the global point cloud generating module is used for splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud.
The invention also provides a computer readable storage medium for storing program code for performing the method of any of claims 1 to 7.
The present invention also provides an electronic device, comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any one of claims 1 to 7 according to instructions in the program code.
(III) advantageous effects
The invention provides a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction method and a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction system. Compared with the prior art, the method has the following beneficial effects:
according to the method, an endoscope image is obtained, depth estimation is carried out on a current frame of the endoscope image based on a preset multitask neural network model, and point cloud depth of the current frame is obtained; acquiring a local point cloud based on the point cloud depth and the camera model; carrying out registration fusion on the plurality of local point clouds; and splicing the registered and fused local point clouds to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud. The invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on deep learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a minimally invasive surgery space perception oriented endoscopic image three-dimensional reconstruction method according to an embodiment of the invention;
fig. 2 is a structural diagram of a multitasking neural network model in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The embodiment of the application provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method and system, solves the technical problem that the existing method cannot reconstruct and dynamically update the whole three-dimensional model, and achieves minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
the development of minimally invasive surgery can be assisted by endoscope image three-dimensional reconstruction, better perception experience is brought to a doctor, and the surgery precision is improved. The existing endoscope image three-dimensional reconstruction method based on deep learning is limited to three-dimensional reconstruction (depth estimation) of a single image, and is less related to global three-dimensional reconstruction. Global three-dimensional reconstruction models most of the research has focused on non-rigid registration with preoperative CT/MRI three-dimensional models, and the method fails when there is no preoperative three-dimensional model. In order to solve the above problems, the method of the embodiment of the present invention is provided, so as to overcome the current situation that the existing endoscope image three-dimensional reconstruction system based on deep learning can only estimate the depth of field information of the current endoscope image, and cannot reconstruct and dynamically update the whole three-dimensional model, and meanwhile, the embodiment of the present invention does not need the support of the preoperative CT/MRI image, and realizes the unsupervised real-time dynamic global three-dimensional reconstruction of the endoscope image.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The embodiment of the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method, which comprises the following steps of S1-S5:
s1, acquiring an endoscope image;
s2, depth estimation is carried out on the current frame of the endoscope image based on a preset multitask neural network model, and the point cloud depth of the current frame is obtained;
s3, acquiring local point cloud based on the point cloud depth and the camera model;
s4, carrying out registration fusion on the local point clouds;
and S5, splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the passage of time, and visually displaying the global point cloud.
The embodiment of the invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on deep learning can only estimate the depth of field information of the current endoscope image and cannot reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.
The following describes each step in detail:
in step S1, an endoscopic image is acquired. The specific implementation process is as follows:
calibrating endoscope parameters by adopting Opencv and checkerboards to obtain endoscope camera internal parameters f x ,f y ,c x ,c y And external reference k 1 ,k 2 ,k 3 ,p 1 ,p 2 . Wherein k is 1 ,k 2 ,k 3 As a radial distortion parameter, p 1 And p 2 Is a tangential distortion parameter.
Shooting soft tissue images by using a calibrated endoscope, acquiring the endoscope images by adopting Opencv, and modifying the resolution of the endoscope images to meet the requirement of multitask neural network model input.
In step S2, depth estimation is performed on the current frame of the endoscopic image based on a preset multitask neural network model, and a point cloud depth of the current frame is obtained. The specific implementation process is as follows:
in the embodiment of the present invention, the process of constructing the preset multitask neural network model includes:
a1, acquiring and processing an endoscope image, comprising:
calibrating endoscope parameters by adopting Opencv and checkerboards to obtain endoscope camera internal parameters f x ,f y ,c x ,c y And external reference k 1 ,k 2 ,k 3 ,p 1 ,p 2 . Wherein k is 1 ,k 2 ,k 3 As a radial distortion parameter, p 1 And p 2 Is a tangential distortion parameter.
And shooting soft tissue images by using the endoscope calibrated by the robot support, acquiring the endoscope images by adopting Opencv, and modifying the resolution of the endoscope images to meet the model input. When the endoscope image is obtained, the corresponding camera pose of each frame of image is solved according to positive kinematics modeling of the robot. And calculating the pose conversion relation between every two frames of endoscope images.
The forward kinematics modeling is to adopt a motion equation of the robot, solve the pose of the end effector according to relevant state parameters of each joint of the robot, and then convert the pose of the end effector to the pose of the endoscope camera to finally solve the pose of the endoscope camera. A common forward kinematics solution method is a D-H parametric method. This process is common knowledge to those skilled in the art and will not be described further herein.
And A2, inputting the processed endoscope image into an initial neural network model, and training the initial neural network model to obtain the multitask neural network model. The method specifically comprises the following steps:
the initial neural network model inputs any two frames of endoscope images with more matching points and outputs a camera pose transformation vector between the two frames of endoscope images, including rotation (r) x r y r z ) And translation (t) x t y t z ) And the inverse of the depth information of each pixel of the following frame image.
The structure of the multitask neural network model is shown in fig. 2, and comprises: three types of convolutional block layers and one layer of global pooling layer, wherein a convolutional block represents a series of blocks composed of convolutional layers.
The two frames of endoscope images are respectively subjected to feature extraction suitable for interframe motion vector estimation through a convolution block II after splicing of the obtained feature images, interframe motion vector estimation features are obtained, and then camera motion vectors between the two frames of endoscope images are obtained through a global pooling layer; meanwhile, the spliced feature map is subjected to feature extraction suitable for solving the depth information of the second endoscopic image through the convolution block III to obtain a depth information feature, the multi-scale feature map output when the second endoscopic image is subjected to the rolling block I operation is connected with the depth information feature layer jump generated by the rolling block III operation, and finally the multi-scale parallax map (namely a matrix formed by the reciprocals of the depths) suitable for the second endoscopic image is output.
In the training process, firstly, the network model of the branch of the camera interframe motion estimation is trained, after the training, the weight of the public part is fixed, and the network weight of the depth estimation part is further trained. The advantage of this is to ensure that the model can achieve better effect under the condition of non-uniform dimension.
In the training process of the model, a self-supervision mode is adopted for training. The loss function is as follows:
camera inter-frame motion estimation loss:
Figure GDA0003703887200000121
wherein the content of the first and second substances,
Figure GDA0003703887200000122
a camera translation vector representing a neural network model prediction;
Figure GDA0003703887200000123
a camera rotation vector representing a neural network model prediction; δ 1 and δ 2 are parameters of the Huber function applied to the translation vector and the rotation vector, respectively;
among them, the Huber function
Figure GDA0003703887200000124
The calculation is as follows:
Figure GDA0003703887200000125
wherein y and
Figure GDA0003703887200000126
representing the two numbers that need to be compared.
The image restoration loss comprises pixel error loss and similarity error loss, and specifically comprises the following steps:
loss of pixel error:
Figure GDA0003703887200000127
wherein: m and N represent the pixel width and height values of the image, respectively. I (I, j) represents the true pixel value of the second image at coordinate (I, j),
Figure GDA0003703887200000128
the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm are shown, and θ is the Huber function parameter representing the pixel error.
Loss of similarity error:
Figure GDA0003703887200000131
wherein: sim represents an image similarity evaluation function, SSIM, PSNR, etc. may be used, and its value is between 0 and 1. I denotes the actual second graph,
Figure GDA0003703887200000132
a second graph reconstructed by the algorithm is shown.
Depth smoothing error loss:
Figure GDA0003703887200000133
wherein: d (i, j) represents the inverse of the estimated depth of the second map at coordinate (i, j).
The final overall loss function is a weighted sum of the above loss functions. The weight distribution of each part is obtained by neural network hyper-parameter learning.
It should be noted that after the model is trained, the model can be used repeatedly without repeated training.
In the actual use process, endoscope image data in the use process can be collected, the model is updated regularly, and the precision of the model is guaranteed.
And (3) performing depth estimation on the current frame by adopting a trained multi-task neural network model and taking a time window as M frames, wherein M is usually 3. (i.e. assuming that the current frame is i, the three frames i-3, i-2 and i-1 are respectively combined with the ith frame to enter a neural network model to calculate the depth of the ith frame, and finally, the average value is calculated to be the depth of the ith frame.)
In step S3, a local point cloud is acquired based on the point cloud depth and the camera model. The specific implementation process is as follows:
first, distortion correction is performed on an endoscope image based on camera external parameters. For undistorted image pixel coordinates (u v), the pixel value reduction steps are as follows:
for the normalized plane of the undistorted image, there are:
Figure GDA0003703887200000141
where (x 'y') represents the corresponding coordinates of the undistorted image pixel coordinates (u v) on the normalized plane.
And the coordinates are distorted, the coordinates on the normalized plane are (x "y"), with:
Figure GDA0003703887200000142
wherein r is 2 =x′ 2 +y′ 2
The distorted normalized plane coordinates are projected onto a pixel plane to obtain pixel coordinates as follows:
Figure GDA0003703887200000143
therefore, the pixel value of the undistorted image coordinate (u v) is the distorted image coordinate (u) d v d ) The corresponding pixel value. And u d And v d Usually a non-integer, in which case (u) can be obtained by bilinear interpolation d v d ) The corresponding pixel value.
Bilinear interpolation is as follows:
if u d And v d Are all non-integer. Then get u 1 <u d <u 1 +1,v 1 <v d <v 1 +1.u 1 And v 1 Are all integers. Then there are:
I(u d ,v d )=(v 1 +1-v d )I(u d ,v 1 )+(v d -v 1 )I(u d ,v 1 +1)
wherein the content of the first and second substances,
I(u d ,v 1 )=(u 1 +1-u d )I(u 1 ,v 1 )+(u d -u 1 )I(u 1 +1,v 1 )
I(u d ,v 1 +1)=(u 1 +1-u d )I(u 1 ,v 1 +1)+(u d -u 1 )I(u 1 +1,v 1 +1)
then solving x and y of the point cloud according to the camera model;
the solving formula is as follows:
x=z(u-c x )/f x
y=z(v-c y )/f y
from the point cloud x and y, the depth in step S2 is taken as z. And obtaining local point cloud under the endoscope camera coordinate system corresponding to the current frame.
In step S4, registration fusion is performed on the plurality of local point clouds. The specific implementation process is as follows:
in the specific implementation process, before the registration fusion is performed, the local point cloud is required to be filtered, and in the embodiment of the invention, the outlier and the noise data of the local point cloud are filtered by adopting a filtering algorithm.
Registration fusion is divided into two cases:
in the first situation, an endoscope is supported by a robot, the camera pose corresponding to each frame of image is obtained, and the motion information between frames of the endoscope camera is obtained through pose conversion.
And in the second case, the endoscope is not supported by a robot, and the interframe motion information is obtained through a multitask neural network model.
And taking the interframe motion information as an initial value of point cloud registration, and then performing registration fusion on the local point clouds by adopting a coherent point drift algorithm suitable for flexible point cloud registration.
In step S5, the registered and fused local point clouds are spliced to form a global point cloud that is flexibly transformed over time, and the global point cloud is visually displayed. The specific implementation process is as follows:
and splicing the point clouds by adopting a dynamic updating mechanism, and performing visual display on the global point clouds by adopting PCL (polycaprolactone), Open3D, Chai3D and other libraries to form the global point clouds flexibly transformed along with the time.
Based on the same inventive concept, the embodiment of the invention also provides an endoscope image three-dimensional reconstruction system facing minimally invasive surgery space perception, which comprises:
the acquisition module is used for acquiring an endoscope image;
the depth estimation module is used for carrying out depth estimation on a current frame of the endoscope image based on a preset multitask neural network model to obtain the point cloud depth of the current frame;
the local point cloud obtaining module is used for obtaining a local point cloud based on the point cloud depth and the camera model;
the registration fusion module is used for performing registration fusion on the local point clouds;
and the global point cloud generating module is used for splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud.
It can be understood that the endoscope image three-dimensional reconstruction system facing minimally invasive surgery space sensing provided by the embodiment of the invention corresponds to the endoscope image three-dimensional reconstruction method facing minimally invasive surgery space sensing provided by the invention, and explanations, examples, beneficial effects and other parts of relevant contents can refer to corresponding parts in the endoscope image three-dimensional reconstruction method facing minimally invasive surgery space sensing, and are not repeated here.
Based on the same inventive concept, the embodiment of the invention also provides a computer-readable storage medium for storing program codes, wherein the program codes are used for executing the endoscopic image three-dimensional reconstruction method facing the minimally invasive surgery space perception.
Based on the same inventive concept, an embodiment of the present invention further provides an electronic device, where the electronic device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the endoscopic image three-dimensional reconstruction method facing minimally invasive surgery space perception according to the instructions in the program code.
In summary, compared with the prior art, the method has the following beneficial effects:
1. the embodiment of the invention overcomes the current situation that the existing endoscope image three-dimensional reconstruction method based on deep learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.
2. The training data of the multitask neural network model of the embodiment of the invention only needs the robot to support the endoscope to acquire the endoscope image data and the camera pose data, and does not need depth information. The data is easy to obtain and the applicability is strong.
3. The embodiment of the invention designs the multitask neural network model, and one network model can restore the depth of field information and the camera inter-frame motion information.
4. The embodiment of the invention can realize the unsupervised real-time dynamic global three-dimensional reconstruction without the support of the CT/MRI image before operation.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A minimally invasive surgery space perception-oriented endoscopic image three-dimensional reconstruction method is characterized by comprising the following steps:
s1, acquiring an endoscope image;
s2, depth estimation is carried out on the current frame of the endoscope image based on a preset multitask neural network model, and the point cloud depth of the current frame is obtained;
s3, acquiring local point cloud based on the point cloud depth and a camera model;
s4, carrying out registration fusion on the local point clouds;
s5, splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the passage of time, and carrying out visual display on the global point cloud;
wherein the content of the first and second substances,
the preset multitask neural network model comprises the following steps: the method comprises three types of convolution blocks and a global pooling layer, wherein the three types of convolution blocks comprise a convolution block I, a convolution block II and a convolution block III, and the process of processing the endoscope image by the multitask neural network model comprises the following steps:
extracting a feature map of an endoscopic image through two convolution blocks and a pair of two frames of endoscopic images to obtain a first feature map and a second feature map, wherein network parameter weights of the two convolution blocks are shared;
splicing the first characteristic diagram and the second characteristic diagram, and extracting the characteristics of the spliced characteristic diagrams through the two pairs of the rolling blocks to obtain the inter-frame motion vector estimation characteristics;
pooling the inter-frame motion vector estimation characteristics through the global pooling layer to obtain a camera motion vector between two frames of endoscope images;
adjusting feature extraction is carried out on the spliced feature map through the volume block III to obtain depth information features; the second feature map is connected with the depth information feature layer jump and outputs a multi-scale parallax map suitable for a second endoscopic image;
the training process of the preset multitask neural network model comprises the following steps:
acquiring and processing an endoscope image;
inputting the processed endoscope image into an initial neural network model, and training the initial neural network model in a self-supervision mode to obtain a multitask neural network model;
wherein, the loss function in the training process comprises:
camera inter-frame motion estimation loss:
Figure FDA0003703887190000021
wherein the content of the first and second substances,
Figure FDA0003703887190000022
a camera translation vector representing a neural network model prediction;
Figure FDA0003703887190000023
a camera rotation vector representing a neural network model prediction; δ 1 and δ 2 are parameters of the Huber function applied to the translation vector and the rotation vector, respectively;
among them, the Huber function
Figure FDA0003703887190000024
The calculation is as follows:
Figure FDA0003703887190000025
wherein y and
Figure FDA0003703887190000026
two numbers to be compared are indicated;
the image restoration loss comprises pixel error loss and similarity error loss, and specifically comprises the following steps:
loss of pixel error:
Figure FDA0003703887190000027
wherein: m and N represent the pixel width and height values of the image, respectively; i (I, j) represents the true pixel value of the second graph at coordinate (I, j);
Figure FDA0003703887190000031
representing the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm; θ is a parameter of the Huber function representing the pixel error;
loss of similarity error:
Figure FDA0003703887190000032
wherein: sim represents an image similarity evaluation function, and the value of the Sim is between 0 and 1; i represents the second real graph;
Figure FDA0003703887190000033
representing a second graph reconstructed by the algorithm;
depth smoothing error loss:
Figure FDA0003703887190000034
wherein: d (i, j) represents the inverse of the estimated depth of the second map at coordinate (i, j);
the total loss function is the weighted sum of the loss functions, and the weight distribution of each part is obtained by the neural network hyper-parameter learning;
the S3 includes:
distortion correction is performed on an endoscope image based on camera external parameters, and a pixel value restoration step for undistorted image pixel coordinates (u v) includes:
for the normalized plane of the undistorted image, there are:
Figure FDA0003703887190000035
wherein (x 'y') represents the corresponding coordinates of the undistorted image pixel coordinates (u v) on the normalized plane; c. C x 、c y 、f x And f y Respectively representInternal reference of the endoscopic camera;
after the coordinates are distorted, the coordinates on the normalized plane are (x "y"), with:
Figure FDA0003703887190000041
wherein r is 2 =x′ 2 +y′ 2 ;k 1 And k 2 A parameter that is radial distortion; p is a radical of 1 And p 2 Is a parameter of the tangential distortion;
projecting the distorted normalized plane coordinates onto a pixel plane to obtain pixel coordinates as follows:
Figure FDA0003703887190000042
therefore, the pixel value of the undistorted image coordinate (u v) is the distorted image coordinate (u) d v d ) The corresponding pixel value; u. of d And v d Is a non-integer, is obtained by bilinear interpolation d v d ) The corresponding pixel value;
bilinear interpolation is as follows:
if u d And v d If all are non-integers, then u is taken 1 <u d <u 1 +1,v 1 <v d <v 1 + 1; if u 1 And v 1 Are integers, then:
I(u d ,v d )=(v 1 +1-v d )I(u d ,v 1 )+(v d -v 1 )I(u d ,v 1 +1)
wherein the content of the first and second substances,
I(u d ,v 1 )=(u 1 +1-u d )I(u 1 ,v 1 )+(u d -u 1 )I(u 1 +1,v 1 )
I(u d ,v 1 +1)=(u 1 +1-u d )I(u 1 ,v 1 +1)+(u d -u 1 )I(u 1 +1,v 1 +1)
solving x and y of the point cloud according to the coordinates and the camera model after the pixel value reduction, specifically:
x=z(u-c x )/f x
y=z(v-c y )/f y
and taking the depth of the point cloud in the step S2 as z according to the x and y of the point cloud, and obtaining a local point cloud under the endoscope camera coordinate system corresponding to the current frame.
2. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S4 includes:
if the endoscope is supported by the robot, the camera pose corresponding to each frame of image is obtained, and the inter-frame motion information of the endoscope camera is obtained through pose conversion;
and (3) taking the interframe motion information as an initial value of point cloud registration, and performing registration fusion on a plurality of local point clouds by adopting a coherent point drift algorithm.
3. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S4 further comprises:
if the endoscope is not supported by the robot, acquiring interframe motion information through a multitask neural network model;
and (3) taking the interframe motion information as an initial value of point cloud registration, and performing registration fusion on a plurality of local point clouds by adopting a coherent point drift algorithm.
4. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S5 includes:
and splicing the registered and fused local point clouds by adopting a dynamic updating mechanism to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud by adopting a three-dimensional data processing library.
5. An endoscopic image three-dimensional reconstruction system facing minimally invasive surgery space perception, which is used for executing the method of any one of claims 1-4;
the method comprises the following steps:
the acquisition module is used for acquiring an endoscope image;
the depth estimation module is used for carrying out depth estimation on a current frame of the endoscope image based on a preset multitask neural network model to obtain the point cloud depth of the current frame;
the local point cloud obtaining module is used for obtaining a local point cloud based on the point cloud depth and the camera model;
the registration fusion module is used for performing registration fusion on the local point clouds;
and the global point cloud generating module is used for splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing the method of any of claims 1 to 4.
7. An electronic device, comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any one of claims 1-4 according to instructions in the program code.
CN202110106321.XA 2021-01-26 2021-01-26 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception Active CN112802185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110106321.XA CN112802185B (en) 2021-01-26 2021-01-26 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110106321.XA CN112802185B (en) 2021-01-26 2021-01-26 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception

Publications (2)

Publication Number Publication Date
CN112802185A CN112802185A (en) 2021-05-14
CN112802185B true CN112802185B (en) 2022-08-02

Family

ID=75811926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110106321.XA Active CN112802185B (en) 2021-01-26 2021-01-26 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception

Country Status (1)

Country Link
CN (1) CN112802185B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435573B (en) * 2021-06-07 2022-04-29 华中科技大学 Method for establishing parallax prediction model of endoscope image and depth estimation method
CN114387153B (en) * 2021-12-13 2023-07-04 复旦大学 Visual field expanding method for intubation robot
CN113925441B (en) * 2021-12-17 2022-05-03 极限人工智能有限公司 Imaging method and imaging system based on endoscope
CN117671012B (en) * 2024-01-31 2024-04-30 临沂大学 Method, device and equipment for calculating absolute and relative pose of endoscope in operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448041A (en) * 2018-10-29 2019-03-08 重庆金山医疗器械有限公司 A kind of capsule endoscope 3-dimensional reconstruction method and system
CN111772792A (en) * 2020-08-05 2020-10-16 山东省肿瘤防治研究院(山东省肿瘤医院) Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning
WO2020259248A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Depth information-based pose determination method and device, medium, and electronic apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111386550A (en) * 2017-11-15 2020-07-07 谷歌有限责任公司 Unsupervised learning of image depth and ego-motion predictive neural networks
US10733745B2 (en) * 2019-01-07 2020-08-04 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for deriving a three-dimensional (3D) textured surface from endoscopic video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448041A (en) * 2018-10-29 2019-03-08 重庆金山医疗器械有限公司 A kind of capsule endoscope 3-dimensional reconstruction method and system
WO2020259248A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Depth information-based pose determination method and device, medium, and electronic apparatus
CN111772792A (en) * 2020-08-05 2020-10-16 山东省肿瘤防治研究院(山东省肿瘤医院) Endoscopic surgery navigation method, system and readable storage medium based on augmented reality and deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Diagnostic value of endoscopic ultrasonography for submucosal tumors of upper gastrointestinal tract;Wu, Airong等;《Chinese journal of gastrointestinal surgery》;20151130;第18卷(第11期);全文 *
交互式实时虚拟内窥镜系统中的关键技术;耿国华等;《计算机应用》;20021128(第11期);全文 *
基于RGB-D摄像机的室内三维彩色点云地图构建;赵矿军;《哈尔滨商业大学学报(自然科学版)》;20180215(第01期);全文 *
基于序列内窥镜视频图像的膀胱三维场景重建;衡怡伶等;《科学技术与工程》;20180928(第27期);全文 *

Also Published As

Publication number Publication date
CN112802185A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112802185B (en) Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
US20210158510A1 (en) Estimating object thickness with neural networks
JP5153620B2 (en) System for superimposing images related to a continuously guided endoscope
CN111161290B (en) Image segmentation model construction method, image segmentation method and image segmentation system
US20120253170A1 (en) Method and apparatus for generating medical image of body organ by using 3-d model
JP4885138B2 (en) Method and system for motion correction in a sequence of images
CN111080778B (en) Online three-dimensional reconstruction method of binocular endoscope soft tissue image
Wu et al. Three-dimensional modeling from endoscopic video using geometric constraints via feature positioning
US20220198693A1 (en) Image processing method, device and computer-readable storage medium
CN114842154B (en) Method and system for reconstructing three-dimensional image based on two-dimensional X-ray image
CN111063441A (en) Liver deformation prediction method and system and electronic equipment
CN116188677A (en) Three-dimensional reconstruction method, system and device for vascular intervention operation area
CN112261399B (en) Capsule endoscope image three-dimensional reconstruction method, electronic device and readable storage medium
CN114387392A (en) Method for reconstructing three-dimensional human body posture according to human shadow
CN112562070A (en) Craniosynostosis operation cutting coordinate generation system based on template matching
CN116993805A (en) Intraoperative residual organ volume estimation system oriented to operation planning assistance
CN113538335A (en) In-vivo relative positioning method and device of wireless capsule endoscope
Deligianni et al. Non-rigid 2d-3d registration with catheter tip em tracking for patient specific bronchoscope simulation
CN112150404B (en) Global-to-local non-rigid image registration method and device based on joint saliency map
JP2022052210A (en) Information processing device, information processing method, and program
CN113850710A (en) Cross-modal medical image accurate conversion method
Ahmad et al. 3D reconstruction of gastrointestinal regions using shape-from-focus
JP2010005109A (en) Image forming device, program, and image forming method
Bouattour et al. 4D reconstruction of coronary arteries from monoplane angiograms
CN115281584B (en) Flexible endoscope robot control system and flexible endoscope robot simulation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant