CN112802185A

CN112802185A - Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception

Info

Publication number: CN112802185A
Application number: CN202110106321.XA
Authority: CN
Inventors: 李霄剑; 李玲; 丁帅; 杨善林
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-14
Anticipated expiration: 2041-01-26
Also published as: CN112802185B

Abstract

The invention provides a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction method and system, and relates to the technical field of three-dimensional reconstruction. According to the method, an endoscope image is obtained, depth estimation is carried out on a current frame of the endoscope image based on a preset multitask neural network model, and point cloud depth of the current frame is obtained; acquiring a local point cloud based on the point cloud depth and the camera model; carrying out registration fusion on the plurality of local point clouds; and splicing the registered and fused local point clouds to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud. The invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on depth learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

Description

Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception

Technical Field

The invention relates to the technical field of three-dimensional reconstruction, in particular to a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method and system.

Background

Minimally invasive surgery refers to surgery performed using modern medical instruments such as endoscopes and related equipment. In the past decade, minimally invasive surgery has the advantages of small wound, light pain, less bleeding, quick recovery and the like, and becomes an important diagnosis and treatment means for multiple departments such as general surgery, urinary surgery, extracerebral surgery, extracardiac surgery and the like.

In minimally invasive surgery, it is difficult for a doctor to obtain comprehensive in-vivo environmental information due to a restriction of an endoscope angle of view. In addition, the displacement of organs before and during the operation, the operation in the operation may cause the lack of anatomical features, which brings challenges to the operation of locating, suturing, cutting and the like of the focus point in the operation and reduces the operation precision. The three-dimensional reconstruction of the in-vivo model can solve the problems and assist the development of minimally invasive surgery.

The existing endoscope image three-dimensional reconstruction method based on depth learning only can estimate the depth of field information of the current endoscope image and cannot reconstruct and dynamically update the whole three-dimensional model.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method and system, and solves the technical problem that the existing method cannot reconstruct and dynamically update the whole three-dimensional model.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method, which comprises the following steps:

s1, acquiring an endoscope image;

s2, depth estimation is carried out on the current frame of the endoscope image based on a preset multitask neural network model, and the point cloud depth of the current frame is obtained;

s3, acquiring local point cloud based on the point cloud depth and a camera model;

s4, carrying out registration fusion on the local point clouds;

and S5, splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the passage of time, and visually displaying the global point cloud.

Preferably, the preset multitasking neural network model comprises: the method comprises three types of convolution blocks and a global pooling layer, wherein the three types of convolution blocks comprise a convolution block I, a convolution block II and a convolution block III, and the process of processing the endoscope image by the multitask neural network model comprises the following steps:

extracting a feature map of an endoscopic image through two convolution blocks and a pair of two frames of endoscopic images to obtain a first feature map and a second feature map, wherein network parameter weights of the two convolution blocks are shared;

splicing the first characteristic diagram and the second characteristic diagram, and extracting the characteristics of the spliced characteristic diagrams through the two pairs of the rolling blocks to obtain the inter-frame motion vector estimation characteristics;

pooling the inter-frame motion vector estimation characteristics through the global pooling layer to obtain a camera motion vector between two frames of endoscope images;

adjusting feature extraction is carried out on the spliced feature map through the volume block III to obtain depth information features; and the second feature map is connected with the depth information feature layer jump and outputs a multi-scale parallax map suitable for the second endoscopic image.

Preferably, the training process of the preset multitask neural network model includes:

acquiring and processing an endoscope image;

inputting the processed endoscope image into an initial neural network model, and training the initial neural network model in a self-supervision mode to obtain a multitask neural network model;

wherein, the loss function in the training process comprises:

camera inter-frame motion estimation loss:

wherein the content of the first and second substances,

representing a camera translation vector preset by a neural network model;

representing a camera rotation vector preset by a neural network model; δ 1 and δ 2 are parameters of the Huber function applied to the translation vector and the rotation vector, respectively;

among them, the Huber function

The calculation is as follows:

wherein y and

two numbers to be compared are indicated;

the image restoration loss comprises pixel error loss and similarity error loss, and specifically comprises the following steps:

loss of pixel error:

wherein: m and N represent the pixel width and height values of the image, respectively; i (I, j) represents the true pixel value of the second graph at coordinate (I, j);

representing the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm; θ is a Huber function parameter representing pixel error;

loss of similarity error:

wherein: sim represents an image similarity evaluation function, and the value of the Sim is between 0 and 1; i represents the second real graph;

representing a second graph reconstructed by the algorithm;

depth smoothing error loss:

wherein: d (i, j) represents the inverse of the estimated depth of the second map at coordinate (i, j);

the total loss function is the weighted sum of the loss functions, and the weight distribution of each part is obtained through the neural network hyper-parameter learning.

Preferably, the S3 includes:

distortion correction is performed on an endoscope image based on camera external parameters, and a pixel value restoration step for undistorted image pixel coordinates (u v) includes:

for the normalized plane of the undistorted image, there are:

wherein (x 'y') represents the corresponding coordinates of the undistorted image pixel coordinates (u v) on the normalized plane;

after the coordinates are distorted, the coordinates on the normalized plane are (x "y"), with:

wherein r is²＝x′²+y′²

Projecting the distorted normalized plane coordinates onto a pixel plane to obtain pixel coordinates as follows:

therefore, the pixel value of the undistorted image coordinate (u v) is the distorted image coordinate (u)_d v_d) The corresponding pixel value; and u_dAnd v_dUsually a non-integer, and can be obtained by bilinear interpolation_d v_d) The corresponding pixel value;

bilinear interpolation is as follows:

if u_dAnd v_dIf all are non-integers, then u is taken₁<u_d<u₁+1,v₁<v_d<v₁+ 1; if u₁And v₁Are integers, then:

I(u_d,v_d)＝(v₁+1-v_d)I(u_d,v₁)+(v_d-v₁)I(u_d,v₁+1)

wherein the content of the first and second substances,

I(u_d,v₁)＝(u₁+1-u_d)I(u₁,v₁)+(u_d-u₁)I(u₁+1,v₁)

I(u_d,v₁+1)＝(u₁+1-u_d)I(u₁,v₁+1)+(u_d-u₁)I(u₁+1,v₁+1)；

solving x and y of the point cloud according to the coordinates and the camera model after the pixel value reduction, specifically:

x＝z(u-c_x)/f_x

y＝z(v-c_y)/f_y

and taking the depth of the point cloud in the step S2 as z according to the x and y of the point cloud, and obtaining a local point cloud under the endoscope camera coordinate system corresponding to the current frame.

Preferably, the S4 includes:

if the endoscope is supported by the robot, the camera pose corresponding to each frame of image is obtained, and the inter-frame motion information of the endoscope camera is obtained through pose conversion;

and (3) taking the interframe motion information as an initial value of point cloud registration, and performing registration fusion on a plurality of local point clouds by adopting a coherent point drift algorithm.

Preferably, the S4 further includes:

if the endoscope is not supported by the robot, acquiring interframe motion information through a multitask neural network model;

Preferably, the S5 includes:

and splicing the registered and fused local point clouds by adopting a dynamic updating mechanism to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud by adopting a three-dimensional data processing library.

The invention also provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction system, which is characterized by comprising the following components:

the acquisition module is used for acquiring an endoscope image;

the depth estimation module is used for carrying out depth estimation on a current frame of the endoscope image based on a preset multitask neural network model to obtain the point cloud depth of the current frame;

the local point cloud obtaining module is used for obtaining a local point cloud based on the point cloud depth and the camera model;

the registration fusion module is used for performing registration fusion on the local point clouds;

and the global point cloud generating module is used for splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud.

The invention also provides a computer readable storage medium for storing program code for performing the method of any of claims 1 to 7.

The present invention also provides an electronic device, comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of any one of claims 1 to 7 according to instructions in the program code.

(III) advantageous effects

The invention provides a minimally invasive surgery space perception oriented endoscope image three-dimensional reconstruction method and system. Compared with the prior art, the method has the following beneficial effects:

according to the method, an endoscope image is obtained, depth estimation is carried out on a current frame of the endoscope image based on a preset multitask neural network model, and point cloud depth of the current frame is obtained; acquiring a local point cloud based on the point cloud depth and the camera model; carrying out registration fusion on the plurality of local point clouds; and splicing the registered and fused local point clouds to form a global point cloud flexibly transformed along with the time, and visually displaying the global point cloud. The invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on depth learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a minimally invasive surgery space perception oriented endoscopic image three-dimensional reconstruction method according to an embodiment of the invention;

fig. 2 is a structural diagram of a multitasking neural network model in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application provides the endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception, solves the technical problem that the existing method cannot reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

the development of minimally invasive surgery can be assisted by endoscope image three-dimensional reconstruction, better perception experience is brought to a doctor, and the surgery precision is improved. The existing endoscope image three-dimensional reconstruction method based on depth learning is limited to three-dimensional reconstruction (depth estimation) of a single image, and is less related to global three-dimensional reconstruction. Global three-dimensional reconstruction models most of the research has focused on non-rigid registration with preoperative CT/MRI three-dimensional models, and the method fails when there is no preoperative three-dimensional model. In order to solve the above problems, the method of the embodiment of the present invention is provided, so as to overcome the current situation that the existing endoscope image three-dimensional reconstruction system based on deep learning can only estimate the depth of field information of the current endoscope image, and cannot reconstruct and dynamically update the whole three-dimensional model, and meanwhile, the embodiment of the present invention does not need the support of the preoperative CT/MRI image, and realizes the unsupervised real-time dynamic global three-dimensional reconstruction of the endoscope image.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The embodiment of the invention provides a minimally invasive surgery space perception-oriented endoscope image three-dimensional reconstruction method, which comprises the following steps of S1-S5:

s1, acquiring an endoscope image;

s3, acquiring local point cloud based on the point cloud depth and the camera model;

s4, carrying out registration fusion on the local point clouds;

The embodiment of the invention overcomes the technical problems that the existing endoscope image three-dimensional reconstruction method based on depth learning can only estimate the depth of field information of the current endoscope image and cannot reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

The following describes each step in detail:

in step S1, an endoscopic image is acquired. The specific implementation process is as follows:

calibrating endoscope parameters by adopting Opencv and checkerboards to obtain endoscope camera internal parameters f_x，f_y，c_x，c_yAnd external reference k₁，k₂，k₃，p₁，p₂. Wherein k is₁，k₂，k₃For radial distortion parameters, p1 and p2 are tangential distortion parameters.

Shooting soft tissue images by using a calibrated endoscope, acquiring the endoscope images by adopting Opencv, and modifying the resolution of the endoscope images to meet the requirement of multitask neural network model input.

In step S2, depth estimation is performed on the current frame of the endoscopic image based on a preset multitask neural network model, and a point cloud depth of the current frame is obtained. The specific implementation process is as follows:

in the embodiment of the present invention, the process of constructing the preset multitask neural network model includes:

a1, acquiring and processing an endoscope image, comprising:

calibrating endoscope parameters by adopting Opencv and checkerboards to obtain endoscope camera internal parameters f_x，f_y，c_x，c_yAnd external reference k₁，k₂，k₃，p₁，p₂. Wherein k is₁，k₂，k₃As a radial distortion parameter, p₁And p₂Is a tangential distortion parameter.

And shooting soft tissue images by using the endoscope calibrated by the robot support, acquiring the endoscope images by adopting Opencv, and modifying the resolution of the endoscope images to meet the model input. When the endoscope image is obtained, the corresponding camera pose of each frame of image is solved according to positive kinematics modeling of the robot. And calculating the pose conversion relation between every two frames of endoscope images.

The forward kinematics modeling is to adopt a motion equation of the robot, solve the pose of the end effector according to relevant state parameters of each joint of the robot, then convert the pose of the end effector to the pose of the endoscope camera, and finally obtain the pose of the endoscope camera. A common forward kinematics solution method is a D-H parametric method. This process is common knowledge to those skilled in the art and will not be described further herein.

And A2, inputting the processed endoscope image into an initial neural network model, and training the initial neural network model to obtain the multitask neural network model. The method specifically comprises the following steps:

the initial neural network model inputs any two frames of endoscope images with more matching points and outputs a camera pose transformation vector between the two frames of endoscope images, including rotation (r)_x r_y r_z) And translating (t_xt_y t_z) And the inverse of the depth information of each pixel of the following frame image.

The structure of the multitask neural network model is shown in fig. 2, and comprises: three types of convolutional block layers and one layer of global pooling layer, wherein a convolutional block represents a series of blocks composed of convolutional layers.

The two frames of endoscope images are respectively subjected to feature extraction suitable for interframe motion vector estimation through a convolution block II after splicing of the obtained feature images, interframe motion vector estimation features are obtained, and then camera motion vectors between the two frames of endoscope images are obtained through a global pooling layer; meanwhile, the spliced feature map is subjected to feature extraction suitable for solving the depth information of the second endoscopic image through the convolution block III to obtain a depth information feature, the multi-scale feature map output when the second endoscopic image is subjected to the rolling block I operation is connected with the depth information feature layer jump generated by the rolling block III operation, and finally the multi-scale parallax map (namely a matrix formed by the reciprocals of the depths) suitable for the second endoscopic image is output.

In the training process, firstly, the network model of the branch of the camera interframe motion estimation is trained, after the training, the weight of the public part is fixed, and the network weight of the depth estimation part is further trained. The advantage of this is to ensure that the model can achieve better effect under the condition of non-uniform dimension.

In the training process of the model, a self-supervision mode is adopted for training. The loss function is as follows:

camera inter-frame motion estimation loss:

wherein the content of the first and second substances,

representing neural network modelsSetting a camera translation vector;

among them, the Huber function

The calculation is as follows:

wherein y and

representing the two numbers that need to be compared.

loss of pixel error:

wherein: m and N represent the pixel width and height values of the image, respectively. I (I, j) represents the true pixel value of the second image at coordinate (I, j),

the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm are shown, and θ is the Huber function parameter representing the pixel error.

Loss of similarity error:

wherein: sim represents an image similarity evaluation function, and SSIM, PSNR, etc. may be used, and its value is 0-1. I denotes the actual second graph,

a second graph reconstructed by the algorithm is shown.

Depth smoothing error loss:

wherein: d (i, j) represents the inverse of the estimated depth of the second map at coordinate (i, j).

The final overall loss function is a weighted sum of the above loss functions. The weight distribution of each part is obtained by neural network hyper-parameter learning.

It should be noted that after the model is trained, the model can be used repeatedly without repeated training.

In the actual use process, endoscope image data in the use process can be collected, the model is updated regularly, and the precision of the model is guaranteed.

And (3) performing depth estimation on the current frame by adopting a trained multi-task neural network model and taking a time window as M frames, wherein M is usually 3. (i.e. assuming that the current frame is i, the three frames i-3, i-2 and i-1 are respectively combined with the ith frame to enter a neural network model to calculate the depth of the ith frame, and finally, the average value is calculated to be the depth of the ith frame.)

In step S3, a local point cloud is acquired based on the point cloud depth and the camera model. The specific implementation process is as follows:

first, distortion correction is performed on an endoscope image based on camera external parameters. For undistorted image pixel coordinates (u v), the pixel value reduction steps are as follows:

for the normalized plane of the undistorted image, there are:

where (x 'y') represents the corresponding coordinates of the undistorted image pixel coordinates (u v) on the normalized plane.

And the coordinates are distorted, the coordinates on the normalized plane are (x "y"), with:

wherein r is²＝x′²+y′²。

The distorted normalized plane coordinates are projected onto a pixel plane to obtain pixel coordinates as follows:

therefore, the pixel value of the undistorted image coordinate (u v) is the distorted image coordinate (u)_d v_d) The corresponding pixel value. And u_dAnd v_dUsually a non-integer, in which case (u) can be obtained by bilinear interpolation_d v_d) The corresponding pixel value.

Bilinear interpolation is as follows:

if u_dAnd v_dAre all non-integer. Then get u₁<u_d<u₁+1,v₁<v_d<v₁+1.u₁And v₁Are all integers. Then there are:

I(u_d,v_d)＝(v₁+1-v_d)I(u_d,v₁)+(v_d-v₁)I(u_d,v₁+1)

wherein the content of the first and second substances,

I(u_d,v₁)＝(u₁+1-u_d)I(u₁,v₁)+(u_d-u₁)I(u₁+1,v₁)

I(u_d,v₁+1)＝(u₁+1-u_d)I(u₁,v₁+1)+(u_d-u₁)I(u₁+1,v₁+1)

then solving x and y of the point cloud according to the camera model;

the solving formula is as follows:

x＝z(u-c_x)/f_x

y＝z(v-c_y)/f_y

from the point cloud x and y, the depth in step S2 is taken as z. And obtaining local point cloud under the endoscope camera coordinate system corresponding to the current frame.

In step S4, registration fusion is performed on the plurality of local point clouds. The specific implementation process is as follows:

in the specific implementation process, before the registration fusion is performed, the local point cloud is required to be filtered, and in the embodiment of the invention, the outlier and the noise data of the local point cloud are filtered by adopting a filtering algorithm.

Registration fusion is divided into two cases:

in the first situation, an endoscope is supported by a robot, the camera pose corresponding to each frame of image is obtained, and the motion information between frames of the endoscope camera is obtained through pose conversion.

And in the second case, the endoscope is not supported by a robot, and the interframe motion information is obtained through a multitask neural network model.

And taking the interframe motion information as an initial value of point cloud registration, and then performing registration fusion on the local point clouds by adopting a coherent point drift algorithm suitable for flexible point cloud registration.

In step S5, the registered and fused local point clouds are spliced to form a global point cloud that is flexibly transformed over time, and the global point cloud is visually displayed. The specific implementation process is as follows:

and splicing the point clouds by adopting a dynamic updating mechanism, and performing visual display on the global point clouds by adopting PCL (polycaprolactone), Open3D, Chai3D and other libraries to form the global point clouds flexibly transformed along with the time.

Based on the same inventive concept, the embodiment of the invention also provides an endoscope image three-dimensional reconstruction system facing minimally invasive surgery space perception, which comprises:

the acquisition module is used for acquiring an endoscope image;

It can be understood that the endoscope image three-dimensional reconstruction system facing minimally invasive surgery space sensing provided by the embodiment of the invention corresponds to the endoscope image three-dimensional reconstruction method facing minimally invasive surgery space sensing provided by the invention, and explanations, examples, beneficial effects and other parts of relevant contents can refer to corresponding parts in the endoscope image three-dimensional reconstruction method facing minimally invasive surgery space sensing, and are not repeated here.

Based on the same inventive concept, the embodiment of the invention also provides a computer-readable storage medium for storing program codes, wherein the program codes are used for executing the endoscopic image three-dimensional reconstruction method facing the minimally invasive surgery space perception.

Based on the same inventive concept, an embodiment of the present invention further provides an electronic device, where the electronic device includes a processor and a memory:

the processor is used for executing the endoscopic image three-dimensional reconstruction method facing minimally invasive surgery space perception according to the instructions in the program code.

In summary, compared with the prior art, the method has the following beneficial effects:

1. the embodiment of the invention overcomes the current situation that the existing endoscope image three-dimensional reconstruction method based on depth learning can only estimate the depth of field information of the current endoscope image and can not reconstruct and dynamically update the whole three-dimensional model, and realizes the endoscope image three-dimensional reconstruction facing minimally invasive surgery space perception.

2. The training data of the multitask neural network model of the embodiment of the invention only needs the robot to support the endoscope to acquire the endoscope image data and the camera pose data, and does not need depth information. The data is easy to obtain and the applicability is strong.

3. The embodiment of the invention designs the multitask neural network model, and one network model can restore the depth of field information and the camera inter-frame motion information.

4. The embodiment of the invention can realize the unsupervised real-time dynamic global three-dimensional reconstruction without the support of the CT/MRI image before operation.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A minimally invasive surgery space perception-oriented endoscopic image three-dimensional reconstruction method is characterized by comprising the following steps:

s1, acquiring an endoscope image;

s4, carrying out registration fusion on the local point clouds;

and S5, splicing the local point clouds after registration and fusion to form a global point cloud flexibly transformed along with the passage of time, and carrying out visual display on the global point cloud.

2. The minimally invasive surgery space perception-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein the preset multitask neural network model comprises the following steps: the method comprises three types of convolution blocks and a global pooling layer, wherein the three types of convolution blocks comprise a convolution block I, a convolution block II and a convolution block III, and the process of processing the endoscope image by the multitask neural network model comprises the following steps:

3. The minimally invasive surgery space perception-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein the training process of the preset multitask neural network model comprises the following steps:

acquiring and processing an endoscope image;

wherein, the loss function in the training process comprises:

camera inter-frame motion estimation loss:

wherein the content of the first and second substances,

representing a camera translation vector preset by a neural network model;

among them, the Huber function

The calculation is as follows:

wherein y and

indicating a need to proceedTwo numbers compared;

loss of pixel error:

representing the pixel values at coordinates (i, j) of the second image reconstructed by the algorithm; θ is a parameter of the Huber function representing the pixel error;

loss of similarity error:

representing a second graph reconstructed by the algorithm;

depth smoothing error loss:

4. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S3 includes:

for the normalized plane of the undistorted image, there are:

wherein r is²＝x′²+y′²

bilinear interpolation is as follows:

if u_dAnd v_dIf all are non-integers, then u is taken₁＜u_d＜u₁+1，v₁＜v_d＜v₁+ 1; if u₁And v₁Are all integers, and are not limited to the specific figure,then there are:

I(u_d，v_d)＝(v₁+1-v_d)I(u_d，v₁)+(v_d-v₁)I(u_d，v₁+1)

wherein the content of the first and second substances,

I(u_d，v₁)＝(u₁+1-u_d)I(u₁，v₁)+(u_d-u₁)I(u₁+1，v₁)

I(u_d，v₁+1)＝(u₁+1-u_d)I(u₁，v₁+1)+(u_d-u₁)I(u₁+1，v₁+1)；

x＝z(u-c_x)/f_x

y＝z(v-c_y)/f_y

5. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S4 includes:

6. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S4 further comprises:

7. The minimally invasive surgery spatially-aware-oriented endoscopic image three-dimensional reconstruction method according to claim 1, wherein said S5 includes:

8. An endoscope image three-dimensional reconstruction system facing minimally invasive surgery space perception is characterized by comprising:

the acquisition module is used for acquiring an endoscope image;

9. A computer-readable storage medium for storing program code for performing the method of any one of claims 1-7.

10. An electronic device, comprising a processor and a memory: