CN117671012B

CN117671012B - Method, device and equipment for calculating absolute and relative pose of endoscope in operation

Info

Publication number: CN117671012B
Application number: CN202410129170.3A
Authority: CN
Inventors: 宋华建; 王越; 郭明; 张安彩; 邱建龙
Original assignee: Linyi University
Current assignee: Linyi University
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-04-30
Anticipated expiration: 2044-01-31
Also published as: CN117671012A

Abstract

The invention discloses a method, a device and equipment for calculating absolute and relative pose of an endoscope in operation, and belongs to the technical field of computer vision and image processing. The method comprises the following steps: constructing a virtual data set and a real world-simulated data set in an RMIS scene; carrying out data preprocessing on the data set; establishing an endoscope pose estimation model of a single encoder-double decoder; training and evaluating the endoscope pose estimation model to obtain a trained endoscope pose estimation model; image data in a patient body captured by an endoscope system in the RMIS process is acquired in real time and is input into a trained endoscope pose estimation model to obtain absolute endoscope pose data corresponding to the real-time image; based on the absolute pose data of the endoscope, the real-time relative pose of the endoscope is calculated. The invention improves the automation level in the RMIS scene and the accuracy of the surgical robot in response to the surgeon, and ensures the safety of the patient in the robot-assisted minimally invasive surgery.

Description

Method, device and equipment for calculating absolute and relative pose of endoscope in operation

Technical Field

The invention relates to a method, a device and equipment for calculating absolute and relative pose of an endoscope in operation, belonging to the technical field of machine vision and computer vision.

Background

With the rapid development of robotics, robot-assisted minimally invasive Surgery (Robot-ASSISTED MINIMAL INVASIVE Surgery, abbreviated as "RMIS") technology combining robotics with minimally invasive Surgery has been widely used. RMIS is an operation performed by using a robot, an endoscope, a computer, and other devices, and has the advantages of small wound, light pain, quick recovery, and the like.

In RMIS such procedures where an endoscope is to be inserted into a patient, blood stains, splatter during operation, etc., often cause a portion or all of the lens area of the endoscope to be blocked, resulting in an unclear image captured by the endoscope. In such a case, the endoscope needs to be immediately withdrawn from the surgical robot for lens cleaning and reinserted into the patient, resulting in a change in the relative position and posture (simply referred to as "pose") relationship between the endoscope coordinate system and the robot coordinate system holding the endoscope and becoming "unknown". From the safety point of view, in order to obtain the pose relationship again accurately, one of the vital links is the determination of the relative pose of the endoscope during operation, that is, the determination of the relative rotation and translation matrix of the endoscope coordinate system in two adjacent movements of the mechanical arm holding the endoscope. In industrial machine applications, this is often aided by spatial markers (e.g., checkerboard calibration), etc.

However, in the RMIS scene, due to the aseptic environment requirement of the surgery and the limitation of the narrow internal space of the patient, the markers cannot be placed, so that the determination of the pose of the endoscope in the RMIS scene has a certain challenge, and therefore, a technical measure capable of effectively acquiring the absolute and relative pose of the endoscope in the surgery is needed.

Disclosure of Invention

In order to solve the problems, the invention provides a method, a device and equipment for calculating absolute and relative pose of an intraoperative endoscope, which can achieve the purpose of acquiring the absolute and relative pose of the intraoperative endoscope in an RMIS scene so as to be better applied to a robot-assisted minimally invasive surgery (RMIS) scene.

The technical scheme adopted for solving the technical problems is as follows:

In a first aspect, an embodiment of the present invention provides a method for calculating absolute and relative pose of an intraoperative endoscope, including the following steps:

Collecting data when a virtual camera simulates the action of an endoscope, and constructing a virtual data set in an RMIS scene;

Acquiring data when the simulated operation drives the endoscope to move based on the pose of the endoscope coordinate system relative to the sensor coordinate system by an external calibration technology, and constructing a real world-simulated data set in an RMIS scene;

Performing data preprocessing on the virtual data set and the real world-simulation data set, dividing the preprocessed virtual data set into a first training set and a first test set, and dividing the preprocessed real world-simulation data set into a second training set and a second test set;

Establishing an endoscope pose estimation model of a single encoder-double decoder based on an encoder-decoder architecture, wherein the endoscope pose estimation model comprises an encoder, a feature jump connector, a semantic segmentation decoder and a pose estimation decoder; the encoder is used for extracting all levels of feature graphs of the image data in the RMIS process, wherein the all levels of feature graphs contain information of different abstraction levels of the image; the feature jump connector is used for sending each level of feature images extracted by the encoder to the semantic segmentation decoder; the semantic segmentation decoder is used for learning the capability of recovering image details from the feature map and providing implicit geometric constraints for the pose estimation decoder; the pose estimation decoder is used for regression analysis of absolute pose of the endoscope from the feature map and outputting absolute pose of the endoscope corresponding to the image in the RMIS process;

Training and evaluating the endoscope pose estimation model by adopting the preprocessed virtual data set and the real world-simulated data set to obtain a trained endoscope pose estimation model;

acquiring in-vivo image data of a patient captured by an endoscope system in the RMIS process in real time, and inputting the in-vivo image data into a trained endoscope pose estimation model to obtain absolute endoscope pose data corresponding to the real-time image in the RMIS process;

Based on absolute pose data of the endoscope corresponding to the real-time image in the RMIS process, calculating the real-time relative pose of the endoscope in the RMIS process.

As a possible implementation manner of this embodiment, the collecting data when the virtual camera simulates the endoscope action, and constructing a virtual data set in an RMIS scene includes:

The three-dimensional model of the surgical robot surgical instrument and different biological tissue backgrounds is imported into three-dimensional rendering software, a virtual camera is arranged to simulate an endoscope and enable the endoscope to move according to a preset track and align to the surgical instrument three-dimensional model;

And continuously rendering and generating surgical instruments and biological tissue background images under the visual field of the virtual camera according to a certain frame rate, and simultaneously collecting surgical instrument segmentation masks and absolute pose marks of the virtual endoscope corresponding to each frame of images to form a virtual data set.

As a possible implementation manner of this embodiment, the acquiring data when the simulated operation drives the endoscope to move based on the pose of the endoscope coordinate system relative to the sensor coordinate system acquired by the external calibration technology, and constructing a real world-simulated data set in an RMIS scene includes:

outside an actual operation scene, an external sensor capable of automatically recording the position and the posture of a coordinate system of the endoscope relative to a world coordinate system is fixedly connected to the endoscope, and the position and the posture of the endoscope coordinate system relative to the sensor coordinate system in a fixedly connected state are obtained through an external calibration technology;

The endoscope and a sensor fixedly connected with the endoscope are clamped by a robot mechanical arm, the endoscope is driven to move through simulated operation, RMIS field image data collected by the endoscope and pose data of a sensor coordinate system relative to a world coordinate system are recorded, and the pose of the endoscope coordinate system relative to the world coordinate system is calculated by utilizing the pose of the endoscope coordinate system relative to the sensor coordinate system in a fixedly connected state obtained by external calibration;

A surgical instrument segmentation mask is marked on each piece of live image data by using a marking tool, and the live image data is combined with the pose of the corresponding endoscope coordinate system relative to the world coordinate system and the surgical instrument segmentation mask to construct a real world-simulated data set.

As a possible implementation manner of this embodiment, the performing data preprocessing on the virtual data set and the real world-analog data set, dividing the preprocessed virtual data set into a first training set and a first test set, and dividing the preprocessed real world-analog data set into a second training set and a second test set, includes:

Firstly, carrying out image size adjustment and image normalization processing on data in a virtual data set, then converting the pose labels of the virtual endoscope corresponding to all image samples in the virtual data set into a dual quaternion form, and finally dividing the virtual data set into a first training set and a first testing set, wherein each image in the virtual data set at least corresponds to the pose truth labels parameterized by the dual quaternion of a coordinate system of the virtual endoscope containing a surgical instrument segmentation mask and a shooting image relative to a world coordinate system;

firstly, adjusting the image size and carrying out image normalization processing on data in a real world-simulation data set, then converting the pose labels of the endoscopes corresponding to all image samples into dual quaternion forms, and finally dividing the real world-simulation data set into a second training set and a second testing set, wherein each image in the real world-simulation data set at least corresponds to the pose truth labels parameterized by dual quaternions relative to a world coordinate system by a coordinate system of the endoscope containing a surgical instrument segmentation mask and shooting the image.

As one possible implementation manner of the present embodiment, the building an endoscope pose estimation model of a single encoder-dual decoder based on an encoder-decoder architecture includes:

acquiring initial model parameters of an encoder (a characteristic extraction network), loading the initial model parameters to the encoder, and removing a full-connection classification layer of the encoder to form a full-convolution network;

Inputting a number of images in a training set of the selected dataset into an encoder;

Dividing all convolution layers into a plurality of layers according to the output size of each convolution block of the full convolution network, and performing coding operation on the training images to obtain a first feature map of each training image in each layer of convolution layer;

According to the number of the levels divided by the selected encoder, setting cascade sub-decoders with the same number and one-to-one correspondence between the output sizes and all levels of the encoder, and forming a semantic segmentation decoder by all sub-decoders and a prediction module, wherein the output size of a final-stage sub-decoder block of the semantic segmentation decoder is the same as the size of an original training image;

setting a high-dimensional full-connection layer and a full-connection layer with the same dimension as the pose vector length to form a pose estimation decoder;

the lowest level sub-decoder of the semantic segmentation decoder receives as input the first feature map with the smallest size, the next lowest level decoder receives as input the output of the lowest level sub-decoder, and so on, and finally the pose estimation decoder outputs the absolute pose vector of the endoscope.

As a possible implementation manner of this embodiment, the training and evaluating the endoscope pose estimation model by using the preprocessed virtual data set and the real world-simulated data set to obtain a trained endoscope pose estimation model includes:

Based on a first training set, respectively setting a loss function for a semantic segmentation decoder and a pose estimation decoder of an endoscope pose estimation model, taking weighted combination of the two loss functions as a total loss function, substituting an output semantic segmentation result of the model, a pose estimation result and corresponding labels into the model in sequence, calculating a loss value, performing pre-training of the endoscope pose estimation model by adopting an optimizer, updating weights until the endoscope pose estimation model converges, and storing the weights of the whole network model after the pre-training converges;

Loading the stored pre-trained and converged overall network model weight for the endoscope pose estimation model, inputting image samples in a first test set, obtaining endoscope pose estimation values corresponding to the samples, measuring errors of the endoscope pose estimation values and corresponding pose true values, and evaluating the effect of the endoscope pose estimation model on virtual endoscope pose estimation;

Based on a second training set, respectively setting a loss function for a semantic segmentation decoder and a pose estimation decoder of the endoscope pose estimation model, carrying out weighted combination on the two loss functions to obtain total loss, substituting an output semantic segmentation result of the model, a pose estimation result and corresponding labels into the model in sequence, calculating a loss value, carrying out fine tuning training of the endoscope pose estimation model by adopting an optimizer, updating weights until the endoscope pose estimation model converges, and storing the weights of the whole network model after the fine tuning training converges;

And loading the model weight after the fine tuning training for the endoscope pose estimation model, inputting the image sample in the second test set, obtaining the endoscope pose estimation value corresponding to each sample, measuring the error between the endoscope pose estimation value and the ground truth value of the corresponding pose, and evaluating the pose estimation effect of the endoscope pose estimation model.

As a possible implementation manner of this embodiment, the loss function L ₁ of the semantic segmentation decoder is:

where n is the number of samples, C is the class cross entropy loss, I and Representing a predicted value and a corresponding labeling value of each pixel in the output-labeling image pair, wherein alpha is a weight scalar;

The loss function L ₂ of the pose estimation decoder is:

wherein p is the pose estimation value, The pose is true;

The total loss function is:

Wherein, Is a weight scalar.

In a second aspect, an embodiment of the present invention provides an apparatus for calculating absolute and relative pose of an intra-operative endoscope, including:

The virtual data set construction module is used for acquiring data when the virtual camera simulates the action of the endoscope and constructing a virtual data set in an RMIS scene;

The real world-simulated data set construction module is used for acquiring the pose of the endoscope coordinate system relative to the sensor coordinate system based on an external calibration technology, acquiring data when the simulated operation drives the endoscope to move, and constructing a real world-simulated data set in an RMIS scene;

the data preprocessing module is used for preprocessing the data of the virtual data set and the real world-simulation data set, dividing the preprocessed virtual data set into a first training set and a first test set, and dividing the preprocessed real world-simulation data set into a second training set and a second test set;

An endoscope pose estimation model building module for building an endoscope pose estimation model of a single encoder-double decoder based on an encoder-decoder architecture, wherein the endoscope pose estimation model comprises an encoder, a feature jump connector, a semantic segmentation decoder and a pose estimation decoder; the encoder is used for extracting all levels of feature graphs of the image data in the RMIS process, wherein the all levels of feature graphs contain information of different abstraction levels of the image; the feature jump connector is used for sending each level of feature images extracted by the encoder to the semantic segmentation decoder; the semantic segmentation decoder is used for learning the capability of recovering image details from the feature map and providing implicit geometric constraints for the pose estimation decoder; the pose estimation decoder is used for regression analysis of absolute pose of the endoscope from the feature map and outputting absolute pose of the endoscope corresponding to the image in the RMIS process;

The model training module is used for training and evaluating the endoscope pose estimation model by adopting the preprocessed virtual data set and the real world-simulation data set to obtain a trained endoscope pose estimation model;

The absolute pose output module is used for acquiring in-vivo image data of a patient captured by an endoscope system in the RMIS process in real time, inputting the in-vivo image data into a trained endoscope pose estimation model and obtaining absolute pose data of the endoscope corresponding to the real-time image in the RMIS process;

The relative pose calculation module is used for calculating the real-time relative pose of the endoscope in the RMIS process based on absolute pose data of the endoscope corresponding to the real-time image in the RMIS process.

In a third aspect, an embodiment of the present invention provides a computer device, including a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processor communicates with the memory through the bus, and the processor executes the machine-readable instructions to perform steps of a method for calculating absolute and relative pose of an endoscope in any of the above operations.

In a fourth aspect, embodiments of the present invention provide a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of absolute and relative pose calculation of an endoscope as described in any of the above.

The technical scheme of the embodiment of the invention has the following beneficial effects:

On the basis of regressing absolute poses of an endoscope by using a deep neural network, the invention adds an extra semantic segmentation branch into an endoscope pose estimation model to provide implicit geometric constraint for pose regressing tasks, and calculates relative poses of the endoscope according to physical meanings after two or more absolute poses of the endoscope are acquired, thereby realizing absolute pose estimation and relative pose calculation of the endoscope in an RMIS scene, improving the automation level in the RMIS scene and the accuracy of a surgical robot in response to a surgeon, and ensuring the safety of patients in robot-assisted minimally invasive surgery.

Drawings

FIG. 1 is a flow chart illustrating a method of intra-operative endoscope absolute and relative pose calculation according to an exemplary embodiment;

FIG. 2 is a block diagram illustrating an apparatus for intra-operative endoscope absolute and relative pose calculation according to an exemplary embodiment;

FIG. 3 is an overall block diagram of an endoscope pose estimation model, according to an exemplary embodiment;

FIG. 4 is a schematic view of an endoscope relative pose calculation, according to an exemplary embodiment.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for calculating absolute and relative pose of an endoscope during operation, comprising the following steps:

Firstly, adjusting the image size and carrying out image normalization processing on data in a virtual data set so as to enhance the robustness and generalization capability of a model and ensure the output quality and consistency of the model; converting the pose labels of the virtual endoscopes corresponding to all image samples in the virtual data set into dual quaternion forms, and finally dividing the virtual data set into a first training set and a first test set, wherein each image in the virtual data set at least corresponds to the pose truth labels parameterized by the dual quaternion of a coordinate system of the virtual endoscope containing the surgical instrument segmentation mask and the shooting image relative to a world coordinate system;

Firstly, adjusting the image size and carrying out image normalization processing on data in a real world-simulation data set so as to enhance the robustness and generalization capability of a model and ensure the output quality and consistency of the model; and converting the pose labels of the endoscopes corresponding to all the image samples into dual quaternion forms, and finally dividing the real world-simulation data set into a second training set and a second test set, wherein each image in the real world-simulation data set at least corresponds to the pose truth labels which at least contain surgical instrument segmentation masks and are parameterized by using the dual quaternions of the coordinate system of the endoscope for shooting the image relative to the world coordinate system.

step S1, obtaining initial model parameters of an encoder (a characteristic extraction network) to be pre-trained, loading the initial model parameters to the encoder, and removing a fully connected classification layer of the encoder to form a full convolution network, wherein the initial model parameters of the encoder to be pre-trained are initial weight matrixes of all layers pre-trained in a large-scale data set;

s2, inputting a plurality of images in a training set of the selected data set into an encoder;

Step S3, dividing all the convolution layers into a plurality of layers according to the output size of each convolution block of the full convolution network, and performing coding operation on the training images to obtain a first feature map of each training image in each layer of the convolution layers;

Step S4, according to the number of the levels divided by the selected encoder, cascading sub-decoders with the same number and one-to-one correspondence between the output sizes and the levels of the encoder are arranged, a semantic segmentation decoder is formed by all sub-decoders and a prediction module, and the output size of a sub-decoder block of the last stage of the semantic segmentation decoder is the same as the size of an original training image;

step S5, setting a high-dimensional full-connection layer and a full-connection layer with the same dimension as the pose vector length to form a pose estimation decoder;

Step S6, the lowest-level sub-decoder of the semantic segmentation decoder receives a first feature map with the smallest size as input, the second lowest-level decoder receives the output of the lowest-level sub-decoder as input, and so on, and finally the pose estimation decoder outputs the absolute pose vector of the endoscope.

The logical relationship and the data transfer relationship of the encoder, the semantic segmentation decoder and the pose estimation decoder in the pose estimation model in the steps S2 to S6 are as follows:

A deep neural network (Deep Neural Networks, DNNs) is selected as an encoder for extracting a first set of image feature maps of the input image I According to the network characteristics (output size layer series i) of the selected deep neural network, the method will/>；

（1）

The semantic segmentation decoder D is composed of several sub-decoders D _i, i= {1,2,4} And a prediction module. Each sub-decoder is constructed and the component order is as follows: an upsampling layer, a 3*3 convolution layer, a Batch Norm layer, a ReLU layer, a 1*1 convolution layer, a Batch Norm layer, and a ReLU layer for stepwise recovering image details to obtain a second image feature map set/>. The prediction module P consists of one 1*1 convolution layer and one Softmax layer. Prediction Module reception/>As input, the semantic segmentation result O corresponding to the image I is output, providing implicit geometric constraint for the pose estimation decoder.

The feature map transfer relationship between the encoder and the semantic segmentation decoder is as follows:

（2）

Where ∈ represents feature jump connections for the channel dimension concatenation of feature graphs.

The pose estimation decoder E consists of a high-dimensional full-connection layer and a full-connection layer with the same dimension as the pose vector in length, receives f _i as input, and outputs an absolute pose vector v of the endoscope corresponding to the image I.

（3）

Wherein, Is a first image feature map.

The loss function L ₂ of the pose estimation decoder is:

wherein p is the pose estimation value, The pose is true;

The total loss function is:

Wherein, And/>Is a weight scalar.

As shown in fig. 2, an apparatus for calculating absolute and relative pose of an endoscope in operation according to an embodiment of the present invention includes:

The specific process of carrying out the absolute and relative pose of the endoscope in operation by adopting the technique for calculating the absolute and relative pose of the endoscope in operation is as follows.

1. Construction of a virtual data set.

The three-dimensional model of the surgical instrument of the surgical robot and the background of different biological tissues is obtained, the model is imported into a three-dimensional rendering software blender, and a coordinate system of the surgical instrument is fixedly connected with a world coordinate system in software and kept relatively static. Different tracks (such as a spiral) with large enough movement angles are planned, a virtual camera is arranged to simulate the endoscope, a certain track is set as a following path of the virtual endoscope, namely, the optical center of the virtual endoscope is adsorbed on the track to move, and the optical center of the virtual endoscope is always aligned to the surgical instrument model. The scene configuration is kept in a three-dimensional rendering software blender, different tracks are combined with biological tissue background, thousands of images are rendered at the rate of 30 frames per second, and VisionBlender plug-ins are started to collect segmentation masks of surgical instruments and virtual endoscope pose labels (in the form of homogeneous matrixes) corresponding to the images, so that a virtual data set is formed. Preprocessing data in the virtual data set, converting the pose labels of the virtual endoscope corresponding to all the image samples into dual quaternion forms, and dividing the virtual data set into a training set and a testing set; each image in the virtual data set at least corresponds to a pose true value mark which contains the dual quaternion parameterization of a virtual endoscope coordinate system for shooting the image relative to a world coordinate system;

2. construction of real world-analog datasets.

2.1, External sensor configuration: an intel-feeling tracking camera T265 (hereinafter, abbreviated as "T265") capable of automatically recording the pose of an inertial measurement unit provided with respect to a world coordinate system during photographing is fixedly attached to an endoscope in addition to a scene where an operation is actually performed.

2.2, External calibration: the endoscope and the T265 are simultaneously used for acquiring the images of dozens of checkerboard calibration plates, and the two images are subjected to double-target positioning by adopting the fisheye lens on the T265 side (left side or right side) and the image shot by the endoscope, so that the relative pose of the endoscope coordinate system relative to the fisheye lens optical center coordinate system on the T265 side is obtained.

2.3, Data acquisition: after the process is finished, the endoscope is clamped by using a robot mechanical arm, the T265 is fixedly connected to the endoscope, operation is simulated on a simulated operation platform, hundreds of onsite image data in the RMIS process are collected simultaneously by using the two, and for each image, the pose of the endoscope coordinate system relative to the world coordinate system is calculated and recorded according to the pose of an inertial measurement unit equipped with the T265 under the world coordinate system, the physical relationship between the inertial measurement unit equipped with the T265 and a fisheye lens on one side, and the relative pose of the endoscope coordinate system relative to the fisheye lens coordinate system on one side of the T265, so that a real world-simulated data set is constructed.

2.4, Data preprocessing: preprocessing data in the real world-simulation data set, converting the endoscope pose labels corresponding to all image samples into dual quaternion forms, and then dividing the real world-simulation data set into a training set and a testing set. Each image in the real world-simulated dataset corresponds to at least a pose truth annotation containing a dual quaternion parameterization of the endoscope optical center coordinate system capturing the image relative to the world coordinate system.

3. An endoscope pose estimation depth neural network model as shown in fig. 3 is established.

Extracting a first feature map set of the input image I using ResNet-50 as an encoderBased on ResNet-50 network characteristics (four output size levels), will/>；

。

Four concatenated sub-decoders are arranged according to ResNet-50 layer levelsAnd a prediction module. Each sub-decoder includes an upsampling layer, a 3*3 convolution layer, a Batch Norm layer, a ReLU layer, a 1*1 convolution layer, a Batch Norm layer, and a ReLU layer. The prediction module P consists of one 1*1 convolution layer and one Softmax layer. Meanwhile, a pose estimation decoder is provided, which is composed of a 2048-dimensional full connection layer and an 8-dimensional full connection layer.

4. Training of the model.

The model proposed by the present application was implemented using TensorFlow.12.1, keras 2.12.0. A ExponentialDecay learning rate updating strategy and an Adam optimizer are adopted to improve the effect and stability of model training.

Based on a training set of the virtual data set, using a weighted combination of category cross entropy loss and logarithmic cross ratio loss as a loss function L ₁ of semantic segmentation branches to perform pre-training of a network model:

（4）

where n is the number of samples, C is the class cross entropy loss, I and Representing the predicted value and the corresponding labeling value for each pixel in the output-labeled image pair, alpha being the weight scalar.

Meanwhile, the average mean square error is used as a loss function L ₂ of the pose estimation branch:

（5）

where n is the number of samples, p is the pose estimation value, Is the true value of the pose.

The loss functions of the two branches are weighted and combined to form a total loss function L:

Wherein, Is a weight scalar.

Pre-training and updating weights of the whole network model by adopting an Adam optimizer, and storing the weights of the whole network model after pre-training is converged;

Loading stored pre-trained and converged overall network model weight for the endoscope pose estimation model, inputting image samples in a test set of a virtual data set, obtaining endoscope pose estimation values corresponding to the samples, and measuring rotation errors and translation errors of the endoscope pose estimation values and ground truth values of corresponding poses so as to evaluate the effect of the model on virtual endoscope pose estimation;

Loading the weight of the whole network model after pre-training and convergence on the virtual data set for the endoscope pose estimation model, and performing fine-tuning training of the network model based on the training set of the real world-simulation data set by using a weighted combination of category cross entropy loss and logarithmic cross ratio loss as a loss function L ₁ of semantic segmentation branches:

（6）

The average mean square error is used as a loss function L ₂ of the pose estimation branch:

wherein p is the pose estimation value, The pose true value, n is the number of samples.

The loss functions of the two branches are weighted and combined to form a total loss L:

Wherein, Is a weight scalar.

Adopting an Adam optimizer to perform fine tuning training of the whole network model and update weights, and storing the weights of the whole network model after convergence of the fine tuning training;

And loading the overall network model weight after fine tuning training convergence for the endoscope pose estimation model, inputting image samples in a test set of a real world-simulation data set, obtaining endoscope pose estimation values corresponding to the samples, and measuring rotation errors and translation errors of the endoscope pose estimation values and ground truth values of corresponding poses so as to evaluate the effect of the model on real world-simulation endoscope pose estimation.

The calculation formulas of the rotation error E _R and the translation error E _T are as follows:

（7）

（8）

wherein p is the pose estimation value, P _q and/>, which are pose truth valuesRepresenting rotation quaternions extracted from pose estimation values and pose true values in dual quaternion form, p _t and/>Representing translation vectors extracted from the pose estimation value and the pose true value in the dual quaternion form,/>Representing a quaternion multiplication.

5. Testing and application of the model.

After the model is converged, the image shot by the endoscope in the RMIS scene is directly input into the model, and then the absolute pose estimation result of the endoscope can be obtained.

6. And calculating the relative pose of the endoscope.

After repeating the previous step twice or more, the absolute pose estimation value of the endoscope corresponding to each image can be obtained, and optionally two of the absolute pose estimation values are obtained, and as shown in fig. 4, the relative pose p _r of the endoscope can be calculated by the following formula:

Wherein, Operator representing conversion of pose into homogeneous matrix,/>Representing the inverse of matrix H.

The invention provides an intraoperative endoscope absolute and relative pose calculating method which adds semantic segmentation information to provide implicit constraint for pose estimation, can realize absolute pose estimation and relative pose estimation of an endoscope in an RMIS scene, improves the automation level in the scene and the accuracy of a surgical robot in response to a surgeon, and ensures the safety of a patient in robot-assisted minimally invasive surgery.

The embodiment of the invention provides computer equipment, which comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the device is operated, the processor and the memory are communicated through the bus, and the processor executes the machine-readable instructions to execute the steps of the method for calculating the absolute and relative pose of an endoscope in any operation.

In particular, the above memory and processor can be general-purpose memory and processor, and are not particularly limited herein, and when the processor runs a computer program stored in the memory, the above method for calculating absolute and relative pose of the endoscope during operation can be performed.

It will be appreciated by those skilled in the art that the structure of the computer device is not limiting of the computer device and may include more or fewer components than shown, or may be combined with or separated from certain components, or may be arranged in a different arrangement of components.

In some embodiments, the computer device may further include a touch screen operable to display a graphical user interface (e.g., a launch interface of an application) and to receive user operations with respect to the graphical user interface (e.g., launch operations with respect to the application). A particular touch screen may include a display panel and a touch panel. The display panel may be configured in the form of an LCD (Liquid CRYSTAL DISPLAY), an OLED (Organic Light-Emitting Diode), or the like. The touch panel may collect touch or non-touch operations on or near the user and generate preset operation instructions, for example, operations of the user on or near the touch panel using any suitable object or accessory such as a finger, a stylus, or the like. In addition, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth and the touch gesture of a user, detects signals brought by touch operation and transmits the signals to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into information which can be processed by the processor, sends the information to the processor, and can receive and execute commands sent by the processor. In addition, the touch panel may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave, or may be implemented by any technology developed in the future. Further, the touch panel may overlay the display panel, and a user may operate on or near the touch panel overlaid on the display panel according to a graphical user interface displayed by the display panel, and upon detection of an operation thereon or thereabout, the touch panel is transferred to the processor to determine a user input, and the processor then provides a corresponding visual output on the display panel in response to the user input. In addition, the touch panel and the display panel may be implemented as two independent components or may be integrated.

Corresponding to the above method for starting the application program, the embodiment of the invention also provides a storage medium, and the storage medium stores a computer program, and the computer program is executed by a processor to execute the steps of the method for calculating the absolute and relative pose of the endoscope in any operation.

The starting device of the application program provided by the embodiment of the application can be specific hardware on the equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of modules is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiment provided by the application may be integrated in one processing module, or each module may exist alone physically, or two or more modules may be integrated in one module.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. A method for calculating absolute and relative pose of an endoscope in operation, comprising the following steps:

Based on absolute pose data of the endoscope corresponding to the real-time image in the RMIS process, calculating the real-time relative pose of the endoscope in the RMIS process;

The encoder-decoder architecture based model for estimating the pose of an endoscope with a single encoder and a double decoder is established, and comprises the following steps:

Acquiring initial model parameters of an encoder, loading the initial model parameters to the encoder, and removing a full-connection classification layer of the encoder to form a full-convolution network;

The lowest-level sub-decoder of the semantic segmentation decoder receives a first feature map with the smallest size as input, the second lowest-level decoder receives the output of the lowest-level sub-decoder as input, and so on, and finally the pose estimation decoder outputs absolute pose vectors of the endoscope;

training and evaluating the endoscope pose estimation model by adopting the preprocessed virtual data set and the real world-simulated data set to obtain a trained endoscope pose estimation model, wherein the training comprises the following steps of:

2. The method of claim 1, wherein the collecting data from the virtual camera simulating the motion of the endoscope to construct a virtual dataset in an RMIS scene comprises:

3. The method for calculating absolute and relative pose of an intraoperative endoscope according to claim 1, wherein the acquiring data when the simulated surgical operation drives the endoscope to move based on the pose of an endoscope coordinate system relative to a sensor coordinate system obtained by an external calibration technology, and constructing a real world-simulated data set in an RMIS scene comprises:

4. The method of claim 1, wherein the pre-processing the virtual data set and the real world-simulation data set, dividing the pre-processed virtual data set into a first training set and a first test set, and dividing the pre-processed real world-simulation data set into a second training set and a second test set, comprises:

5. The method of intra-operative endoscopic absolute and relative pose calculation according to claim 1, wherein the loss function L ₁ of the semantic segmentation decoder is:

The loss function L ₂ of the pose estimation decoder is:

wherein p is the pose estimation value, The pose is true;

The total loss function is:

Wherein, Is a weight scalar.

6. An apparatus for calculating absolute and relative pose of an endoscope during operation, comprising:

The relative pose calculation module is used for calculating the real-time relative pose of the endoscope in the RMIS process based on absolute pose data of the endoscope corresponding to the real-time image in the RMIS process;

7. A computer device comprising a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is in operation, the processor executing the machine-readable instructions to perform the steps of the method of absolute and relative pose calculation of an intra-operative endoscope according to any of claims 1-5.

8. A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of intra-operative endoscope absolute and relative pose calculation as claimed in any of claims 1-5.