CN115797448A

CN115797448A - Digestive endoscopy visual reconstruction navigation system and method

Info

Publication number: CN115797448A
Application number: CN202211389856.3A
Authority: CN
Inventors: 熊璟; 谭敏; 夏泽洋; 谢高生
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-03-14

Abstract

The invention relates to the technical field of medical image processing, in particular to a digestive endoscopy visual reconstruction navigation system and a method, wherein the system comprises a data acquisition module, a map construction module and a path planning module; the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module; the map building module is used for building an optical flow self-supervision network and an improved residual error network according to the pose data and the image depth data of the virtual camera; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map; and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating the path around the topological central line. The invention can solve the problems that the extraction of the characteristic points of the endoscope image is difficult and the direction cannot be accurately distinguished under the characteristics of low illumination and less texture.

Description

Digestive endoscopy visual reconstruction navigation system and method

Technical Field

The embodiment of the application relates to the technical field of medical image processing, in particular to a digestive endoscope visual reconstruction navigation system and a digestive endoscope visual reconstruction navigation method.

Background

Colorectal cancer is the cancer of the digestive system with the highest prevalence rate in China, enteroscopy is the best means for finding malignant polyps, and in the operation of colonoscopy, doctors observe endoscopic images by clinical experience and operate a control handle of the colonoscope to move forward. However, the image area inside the tissue acquired by the digestive endoscopy has weak texture, more repeated texture and larger scene illumination change. Moreover, camera motion can also produce motion blur, making feature extraction of images difficult. Therefore, when an endoscopic image appears as a "no information frame", the lumen channel is lost, and the correct direction cannot be identified.

The existing digestive endoscopy visual navigation method is divided into a traditional image processing algorithm and a deep learning related algorithm, wherein the traditional image processing algorithm uses the characteristics of a significant outline, a dark area and the like of an intestinal cavity as the basis of navigation; the deep learning correlation algorithm is to estimate the camera pose and depth map from the input image stream. However, for the conventional image processing algorithm, when the image is blocked and blurred, the effectiveness of the algorithm is greatly reduced, and when the endoscope is too close to the intestinal wall, the angle of light received by the endoscope head is too narrow, and the intestinal muscle line and the dark area may even be mixed, so that the feature extraction of the image is difficult. For the supervised deep learning method, it is relatively easy to obtain clinical operation video images in a digestive endoscopy environment, but it is very difficult to obtain true value labels such as camera pose and depth corresponding to each frame of image, so that the endoscope image cannot be accurately identified.

Disclosure of Invention

The embodiment of the application provides a digestive endoscopy vision reconstruction navigation system and method, and solves the problems that characteristic points of an endoscope image are difficult to extract under the characteristics of low illumination and few textures, and the direction cannot be accurately distinguished.

In order to solve the above technical problem, in a first aspect, an embodiment of the present application provides a digestive endoscope visual reconstruction navigation system, including: the system comprises a data acquisition module, a map construction module and a path planning module which are connected in sequence; the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module; the map building module is used for building an optical flow self-supervision network and an improved residual error network according to the virtual camera pose data and the image depth data; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map; and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating a path around the topological central line.

In some exemplary embodiments, the mapping module comprises a camera pose estimation module and a depth map estimation module; the camera pose estimation module is used for obtaining an estimated camera pose according to the optical flow self-monitoring network; and the depth map estimation module is used for obtaining the estimated endoscope image depth according to the improved residual error network.

In some exemplary embodiments, the map construction module constructs the environment map through three-dimensional reconstruction based on the estimated camera pose and the estimated endoscope image depth. In some exemplary embodiments, the path planning module comprises a topological centerline acquisition module and a navigation module; the topological central line acquisition module is used for acquiring a topological central line of an intestinal lumen channel by combining the pipeline characteristics of the intestinal lumen; and the navigation module is used for extracting the topological central line and planning and navigating a path around the topological central line.

In a second aspect, an embodiment of the present application further provides a digestive endoscopy visual reconstruction navigation method, where the digestive endoscopy visual reconstruction navigation system is used for navigation, and the method includes the following steps: acquiring pose data and image depth data of a virtual camera; constructing an optical flow self-monitoring network based on the virtual camera pose data, and obtaining an estimated camera pose based on the optical flow self-monitoring network; constructing an improved residual error network based on the image depth data, and obtaining an estimated endoscope image depth based on the improved residual error network; constructing an environment map based on the estimated camera pose and the estimated endoscope image depth; and extracting a topological central line based on the environment map, and planning and navigating a path around the topological central line.

In some exemplary embodiments, the deriving an estimated camera pose based on the optical flow auto-supervision network includes: taking at least two pictures as input, and carrying out network training to obtain a feature descriptor corresponding to each picture; the feature descriptors are matched according to a sorting rule to obtain corresponding pixel points among different pictures; constructing a confidence score loss function, and extracting feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.

In some exemplary embodiments, two pictures are used as input, and network training is performed to obtain two feature descriptors; matching the feature descriptors of the two pictures according to a sorting rule to obtain corresponding pixel points between the two pictures; the confidence score loss function is shown in equation (1):

wherein R is _ij Represents a confidence score, R _ij ＝0～1；R _ij The larger the probability that the feature descriptor is a feature point is; (i, j) representing the position coordinates of the pixel points in the picture; AP (i, j) represents the average precision of the pixel points; k is an element of [0,1]]Is a hyper-parameter of a threshold;

extracting characteristic points through a loss function of average precision; the loss function of average accuracy is shown in equation (2):

let k =0.5 in formula (1) in the network, when the calculated AP (i, j) loss function is less than k, R _ij The smaller; (x) _i ,y)、(x _i And y') are the position coordinates of the corresponding pixel points in the two images with the overlapped area respectively.

In some exemplary embodiments, an estimated endoscope image depth is derived by convolution and batch normalization processes based on the modified residual network; the improved residual network comprises an encoder module and a decoder module, wherein the decoder module adopts a convolution block with an activation function and a loss function for decoding; the activation function is an exponential linear unit function, as shown in equation (3):

wherein ELU (x) represents an exponential linear unit function;

the loss functions include a first loss function, a second loss function, and a third loss function; the first loss function is shown in equation (4):

wherein D is _i (p) representing the true depth value image, D _i ' (p) denotes a predicted depth map; h is _i ＝logD _i '(p)－logD _i (p); t represents the number of effective values left after filtering, and p belongs to T;

the second loss function is shown in equation (5):

the third loss function is shown in equation (6):

wherein l _i (p) represents a color image, and

the derivation of the color image and the depth image in the x and y directions is shown, and the gradient image of the color image and the depth image is obtained.

In some exemplary embodiments, extracting a topological centerline based on the environment map, and performing path planning and navigation around the topological centerline, includes: acquiring a topological central line of an intestinal lumen channel based on the pipeline characteristics of the intestinal lumen; constructing a topological map in a cavity capable of traveling in the intestinal lumen based on the topological centerline; and planning a path from the current position of the camera to the target position based on the topological map.

In some exemplary embodiments, said constructing a topological map within a cavity traversable within an intestinal lumen based on said topological centerline comprises: traversing all voxels in a free space in the metric map; comparing the parent direction of each voxel with the parent directions of the voxels adjacent to the voxel; the parent direction is the direction from the current voxel to the nearest occupied point voxel; filtering the voxels based on the angle of the topological centerline, and reserving key points as nodes of a topological map; and connecting the nodes to obtain a topological map.

The technical scheme provided by the embodiment of the application has at least the following advantages:

the embodiment of the application mainly aims at the problems that the feature point extraction of an endoscope image is difficult and the direction cannot be accurately distinguished under the characteristics of low illumination and less texture, and provides a digestive endoscopy visual reconstruction navigation system and a method, wherein the system comprises the following steps: the system comprises a data acquisition module, a map construction module and a path planning module; the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module; the map building module is used for building an optical flow self-supervision network and an improved residual error network according to the pose data and the image depth data of the virtual camera; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map; and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating a path around the topological central line.

Compared with the traditional digestive endoscopy navigation method, the digestive endoscopy visual reconstruction navigation system and the digestive endoscopy visual reconstruction navigation method can sense the endoscope environment globally, and record the historical track of the endoscope during visual reconstruction. Moreover, the feature point network based on the optical flow self-supervision constructed by the method is more suitable for the characteristics of weak texture, smooth surface and the like of the endoscope image, and can solve the problem that the feature points of the endoscope image are difficult to extract under the characteristics of low illumination and less texture. In addition, compared with a supervised deep learning method, the method has the advantages that the data acquisition module is built for solving the problem that the clinical image does not have a true value label, therefore, the method does not need a pose true value label to train the network, and the label is only used for calculating accuracy indexes and errors in a verification stage.

Drawings

One or more embodiments are illustrated by corresponding figures in the drawings, which are not to be construed as limiting the embodiments, unless expressly stated otherwise, and the drawings are not to scale.

Fig. 1 is a schematic structural diagram of a digestive endoscope visual reconstruction navigation system according to an embodiment of the present application;

fig. 2 is a schematic diagram of a frame structure of a navigation system for visual reconstruction of an endoscope according to an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating data acquisition performed by a data acquisition module on a virtual colon simulation platform according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a digestive endoscopy visual reconstruction navigation method according to an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating an embodiment of a method for obtaining an estimated pose of a camera based on an optical flow auto-supervision network;

fig. 6 is a schematic flowchart of obtaining an estimated endoscope image depth according to an improved residual error network according to an embodiment of the present application.

Detailed Description

The background technology shows that the existing digestive endoscopy visual navigation method has the problems that the extraction of characteristic points of an endoscope image is difficult under the characteristics of low illumination and less texture, and the direction cannot be accurately distinguished.

At present, the existing digestive endoscopy visual navigation method is divided into a traditional image processing algorithm and a deep learning related algorithm, wherein the traditional image processing algorithm utilizes the characteristics of a significant contour, a dark area and the like of an intestinal cavity, and specifically is divided into a dark area extraction method, a contour identification method and the like, and the traditional image processing algorithm and the dark area extraction method are combined to be used as a navigation basis in the prior art. Because the endoscope advances in the closed intestinal cavity, the light is irradiated from far to near, and therefore, the dark area is the most important and most remarkable characteristic for the doctor to judge the advancing direction. In addition, the colon typically has a relatively sharp muscle ring inside, and when the lumen is clearly visible, the shape of the intestinal semi-closed muscle curve can be seen. Therefore, the contour recognition method based on the structural characteristics of the colon itself navigates the direction of the radius of curvature as the deepest part of the intestine. In the deep learning related algorithm, in order to map an environment and position a robot, the algorithm is required to estimate a camera pose and a depth map according to an input image stream. The pose of a camera is equivalent to the transformation of the world coordinate system to the camera coordinate system, and is also referred to as an external reference in three-dimensional vision. In order to complete the conversion from the camera coordinate system to the pixel coordinate system, an internal reference matrix related to the properties of the camera itself is also required. Pose estimation of the camera is also called front end in SLAM (simultaneous mapping and positioning) framework, called visual odometer. After the camera pose is obtained, if the point-by-point pixel depth corresponding to the color image frame can be obtained, a map of the environment can be reconstructed.

With the rise of data-driven deep networks, some studies based on deep learning in combination with SLAM began to emerge. The method based on deep learning and SLAM is divided into a supervised method and an unsupervised method, and the artificial neural network with supervised training has good generalization capability and quicker prediction time. However, it is difficult to select the optimal parameter set during network training, and it is sensitive to the selection of the initial weight. In addition, in a digestive endoscopy environment, it is relatively easy to obtain clinical operation videos, but it is very difficult to obtain true value labels such as camera pose, depth and the like corresponding to each frame of image. The unsupervised method can just solve the defect that a truth label is difficult to obtain in the medical field, and self-constraint is formed inside the network by means of a loss function. The self-constraint is that in the process of transforming from an imaged pixel coordinate system to a camera coordinate system to a world coordinate system, the position of a three-dimensional point can be recovered from an image by knowing the position and the posture of a depth map and a camera, and the geometric consistency constraint is met.

However, for the traditional image processing algorithm, when the image is blocked and blurred, the effectiveness of the algorithm is greatly reduced, and even the algorithm does not work at all. And when the endoscope is too close to the intestinal wall, the angle of light received by the endoscope head is too narrow, and the intestinal muscle line and the dark area may be mixed. The traditional image processing algorithm is high in robustness when the intestinal cavity is clearly visible, but has no much auxiliary significance for doctors. In addition, the traditional image processing algorithm focuses on processing each frame of image in real time, and some fixed threshold parameters are usually required to be set in the dark area extraction or edge contour extraction method, so that adaptive parameter adjustment is difficult to realize in the processing process. For the supervised deep learning method, it is relatively easy to obtain clinical operation video images in a digestive endoscopy environment, but it is very difficult to obtain true value labels such as camera pose, depth and the like corresponding to each frame of image. For unsupervised self-constrained deep learning methods, errors in motion estimation may accumulate over time, resulting in drift of the trajectory. Most of work is based on an automatic driving natural scene image or a medical laparoscope image, the motion trail of a camera is not complex, and clear textures exist at the edge of organs such as a liver in an abdominal cavity, so that pose and depth estimation is easier. In addition, for the existing pose estimation and depth estimation work aiming at the enteroscope image, the defects are that the error is still large, and only a simple track such as a straight line can be estimated. However, inside the intestinal tract, turns are often more critical, and unsupervised frames migrated from the field of automatic driving often cannot well deal with problems of repeated textures and complex pose prediction of the turns.

In order to solve the above technical problems, an embodiment of the present application provides a digestive endoscopy visual reconstruction navigation system and a method, where the system includes: the system comprises a data acquisition module, a map construction module and a path planning module which are connected in sequence; the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module; the map building module is used for building an optical flow self-monitoring network and an improved residual error network according to the pose data and the image depth data of the virtual camera; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map; and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating the path around the topological central line. According to the digestive endoscope visual reconstruction navigation system provided by the embodiment of the application, firstly, a data acquisition module is built for solving the problem that a clinical image does not have a truth value label; secondly, a pose estimation network and a depth map prediction network are set up by utilizing the depth network respectively so as to construct an environment map; finally, a topological centerline navigation algorithm is used to complete the path planning. By the digestive endoscopy vision reconstruction navigation system and the digestive endoscopy vision reconstruction navigation method, the problems that the characteristic points of an endoscope image are difficult to extract under the characteristics of low illumination and less texture and the direction cannot be accurately distinguished in the existing digestive endoscopy vision navigation method can be solved.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

Referring to fig. 1, an embodiment of the present application provides a digestive endoscope visual reconstruction navigation system, including: the system comprises a data acquisition module 101, a map construction module 102 and a path planning module 1021 which are connected in sequence; the data acquisition module 101 is configured to acquire pose data and image depth data of the virtual camera, and send the acquired pose data and image depth data of the virtual camera to the map construction module 102; the map building module 102 is used for building an optical flow self-supervision network and an improved residual error network according to the pose data and the image depth data of the virtual camera; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map; the path planning module 103 is configured to extract a topological centerline according to the environment map, and perform path planning and navigation around the topological centerline.

As shown in fig. 1 and fig. 2, by designing a digestive endoscopy visual reconstruction and navigation system based on deep self-supervision deep learning, the system can better assist a doctor in navigation, and the constructed environment map can also play a role in assisting diagnosis and recording the position of a focus. If the algorithm is provided for the robot, the robot can complete autonomous navigation.

Specifically, the data acquisition module 101 is configured to acquire pose data and image depth data of the virtual camera, and send the acquired pose data and image depth data of the virtual camera to the map construction module 102. Because a large number of fuzzy situations exist in the clinical real colon surgery video stream and the truth labels of the depth map and the pose required by the training depth network are difficult to mark manually, the training data collection of the data collection module 101 is performed on the virtual colon simulation platform. Fig. 3 shows a schematic flow chart of data acquisition by the data acquisition module 101 on the virtual colon simulation platform, true value labels when the virtual camera pose and depth data acquired by the data acquisition module 101 are to be used for depth network training, and subsequent related evaluation. As shown in fig. 3, fig. 3 shows a colon internal simulation environment acquired by the data set acquisition module 101 of the virtual colon simulation platform, a colon model obtained through CT scanning is imported into the virtual simulation platform, environment and highlight rendering can be performed through the simulation platform, a custom script is compiled and corresponding camera pose and depth images can be acquired, and the two parts of data are used as truth labels of a subsequent training network, so as to perform work such as verification.

As shown in fig. 1, in some embodiments, the mapping module 102 includes a camera pose estimation module 1021 and a depth map estimation module 1022; the camera pose estimation module 1021 is used for obtaining an estimated camera pose according to the optical flow self-monitoring network; the depth map estimation module 1022 is configured to obtain an estimated endoscope image depth according to the improved residual error network.

In the navigation system for visual reconstruction of a digestive endoscope provided by the embodiment of the application, the map construction module 102 is a core module, and the map construction module 102 can be divided into pose estimation based on an optical flow self-supervision network and monocular endoscope image depth map estimation based on an improved residual error network, wherein the camera pose estimation module 1021 is used for realizing pose estimation based on the optical flow self-supervision network; the depth map estimation module 1022 is used for implementing monocular endoscopic image depth map estimation based on the improved residual error network; based on the camera pose and intestinal cavity depth map output by the two networks, an environment map can be obtained through reconstruction, and a path planning and navigation algorithm is developed on the basis.

The map building module 102 mainly builds the environment map as follows: (1) And constructing an optical flow self-monitoring network and an improved residual error network, respectively estimating the pose of the camera and the depth map, and training. (2) And obtaining an estimated camera pose by using an optical flow-based self-monitoring network. (3) And using a monocular endoscope image depth map estimation network based on the improved residual error network to obtain the estimated endoscope image depth. (4) And (4) constructing an environment map by using the camera pose obtained in the steps (2) and (3) and the endoscope image depth.

In some embodiments, the map construction module 102 constructs an environmental map through three-dimensional reconstruction based on the estimated camera pose, the estimated endoscope image depth.

As shown in fig. 1, in some embodiments, the path planning module 103 includes a topological centerline acquisition module 1031 and a navigation module 1032; the topological centerline obtaining module 1031 is configured to obtain a topological centerline of the intestinal lumen tract in combination with the tube characteristics of the intestinal lumen; the navigation module 1032 is configured to extract a topological centerline and perform path planning and navigation around the topological centerline. Specifically, the navigation module 1031 constructs a topological map for describing the cavity in the cavity that can be traveled according to the topological centerline, and then plans a path from the current position of the camera to the target position around the topological centerline. It should be noted that the target position is a special position seen in the advancing stage of the endoscope, and the path planning to the special position is completed in the backward process; the specific location includes one or both of a lesion location, a polyp location.

Specifically, the main functions of the path planning module 103 are: the method comprises the steps of directly extracting a topological central line of a cavity channel by combining the channel characteristics of an intestinal cavity, constructing a simple topological map for describing the cavity in the cavity which can be traveled, and planning a path from the current position of a camera to a target position (also called a target point) around the topological map. The target point is defined as a special position such as a focus or polyp seen in the advancing stage of the endoscope, and path planning to the special position is completed in the retreating process.

The specific position may be a position having a specific mark other than a lesion position and a polyp position.

Referring to fig. 4, an embodiment of the present application further provides a digestive endoscopy visual reconstruction navigation method, where the digestive endoscopy visual reconstruction navigation system is used for navigation, and the navigation method includes the following steps:

s1, acquiring pose data and image depth data of the virtual camera.

S2, building an optical flow self-monitoring network based on the pose data of the virtual camera, and obtaining an estimated camera pose based on the optical flow self-monitoring network; and constructing an improved residual error network based on the image depth data, and obtaining the estimated endoscope image depth based on the improved residual error network.

And S3, constructing an environment map based on the estimated camera pose and the estimated endoscope image depth.

And S4, extracting a topological central line based on the environment map, and planning and navigating a path around the topological central line.

The digestive endoscope visual reconstruction navigation method provided by the application is also called an endoscope navigation algorithm, and the algorithm comprises the following steps: the method comprises the steps of performing digestive endoscopy visual reconstruction navigation by using a pose estimation algorithm based on an optical flow self-supervision network, a monocular endoscope image depth map estimation algorithm based on an improved residual error network and a path planning and navigation algorithm of a topological central line. By the endoscope navigation algorithm, the environment and the current pose of the robot can be globally perceived, and a prompt is given to the next action according to the current environment map.

Specifically, in step S1, the data acquisition module 101 acquires pose data and image depth data of the virtual camera, the pose and depth data of the virtual camera acquired by the data acquisition module 101 are used as true value tags during depth network training, and subsequent related evaluations are performed. After the pose data and the image depth data of the virtual camera are acquired, step S2 is executed, the step S2 can be divided into two steps, and step S201 is pose estimation based on an optical flow self-supervision network; and S202, estimating the monocular endoscope image depth map based on the improved residual error network.

The following embodiment is an explanation of specific steps of the pose estimation based on the optical flow auto-supervision network in step S201.

In some embodiments, the obtaining the estimated camera pose based on the optical flow auto-supervision network in step S2 includes: taking at least two pictures as input, and carrying out network training to obtain a feature descriptor corresponding to each picture; the feature descriptors are matched according to a sorting rule to obtain corresponding pixel points among different pictures; constructing a confidence score loss function, and extracting feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.

It should be noted that, usually, two pictures are taken as input, and network training is performed to obtain feature descriptors corresponding to the two pictures one by one. The following describes the network training using two pictures as input.

In some embodiments, two pictures are used as input, and network training is carried out to obtain two feature descriptors; matching the feature descriptors of the two pictures according to a sorting rule to obtain corresponding pixel points between the two pictures; the confidence score loss function is shown in equation (1):

wherein R is _ij Represents a confidence score, R _ij ＝0～1；R _ij The larger the probability that the feature descriptor is a feature point is; (i, j) representing the position coordinates of pixel points in the picture; AP (i, j) represents the average precision of the pixel points; k is an element of [0,1]]A hyperparameter that is a threshold;

let k =0.5 in equation (1) in the network, when the calculated AP (i, j) loss function is less than k, R _ij The smaller;

(x _i ,y)、(x _i and y') are the position coordinates of the corresponding pixel points in the two images with the overlapped area respectively.

The optical flow self-supervision network framework constructs a confidence score loss function by means of the optical flow, and extracts more robust feature points. And subsequently, estimating the essential matrix from the two views by using the essential matrix and epipolar constraint to reversely solve the pose. The parameters of each node of the optical flow self-supervision network are shown in FIG. 5. The network takes two pictures as input, the two pictures are respectively represented by img1 and img2, and two feature descriptors are obtained through the operations of convolution (Conv), reLU activation function and Batch Normalization (BN) layer and finally output. The feature descriptors of the two pictures are matched according to the sorting rule to find corresponding pixel points (called corresponding points) between the two pictures. While this matching ordering rule is self-supervised based on pre-computed optical flow driving, the additional confidence score can help to select more stable and reliable feature points in the corresponding points and filter out those feature points with lower scores.

The optical flows of two extra images are introduced in the design of the loss function and are generated in the data loading stage. The specific value in the optical flow vector represents the position coordinates (x, y) of each pixel of img1 in img2, and the confidence rating loss function is shown in formula (1), wherein the confidence score R _ij Is 0 to 1; r _ij The larger the probability that the feature descriptor is a feature point; k is an element of [0,1]]Is a super-parameter of a threshold value, is usually set to be 0.5 in a network vertical, and when the loss function of the position coordinates AP (i, j) of the pixel points in the calculated picture is less than k, R _ij The smaller.

Average Precision (AP) is an evaluation index used to measure the classification result in multi-label classification, and is used as a loss function to minimize the matching error between two feature descriptors. The matching of the feature description vectors can be modeled as a sort optimization problem, i.e. in two images I and I 'with overlapping regions, the distance (e.g. euclidean distance) between each feature description vector in image I and the feature description vector in image I' is calculated. And after the distances are obtained, sorting the distances from small to large, wherein the minimum distance is the characteristic matched with the minimum distance. The true value of the label is obtained by sparse sampling of the optical flow in fig. 5, which is equivalent to knowing the matching relationship before two frames of images in advance. After the feature points are extracted, the pose estimation is carried out by utilizing the classical geometrical relationship between two views.

The invention provides a deep learning network based on self-supervision characteristic point extraction, and the self-supervision is directly carried out on the characteristic description extracted by the network through information such as optical flow, so that more robust characteristic points in the image are extracted. Experimental tests prove that the self-supervision learning route can solve the problem of feature point extraction of the endoscope image under the characteristics of low illumination and less texture. And after extracting the feature points, matching the feature descriptors of every two images, and finally solving the pose of the camera by using a traditional multi-view geometric algorithm.

Here, it should be noted that the embodiment of the present application acquires feature points by using an auto-supervision optical flow depth network output algorithm. It will be appreciated that the feature points may also be obtained by other algorithms. For example, the process of solving the feature points of the two images can be replaced by using the traditional Scale-invariant feature transform (SIFT) feature points or ORB (organized FAST and organized bridge) feature points, and the self-supervision optical flow depth network used in the application outputs more stable feature points than the algorithm, so as to perform pose solving. In addition, as an embodiment of the present application, the monocular endoscopic image depth map prediction process may be replaced with another customized supervised network.

As mentioned above, step S2 includes: s201, estimating a pose based on an optical flow self-monitoring network; and S202, estimating the monocular endoscope image depth map based on the improved residual error network. Next, a specific procedure of monocular endoscopic image depth map estimation based on the improved residual network in step S202 will be explained.

In some embodiments, an estimated endoscope image depth is derived by convolution and batch normalization processes based on the modified residual network; the improved residual network comprises an encoder module and a decoder module, wherein the decoder module adopts a convolution block with an activation function and a loss function for decoding; the activation function is an exponential linear unit function, as shown in equation (3):

wherein ELU (x) represents an exponential linear unit function;

wherein D is _i (p) representing the true depth value image, D _i ' (p) denotes a predicted depth map; h is _i ＝logD _i '(p)－logD _i (p); t represents passing throughThe number of the effective values left after filtering is p belongs to T;

the second loss function is shown in equation (5):

the third loss function is shown in equation (6):

wherein l _i (p) represents a color image, and

In the process of estimating the monocular endoscope image depth map based on the improved residual error network, the depth estimation of the monocular endoscope image is improved by using a classical 18-layer residual error network (ResNet), and the depth network architecture based on the improved residual error network is shown in FIG. 6, and features are extracted mainly by using convolution and Batch Normalization (BN). In order to output the depth map with the same width and height as the original image, the depth network mainly comprises an Encoder (Encoder) and a Decoder (Decoder), wherein a complete ResNet is used in the Encoder part, and a convolution block with an Exponential Linear Unit (ELU) activation function is directly used in the decoding part. In the encoding stage, each Basic Block is subjected to downsampling operation to gradually increase the number of channels of the feature vector, and the average pooling layer (Avg Pool) and the full connection layer (FN) are increased all the time to extract 512-dimensional feature vectors. In the decoding stage, the 3 × 3 volume block is directly used to match the ELU activation function to achieve the purpose of up-sampling.

In the design of the loss function, the application designs three loss functions which are respectively a first loss function, a second loss function and a third loss function, such as formula (4) and a common formulaThe formula (5) and the formula (6). In the formula (4), T represents the number of valid values left after the validity mask filtering, and p is equal to T. The first and second loss functions are both differential losses for directly comparing the disparity of the two depth maps. In the formulas (4) and (5), D _i (p) representing the true depth value image, D _i ' (p) denotes a predicted depth map. Because the depth map predicted by ResNet is too smooth and loses part of detailed texture information, the smooth loss in the depth map predicted by ResNet is improved, and a third loss function is provided, as shown in formula (6), l _i (p) represents a color image, and

the derivative values of the color RGB image and the depth image in the x direction and the y direction are calculated to obtain a gradient image of the color image and the depth image; the basis is that the gradient of pixels is usually larger and the change is more drastic at some curved edges in digestive endoscopy images.

And after the estimated camera pose and the estimated endoscope image depth are obtained, executing the step S3, and constructing an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscope image depth. After the environment map is constructed, step S4 is executed to construct a topological map based on the environment map, and perform path planning around the topological map.

In some embodiments, the step S4 of extracting a topological centerline based on the environment map, and performing path planning and navigation around the topological centerline includes the following steps:

step S401, acquiring a topological central line of an intestinal lumen channel based on the characteristics of the intestinal lumen channel.

And S402, constructing a topological map in a cavity capable of traveling in the intestinal cavity based on the topological central line.

Step S403, planning a path from the current position of the camera to the target position based on the topological map.

It should be noted that step S4 is mainly to perform the operation of path planning and navigation based on the topological centerline. The basis for the path planning is the topological centreline of the map. The topological centerline may also be referred to as a 3D Generalized Voronoi Diagram (GVD) skeleton, and the generation of GVD depends on an Euclidean Signed Distance Functions (ESDF) metric map, i.e., an ESDF metric map. All points equidistant from two or more obstacles are iteratively found in the metric map, and the points are connected to form a ridge of free space, which may also be referred to as a central axis. And secondly, obtaining a complete sparse topological map description as a navigation advancing route through post-processing such as skeleton refinement and pruning.

In some embodiments, step S402 constructs a topological map within a cavity traversable within the intestinal lumen based on the topological centerline, comprising the steps of:

step S4021, traversing all voxels in the free space in the metric map.

Step S4022, comparing the parent direction of each voxel with the parent direction of the voxels adjacent to the parent direction; the parent direction is the direction of the current voxel to the nearest occupied point voxel.

S4023, filtering voxels based on the angle of the topological centerline, and reserving key points as nodes of the topological map; and connecting the nodes to obtain a topological map.

Specifically, the GVD extraction process comprises: firstly, traversing all voxels in a free space in the ESDF; next, comparing their parent directions with 6 parent directions connecting neighbors, the parent direction being defined as the direction from the current voxel to the nearest occupied point voxel; then, voxels that are too large in the paternal direction are discarded according to the pre-set GVD angle. And finally, after filtering redundant voxels, leaving key points as nodes of the topological map, and connecting the nodes to obtain the topological map.

The digestive endoscope visual reconstruction navigation system and the method provided by the invention have feasibility through experimental simulation. A total of 21671 colonoscope image data collected by the virtual platform were used in the training stage, and two sets of virtual data, 400 and 447, were used in the testing stage. The two groups of clinical data are 82 and 109 respectively, and the effect is feasible through preliminary verification of experiments.

A of the calculated predicted depth map ₁ And a ₂ The mean precision values were 0.7637 and 0.9471, respectively, and the mean error RMSE was 0.0929, both in units of grayscale of the depth values. The above errors are all in a controllable range, and the evaluation index is calculated in the following way:

wherein d is ^* Representing the predicted depth map and d the true depth map.

By the technical scheme, the embodiment of the application mainly aims at the problems that the characteristic points of an endoscope image are difficult to extract and the direction cannot be accurately distinguished under the characteristics of low illumination and few textures in the existing endoscope navigation algorithm, and provides a digestive endoscope visual reconstruction navigation system and a method, wherein the system comprises: the system comprises a data acquisition module, a map construction module and a path planning module; the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module; the map building module is used for building an optical flow self-supervision network and an improved residual error network according to the pose data and the image depth data of the virtual camera; according to the optical flow self-supervision network and the improved residual error network, camera pose estimation and depth map estimation are respectively carried out, and an environment map is constructed; and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating a path around the topological central line.

Compared with the traditional endoscope navigation algorithm, the digestive endoscope visual reconstruction navigation system and the digestive endoscope visual reconstruction navigation method provided by the embodiment of the application can sense the endoscope environment globally, and record the endoscope historical track through visual reconstruction. Moreover, the feature point network based on the optical flow self-supervision constructed by the method is more suitable for the characteristics of weak texture, smooth surface and the like of the endoscope image, and can solve the problem that the feature points of the endoscope image are difficult to extract under the characteristics of low illumination and less texture. In addition, compared with a supervised deep learning method, the method has the advantages that the data acquisition module is built for solving the problem that the clinical image does not have a true value label, therefore, the method does not need a pose true value label to train the network, and the label is only used for calculating accuracy indexes and errors in a verification stage.

It will be understood by those skilled in the art that the above-described embodiments are specific examples for implementing the application, and that various changes in form and detail may be made therein without departing from the spirit and scope of the application. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present disclosure, and the scope of the present disclosure should be defined only by the appended claims.

Claims

1. A digestive endoscopy visual reconstruction navigation system, comprising: the system comprises a data acquisition module, a map construction module and a path planning module which are connected in sequence;

the data acquisition module is used for acquiring pose data and image depth data of the virtual camera and sending the acquired pose data and image depth data of the virtual camera to the map construction module;

the map construction module is used for constructing an optical flow self-supervision network and an improved residual error network according to the virtual camera pose data and the image depth data; respectively carrying out camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual error network to construct an environment map;

and the path planning module is used for extracting a topological central line according to the environment map and planning and navigating a path around the topological central line.

2. The system according to claim 1, wherein the map construction module comprises a camera pose estimation module and a depth map estimation module;

the camera pose estimation module is used for obtaining an estimated camera pose according to the optical flow self-monitoring network;

and the depth map estimation module is used for obtaining the estimated endoscope image depth according to the improved residual error network.

3. The system according to claim 2, wherein the map construction module constructs the environment map by three-dimensional reconstruction based on the estimated camera pose and the estimated endoscope image depth.

4. The system according to claim 1, wherein the path planning module comprises a topological centerline acquisition module and a navigation module;

the topological central line acquisition module is used for acquiring a topological central line of an intestinal lumen channel by combining the pipeline characteristics of the intestinal lumen;

the navigation module is used for extracting the topological central line and planning and navigating a path around the topological central line.

5. A digestive endoscopy visual reconstruction navigation method using the digestive endoscopy visual reconstruction navigation system of claim 1 to claim 4 for navigation, comprising the steps of:

acquiring pose data and image depth data of a virtual camera;

constructing an optical flow self-monitoring network based on the virtual camera pose data, and obtaining an estimated camera pose based on the optical flow self-monitoring network; constructing an improved residual error network based on the image depth data, and obtaining an estimated endoscope image depth based on the improved residual error network;

constructing an environment map based on the estimated camera pose and the estimated endoscope image depth;

and extracting a topological central line based on the environment map, and planning and navigating a path around the topological central line.

6. The navigation method for visual reconstruction with digestive endoscopy according to claim 5, wherein the obtaining an estimated camera pose based on the optical flow auto-supervision network comprises:

taking at least two pictures as input, and carrying out network training to obtain a feature descriptor corresponding to each picture;

the feature descriptors are matched according to a sorting rule to obtain corresponding pixel points among different pictures;

constructing a confidence score loss function, and extracting feature points from the pixel points;

and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.

7. The visual reconstruction navigation method of the digestive endoscopy as claimed in claim 6, wherein two pictures are taken as input, and network training is performed to obtain two feature descriptors; matching the feature descriptors of the two pictures according to a sorting rule to obtain corresponding pixel points between the two pictures;

the confidence score loss function is shown in equation (1):

wherein R is _ij Represents a confidence score, R _ij ＝0～1；R _ij The larger the probability that the feature descriptor is a feature point;

(i, j) representing the position coordinates of the pixel points in the picture; AP (i, j) represents the average precision of the pixel points; k is an over-parameter of a threshold value and belongs to [0,1 ];

8. The digestive endoscopy visual reconstruction navigation method of claim 5, wherein the estimated endoscope image depth is obtained by convolution and batch normalization processing based on the modified residual network;

the improved residual network comprises an encoder module and a decoder module, wherein the decoder module adopts a convolution block with an activation function and a loss function for decoding;

the activation function is an exponential linear unit function, as shown in equation (3):

wherein ELU (x) represents an exponential linear unit function;

the second loss function is shown in equation (5):

the third loss function is shown in equation (6):

wherein l _i (p) represents a color image, and

9. The endodigestive vision reconstruction navigation method of claim 5, wherein based on the environment map, extracting a topological centerline and performing path planning and navigation around the topological centerline comprises:

acquiring a topological central line of an intestinal lumen channel based on the characteristics of the intestinal lumen channel;

constructing a topological map in a cavity capable of traveling in the intestinal lumen based on the topological centerline; and planning a path from the current position of the camera to the target position based on the topological map.

10. The navigation method for visual reconstruction with digestive endoscopy according to claim 9, wherein the constructing a topological map in a cavity that can be traveled in the intestinal lumen based on the topological centerline comprises:

traversing all voxels in a free space in the metric map;

comparing the parent direction of each voxel with the parent directions of the voxels adjacent to the voxel; the parent direction is the direction from the current voxel to the nearest occupied point voxel;

filtering the voxels based on the angle of the topological centerline, and reserving key points as nodes of a topological map;

and connecting the nodes to obtain a topological map.