WO2022170562A1

WO2022170562A1 - Digestive endoscope navigation method and system

Info

Publication number: WO2022170562A1
Application number: PCT/CN2021/076523
Authority: WO
Inventors: 熊璟; 谭敏; 夏泽洋; 谢高生
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2021-02-10
Filing date: 2021-02-10
Publication date: 2022-08-18

Abstract

Disclosed are a digestive endoscope navigation method and system. The method comprises: fusing key points of two frames of images, and performing offline marking by using a displacement vector distribution of a fused image, so as to obtain a true value of training data; constructing a twin neural network to perform supervised pre-training; and estimating position coordinates of the next frame of a digestive endoscopy lens by using a displacement vector distribution output by the trained model, and then outputting a complete movement trajectory, so as to realize the navigation of a digestive endoscope. By means of the method, the pose and the movement direction of a digestive endoscope itself can be accurately identified, such that a movement trajectory thereof is grasped on the whole, and the adaptability is strong.

Description

A digestive endoscope navigation method and system

technical field

The present invention relates to the technical field of medical image processing, and more particularly, to a digestive endoscope navigation method and system.

Background technique

Colonoscopy is one of the important methods for diagnosing malignant tumors in anorectal surgery. The doctor controls the colonoscope to be inserted from the patient's anus. The examination is divided into two stages: forward and backward. In the previous stage, the doctor looks for the cavity advancement lens based on clinical experience and colonoscopy images. , until the tail of the cecum is reached, and then the regression phase is performed to observe whether there are polyps or other lesions in the intestinal tract. In this traditional colonoscopy operation, doctors only rely on endoscopic imaging and their own experience to find the advancing lens in the center of the lumen. Due to the limitations of the field of view, the movement of the intestine itself, the influence of intestinal solubles and water mist, it is difficult to Correctly judging the moving direction of the lens will eventually lead to the loss of the cavity and it is difficult to return to the original position to continue advancing, which will bring psychological and physical pain to the patient, and even cause perforation, bleeding and many other problems in severe cases.

At present, colonoscopy robots (such as microcapsule endoscopes) that can move autonomously can eliminate the psychological discomfort of patients to a certain extent, and are an ideal solution. Whether it is to assist doctors in navigating during surgery or autonomously navigate colonoscopy robots, it is necessary to recognize and guide the movement of the endoscope. In the prior art, the solutions for endoscopic navigation mainly include:

(1) Dark area extraction method. Since the endoscope advances in the closed intestinal cavity, the light is from far to near, so the dark area is the most important and significant indicator for doctors to judge the direction of advancement. The method extracts features such as contour and dark area based on a single frame image to find the approximate center of the cavity, and uses the coordinates of the center point as the basis for navigation to guide the doctor or robot to move toward the coordinate point. However, this method of navigating with dark areas as features has great limitations, because this method is only applicable when the cavity is clearly visible, and most of these methods use traditional image segmentation such as threshold segmentation and edge extraction. The processing method, which relies on a single frame of image, has no global adaptability.

(2) Contour recognition method. The principle of the contour recognition method is based on the structural characteristics of the colon itself, such as using the inherent colon ring shape of the colon to calculate the direction of its curvature, and finally determine the direction of the center of the lumen. However, this texture analysis-based navigation method has the same disadvantages as the dark area extraction method, that is, the robustness is poor or even completely ineffective when the image is occluded or blurred. In addition, when the endoscope is too close to the intestinal wall, the light angle received by the endoscope head is too narrow, and the intestinal muscle lines and dark areas may even be confused.

(3) Three-dimensional reconstruction method. The principle of the three-dimensional reconstruction method is to obtain information such as brightness, contour, and feature points from the image, and finally estimate the approximate depth information, and use the deepest point as the direction of the lens movement. However, this 3D reconstruction method mostly uses 2D image shadows to obtain depth information for reconstruction, so it is sensitive to illumination, and the final navigation direction error is also large.

SUMMARY OF THE INVENTION

The purpose of the present invention is to overcome the above-mentioned defects of the prior art, and to provide a digestive endoscope navigation method and system to assist in solving the problem of cavity loss that often occurs in traditional operations.

According to a first aspect of the present invention, a digestive endoscope navigation method is provided. The method includes the following steps:

The twin neural network model is trained by using training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of digestive endoscope images and the motion pattern of the digestive endoscope, and each displacement vector is the relationship between the two consecutive frames of images. Feature points at the same location are connected;

The continuous video stream of digestive endoscope is acquired in real time, and two consecutive frames of images are input into the trained twin neural network model to identify the motion pattern of the digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the position coordinates of the next frame of the corresponding motion pattern, and then Output motion trajectory.

According to a first aspect of the present invention, a digestive endoscope navigation system is provided. The system includes:

Training module: used to train the twin neural network model using the training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of images of the digestive endoscope and the motion pattern of the digestive endoscope, and each displacement vector is Two consecutive frames of images are connected with feature points at the same position;

Prediction module: It is used to obtain the continuous video stream of digestive endoscope in real time, and input two consecutive frames of images into the trained twin neural network model to identify the movement pattern of digestive endoscope according to the distribution characteristics of displacement vectors, and calculate the next step of the corresponding movement pattern. Frame position coordinates, and then output the motion trajectory.

Compared with the prior art, the present invention has the advantage that "learning" based on the data set is no longer dependent on features such as dark areas or contours of a single frame picture, and has better global adaptability; In some cases, the neural network is used to learn displacement features based on pure images, so that the global error is more controllable, thereby providing accurate digestive endoscopic navigation and positioning.

Other features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

1 is a flowchart of a digestive endoscope navigation method according to an embodiment of the present invention;

2 is a schematic diagram of a process of a digestive endoscope navigation method according to an embodiment of the present invention;

3 is a schematic diagram of a displacement vector extraction algorithm according to an embodiment of the present invention;

4 is a schematic diagram of a displacement vector prediction process based on a twin neural network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process of calculating the position of the next frame when the digestive endoscope is in a forward posture according to an embodiment of the present invention.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.

In all examples shown and discussed herein, any specific values should be construed as illustrative only and not limiting. Accordingly, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.

The digestive endoscope navigation method provided by the present invention includes: fusing the key points (or feature points) of two frames of images, and using the displacement vector distribution of the fused images to carry out offline marking to obtain the true value of the training data; building a twin neural network for supervised Pre-training, the model weight is obtained after the training is completed; the trained model is used for testing, and the position coordinates of the next frame of the digestive endoscope lens are estimated according to the distribution of the output displacement vector, and then the complete motion trajectory is output to realize the navigation of the digestive endoscope.

Specifically, as shown in FIG. 1 and FIG. 2 , the provided digestive endoscope navigation method includes the following steps.

Step S110, extracting displacement vectors of two consecutive frames of images in the video stream to construct training data.

Due to the continuity of motion, most of the two consecutive images in the video stream are the same (the same point in the world coordinate system has different position coordinates in the two image coordinate systems), and the feature point (Feature Point) refers to the two frames of images. similarities in . The displacement of the lens can be obtained by finding the feature points of the two images, and the connection of the feature points of the two images about the same position constitutes the displacement vector.

In one embodiment, the displacement vector extraction includes: first, using SURF (Speeded up Robust Features) feature point matching algorithm to extract feature points of two frames of images; The images of 50% transparency are also superimposed to obtain a fused image; the feature points of the two images are connected on the fused image to obtain multiple displacement vectors, and the different distributions of the displacement vectors represent different motion modes of the lens. After analyzing the video stream, it is found that the movement of the digestive endoscope lens in the intestine is classified into three movement modes, such as forward posture, backward posture, and movement in the image plane. The movement in the plane can be further subdivided into rotation and translation. . As shown in Figure 3, Figure 3 (a) is the basic process of the displacement vector extraction algorithm, and finally a fusion image with displacement vector is obtained; Figure 3 (b) is a schematic diagram of three types of motion modes, which are divided into forward, There are three basic modes of retreat and in-plane motion; Figure 3(c) is an enlarged schematic diagram of the forward attitude.

In another embodiment, the displacement vector can be extracted by the optical flow method. The optical flow is the instantaneous speed of the pixel motion of the spatially moving object on the observation imaging plane. The optical flow method describes the time domain of the adjacent frame pixels. To obtain the correlation between adjacent frames, the optical flow field is the projection of the displacement of the moving object in the three-dimensional space on the two-dimensional image plane. Therefore, the local optical flow method can also be used to extract the displacement vector, and then follow-up work such as offline labeling of the data set is performed.

For example, taking the colonoscopy video stream as an example, the training data set includes 6302 clear colonoscopy images obtained from the colonoscopy video stream, including the cases where the center of the lumen is visible and invisible, of which 5041 images (accounting for the total data set 80%) as training samples, and the remaining 1261 images as test samples.

In this step, by off-line labeling of the dataset, the correspondence between the distribution of the displacement vectors and the motion patterns can be obtained.

Step S120, using the training data to train the Siamese neural network model with the goal of minimizing the set loss function.

The supervised learning of the deep neural network in the deep learning method requires the ground truth of the given data sample. For example, to identify a cat from a picture, the network model needs to be trained with a large number of positive and negative samples in order to correctly simulate it. A complex mapping function with weighting coefficients from the training data to the final target. Such a training method needs to know in advance whether the corresponding picture is a cat or a dog, and such a label is the true value of the sample. The task of identifying a cat or dog from a picture is called a classification task, and the task of implicitly predicting the output of a value is called a regression task. The difference between the predicted output of the network and the true label is measured using a loss function.

In the embodiment of the present invention, the twin neural network is preferably used for learning and training. Referring to FIG. 4 , two consecutive images of the video stream are sequentially input in time series, and GoogleNet is used as the backbone network to extract image features respectively, and the classification module performs feature fusion and then Three motion modes are predicted, and the regression module directly calculates the angle and length of the feature. That is, the deep Siamese neural network constructed in this embodiment is divided into a classification module and a regression module, the classification module is responsible for the category output of the lens motion pattern, and the regression module predicts the distribution of the displacement vector in step S110. For example, the similarity of the distribution patterns of the displacement vectors is measured by the following three indicators: the coordinates of the feature points of the two frames of images, the length of the displacement vectors, and the angle θ of the displacement vectors.

Therefore, the displacement vector extraction algorithm in the above step S110 actually serves for offline labeling of the dataset required for network input. Since the continuous lens motion displacement needs to be estimated in the deep learning task of the present invention, the network needs to input the above-mentioned two frames of images at the same time, and the twin network is two neural networks with the same weight.

Since both classification and regression tasks are involved, a multi-task loss function is designed for the Siamese neural network, which is expressed as:

Among them, λ _coord , λ _angle , λ _length , λ _class are the weights of the corresponding items, which can be set as needed, x _ij , y _ij are the coordinate values of the feature points,

and

is the estimated feature point coordinate value, i represents the feature point index, j represents the image index of two consecutive frames,

represents the true value of the angle of the displacement vector,

represents the angle estimate of the displacement vector,

represents the true value of the length of the displacement vector,

represents the estimated length of the displacement vector, p _i (c) represents the true value of the category (ie, the corresponding motion mode category),

represents a category estimate.

is the coordinate loss of feature points, for example, λ _coord takes a small value of 0.1.

is the angular loss of the displacement vector,

is the length loss of the displacement vector, for example, λ _angle and λ _length are both 0.5.

is the classification loss, for example, the commonly used cross entropy loss is used, C represents the number of categories, and λ _class , for example, takes a larger weight value of 0.5.

In addition, in order to better describe the feature points output by the network, the distribution of the calculated angle and the true value, and the distribution of the length and the true value, the influence of the order of the angle and length vectors needs to be discarded when comparing. Therefore, the angle and length loss parts in the composite loss function no longer use a simple square loss, but use the Wasserstein distance to measure whether the two distributions are close or not, expressed as:

Wasserstein distance is used to measure the similarity of two distributions P ₁ and P _2. Compared with JS divergence and KL divergence, it has more advantages. Even if the two distributions do not overlap or overlap very little, they can still reflect the distance of the distribution.

For example, in the training phase, two consecutive colonoscopy images are input, and the predicted colonoscopy motion pattern category, the angle feature vector of size 16*1, the length feature vector of size 16*1, etc. are output. It should be noted that the present invention does not limit the specific structure of the twin neural network model, and the number of layers, the dimensions of input and output, etc. can be set as required.

The invention adopts the twin neural network model, takes two consecutive frames of images as input, and outputs the representation embedded in the high-dimensional space to compare the similarity of the displacement vector distribution patterns. By using the twin neural network, the representation of different labels can be maximized, and the minimum The representation of the same label is quantized, that is, in this way, the distances of the learned similar features are close, and the distances of different features are separated.

Step S130, calculating the position coordinates of the next frame.

The motion posture of the lens can be determined according to the distribution pattern of the displacement vector output by the twin neural network. Taking the forward posture as an example, the coordinates of the forward center need to be estimated. The forward center is the projection of the forward position coordinates of the next frame of the lens on the image coordinate system. The calculation method is to take the intersection of the inverse extension lines of each displacement vector. Further, after obtaining the forward center, the position coordinates of the next frame can be calculated according to the geometric relationship, as shown in FIG. 5 .

Specifically, assuming that the current position (x ₁ , y ₁ , z ₁ ) is known, and the forward center (x ₂ , y ₂ , z ₂ ) can be obtained by taking the reverse extension of the displacement vector according to the vector distribution, the forward distance l Taking the mean value of the displacement vectors, the calculation formula of the position coordinates (x ₃ , y ₃ , z ₃ ) of the next frame is expressed as:

Among them, the function d(p _i1 , p _i2 ) is used to calculate the distance of the feature matching point pair (ie the displacement vector), and p ₁ and p ₂ respectively represent the feature points extracted from two consecutive frames of images.

The corresponding descriptions of the retraction posture and the motion posture in the plane are as follows:

For the backward posture, when the lens is withdrawn, it is just opposite to the forward movement, the length of the displacement vector is quite different from the angle, and the direction pointed by the arrow of the displacement vector can be gathered into a backward center.

For the movement posture in the plane, it can be divided into rotation and translation. At this time, the length of the displacement vector is not much different, and the vector connection cannot be gathered into the center. Specifically, it can be subdivided into the following three situations:

1) When it is only translational motion, the position coordinates of the next frame are directly calculated by translation in the image plane, the length l of the translation is the mean value of the modulo length of the displacement vector, and the angle θ is the angle formed by the displacement vector and the x-axis. mean;

2) When there is only rotational motion, it is simplified to only consider the calculation of the rotation angle in the image plane, the rotation angle takes the maximum span angle |θ _max -θ _min |, and θ _max is the maximum angle formed by the displacement vector and the x-axis. value, θ _min is the minimum value of the angle formed by the displacement vector and the x-axis. The difference between the two is the range of the rotation angle of the lens when performing the rotation movement. The rotation center is the current position of the lens, and the result is the direction of the lens. Change;

In a further embodiment, an optional alternative to the maximum span angle for in-plane rotational motion is to average the angles

or the median as the rotation angle.

3) In the actual situation, the lens usually has both translation and rotation in the image plane. In this case, it can be considered to calculate the displacement length and angle of the translation motion first, and then calculate the change of the lens orientation caused by the rotation motion.

It should be noted that the present invention directly estimates the translation and rotation of the attitude transformation through the distribution of the displacement vector, and the translation and rotation can also be expressed by a 4×4 attitude transformation matrix T, and its elements can be obtained through neural network training. . The specific alternative is to set the weight of a certain layer in the network structure as the parameter of this pose matrix, and the weight can be updated with the back-propagation of the network during the training process, without the need to display the true value of the given matrix T to guide the training.

In step S140, the complete motion trajectory of the lens is acquired, which is used for the navigation of the digestive endoscope.

The complete motion trajectory of the lens can be obtained by concatenating the position coordinate points of the next frame obtained in step S130 in series, which can be compared with the actual displacement trajectory in the verification stage to verify the feasibility of the present invention. Based on the connection direction between the position coordinate point of the next frame and the current position, it assists the doctor to perform surgery, or provides visual navigation for the movement of the colonoscopy robot.

Correspondingly, the present invention also provides a digestive endoscope navigation system for implementing one or more aspects of the above method. For example, the system includes: a training module, which is used to train a twin neural network model by using training data, wherein the training data reflects the correspondence between the displacement vector distribution characteristics of two consecutive frames of digestive endoscope images and the movement pattern of digestive endoscope , each displacement vector is connected to the feature points of two consecutive frames of images about the same position; the prediction module, which is used to obtain the continuous video stream of digestive endoscopy in real time, inputs the two consecutive frames of images into the trained twin neural network model, to obtain according to Displacement vector distribution features identify the motion pattern of digestive endoscope, calculate the position coordinates of the next frame of the corresponding motion pattern, and then output the motion trajectory.

To sum up, the present invention is based on the displacement vector extraction of the fusion of the key points of the two frames of images, combined with the twin neural network to predict the displacement vector, and then outputs the position coordinates of the next frame of the digestive endoscope, which provides a vision for digestive endoscopy, especially colonoscopy. The navigation method can accurately identify the posture and movement direction of the digestive endoscope.

In order to further verify the effect of the present invention, experiments were carried out. Two sets of bronchial model data with magnetic localization are used, including 1441 and 3333 internal moving images of the bronchial model and their corresponding 6-DOF rotation angles and camera space pose coordinates.

It has been verified that the output path of the deep convolutional neural network is consistent with the actual magnetic positioning space path to a certain extent, but with the increase of the path, the later errors are gradually accumulated, and outliers with large errors will appear. Among them, the minimum error of the first set of data sets is 0.06144mm, and the maximum error is 4.5234mm. In the data correction of the second set of data sets, due to the influence of factors such as illumination, the prediction results fluctuate greatly, the minimum error is 0.0869mm, and the maximum error is 6.9547mm. But the error is still within the controllable range. Experiments show that, compared with the prior art, the present invention improves the navigation and positioning accuracy.

To sum up, the present invention proposes a displacement vector extraction algorithm based on the fusion of key points of two frames of images, and at the same time builds a twin neural network model based on the algorithm to estimate the current motion mode of the digestive endoscope and give the position coordinates of the next frame, without Then rely on the local image to extract the contour of the dark area, so that the algorithm has better adaptability. Based on the current improvement of big data and computer computing power, the lens motion pattern is learned from the video stream of colonoscopy surgery correctly operated by the doctor. In the early stage, image processing technology was used to remove the specular effect caused by the specular reflection of the intestinal wall caused by flushing during the operation. In the middle stage, the features of two frames of images were extracted and fused to obtain the displacement vector. The displacement vector of the neural network model was trained offline, and finally a set of digestive endoscopic navigation methods with better adaptability based on the deep learning method was formed.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.

A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above. Computer-readable storage media, as used herein, are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

The computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect). In some embodiments, custom electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), can be personalized by utilizing state information of computer readable program instructions. Computer readable program instructions are executed to implement various aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation in hardware, implementation in software, and implementation in a combination of software and hardware are all equivalent.

Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A digestive endoscope navigation method, comprising the following steps:

The twin neural network model is trained by using training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of digestive endoscope images and the motion pattern of the digestive endoscope, and each displacement vector is the relationship between the two consecutive frames of images. Feature points at the same location are connected;

The continuous video stream of digestive endoscope is acquired in real time, and two consecutive frames of images are input into the trained twin neural network model to identify the motion pattern of the digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the position coordinates of the next frame of the corresponding motion pattern, and then Output motion trajectory.
The method according to claim 1, wherein the displacement vector distribution characteristics of two consecutive frames of images are obtained according to the following steps:

The feature point matching algorithm is used to extract the feature points of two consecutive frames of images to represent the similar points in the two consecutive frames of images;

The previous frame image takes the transparency of the first set ratio, and the image of the current frame takes the second set ratio transparency and superimposes to obtain a fusion image;

The feature points of two consecutive frames of images are connected on the fusion image to obtain multiple displacement vectors.
The method according to claim 1, wherein the distribution features of the displacement vectors include the coordinates of the feature points of two consecutive frames of images, the length of the displacement vectors and the angle of the displacement vectors, and the motion modes are classified as forward, backward and in the image plane Inner movement mode.
The method according to claim 1, wherein, in the training process, the loss function of the Siamese neural network model is expressed as:

where λ coord is the coordinate loss weight, λ angle is the angle loss weight of the displacement vector, λ length is the length loss weight of the displacement vector, λ class is the motion mode category loss weight, x ij and y ij are the coordinate values of the feature points,
and
is the estimated feature point coordinate value, i represents the feature point index, j represents the image index of two consecutive frames,
represents the true value of the angle of the displacement vector,
represents the angle estimate of the displacement vector,
represents the true value of the length of the displacement vector,
represents the length estimate of the displacement vector, p i (c) represents the true value of the motion mode category,
represents the motion mode category estimate, and C is the number of motion mode categories.
5. The method of claim 4, wherein the angular loss of the displacement vector and the length loss of the displacement vector are measured using Wasserstein distance, which measures the similarity of two distributions.
The method according to claim 3, wherein calculating the position coordinates of the next frame of the corresponding motion mode comprises:

For the forward motion mode, take the intersection of the reverse extension lines of each displacement vector as the coordinates of the forward center, and calculate the position coordinates of the next frame according to the geometric relationship;

For the backward motion mode, gather into a backward center in the direction pointed by the displacement vector;

For in-plane motion mode, it is calculated according to the following steps:

When it is only a translation movement, the position coordinates of the next frame are directly calculated by translation in the image plane, the length l of the translation is the mean value of the modulo length of the displacement vector, and the angle θ is the mean value of the angle formed by the displacement vector and the x-axis;

When there is only rotational motion, the rotation angle takes the maximum span angle |θ max -θ min |, θ max is the maximum value of the angle formed by the displacement vector and the x-axis, and θ min is the angle formed by the displacement vector and the x-axis. The minimum value, the rotation center is the current position of the digestive endoscope lens;

For the case where the lens has both translation and rotation movement in the image plane, first calculate the displacement length and angle of the translation movement, and then calculate the change of the lens orientation caused by the rotation movement.
The method according to claim 6, wherein, for the forward motion mode, the position coordinates (x 3 , y 3 , z 3 ) of the next frame are expressed as:

Among them, the function d(p i1 , p i2 ) is used to calculate the distance of the displacement vector, p 1 and p 2 respectively represent the feature points extracted from two consecutive frames of images, (x 1 , y 1 , z 1 ) is the known current Position coordinates, (x 2 , y 2 , z 2 ) are the coordinates of the advancing center, l is the advancing distance, take the mean of the displacement vectors, and n represent the number of feature points.
A digestive endoscope navigation system, comprising:

Training module: used to train the twin neural network model by using training data, wherein the training data reflects the correspondence between the displacement vector distribution characteristics of two consecutive frames of images of digestive endoscope and the motion pattern of digestive endoscope, and each displacement vector is Two consecutive frames of images are connected with feature points at the same position;

Prediction module: used to obtain the continuous video stream of digestive endoscope in real time, and input two consecutive frames of images into the trained twin neural network model to identify the movement pattern of digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the next step of the corresponding movement pattern. Frame position coordinates, and then output the motion trajectory.
A computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
A computer device, comprising a memory and a processor, a computer program that can be run on the processor is stored in the memory, and characterized in that, when the processor executes the program, any one of claims 1 to 7 is implemented The steps of the method described in item.