WO2022170562A1 - Digestive endoscope navigation method and system - Google Patents

Digestive endoscope navigation method and system Download PDF

Info

Publication number
WO2022170562A1
WO2022170562A1 PCT/CN2021/076523 CN2021076523W WO2022170562A1 WO 2022170562 A1 WO2022170562 A1 WO 2022170562A1 CN 2021076523 W CN2021076523 W CN 2021076523W WO 2022170562 A1 WO2022170562 A1 WO 2022170562A1
Authority
WO
WIPO (PCT)
Prior art keywords
displacement vector
images
digestive endoscope
consecutive frames
displacement
Prior art date
Application number
PCT/CN2021/076523
Other languages
French (fr)
Chinese (zh)
Inventor
熊璟
谭敏
夏泽洋
谢高生
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2021/076523 priority Critical patent/WO2022170562A1/en
Publication of WO2022170562A1 publication Critical patent/WO2022170562A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B23/00Models for scientific, medical, or mathematical purposes, e.g. full-sized devices for demonstration purposes
    • G09B23/28Models for scientific, medical, or mathematical purposes, e.g. full-sized devices for demonstration purposes for medicine

Definitions

  • the present invention relates to the technical field of medical image processing, and more particularly, to a digestive endoscope navigation method and system.
  • Colonoscopy is one of the important methods for diagnosing malignant tumors in anorectal surgery.
  • the doctor controls the colonoscope to be inserted from the patient's anus.
  • the examination is divided into two stages: forward and backward.
  • the doctor looks for the cavity advancement lens based on clinical experience and colonoscopy images. , until the tail of the cecum is reached, and then the regression phase is performed to observe whether there are polyps or other lesions in the intestinal tract.
  • doctors In this traditional colonoscopy operation, doctors only rely on endoscopic imaging and their own experience to find the advancing lens in the center of the lumen.
  • colonoscopy robots such as microcapsule endoscopes
  • solutions for endoscopic navigation mainly include:
  • the principle of the contour recognition method is based on the structural characteristics of the colon itself, such as using the inherent colon ring shape of the colon to calculate the direction of its curvature, and finally determine the direction of the center of the lumen.
  • this texture analysis-based navigation method has the same disadvantages as the dark area extraction method, that is, the robustness is poor or even completely ineffective when the image is occluded or blurred.
  • the endoscope is too close to the intestinal wall, the light angle received by the endoscope head is too narrow, and the intestinal muscle lines and dark areas may even be confused.
  • Three-dimensional reconstruction method The principle of the three-dimensional reconstruction method is to obtain information such as brightness, contour, and feature points from the image, and finally estimate the approximate depth information, and use the deepest point as the direction of the lens movement.
  • this 3D reconstruction method mostly uses 2D image shadows to obtain depth information for reconstruction, so it is sensitive to illumination, and the final navigation direction error is also large.
  • the purpose of the present invention is to overcome the above-mentioned defects of the prior art, and to provide a digestive endoscope navigation method and system to assist in solving the problem of cavity loss that often occurs in traditional operations.
  • a digestive endoscope navigation method includes the following steps:
  • the twin neural network model is trained by using training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of digestive endoscope images and the motion pattern of the digestive endoscope, and each displacement vector is the relationship between the two consecutive frames of images. Feature points at the same location are connected;
  • the continuous video stream of digestive endoscope is acquired in real time, and two consecutive frames of images are input into the trained twin neural network model to identify the motion pattern of the digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the position coordinates of the next frame of the corresponding motion pattern, and then Output motion trajectory.
  • a digestive endoscope navigation system includes:
  • Training module used to train the twin neural network model using the training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of images of the digestive endoscope and the motion pattern of the digestive endoscope, and each displacement vector is Two consecutive frames of images are connected with feature points at the same position;
  • Prediction module It is used to obtain the continuous video stream of digestive endoscope in real time, and input two consecutive frames of images into the trained twin neural network model to identify the movement pattern of digestive endoscope according to the distribution characteristics of displacement vectors, and calculate the next step of the corresponding movement pattern. Frame position coordinates, and then output the motion trajectory.
  • the present invention has the advantage that "learning" based on the data set is no longer dependent on features such as dark areas or contours of a single frame picture, and has better global adaptability;
  • the neural network is used to learn displacement features based on pure images, so that the global error is more controllable, thereby providing accurate digestive endoscopic navigation and positioning.
  • FIG. 1 is a flowchart of a digestive endoscope navigation method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a process of a digestive endoscope navigation method according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a displacement vector extraction algorithm according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a displacement vector prediction process based on a twin neural network according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a process of calculating the position of the next frame when the digestive endoscope is in a forward posture according to an embodiment of the present invention.
  • the digestive endoscope navigation method provided by the present invention includes: fusing the key points (or feature points) of two frames of images, and using the displacement vector distribution of the fused images to carry out offline marking to obtain the true value of the training data; building a twin neural network for supervised Pre-training, the model weight is obtained after the training is completed; the trained model is used for testing, and the position coordinates of the next frame of the digestive endoscope lens are estimated according to the distribution of the output displacement vector, and then the complete motion trajectory is output to realize the navigation of the digestive endoscope.
  • the provided digestive endoscope navigation method includes the following steps.
  • Step S110 extracting displacement vectors of two consecutive frames of images in the video stream to construct training data.
  • the feature point refers to the two frames of images. similarities in .
  • the displacement of the lens can be obtained by finding the feature points of the two images, and the connection of the feature points of the two images about the same position constitutes the displacement vector.
  • the displacement vector extraction includes: first, using SURF (Speeded up Robust Features) feature point matching algorithm to extract feature points of two frames of images; The images of 50% transparency are also superimposed to obtain a fused image; the feature points of the two images are connected on the fused image to obtain multiple displacement vectors, and the different distributions of the displacement vectors represent different motion modes of the lens.
  • SURF Speeded up Robust Features
  • the movement of the digestive endoscope lens in the intestine is classified into three movement modes, such as forward posture, backward posture, and movement in the image plane.
  • the movement in the plane can be further subdivided into rotation and translation. .
  • Figure 3 (a) is the basic process of the displacement vector extraction algorithm, and finally a fusion image with displacement vector is obtained;
  • Figure 3 (b) is a schematic diagram of three types of motion modes, which are divided into forward, There are three basic modes of retreat and in-plane motion;
  • Figure 3(c) is an enlarged schematic diagram of the forward attitude.
  • the displacement vector can be extracted by the optical flow method.
  • the optical flow is the instantaneous speed of the pixel motion of the spatially moving object on the observation imaging plane.
  • the optical flow method describes the time domain of the adjacent frame pixels.
  • the optical flow field is the projection of the displacement of the moving object in the three-dimensional space on the two-dimensional image plane. Therefore, the local optical flow method can also be used to extract the displacement vector, and then follow-up work such as offline labeling of the data set is performed.
  • the training data set includes 6302 clear colonoscopy images obtained from the colonoscopy video stream, including the cases where the center of the lumen is visible and invisible, of which 5041 images (accounting for the total data set 80%) as training samples, and the remaining 1261 images as test samples.
  • Step S120 using the training data to train the Siamese neural network model with the goal of minimizing the set loss function.
  • the supervised learning of the deep neural network in the deep learning method requires the ground truth of the given data sample. For example, to identify a cat from a picture, the network model needs to be trained with a large number of positive and negative samples in order to correctly simulate it. A complex mapping function with weighting coefficients from the training data to the final target. Such a training method needs to know in advance whether the corresponding picture is a cat or a dog, and such a label is the true value of the sample.
  • the task of identifying a cat or dog from a picture is called a classification task, and the task of implicitly predicting the output of a value is called a regression task.
  • the difference between the predicted output of the network and the true label is measured using a loss function.
  • the twin neural network is preferably used for learning and training.
  • two consecutive images of the video stream are sequentially input in time series, and GoogleNet is used as the backbone network to extract image features respectively, and the classification module performs feature fusion and then Three motion modes are predicted, and the regression module directly calculates the angle and length of the feature.
  • the deep Siamese neural network constructed in this embodiment is divided into a classification module and a regression module, the classification module is responsible for the category output of the lens motion pattern, and the regression module predicts the distribution of the displacement vector in step S110.
  • the similarity of the distribution patterns of the displacement vectors is measured by the following three indicators: the coordinates of the feature points of the two frames of images, the length of the displacement vectors, and the angle ⁇ of the displacement vectors.
  • the displacement vector extraction algorithm in the above step S110 actually serves for offline labeling of the dataset required for network input. Since the continuous lens motion displacement needs to be estimated in the deep learning task of the present invention, the network needs to input the above-mentioned two frames of images at the same time, and the twin network is two neural networks with the same weight.
  • ⁇ coord , ⁇ angle , ⁇ length , ⁇ class are the weights of the corresponding items, which can be set as needed
  • x ij , y ij are the coordinate values of the feature points, and is the estimated feature point coordinate value
  • i represents the feature point index
  • j represents the image index of two consecutive frames
  • p i (c) represents the true value of the category (ie, the corresponding motion mode category), represents a category estimate.
  • ⁇ coord takes a small value of 0.1.
  • ⁇ angle and ⁇ length are both 0.5.
  • classification loss for example, the commonly used cross entropy loss is used
  • C represents the number of categories
  • ⁇ class for example, takes a larger weight value of 0.5.
  • the angle and length loss parts in the composite loss function no longer use a simple square loss, but use the Wasserstein distance to measure whether the two distributions are close or not, expressed as:
  • Wasserstein distance is used to measure the similarity of two distributions P 1 and P 2. Compared with JS divergence and KL divergence, it has more advantages. Even if the two distributions do not overlap or overlap very little, they can still reflect the distance of the distribution.
  • the present invention does not limit the specific structure of the twin neural network model, and the number of layers, the dimensions of input and output, etc. can be set as required.
  • the invention adopts the twin neural network model, takes two consecutive frames of images as input, and outputs the representation embedded in the high-dimensional space to compare the similarity of the displacement vector distribution patterns.
  • the representation of different labels can be maximized, and the minimum The representation of the same label is quantized, that is, in this way, the distances of the learned similar features are close, and the distances of different features are separated.
  • Step S130 calculating the position coordinates of the next frame.
  • the motion posture of the lens can be determined according to the distribution pattern of the displacement vector output by the twin neural network. Taking the forward posture as an example, the coordinates of the forward center need to be estimated.
  • the forward center is the projection of the forward position coordinates of the next frame of the lens on the image coordinate system.
  • the calculation method is to take the intersection of the inverse extension lines of each displacement vector. Further, after obtaining the forward center, the position coordinates of the next frame can be calculated according to the geometric relationship, as shown in FIG. 5 .
  • the forward center (x 2 , y 2 , z 2 ) can be obtained by taking the reverse extension of the displacement vector according to the vector distribution, the forward distance l Taking the mean value of the displacement vectors, the calculation formula of the position coordinates (x 3 , y 3 , z 3 ) of the next frame is expressed as:
  • the function d(p i1 , p i2 ) is used to calculate the distance of the feature matching point pair (ie the displacement vector), and p 1 and p 2 respectively represent the feature points extracted from two consecutive frames of images.
  • the length of the displacement vector is quite different from the angle, and the direction pointed by the arrow of the displacement vector can be gathered into a backward center.
  • the movement posture in the plane it can be divided into rotation and translation. At this time, the length of the displacement vector is not much different, and the vector connection cannot be gathered into the center. Specifically, it can be subdivided into the following three situations:
  • the rotation angle takes the maximum span angle
  • the difference between the two is the range of the rotation angle of the lens when performing the rotation movement.
  • the rotation center is the current position of the lens, and the result is the direction of the lens. Change;
  • an optional alternative to the maximum span angle for in-plane rotational motion is to average the angles or the median as the rotation angle.
  • the lens usually has both translation and rotation in the image plane. In this case, it can be considered to calculate the displacement length and angle of the translation motion first, and then calculate the change of the lens orientation caused by the rotation motion.
  • the present invention directly estimates the translation and rotation of the attitude transformation through the distribution of the displacement vector, and the translation and rotation can also be expressed by a 4 ⁇ 4 attitude transformation matrix T, and its elements can be obtained through neural network training.
  • the specific alternative is to set the weight of a certain layer in the network structure as the parameter of this pose matrix, and the weight can be updated with the back-propagation of the network during the training process, without the need to display the true value of the given matrix T to guide the training.
  • step S140 the complete motion trajectory of the lens is acquired, which is used for the navigation of the digestive endoscope.
  • the complete motion trajectory of the lens can be obtained by concatenating the position coordinate points of the next frame obtained in step S130 in series, which can be compared with the actual displacement trajectory in the verification stage to verify the feasibility of the present invention. Based on the connection direction between the position coordinate point of the next frame and the current position, it assists the doctor to perform surgery, or provides visual navigation for the movement of the colonoscopy robot.
  • the present invention also provides a digestive endoscope navigation system for implementing one or more aspects of the above method.
  • the system includes: a training module, which is used to train a twin neural network model by using training data, wherein the training data reflects the correspondence between the displacement vector distribution characteristics of two consecutive frames of digestive endoscope images and the movement pattern of digestive endoscope , each displacement vector is connected to the feature points of two consecutive frames of images about the same position; the prediction module, which is used to obtain the continuous video stream of digestive endoscopy in real time, inputs the two consecutive frames of images into the trained twin neural network model, to obtain according to Displacement vector distribution features identify the motion pattern of digestive endoscope, calculate the position coordinates of the next frame of the corresponding motion pattern, and then output the motion trajectory.
  • the present invention is based on the displacement vector extraction of the fusion of the key points of the two frames of images, combined with the twin neural network to predict the displacement vector, and then outputs the position coordinates of the next frame of the digestive endoscope, which provides a vision for digestive endoscopy, especially colonoscopy.
  • the navigation method can accurately identify the posture and movement direction of the digestive endoscope.
  • Two sets of bronchial model data with magnetic localization are used, including 1441 and 3333 internal moving images of the bronchial model and their corresponding 6-DOF rotation angles and camera space pose coordinates.
  • the output path of the deep convolutional neural network is consistent with the actual magnetic positioning space path to a certain extent, but with the increase of the path, the later errors are gradually accumulated, and outliers with large errors will appear.
  • the minimum error of the first set of data sets is 0.06144mm
  • the maximum error is 4.5234mm.
  • the prediction results fluctuate greatly, the minimum error is 0.0869mm, and the maximum error is 6.9547mm. But the error is still within the controllable range.
  • the present invention proposes a displacement vector extraction algorithm based on the fusion of key points of two frames of images, and at the same time builds a twin neural network model based on the algorithm to estimate the current motion mode of the digestive endoscope and give the position coordinates of the next frame, without Then rely on the local image to extract the contour of the dark area, so that the algorithm has better adaptability.
  • the lens motion pattern is learned from the video stream of colonoscopy surgery correctly operated by the doctor.
  • image processing technology was used to remove the specular effect caused by the specular reflection of the intestinal wall caused by flushing during the operation.
  • the features of two frames of images were extracted and fused to obtain the displacement vector.
  • the displacement vector of the neural network model was trained offline, and finally a set of digestive endoscopic navigation methods with better adaptability based on the deep learning method was formed.
  • the present invention may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disk read only memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanically coded devices such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
  • the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • the computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • Source or object code written in any combination including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect).
  • LAN local area network
  • WAN wide area network
  • custom electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs)
  • FPGAs field programmable gate arrays
  • PDAs programmable logic arrays
  • Computer readable program instructions are executed to implement various aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation in hardware, implementation in software, and implementation in a combination of software and hardware are all equivalent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Surgery (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Algebra (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Robotics (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Molecular Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Endoscopes (AREA)

Abstract

Disclosed are a digestive endoscope navigation method and system. The method comprises: fusing key points of two frames of images, and performing offline marking by using a displacement vector distribution of a fused image, so as to obtain a true value of training data; constructing a twin neural network to perform supervised pre-training; and estimating position coordinates of the next frame of a digestive endoscopy lens by using a displacement vector distribution output by the trained model, and then outputting a complete movement trajectory, so as to realize the navigation of a digestive endoscope. By means of the method, the pose and the movement direction of a digestive endoscope itself can be accurately identified, such that a movement trajectory thereof is grasped on the whole, and the adaptability is strong.

Description

一种消化内镜导航方法和系统A digestive endoscope navigation method and system 技术领域technical field
本发明涉及医学图像处理技术领域,更具体地,涉及一种消化内镜导航方法和系统。The present invention relates to the technical field of medical image processing, and more particularly, to a digestive endoscope navigation method and system.
背景技术Background technique
结肠镜检查是肛肠外科诊断恶性肿瘤的重要手段之一,医生控制结肠镜从病人肛门插入,检查分为前进与后退两个阶段,前一阶段医生凭临床经验结合肠镜图像寻找腔道推进镜头,直到抵达盲肠尾部后,再进行回退阶段观察肠道是否有息肉或其他病变。在这种传统的结肠镜手术中,医生仅凭内窥镜成像与自身经验寻找腔道中心推进镜头,由于视野的局限性、肠道自身运动、肠道内溶物以及水雾的影响,很难正确判断镜头的移动方向,最终导致腔道丢失且难以回到原来的位置继续推进,给病人带来心理与生理上的痛苦,严重时还会造成穿孔、出血等诸多问题。Colonoscopy is one of the important methods for diagnosing malignant tumors in anorectal surgery. The doctor controls the colonoscope to be inserted from the patient's anus. The examination is divided into two stages: forward and backward. In the previous stage, the doctor looks for the cavity advancement lens based on clinical experience and colonoscopy images. , until the tail of the cecum is reached, and then the regression phase is performed to observe whether there are polyps or other lesions in the intestinal tract. In this traditional colonoscopy operation, doctors only rely on endoscopic imaging and their own experience to find the advancing lens in the center of the lumen. Due to the limitations of the field of view, the movement of the intestine itself, the influence of intestinal solubles and water mist, it is difficult to Correctly judging the moving direction of the lens will eventually lead to the loss of the cavity and it is difficult to return to the original position to continue advancing, which will bring psychological and physical pain to the patient, and even cause perforation, bleeding and many other problems in severe cases.
目前,能够自主运动的肠镜机器人(如微型胶囊内窥镜)在一定程度上能消除病人心理上的不适,是一种较为理想的解决方案。无论是辅助医生在手术时进行导航还是肠镜机器人自主导航都需要识别并引导内镜的运动。在现有技术中,用于内镜导航的方案主要有:At present, colonoscopy robots (such as microcapsule endoscopes) that can move autonomously can eliminate the psychological discomfort of patients to a certain extent, and are an ideal solution. Whether it is to assist doctors in navigating during surgery or autonomously navigate colonoscopy robots, it is necessary to recognize and guide the movement of the endoscope. In the prior art, the solutions for endoscopic navigation mainly include:
(1)暗区提取法。由于内窥镜在封闭肠腔内前进,光照由远及近,因此暗区是医生判断前进方向最重要也是最显著的指标。该方法基于单帧图像提取轮廓、暗区等特征从而找到腔道的近似中心,以中心点的坐标作为导航的依据,引导医生或机器人往该坐标点方向前进。然而,这种以暗区作为特征进行导航的方法具有很大的局限性,因为该方法仅在腔道清晰可见的情况下适用,且此类方法多数是通过传统的阈值分割、边缘提取等图像处理方法,依赖于单帧图像,不具有全局适应性。(1) Dark area extraction method. Since the endoscope advances in the closed intestinal cavity, the light is from far to near, so the dark area is the most important and significant indicator for doctors to judge the direction of advancement. The method extracts features such as contour and dark area based on a single frame image to find the approximate center of the cavity, and uses the coordinates of the center point as the basis for navigation to guide the doctor or robot to move toward the coordinate point. However, this method of navigating with dark areas as features has great limitations, because this method is only applicable when the cavity is clearly visible, and most of these methods use traditional image segmentation such as threshold segmentation and edge extraction. The processing method, which relies on a single frame of image, has no global adaptability.
(2)轮廓识别法。轮廓识别法原理基于结肠本身的结构特性,例如 利用结肠固有的结肠环形状,计算其曲率指向方向,最终确定腔道中心所指方向。然而这种基于纹理分析的导航方法与暗区提取法的劣势一致,即在图像出现遮挡、模糊等情况发生时鲁棒性差甚至完全不能起作用。此外,当内窥镜距离肠壁太近时,内窥镜头部接收到的光角度过窄,肠道肌肉线和暗区甚至会发生混淆的情况。(2) Contour recognition method. The principle of the contour recognition method is based on the structural characteristics of the colon itself, such as using the inherent colon ring shape of the colon to calculate the direction of its curvature, and finally determine the direction of the center of the lumen. However, this texture analysis-based navigation method has the same disadvantages as the dark area extraction method, that is, the robustness is poor or even completely ineffective when the image is occluded or blurred. In addition, when the endoscope is too close to the intestinal wall, the light angle received by the endoscope head is too narrow, and the intestinal muscle lines and dark areas may even be confused.
(3)三维重建法。三维重建法原理是从图像中获取亮度、轮廓、特征点等信息,最后估计出大致的深度信息,以最深点作为镜头运动的方向。然而,这种三维重建方法多借助二维图像阴影获取深度信息进行重建,因此对光照敏感,最终得出的导航方向误差也较大。(3) Three-dimensional reconstruction method. The principle of the three-dimensional reconstruction method is to obtain information such as brightness, contour, and feature points from the image, and finally estimate the approximate depth information, and use the deepest point as the direction of the lens movement. However, this 3D reconstruction method mostly uses 2D image shadows to obtain depth information for reconstruction, so it is sensitive to illumination, and the final navigation direction error is also large.
发明内容SUMMARY OF THE INVENTION
本发明的目的是克服上述现有技术的缺陷,提供一种消化内镜导航方法和系统,以辅助解决传统手术中常出现的腔道丢失问题。The purpose of the present invention is to overcome the above-mentioned defects of the prior art, and to provide a digestive endoscope navigation method and system to assist in solving the problem of cavity loss that often occurs in traditional operations.
根据本发明的第一方面,提供一种消化内镜导航方法。该方法包括以下步骤:According to a first aspect of the present invention, a digestive endoscope navigation method is provided. The method includes the following steps:
利用训练数据训练孪生神经网络模型,其中,所述训练数据反映消化内镜连续两帧图像的位移向量分布特征与消化内镜运动模式之间的对应关系,每个位移向量是连续两帧图像关于同一位置的特征点相连;The twin neural network model is trained by using training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of digestive endoscope images and the motion pattern of the digestive endoscope, and each displacement vector is the relationship between the two consecutive frames of images. Feature points at the same location are connected;
实时获取消化内镜连续视频流,将连续两帧图像输入到经训练的孪生神经网络模型,以根据位移向量分布特征识别消化内镜的运动模式,并计算相应运动模式下一帧位置坐标,进而输出运动轨迹。The continuous video stream of digestive endoscope is acquired in real time, and two consecutive frames of images are input into the trained twin neural network model to identify the motion pattern of the digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the position coordinates of the next frame of the corresponding motion pattern, and then Output motion trajectory.
根据本发明的第一方面,提供一种消化内镜导航系统。该系统包括:According to a first aspect of the present invention, a digestive endoscope navigation system is provided. The system includes:
训练模块:用于利用训练数据训练孪生神经网络模型,其中,所述训练数据反映消化内镜连续两帧图像的位移向量分布特征与消化内镜运动模式之间的对应关系,每个位移向量是连续两帧图像关于同一位置的特征点相连;Training module: used to train the twin neural network model using the training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of images of the digestive endoscope and the motion pattern of the digestive endoscope, and each displacement vector is Two consecutive frames of images are connected with feature points at the same position;
预测模块:用于实时获取消化内镜连续视频流,将连续两帧图像输入到经训练的孪生神经网络模型,以根据位移向量分布特征识别消化内镜的运动模式,并计算相应运动模式下一帧位置坐标,进而输出运动轨迹。Prediction module: It is used to obtain the continuous video stream of digestive endoscope in real time, and input two consecutive frames of images into the trained twin neural network model to identify the movement pattern of digestive endoscope according to the distribution characteristics of displacement vectors, and calculate the next step of the corresponding movement pattern. Frame position coordinates, and then output the motion trajectory.
与现有技术相比,本发明的优点在于,基于数据集进行“学习”,不再依赖单帧图片的暗区或轮廓等特征,具有更好的全局适应性;在没有精确磁定位数据的情况下,基于纯图像利用神经网络学习位移特征,使全局误差更可控,从而提供精确的消化内镜导航定位。Compared with the prior art, the present invention has the advantage that "learning" based on the data set is no longer dependent on features such as dark areas or contours of a single frame picture, and has better global adaptability; In some cases, the neural network is used to learn displacement features based on pure images, so that the global error is more controllable, thereby providing accurate digestive endoscopic navigation and positioning.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.
附图说明Description of drawings
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
图1是根据本发明一个实施例的消化内镜导航方法的流程图;1 is a flowchart of a digestive endoscope navigation method according to an embodiment of the present invention;
图2是根据本发明一个实施例的消化内镜导航方法的过程示意图;2 is a schematic diagram of a process of a digestive endoscope navigation method according to an embodiment of the present invention;
图3是根据本发明一个实施例的位移向量提取算法示意图;3 is a schematic diagram of a displacement vector extraction algorithm according to an embodiment of the present invention;
图4是根据本发明一个实施例的基于孪生神经网络的位移向量预测过程示意图;4 is a schematic diagram of a displacement vector prediction process based on a twin neural network according to an embodiment of the present invention;
图5是根据本发明一个实施例的消化内镜处于前进姿态时计算下一帧位置的过程示意图。FIG. 5 is a schematic diagram of a process of calculating the position of the next frame when the digestive endoscope is in a forward posture according to an embodiment of the present invention.
具体实施方式Detailed ways
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the invention unless specifically stated otherwise.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as illustrative only and not limiting. Accordingly, other instances of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.
本发明提供的消化内镜导航方法包括:融合两帧图像的关键点(或称特征点),利用融合图像的位移向量分布进行离线标记,得到训练数据的真值;搭建孪生神经网络进行有监督预训练,训练完成后得到模型权重;利用训练的模型进行测试,根据输出位移向量的分布估计消化内镜镜头下一帧的位置坐标,进而输出完整运动轨迹,以实现消化内镜导航。The digestive endoscope navigation method provided by the present invention includes: fusing the key points (or feature points) of two frames of images, and using the displacement vector distribution of the fused images to carry out offline marking to obtain the true value of the training data; building a twin neural network for supervised Pre-training, the model weight is obtained after the training is completed; the trained model is used for testing, and the position coordinates of the next frame of the digestive endoscope lens are estimated according to the distribution of the output displacement vector, and then the complete motion trajectory is output to realize the navigation of the digestive endoscope.
具体地,结合图1和图2所示,所提供的消化内镜导航方法包括以下步骤。Specifically, as shown in FIG. 1 and FIG. 2 , the provided digestive endoscope navigation method includes the following steps.
步骤S110,提取视频流中两帧连续图像的位移向量,构建训练数据。Step S110, extracting displacement vectors of two consecutive frames of images in the video stream to construct training data.
由于运动的连贯性,视频流中两帧连续图像大部分是相同的(世界坐标系中同一点在两帧图像坐标系中位置坐标不同),特征点(Feature Point)即指的是两帧图像中的相似点。通过找出两幅图像的特征点可得出镜头的位移,而两幅图像关于同一位置的特征点相连即构成了位移向量。Due to the continuity of motion, most of the two consecutive images in the video stream are the same (the same point in the world coordinate system has different position coordinates in the two image coordinate systems), and the feature point (Feature Point) refers to the two frames of images. similarities in . The displacement of the lens can be obtained by finding the feature points of the two images, and the connection of the feature points of the two images about the same position constitutes the displacement vector.
在一个实施例中,位移向量提取包括:首先利用SURF(Speeded up Robust Features,加速稳健特征)特征点匹配算法提取两帧图像的特征点;然后将前一帧图像取50%的透明度,当前帧的图像也取50%透明度进行叠加,得到一张融合图像;在融合图像上将两幅图像的特征点进行相连得到多个位移向量,而位移向量的不同分布表征着镜头不同的运动模式。经过对视频流的分析发现,将消化内镜镜头在肠道内的运动例如分类为前进姿态、后退姿态以及在图像平面内运动等三种运动模式,其中平面内运动可进一步细分为旋转和平移。如图3所示,其中图3(a)为位移向量提取算法的基本流程,最终得到一张带有位移向量的融合图像;图3(b)是3类运动模式的示意图,分为前进、后退、平面内运动3种基本模式;图3(c)为前进姿态的放大示意图。In one embodiment, the displacement vector extraction includes: first, using SURF (Speeded up Robust Features) feature point matching algorithm to extract feature points of two frames of images; The images of 50% transparency are also superimposed to obtain a fused image; the feature points of the two images are connected on the fused image to obtain multiple displacement vectors, and the different distributions of the displacement vectors represent different motion modes of the lens. After analyzing the video stream, it is found that the movement of the digestive endoscope lens in the intestine is classified into three movement modes, such as forward posture, backward posture, and movement in the image plane. The movement in the plane can be further subdivided into rotation and translation. . As shown in Figure 3, Figure 3 (a) is the basic process of the displacement vector extraction algorithm, and finally a fusion image with displacement vector is obtained; Figure 3 (b) is a schematic diagram of three types of motion modes, which are divided into forward, There are three basic modes of retreat and in-plane motion; Figure 3(c) is an enlarged schematic diagram of the forward attitude.
在另外的实施例中,可用光流法提取位移向量,光流(optical flow)是空间运动物体在观察成像平面上的像素运动的瞬时速度,光流法描述相邻帧像素在时间域上的变化以得出相邻帧之间的相关性,光流场即三维空间 中运动物体的位移在二维图像平面上的投影。因此也可用局部光流法提取位移向量,再对数据集进行离线标记等后续工作。In another embodiment, the displacement vector can be extracted by the optical flow method. The optical flow is the instantaneous speed of the pixel motion of the spatially moving object on the observation imaging plane. The optical flow method describes the time domain of the adjacent frame pixels. To obtain the correlation between adjacent frames, the optical flow field is the projection of the displacement of the moving object in the three-dimensional space on the two-dimensional image plane. Therefore, the local optical flow method can also be used to extract the displacement vector, and then follow-up work such as offline labeling of the data set is performed.
例如,以结肠镜视频流为例,训练数据集包括从结肠镜视频流中获取的6302张清晰的结肠镜图像,包括腔道中心可见与不可见的情况,其中5041幅图像(占总数据集的80%)作为训练样本,其余1261幅图像作为测试样本。For example, taking the colonoscopy video stream as an example, the training data set includes 6302 clear colonoscopy images obtained from the colonoscopy video stream, including the cases where the center of the lumen is visible and invisible, of which 5041 images (accounting for the total data set 80%) as training samples, and the remaining 1261 images as test samples.
在此步骤中,通过对数据集进行离线标记,能够获得位移向量分布与运动模式之间的对应关系。In this step, by off-line labeling of the dataset, the correspondence between the distribution of the displacement vectors and the motion patterns can be obtained.
步骤S120,利用训练数据,以最小化设定的损失函数为目标训练孪生神经网络模型。Step S120, using the training data to train the Siamese neural network model with the goal of minimizing the set loss function.
深度学习方法中的深度神经网络进行有监督学习需要给定数据样本真值(Ground Truth),如从一张图片中识别出猫,需要对网络模型经过大量正负样本的训练,才能正确的拟合从训练数据到最终目标的带有权重系数的复杂映射函数。而这样的训练方法需要提前已知对应的图片是猫还是狗,这样的标签就是样本真值。从图片中识别出猫或狗的任务称为分类任务,而隐式的对数值进行预测输出的任务称为回归任务。网络预测输出与真实标签之间的差异使用损失函数衡量。The supervised learning of the deep neural network in the deep learning method requires the ground truth of the given data sample. For example, to identify a cat from a picture, the network model needs to be trained with a large number of positive and negative samples in order to correctly simulate it. A complex mapping function with weighting coefficients from the training data to the final target. Such a training method needs to know in advance whether the corresponding picture is a cat or a dog, and such a label is the true value of the sample. The task of identifying a cat or dog from a picture is called a classification task, and the task of implicitly predicting the output of a value is called a regression task. The difference between the predicted output of the network and the true label is measured using a loss function.
在本发明实施例中,优选孪生神经网络进行学习训练,参见图4所示,按时间序列依次输入视频流的两帧连续图像,使用GoogleNet作为主干网络分别提取图像特征,分类模块进行特征融合进而预测3种运动模式,回归模块直接对特征进行角度与长度的计算。即该实施例构建的深层孪生神经网络分为分类模块与回归模块,分类模块负责镜头运动模式的类别输出,回归模块预测步骤S110中位移向量的分布。例如,位移向量分布模式的相似程度由以下3种指标衡量:两帧图像特征点的坐标coordinate、位移向量的长度length和位移向量的角度θ。In the embodiment of the present invention, the twin neural network is preferably used for learning and training. Referring to FIG. 4 , two consecutive images of the video stream are sequentially input in time series, and GoogleNet is used as the backbone network to extract image features respectively, and the classification module performs feature fusion and then Three motion modes are predicted, and the regression module directly calculates the angle and length of the feature. That is, the deep Siamese neural network constructed in this embodiment is divided into a classification module and a regression module, the classification module is responsible for the category output of the lens motion pattern, and the regression module predicts the distribution of the displacement vector in step S110. For example, the similarity of the distribution patterns of the displacement vectors is measured by the following three indicators: the coordinates of the feature points of the two frames of images, the length of the displacement vectors, and the angle θ of the displacement vectors.
因此,上述步骤S110中的位移向量提取算法其实是为了网络输入所需要的数据集离线标记所服务。由于本发明的深度学习任务中需要对连续的镜头运动位移进行估计,因此网络需要同时输入上述提到的两帧图像,孪生网络即两个权重相同的神经网络。Therefore, the displacement vector extraction algorithm in the above step S110 actually serves for offline labeling of the dataset required for network input. Since the continuous lens motion displacement needs to be estimated in the deep learning task of the present invention, the network needs to input the above-mentioned two frames of images at the same time, and the twin network is two neural networks with the same weight.
由于同时涉及分类与回归任务,因此为孪生神经网络设计了多任务损失函数,表示为:Since both classification and regression tasks are involved, a multi-task loss function is designed for the Siamese neural network, which is expressed as:
Figure PCTCN2021076523-appb-000001
Figure PCTCN2021076523-appb-000001
其中,λ coord,λ angle,λ length,λ class是相应项的权重,可根据需要设定,x ij,y ij是特征点的坐标值,
Figure PCTCN2021076523-appb-000002
Figure PCTCN2021076523-appb-000003
是估计的特征点坐标值,i表示特征点索引,j表示连续两帧图像索引,
Figure PCTCN2021076523-appb-000004
表示位移向量的角度真值,
Figure PCTCN2021076523-appb-000005
表示位移向量的角度估计值,
Figure PCTCN2021076523-appb-000006
表示位移向量的长度真值,
Figure PCTCN2021076523-appb-000007
表示位移向量的长度估计值,p i(c)表示类别真值(即对应运动模式类别),
Figure PCTCN2021076523-appb-000008
表示类别估计值。
Figure PCTCN2021076523-appb-000009
为特征点的坐标损失,例如,λ coord取较小值0.1。
Figure PCTCN2021076523-appb-000010
是位移向量的角度损失,
Figure PCTCN2021076523-appb-000011
是位移向量的长度损失,例如,λ angle与λ length取值都为0.5。
Figure PCTCN2021076523-appb-000012
是分类损失,例如,采用常用的交叉熵损失,C代表类别数目,λ class例如取较大的权重值0.5。
Among them, λ coord , λ angle , λ length , λ class are the weights of the corresponding items, which can be set as needed, x ij , y ij are the coordinate values of the feature points,
Figure PCTCN2021076523-appb-000002
and
Figure PCTCN2021076523-appb-000003
is the estimated feature point coordinate value, i represents the feature point index, j represents the image index of two consecutive frames,
Figure PCTCN2021076523-appb-000004
represents the true value of the angle of the displacement vector,
Figure PCTCN2021076523-appb-000005
represents the angle estimate of the displacement vector,
Figure PCTCN2021076523-appb-000006
represents the true value of the length of the displacement vector,
Figure PCTCN2021076523-appb-000007
represents the estimated length of the displacement vector, p i (c) represents the true value of the category (ie, the corresponding motion mode category),
Figure PCTCN2021076523-appb-000008
represents a category estimate.
Figure PCTCN2021076523-appb-000009
is the coordinate loss of feature points, for example, λ coord takes a small value of 0.1.
Figure PCTCN2021076523-appb-000010
is the angular loss of the displacement vector,
Figure PCTCN2021076523-appb-000011
is the length loss of the displacement vector, for example, λ angle and λ length are both 0.5.
Figure PCTCN2021076523-appb-000012
is the classification loss, for example, the commonly used cross entropy loss is used, C represents the number of categories, and λ class , for example, takes a larger weight value of 0.5.
另外,为了更好的刻画网络输出的特征点,计算得到的角度与真实值的分布情况,以及长度与真实值分布情况,在比较时需摒弃角度与长度向量顺序的影响。因此,复合损失函数中的角度与长度损失部分不再采用简单的平方损失,而是采用衡量两个分布接近与否的Wasserstein距离,表示为:In addition, in order to better describe the feature points output by the network, the distribution of the calculated angle and the true value, and the distribution of the length and the true value, the influence of the order of the angle and length vectors needs to be discarded when comparing. Therefore, the angle and length loss parts in the composite loss function no longer use a simple square loss, but use the Wasserstein distance to measure whether the two distributions are close or not, expressed as:
Figure PCTCN2021076523-appb-000013
Figure PCTCN2021076523-appb-000013
Wasserstein距离用于衡量两个分布P 1和P 2的相似性,相比于JS散度、KL散度等更具有优势,即使两个分布没有重叠或者重叠非常少,仍然能反映分布的远近。 Wasserstein distance is used to measure the similarity of two distributions P 1 and P 2. Compared with JS divergence and KL divergence, it has more advantages. Even if the two distributions do not overlap or overlap very little, they can still reflect the distance of the distribution.
例如,在训练阶段,输入连续的两帧结肠镜图像,输出预测的结肠镜 运动模式类别、大小为16*1的角度特征向量、大小为16*1的长度特征向量等。需说明的是,本发明对孪生神经网络模型具体结构不进行限制,可根据需要设定层数,输入、输出的维度等。For example, in the training phase, two consecutive colonoscopy images are input, and the predicted colonoscopy motion pattern category, the angle feature vector of size 16*1, the length feature vector of size 16*1, etc. are output. It should be noted that the present invention does not limit the specific structure of the twin neural network model, and the number of layers, the dimensions of input and output, etc. can be set as required.
本发明采用孪生神经网络模型,以连续两帧图像为输入,输出其嵌入高维度空间的表征,以比较位移向量分布模式的相似程度,通过利用孪生神经网络能够最大化不同标签的表征,并最小化相同标签的表征,即通过这种方式使学到的相似特征距离相近、不同的特征距离相离。The invention adopts the twin neural network model, takes two consecutive frames of images as input, and outputs the representation embedded in the high-dimensional space to compare the similarity of the displacement vector distribution patterns. By using the twin neural network, the representation of different labels can be maximized, and the minimum The representation of the same label is quantized, that is, in this way, the distances of the learned similar features are close, and the distances of different features are separated.
步骤S130,计算下一帧位置坐标。Step S130, calculating the position coordinates of the next frame.
根据孪生神经网络输出的位移向量的分布模式可确定镜头的运动姿态,以前进姿态为例,需估计出前进中心的坐标,前进中心即是下一帧镜头前进位置坐标在图像坐标系的投影,计算方法是取各位移向量的反向延长线交点。进一步地,获得前进中心之后,根据几何关系可计算得出下一帧位置坐标,如图5所示。The motion posture of the lens can be determined according to the distribution pattern of the displacement vector output by the twin neural network. Taking the forward posture as an example, the coordinates of the forward center need to be estimated. The forward center is the projection of the forward position coordinates of the next frame of the lens on the image coordinate system. The calculation method is to take the intersection of the inverse extension lines of each displacement vector. Further, after obtaining the forward center, the position coordinates of the next frame can be calculated according to the geometric relationship, as shown in FIG. 5 .
具体地,假设当前位置(x 1,y 1,z 1)已知,而前进中心(x 2,y 2,z 2)可根据向量分布取位移向量的反向延长线得出,前进距离l取位移向量的均值,则下一帧位置坐标(x 3,y 3,z 3)的推算公式表示为: Specifically, assuming that the current position (x 1 , y 1 , z 1 ) is known, and the forward center (x 2 , y 2 , z 2 ) can be obtained by taking the reverse extension of the displacement vector according to the vector distribution, the forward distance l Taking the mean value of the displacement vectors, the calculation formula of the position coordinates (x 3 , y 3 , z 3 ) of the next frame is expressed as:
Figure PCTCN2021076523-appb-000014
Figure PCTCN2021076523-appb-000014
其中,函数d(p i1,p i2)用于计算特征匹配点对(即位移向量)的距离,p 1和p 2分别代表连续两帧图像提取的特征点。 Among them, the function d(p i1 , p i2 ) is used to calculate the distance of the feature matching point pair (ie the displacement vector), and p 1 and p 2 respectively represent the feature points extracted from two consecutive frames of images.
关于回退姿态和平面内运动姿态对应说明如下:The corresponding descriptions of the retraction posture and the motion posture in the plane are as follows:
对于后退姿态,镜头在撤回时与前进刚好相反,位移向量长度与角度相差较大,位移向量箭头所指方向可聚集成一个后退中心。For the backward posture, when the lens is withdrawn, it is just opposite to the forward movement, the length of the displacement vector is quite different from the angle, and the direction pointed by the arrow of the displacement vector can be gathered into a backward center.
对于平面内运动姿态,可分为旋转与平移等,此时位移向量长度相差不大,向量连线无法聚集成中心,具体又可细分为以下3种情况:For the movement posture in the plane, it can be divided into rotation and translation. At this time, the length of the displacement vector is not much different, and the vector connection cannot be gathered into the center. Specifically, it can be subdivided into the following three situations:
1)当仅为平移运动时,下一帧位置坐标直接通过在图像平面内平移 计算得出,平移的长度l取位移向量模长的均值,角度θ取位移向量与x轴所成夹角的均值;1) When it is only translational motion, the position coordinates of the next frame are directly calculated by translation in the image plane, the length l of the translation is the mean value of the modulo length of the displacement vector, and the angle θ is the angle formed by the displacement vector and the x-axis. mean;
2)当仅为旋转运动时,简化为仅考虑图像平面内的旋转角度的计算,旋转角度取最大跨度角|θ maxmin|,θ max为位移向量与x轴所成夹角的最大值,θ min为位移向量与x轴所成夹角的最小值,二者做差即是做旋转运动时镜头旋转角度横跨的范围,旋转中心为镜头当前位置,导致的结果是镜头朝向的改变; 2) When there is only rotational motion, it is simplified to only consider the calculation of the rotation angle in the image plane, the rotation angle takes the maximum span angle |θ maxmin |, and θ max is the maximum angle formed by the displacement vector and the x-axis. value, θ min is the minimum value of the angle formed by the displacement vector and the x-axis. The difference between the two is the range of the rotation angle of the lens when performing the rotation movement. The rotation center is the current position of the lens, and the result is the direction of the lens. Change;
在另外的实施例中,在平面内旋转运动时,最大跨度角的可选替代方案是将角度均值
Figure PCTCN2021076523-appb-000015
或中值作为旋转角度。
In a further embodiment, an optional alternative to the maximum span angle for in-plane rotational motion is to average the angles
Figure PCTCN2021076523-appb-000015
or the median as the rotation angle.
3)在实际情况中镜头在图像平面内通常即有平移又有旋转运动,在这种情况下,可考虑先计算平移运动的位移长度与角度,再计算旋转运动所导致的镜头朝向的改变。3) In the actual situation, the lens usually has both translation and rotation in the image plane. In this case, it can be considered to calculate the displacement length and angle of the translation motion first, and then calculate the change of the lens orientation caused by the rotation motion.
需说明的是,本发明分情况通过位移向量的分布直接估计姿态变换的平移和旋转,而平移和旋转也可通过4×4的位姿变换矩阵T进行表述,其元素可以通过神经网络训练得到。具体替代方案是将网络结构中某一层权重设定为这个位姿矩阵的参数,该权重可随着训练过程网络的反向传播进行更新,而不再需要显示的给定矩阵T的真值去指导训练。It should be noted that the present invention directly estimates the translation and rotation of the attitude transformation through the distribution of the displacement vector, and the translation and rotation can also be expressed by a 4×4 attitude transformation matrix T, and its elements can be obtained through neural network training. . The specific alternative is to set the weight of a certain layer in the network structure as the parameter of this pose matrix, and the weight can be updated with the back-propagation of the network during the training process, without the need to display the true value of the given matrix T to guide the training.
步骤S140,获取镜头完整运动轨迹,用于消化内镜的导航。In step S140, the complete motion trajectory of the lens is acquired, which is used for the navigation of the digestive endoscope.
将步骤S130得出的下一帧位置坐标点串联,可得到镜头完整的运动轨迹,在验证阶段与实际位移轨迹进行比较,可验证本发明的可行性。基于下一帧位置坐标点与当前位置的连线方向辅助医生进行手术,或为肠镜机器人运动提供视觉导航。The complete motion trajectory of the lens can be obtained by concatenating the position coordinate points of the next frame obtained in step S130 in series, which can be compared with the actual displacement trajectory in the verification stage to verify the feasibility of the present invention. Based on the connection direction between the position coordinate point of the next frame and the current position, it assists the doctor to perform surgery, or provides visual navigation for the movement of the colonoscopy robot.
相应地,本发明还提供一种消化内镜导航系统,用于实现上述方法的一个方面或多个方面。例如该系统包括:训练模块,其用于利用训练数据训练孪生神经网络模型,其中,所述训练数据反映消化内镜连续两帧图像的位移向量分布特征与消化内镜运动模式之间的对应关系,每个位移向量是连续两帧图像关于同一位置的特征点相连;预测模块,其用于实时获取消化内镜连续视频流,将连续两帧图像输入到经训练的孪生神经网络模型, 以根据位移向量分布特征识别消化内镜的运动模式,并计算相应运动模式下一帧位置坐标,进而输出运动轨迹。Correspondingly, the present invention also provides a digestive endoscope navigation system for implementing one or more aspects of the above method. For example, the system includes: a training module, which is used to train a twin neural network model by using training data, wherein the training data reflects the correspondence between the displacement vector distribution characteristics of two consecutive frames of digestive endoscope images and the movement pattern of digestive endoscope , each displacement vector is connected to the feature points of two consecutive frames of images about the same position; the prediction module, which is used to obtain the continuous video stream of digestive endoscopy in real time, inputs the two consecutive frames of images into the trained twin neural network model, to obtain according to Displacement vector distribution features identify the motion pattern of digestive endoscope, calculate the position coordinates of the next frame of the corresponding motion pattern, and then output the motion trajectory.
综上,本发明基于两帧图像关键点融合的位移向量提取,结合孪生神经网络对位移向量进行预测,进而输出消化内镜下一帧位置坐标,提供了针对消化内镜尤其是肠镜的视觉导航方法,能够精确地识别消化内镜自身位姿与运动方向。To sum up, the present invention is based on the displacement vector extraction of the fusion of the key points of the two frames of images, combined with the twin neural network to predict the displacement vector, and then outputs the position coordinates of the next frame of the digestive endoscope, which provides a vision for digestive endoscopy, especially colonoscopy. The navigation method can accurately identify the posture and movement direction of the digestive endoscope.
为进一步验证本发明的效果,进行了实验。使用了两组具有磁定位的支气管模型数据,包括1441张和3333张支气管模型内部运动图像及其对应的6自由度旋转角和相机空间位姿坐标。In order to further verify the effect of the present invention, experiments were carried out. Two sets of bronchial model data with magnetic localization are used, including 1441 and 3333 internal moving images of the bronchial model and their corresponding 6-DOF rotation angles and camera space pose coordinates.
经验证,深度卷积神经网络输出路径与实际的磁定位空间路径在一定程度上保持了一致性,但随着路径的增加,后期误差逐渐累计,会出现误差较大的离群点。其中,第一组数据集的最小误差为0.06144mm,最大误差为4.5234mm。在第二组数据集的数据校正中,由于光照等因素的影响,预测结果波动较大,最小误差为0.0869mm,最大误差为6.9547mm。但误差仍在可控范围内。实验证明,相对于现有技术,本发明提高了导航定位精度。It has been verified that the output path of the deep convolutional neural network is consistent with the actual magnetic positioning space path to a certain extent, but with the increase of the path, the later errors are gradually accumulated, and outliers with large errors will appear. Among them, the minimum error of the first set of data sets is 0.06144mm, and the maximum error is 4.5234mm. In the data correction of the second set of data sets, due to the influence of factors such as illumination, the prediction results fluctuate greatly, the minimum error is 0.0869mm, and the maximum error is 6.9547mm. But the error is still within the controllable range. Experiments show that, compared with the prior art, the present invention improves the navigation and positioning accuracy.
综上所述,本发明提出一种基于两帧图像关键点融合的位移向量提取算法,同时基于该算法搭建孪生神经网络模型估计消化内镜的当前运动模式并给出下一帧位置坐标,不再依赖于局部图像提取暗区轮廓,使算法具有更好的适应性。基于目前大数据与计算机计算能力的提高,从医生正确操作的结肠镜手术视频流中学习镜头运动模式,即从整体上把握运动轨迹,而非仅仅依赖于单帧图像。前期利用图像处理相关技术去除由于手术中冲水造成的肠壁镜面反射造成的高光影响,中期提取两帧图像特征并做融合得到位移向量,后期使用“学习”的方式,利用大量数据集提取得到的位移向量离线训练神经网络模型,最后形成一套以深度学习方法为基础的具有更好适应性的消化内镜导航方法。To sum up, the present invention proposes a displacement vector extraction algorithm based on the fusion of key points of two frames of images, and at the same time builds a twin neural network model based on the algorithm to estimate the current motion mode of the digestive endoscope and give the position coordinates of the next frame, without Then rely on the local image to extract the contour of the dark area, so that the algorithm has better adaptability. Based on the current improvement of big data and computer computing power, the lens motion pattern is learned from the video stream of colonoscopy surgery correctly operated by the doctor. In the early stage, image processing technology was used to remove the specular effect caused by the specular reflection of the intestinal wall caused by flushing during the operation. In the middle stage, the features of two frames of images were extracted and fused to obtain the displacement vector. The displacement vector of the neural network model was trained offline, and finally a set of digestive endoscopic navigation methods with better adaptability based on the deep learning method was formed.
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above. Computer-readable storage media, as used herein, are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因 特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect). In some embodiments, custom electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), can be personalized by utilizing state information of computer readable program instructions. Computer readable program instructions are executed to implement various aspects of the present invention.
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的 方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation in hardware, implementation in software, and implementation in a combination of software and hardware are all equivalent.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

  1. 一种消化内镜导航方法,包括以下步骤:A digestive endoscope navigation method, comprising the following steps:
    利用训练数据训练孪生神经网络模型,其中,所述训练数据反映消化内镜连续两帧图像的位移向量分布特征与消化内镜运动模式之间的对应关系,每个位移向量是连续两帧图像关于同一位置的特征点相连;The twin neural network model is trained by using training data, wherein the training data reflects the correspondence between the distribution characteristics of the displacement vectors of two consecutive frames of digestive endoscope images and the motion pattern of the digestive endoscope, and each displacement vector is the relationship between the two consecutive frames of images. Feature points at the same location are connected;
    实时获取消化内镜连续视频流,将连续两帧图像输入到经训练的孪生神经网络模型,以根据位移向量分布特征识别消化内镜的运动模式,并计算相应运动模式下一帧位置坐标,进而输出运动轨迹。The continuous video stream of digestive endoscope is acquired in real time, and two consecutive frames of images are input into the trained twin neural network model to identify the motion pattern of the digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the position coordinates of the next frame of the corresponding motion pattern, and then Output motion trajectory.
  2. 根据权利要求1所述的方法,其中,根据以下步骤获得连续两帧图像的位移向量分布特征:The method according to claim 1, wherein the displacement vector distribution characteristics of two consecutive frames of images are obtained according to the following steps:
    利用特征点匹配算法提取连续两帧图像的特征点,以表征连续两帧图像中的相似点;The feature point matching algorithm is used to extract the feature points of two consecutive frames of images to represent the similar points in the two consecutive frames of images;
    将前一帧图像取第一设定比例的透明度,当前帧的图像取第二设定比例透明度进行叠加,得到一张融合图像;The previous frame image takes the transparency of the first set ratio, and the image of the current frame takes the second set ratio transparency and superimposes to obtain a fusion image;
    在融合图像上将连续两帧图像的特征点进行相连,得到多个位移向量。The feature points of two consecutive frames of images are connected on the fusion image to obtain multiple displacement vectors.
  3. 根据权利要求1所述的方法,其中,所述位移向量分布特征包括连续两帧图像特征点的坐标、位移向量的长度和位移向量的角度,所述运动模式分类为前进、后退以及在图像平面内运动模式。The method according to claim 1, wherein the distribution features of the displacement vectors include the coordinates of the feature points of two consecutive frames of images, the length of the displacement vectors and the angle of the displacement vectors, and the motion modes are classified as forward, backward and in the image plane Inner movement mode.
  4. 根据权利要求1所述的方法,其中,在训练过程中,所述孪生神经网络模型的损失函数表示为:The method according to claim 1, wherein, in the training process, the loss function of the Siamese neural network model is expressed as:
    Figure PCTCN2021076523-appb-100001
    Figure PCTCN2021076523-appb-100001
    其中,λ coord是坐标损失权重,λ angle是位移向量的角度损失权重,λ length是位移向量的长度损失权重,λ class是运动模式类别损失权重,x ij和y ij是特征点的坐标值,
    Figure PCTCN2021076523-appb-100002
    Figure PCTCN2021076523-appb-100003
    是估计的特征点坐标值,i表示特征点索引,j表示连续两帧图像索引,
    Figure PCTCN2021076523-appb-100004
    表示位移向量的角度真值,
    Figure PCTCN2021076523-appb-100005
    表示位移向量的角 度估计值,
    Figure PCTCN2021076523-appb-100006
    表示位移向量的长度真值,
    Figure PCTCN2021076523-appb-100007
    表示位移向量的长度估计值,p i(c)表示运动模式类别真值,
    Figure PCTCN2021076523-appb-100008
    表示运动模式类别估计值,C是运动模式的类别数目。
    where λ coord is the coordinate loss weight, λ angle is the angle loss weight of the displacement vector, λ length is the length loss weight of the displacement vector, λ class is the motion mode category loss weight, x ij and y ij are the coordinate values of the feature points,
    Figure PCTCN2021076523-appb-100002
    and
    Figure PCTCN2021076523-appb-100003
    is the estimated feature point coordinate value, i represents the feature point index, j represents the image index of two consecutive frames,
    Figure PCTCN2021076523-appb-100004
    represents the true value of the angle of the displacement vector,
    Figure PCTCN2021076523-appb-100005
    represents the angle estimate of the displacement vector,
    Figure PCTCN2021076523-appb-100006
    represents the true value of the length of the displacement vector,
    Figure PCTCN2021076523-appb-100007
    represents the length estimate of the displacement vector, p i (c) represents the true value of the motion mode category,
    Figure PCTCN2021076523-appb-100008
    represents the motion mode category estimate, and C is the number of motion mode categories.
  5. 根据权利要求4所述的方法,其中,采用衡量两个分布相似性的Wasserstein距离来衡量所述位移向量的角度损失和所述位移向量的长度损失。5. The method of claim 4, wherein the angular loss of the displacement vector and the length loss of the displacement vector are measured using Wasserstein distance, which measures the similarity of two distributions.
  6. 根据权利要求3所述的方法,其中计算相应运动模式下一帧位置坐标包括:The method according to claim 3, wherein calculating the position coordinates of the next frame of the corresponding motion mode comprises:
    对于前进运动模式,取各位移向量的反向延长线交点作为前进中心的坐标,并根据几何关系计算得出下一帧位置坐标;For the forward motion mode, take the intersection of the reverse extension lines of each displacement vector as the coordinates of the forward center, and calculate the position coordinates of the next frame according to the geometric relationship;
    对于后退运动模式,以位移向量所指方向聚集成一个后退中心;For the backward motion mode, gather into a backward center in the direction pointed by the displacement vector;
    对于平面内运动模式,根据以下步骤计算:For in-plane motion mode, it is calculated according to the following steps:
    当仅为平移运动时,下一帧位置坐标直接通过在图像平面内平移计算得出,平移的长度l取位移向量模长的均值,角度θ取位移向量与x轴所成夹角的均值;When it is only a translation movement, the position coordinates of the next frame are directly calculated by translation in the image plane, the length l of the translation is the mean value of the modulo length of the displacement vector, and the angle θ is the mean value of the angle formed by the displacement vector and the x-axis;
    当仅为旋转运动时,旋转角度取最大跨度角|θ maxmin|,θ max为位移向量与x轴所成夹角的最大值,θ min为位移向量与x轴所成夹角的最小值,旋转中心为消化内镜镜头当前位置; When there is only rotational motion, the rotation angle takes the maximum span angle |θ maxmin |, θ max is the maximum value of the angle formed by the displacement vector and the x-axis, and θ min is the angle formed by the displacement vector and the x-axis. The minimum value, the rotation center is the current position of the digestive endoscope lens;
    对于镜头在图像平面内即有平移又有旋转运动的情况,先计算平移运动的位移长度与角度,再计算旋转运动所导致的镜头朝向的改变。For the case where the lens has both translation and rotation movement in the image plane, first calculate the displacement length and angle of the translation movement, and then calculate the change of the lens orientation caused by the rotation movement.
  7. 根据权利要求6所述的方法,其中,对于前进运动模式,下一帧的位置坐标(x 3,y 3,z 3)表示为: The method according to claim 6, wherein, for the forward motion mode, the position coordinates (x 3 , y 3 , z 3 ) of the next frame are expressed as:
    Figure PCTCN2021076523-appb-100009
    Figure PCTCN2021076523-appb-100009
    Figure PCTCN2021076523-appb-100010
    Figure PCTCN2021076523-appb-100010
    其中,函数d(p i1,p i2)用于计算位移向量的距离,p 1和p 2分别代表连续两帧图像提取的特征点,(x 1,y 1,z 1)是已知的当前位置坐标,(x 2,y 2,z 2)是前进 中心的坐标,l是前进距离,取位移向量的均值,n表示特征点数目。 Among them, the function d(p i1 , p i2 ) is used to calculate the distance of the displacement vector, p 1 and p 2 respectively represent the feature points extracted from two consecutive frames of images, (x 1 , y 1 , z 1 ) is the known current Position coordinates, (x 2 , y 2 , z 2 ) are the coordinates of the advancing center, l is the advancing distance, take the mean of the displacement vectors, and n represent the number of feature points.
  8. 一种消化内镜导航系统,包括:A digestive endoscope navigation system, comprising:
    训练模块:用于利用训练数据训练孪生神经网络模型,其中,所述训练数据反映消化内镜连续两帧图像的位移向量分布特征与消化内镜运动模式之间的对应关系,每个位移向量是连续两帧图像关于同一位置的特征点相连;Training module: used to train the twin neural network model by using training data, wherein the training data reflects the correspondence between the displacement vector distribution characteristics of two consecutive frames of images of digestive endoscope and the motion pattern of digestive endoscope, and each displacement vector is Two consecutive frames of images are connected with feature points at the same position;
    预测模块:用于实时获取消化内镜连续视频流,将连续两帧图像输入到经训练的孪生神经网络模型,以根据位移向量分布特征识别消化内镜的运动模式,并计算相应运动模式下一帧位置坐标,进而输出运动轨迹。Prediction module: used to obtain the continuous video stream of digestive endoscope in real time, and input two consecutive frames of images into the trained twin neural network model to identify the movement pattern of digestive endoscope according to the distribution characteristics of the displacement vector, and calculate the next step of the corresponding movement pattern. Frame position coordinates, and then output the motion trajectory.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至7中任一项所述方法的步骤。A computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至7中任一项所述的方法的步骤。A computer device, comprising a memory and a processor, a computer program that can be run on the processor is stored in the memory, and characterized in that, when the processor executes the program, any one of claims 1 to 7 is implemented The steps of the method described in item.
PCT/CN2021/076523 2021-02-10 2021-02-10 Digestive endoscope navigation method and system WO2022170562A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/076523 WO2022170562A1 (en) 2021-02-10 2021-02-10 Digestive endoscope navigation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/076523 WO2022170562A1 (en) 2021-02-10 2021-02-10 Digestive endoscope navigation method and system

Publications (1)

Publication Number Publication Date
WO2022170562A1 true WO2022170562A1 (en) 2022-08-18

Family

ID=82837452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076523 WO2022170562A1 (en) 2021-02-10 2021-02-10 Digestive endoscope navigation method and system

Country Status (1)

Country Link
WO (1) WO2022170562A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953373A (en) * 2022-12-22 2023-04-11 青岛创新奇智科技集团股份有限公司 Glass defect detection method and device, electronic equipment and storage medium
CN117281616A (en) * 2023-11-09 2023-12-26 武汉真彩智造科技有限公司 Operation control method and system based on mixed reality
CN117671012A (en) * 2024-01-31 2024-03-08 临沂大学 Method, device and equipment for calculating absolute and relative pose of endoscope in operation
CN117796745A (en) * 2024-02-29 2024-04-02 四川大学 Method for estimating advancing and retreating distance of digestive endoscope lens

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180296281A1 (en) * 2017-04-12 2018-10-18 Bio-Medical Engineering (HK) Limited Automated steering systems and methods for a robotic endoscope
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount
CN110376605A (en) * 2018-09-18 2019-10-25 北京京东尚科信息技术有限公司 Map constructing method, air navigation aid and device
CN111415564A (en) * 2020-03-02 2020-07-14 武汉大学 Pancreatic ultrasonic endoscopy navigation method and system based on artificial intelligence
US20200297444A1 (en) * 2019-03-21 2020-09-24 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for localization based on machine learning
CN111915573A (en) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 Digestive endoscopy focus tracking method based on time sequence feature learning
CN112330729A (en) * 2020-11-27 2021-02-05 中国科学院深圳先进技术研究院 Image depth prediction method and device, terminal device and readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180296281A1 (en) * 2017-04-12 2018-10-18 Bio-Medical Engineering (HK) Limited Automated steering systems and methods for a robotic endoscope
CN110376605A (en) * 2018-09-18 2019-10-25 北京京东尚科信息技术有限公司 Map constructing method, air navigation aid and device
US20200297444A1 (en) * 2019-03-21 2020-09-24 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for localization based on machine learning
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount
CN111415564A (en) * 2020-03-02 2020-07-14 武汉大学 Pancreatic ultrasonic endoscopy navigation method and system based on artificial intelligence
CN111915573A (en) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 Digestive endoscopy focus tracking method based on time sequence feature learning
CN112330729A (en) * 2020-11-27 2021-02-05 中国科学院深圳先进技术研究院 Image depth prediction method and device, terminal device and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Thesis Submitted in Partial Fulfillment of the Requirementsfor the Degree of Master of Engineering Huazhong University of Science & Technology", 31 December 2018, HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY, CN, article ZOU, TIANHAO: "Research on Data-driven Grid Background Suppression Algorithm under Dynamic Platform", pages: 1 - 65, XP009539003 *
SHI GUOQIANG: "Object tracking algorithm based on jointly-optimized strong-coupled Siamese region proposal network", JOURNAL OF COMPUTER APPLICATIONS, JISUANJI YINGYONG, CN, vol. 40, no. 10, 10 October 2020 (2020-10-10), CN , pages 2822 - 2830, XP055957890, ISSN: 1001-9081 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953373A (en) * 2022-12-22 2023-04-11 青岛创新奇智科技集团股份有限公司 Glass defect detection method and device, electronic equipment and storage medium
CN115953373B (en) * 2022-12-22 2023-12-15 青岛创新奇智科技集团股份有限公司 Glass defect detection method, device, electronic equipment and storage medium
CN117281616A (en) * 2023-11-09 2023-12-26 武汉真彩智造科技有限公司 Operation control method and system based on mixed reality
CN117281616B (en) * 2023-11-09 2024-02-06 武汉真彩智造科技有限公司 Operation control method and system based on mixed reality
CN117671012A (en) * 2024-01-31 2024-03-08 临沂大学 Method, device and equipment for calculating absolute and relative pose of endoscope in operation
CN117671012B (en) * 2024-01-31 2024-04-30 临沂大学 Method, device and equipment for calculating absolute and relative pose of endoscope in operation
CN117796745A (en) * 2024-02-29 2024-04-02 四川大学 Method for estimating advancing and retreating distance of digestive endoscope lens
CN117796745B (en) * 2024-02-29 2024-05-03 四川大学 Method for estimating advancing and retreating distance of digestive endoscope lens

Similar Documents

Publication Publication Date Title
CN112766416B (en) Digestive endoscope navigation method and digestive endoscope navigation system
WO2022170562A1 (en) Digestive endoscope navigation method and system
Visentini-Scarzanella et al. Deep monocular 3D reconstruction for assisted navigation in bronchoscopy
CN108685560B (en) Automated steering system and method for robotic endoscope
US10733745B2 (en) Methods, systems, and computer readable media for deriving a three-dimensional (3D) textured surface from endoscopic video
Lin et al. Video‐based 3D reconstruction, laparoscope localization and deformation recovery for abdominal minimally invasive surgery: a survey
Grasa et al. Visual SLAM for handheld monocular endoscope
Qin et al. Surgical instrument segmentation for endoscopic vision with data fusion of cnn prediction and kinematic pose
WO2022156425A1 (en) Minimally invasive surgery instrument positioning method and system
Liu et al. Sage: slam with appearance and geometry prior for endoscopy
van der Stap et al. Towards automated visual flexible endoscope navigation
Rau et al. Bimodal camera pose prediction for endoscopy
Lee et al. Weakly supervised segmentation for real‐time surgical tool tracking
Lepetit Recent advances in 3d object and hand pose estimation
Van der Stap et al. The use of the focus of expansion for automated steering of flexible endoscopes
US20230267679A1 (en) Method and System for Reconstructing the Three-Dimensional Surface of Tubular Organs
Song et al. BDIS: Bayesian dense inverse searching method for real-time stereo surgical image matching
Rodríguez et al. Tracking monocular camera pose and deformation for slam inside the human body
Liu et al. A robust method to track colonoscopy videos with non-informative images
Zhang et al. 3D reconstruction of deformable colon structures based on preoperative model and deep neural network
TW202322744A (en) Computer-implemented systems and methods for analyzing examination quality for an endoscopic procedure
US11227406B2 (en) Fusing deep learning and geometric constraint for image-based localization
Xu et al. Graph-based Pose Estimation of Texture-less Surgical Tools for Autonomous Robot Control
Nguyen et al. Improving the Hand Pose Estimation from Egocentric Vision via HOPE-Net and Mask R-CNN
Jin et al. A Self-supervised Approach for Detecting the Edges of Haustral Folds in Colonoscopy Video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21925225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE