CN113538335A - In-vivo relative positioning method and device of wireless capsule endoscope - Google Patents

In-vivo relative positioning method and device of wireless capsule endoscope Download PDF

Info

Publication number
CN113538335A
CN113538335A CN202110640677.1A CN202110640677A CN113538335A CN 113538335 A CN113538335 A CN 113538335A CN 202110640677 A CN202110640677 A CN 202110640677A CN 113538335 A CN113538335 A CN 113538335A
Authority
CN
China
Prior art keywords
loss
picture
matrix
network
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110640677.1A
Other languages
Chinese (zh)
Inventor
孟庆虎
许杨昕
邢小涵
王建坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute of CUHK
Original Assignee
Shenzhen Research Institute of CUHK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute of CUHK filed Critical Shenzhen Research Institute of CUHK
Priority to CN202110640677.1A priority Critical patent/CN113538335A/en
Publication of CN113538335A publication Critical patent/CN113538335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B1/00Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
    • A61B1/04Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor combined with photographic or television appliances
    • A61B1/041Capsule endoscopes for imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Radiology & Medical Imaging (AREA)
  • Surgery (AREA)
  • Biomedical Technology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pathology (AREA)
  • Optics & Photonics (AREA)
  • Veterinary Medicine (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Endoscopes (AREA)
  • Image Processing (AREA)

Abstract

A method and a device for in-vivo relative positioning of a wireless capsule endoscope realize the relative positioning of the wireless capsule endoscope in an in-vivo deformable tubular environment by training a visual odometer network based on unsupervised learning without changing the hardware of the wireless capsule endoscope, and can accurately estimate the position and the depth of each image according to a given image sequence by utilizing the trained visual odometer network, thereby improving the positioning accuracy. The visual odometer can be developed by directly using the visual information of the camera carried by the capsule without additionally adding a sensor or modifying the capsule, the used visual odometer based on unsupervised learning does not need a complicated manual labeling process, and the problem of inelastic deformation of the environment can be solved by trusting a mask matrix. The invention utilizes the visual information to carry out relative positioning on the wireless capsule endoscope in vivo, and is more practical in practical clinical application compared with the prior method.

Description

In-vivo relative positioning method and device of wireless capsule endoscope
Technical Field
The invention relates to positioning of a wireless capsule endoscope, in particular to a method and a device for in-vivo relative positioning of the wireless capsule endoscope.
Background
Wireless capsule endoscopy has become an important technical tool for examination of the digestive tract, especially in the small intestine, which is an area that cannot be examined by ordinary gastroscopes and enteroscopes. A wireless capsule endoscope is a robot with the same size as a capsule. The medical image diagnosis system carries a camera module, an image processing module and a wireless transmission module, can shoot images in a patient body and transmit the images to the outside of the body in real time, and doctors can diagnose according to the transmitted images. Meanwhile, the patient has no pain in gastroscopy and enteroscopy. However, if the doctor finds a lesion on a certain picture, the doctor does not have enough information to judge at which part of the body the picture is taken. If the location of the capsule at which the lesion was photographed can be known, the physician can more quickly make a diagnosis and subsequent treatment plan. Therefore, it is very important to calculate and determine the position of the capsule. Meanwhile, doctors hope more that the distance of the position of the pathological change picture shot by the capsule relative to an important position in an organ, such as the distance of the pathological change picture to the tail end pylorus of the stomach, is obtained, and the description of the relative distance is more in line with the requirements of the doctors. The invention obtains the relative position of the current capsule relative to the biological anatomical landmark through the visual mileage calculation method based on the unsupervised learning according to the picture taken by the capsule.
The existing wireless capsule endoscope positioning system mainly comprises capsule radio signal positioning, capsule magnetic positioning and capsule visual positioning. The capsule radio signal positioning technology is to obtain the strength of a radio signal sent by a capsule by using a sensor array outside a human body so as to position, but the positioning result in the mode has high error. The capsule magnetic positioning technology is that a small permanent magnet is placed in a capsule, and a magnetic sensor array is placed outside a human body to acquire the magnetic field intensity so as to calculate the capsule posture information. However, the above two methods obtain coordinate information of six degrees of freedom in a fixed three-dimensional coordinate system, that is, spatial position information (x, y, z) and spatial rotation information (α, β, γ), and such coordinate position information is "absolute positioning" information, and "relative positioning" information cannot be obtained, and does not meet the needs of doctors. Meanwhile, some existing capsule visual positioning technologies are mainly used as auxiliary positioning for radio signal positioning, which is also "absolute positioning", or only visual feature extraction and matching are performed. And their feature extraction matching technique is performed between two whole images of the capsule endoscope. The method mainly utilizes main stream spot feature extraction and matching technologies such as SIFT, SURF or ORB to extract and match features of two continuous images, because the resolution of images photographed by a capsule endoscope is not high and is fuzzy, and the intestinal tract of a person is in motion at any moment and is elastically deformed, and a large amount of mismatching can easily occur when two whole images are directly matched. In addition, studies have suggested that capsules can be positioned using a visual algorithm based on deep learning, and that elastic deformation of the intestinal tract can be overcome to some extent, but tests are also mainly performed in relatively enlarged organs such as pig stomach, and not in tubular environments such as large intestine and small intestine.
It is to be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
The main purpose of the present invention is to overcome the above mentioned drawbacks of the background art, and to provide a method and a device for relative positioning in vivo of a wireless capsule endoscope, which can improve the positioning accuracy and the practicability when performing relative positioning in vivo of the wireless capsule endoscope.
In order to achieve the purpose, the invention adopts the following technical scheme:
a relative positioning method in vivo of a wireless capsule endoscope comprises a training process and a testing process;
wherein the training process comprises the steps of:
s1, inputting a wireless capsule endoscope picture sequence to a visual odometry network based on unsupervised learning, wherein the picture sequence comprises a source picture and a target picture, and the visual odometry network comprises an attitude network and a depth network;
s2, outputting a pose transformation matrix converted from the target picture to the source picture and a confidence mask matrix of the target picture by the posture network;
s3, outputting a pixel depth prediction matrix of the target picture by the depth network;
s4, reconstructing and projecting the source image to a target image according to a camera model based on the pose transformation matrix, the pixel depth prediction matrix and the confidence mask matrix, and calculating a pixel difference between the target image and the reconstructed image, wherein the pixel difference is called pixel loss;
s5, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss;
s6, directly estimating the relative depth of the pixel from the target image, and calculating the difference between the depth matrix and the pixel depth prediction matrix, which is called depth loss;
s7, calculating smoothness loss of the pixel depth prediction matrix;
s8, forming a loss function by the pixel loss, the mask loss, the depth loss and the smoothness loss, optimizing the loss function, and updating network parameters until convergence;
wherein, the test process comprises the following steps:
t1, inputting a wireless capsule endoscope picture sequence including a source picture and a target picture into the trained visual odometer network;
t2, outputting a pose transformation matrix converted from the target picture to the source picture by the pose network;
and T3, calculating the current position of the wireless capsule endoscope relative to the previous biological anatomical landmark according to the newly calculated position and position transformation matrix and the historical position and position transformation matrix accumulated by taking the previous biological anatomical landmark as a starting point.
Further:
the picture sequence comprises three continuous pictures, wherein the first picture and the third picture are the source pictures, and the second picture is the target picture.
The confidence mask matrix is a confidence mask matrix of the target picture under four different scales, and the pixel depth prediction matrix is a pixel depth prediction matrix of the target picture under four different scales.
In step S6, the pixel relative depth is directly estimated from the target picture according to the Shapefromsoding algorithm, called SfS depth matrix, and then the difference between the SfS depth matrix and the pixel depth prediction matrix is calculated.
In step S7, the sum of the absolute values of the derivatives of each element of the pixel depth prediction matrix is calculated as the smoothness loss of the pixel depth prediction matrix.
In step S8, the pixel loss, the mask loss, the depth loss, and the smoothness loss are added to form the loss function.
A method of training a visual odometry network for in vivo relative positioning of a wireless capsule endoscope based on unsupervised learning, the method comprising the steps of:
s1, inputting a wireless capsule endoscope picture sequence to a visual odometry network based on unsupervised learning, wherein the picture sequence comprises a source picture and a target picture, and the visual odometry network comprises an attitude network and a depth network;
s2, outputting a pose transformation matrix converted from the target picture to the source picture and a confidence mask matrix of the target picture by the posture network;
s3, outputting a pixel depth prediction matrix of the target picture by the depth network;
s4, reconstructing and projecting the source image to a target image according to a camera model based on the pose transformation matrix, the pixel depth prediction matrix and the confidence mask matrix, and calculating a pixel difference between the target image and the reconstructed image, wherein the pixel difference is called pixel loss;
s5, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss;
s6, directly estimating the relative depth of the pixel from the target image, and calculating the difference between the depth matrix and the pixel depth prediction matrix, which is called depth loss;
s7, calculating smoothness loss of the pixel depth prediction matrix;
s8, forming a loss function by the pixel loss, the mask loss, the depth loss and the smoothness loss, optimizing the loss function, and updating the network parameters until convergence.
An in-vivo relative positioning method of a wireless capsule endoscope, using a visual odometer network trained by the training method, the in-vivo relative positioning method comprising the steps of:
t1, inputting a wireless capsule endoscope picture sequence including a source picture and a target picture into the trained visual odometer network;
t2, outputting a pose transformation matrix converted from the target picture to the source picture by the pose network;
and T3, calculating the current position of the wireless capsule endoscope relative to the previous biological anatomical landmark according to the newly calculated position and position transformation matrix and the historical position and position transformation matrix accumulated by taking the previous biological anatomical landmark as a starting point.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the method for relative in vivo positioning of a wireless capsule endoscope.
An in vivo relative positioning device of a wireless capsule endoscope comprises a processor and a computer readable storage medium, wherein the processor realizes the in vivo relative positioning method of the wireless capsule endoscope when executing a computer program stored on the computer readable storage medium.
The invention has the following beneficial effects:
the invention provides a method and a device for in-vivo relative positioning of a wireless capsule endoscope, which can realize the relative positioning of the wireless capsule endoscope in an in-vivo deformable tubular environment by training a visual odometer network based on unsupervised learning without changing the hardware of the wireless capsule endoscope, and can accurately estimate the position and the depth of each image according to a given image sequence by utilizing the trained visual odometer network, thereby improving the positioning accuracy. The invention can directly use the visual information of the camera carried by the capsule to develop the visual odometer without additionally adding a sensor or modifying the capsule, does not need complicated manual labeling process for the used visual odometer based on unsupervised learning, and can solve the problem of inelastic deformation of the environment by trusting a mask matrix. The invention utilizes the visual information to carry out relative positioning on the wireless capsule endoscope in vivo, and is more practical in practical clinical application compared with the prior method.
The preferred scheme of the invention adopts depth constraint based on a shadow shape (Sfs), and can further improve the depth prediction precision and the positioning precision. Based on pixel depth matrix limitation of Shapefromsanding theory, estimation accuracy of pixel depth is improved, and therefore positioning accuracy is improved.
Drawings
FIG. 1 is a flow chart of a training process of a method for relative positioning in vivo of a wireless capsule endoscope according to an embodiment of the present invention.
FIG. 2 is a flow chart of a testing process of a method for relative positioning in vivo of a wireless capsule endoscope according to an embodiment of the present invention.
FIG. 3 is a block diagram of a visual odometer network in accordance with an embodiment of the invention.
FIG. 4 is a graph of the relationship between the input and output of a visual odometer network according to an embodiment of the invention.
In fig. 5, (a) shows a picture taken during the motion of the capsule in the pig intestine, and (b) shows the estimated pixel depth prediction matrix after the picture is input into the "depth network".
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
Referring to fig. 1-3, in some embodiments, an unsupervised learning-based relative positioning method for a wireless capsule endoscope in a deformable tubular environment includes the following typical but non-limiting training and testing procedures.
The training process comprises the following steps:
in a first step, a sequence of wireless capsule endoscopic pictures is input into an unsupervised learning-based visual odometry network. The visual odometry network comprises two sub-networks, namely a posture network and a depth network. The picture sequence comprises three continuous pictures, wherein the first picture and the third picture are two source pictures and are respectively called a first source picture, a second source picture and a target picture.
And secondly, inputting three continuous pictures into a first sub-network posture network, converting an output target picture into a pose transformation matrix of two source pictures by the network, and simultaneously, outputting the trust degree mask matrix of the output target picture under four different scales by the network.
And thirdly, inputting the target picture into a second sub-network, namely a depth network, and outputting a pixel depth prediction matrix of the target picture under four different scales by the depth network.
And fourthly, reconstructing and projecting the two source images onto a target image according to the camera model based on the two pose transformation matrixes, the pixel depth prediction matrix and the confidence mask matrix output by the two sub-networks in the previous two steps. The pixel difference between the target image and the two reconstructed images is calculated and is called pixel loss.
And fifthly, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss.
Sixthly, directly estimating the pixel relative depth from the target picture according to Shapefromsanding algorithm, which is called SfS depth matrix. The difference between the depth matrix and the pixel depth prediction matrix output by the network, referred to as the depth penalty, is then computed SfS.
In the seventh step, the smoothness of the pixel depth prediction matrix is calculated, i.e. the sum of the absolute values of the derivatives of each element of the matrix is calculated, called smoothness penalty.
And step eight, adding the four losses to form a loss function. The loss function is optimized and network parameters are updated until the loss values converge.
The test process comprises the following steps:
firstly, inputting a wireless capsule endoscope picture sequence into a trained visual odometer network. The visual odometry network comprises two sub-networks, namely a posture network and a depth network. The picture sequence comprises three continuous pictures, wherein the first picture and the third picture are two source pictures and are respectively called a first source picture, a second source picture and a target picture.
And secondly, inputting three continuous pictures into a first sub-network posture network, converting an output target picture into a pose transformation matrix of two source pictures by the network, and simultaneously, outputting the trust degree mask matrix of the output target picture under four different scales by the network.
And thirdly, according to the two newly calculated pose transformation matrixes and the historical pose transformation matrix accumulated by taking the last biological anatomical landmark as a starting point, calculating the pose of the capsule relative to the last biological anatomical landmark at present.
The invention can be used for the relative positioning of the wireless capsule endoscope during the examination in the alimentary canal of the human body. The physician may pass the sequence of images or video captured by the wireless capsule endoscope to the software, which will analyze the sequence of images/video to estimate the relative distance of the capsule endoscope with respect to a certain bio-anatomical landmark.
The training and testing process of the present invention is further described below in conjunction with the appended drawings.
Training process
In a first step, a sequence of wireless capsule endoscopic pictures is input into an unsupervised learning-based visual odometry network. As shown in fig. 3, the visual odometry network comprises two sub-networks, respectively a "pose network" and a "depth network". The picture sequence comprises three continuous pictures, wherein the first picture and the third picture are two source pictures and are respectively called a first source picture, a second source picture and a target picture.
Second, as shown in FIG. 3 (a), three pictures I are takent-1,It,It+1Input into the first sub-network "pose network", which will output the target picture ItConversion to two source pictures IsS ∈ { T-1, T +1} pose transformation matrix T1,T2Meanwhile, the network outputs a confidence mask matrix M of the target picture under four different scaless,l,s∈{t-1,t+1},l∈{0,1,2,3}。
Third, as shown in fig. 3 (b), the target picture I is takentInputting into a second sub-network, namely a depth network, which outputs a pixel depth prediction matrix D of the target picture under four different scalest,l,l∈{0,1,2,3}。
Fourthly, two pose transformation matrixes T output based on two sub-networks in the previous two steps1,T2Pixel depth prediction matrix Dt,lAnd confidence masking matrix Ms,lAnd reconstructing and projecting the two source images onto a target image according to the camera model. The pixel difference between the target image and the two reconstructed images is calculated and is called pixel loss.
Due to the use of unsupervised learning, only image sequences are input and no labels are input in the whole training process, and the output comprises pose transformation, pixel depth prediction and confidence mask. The relationship between all these inputs and outputs is shown in fig. 4, and an auto-supervision limit is formed between these quantities.
The camera model is used as a self-supervision of the network. If the target picture ItIs a source picture IsS ∈ { t-1, t +1}, the projection reconstruction picture of the source picture isI′sThen the pixel difference between the target picture and the projection reconstructed picture should be 0, so the pixel loss is defined as:
Figure BDA0003107499940000071
It(p) and M'sAnd (p) are the values of the pixel points arbitrarily corresponding to the target picture and the projection reconstruction picture respectively.
Also, because the dynamic, variable environment and the reflections caused by the liquid make some pixels violate the camera model, it is advantageous to ignore these pixels. Therefore, we use the confidence mask matrix of the 'attitude network' output. At different scales, the corresponding mask matrix has the same pixel size as the source image at the corresponding scale. Each element in the confidence mask matrix corresponds to a projection reconstruction image M's(p) for each pixel, if one element in the confidence mask matrix is 1, the corresponding pixel in the projection reconstruction image can be trusted, otherwise the corresponding pixel cannot be trusted. So the pixel loss is redefined as:
Figure BDA0003107499940000072
and fifthly, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss.
If one directly minimizes the loss in the previous step, the confidence mask matrix Ms,lMust be changed to 0 because this makes it possible to make LpixelMinimum to 0. However, this is a solution that is not meaningful, so for Ms,lMinimize cross entropy loss, denoted Lmask(Ms)。
Sixthly, directly estimating the pixel relative depth from the target picture according to Shapefromsanding algorithm, which is called SfS depth matrix. The difference between the depth matrix and the pixel depth prediction matrix output by the network, referred to as the depth penalty, is then computed SfS.
Fig. 5 (a) shows a picture taken during the motion of the capsule in the pig intestine, and fig. 5 (b) shows the pixel depth prediction matrix estimated after the picture is input into the "depth network". Observations show that the non-textured surface of the porcine intestinal wall in the upper left corner is close to the camera lens and the motion variation is small, but the depth prediction incorrectly estimates this as a far pixel. In other words, since only distant scenes may change little over time, the algorithm erroneously estimates the less textured portion of the short motion near as a distant scene.
To address this problem, we use Shape from shaping (SfS) theory. The SfS algorithm is used to estimate the relative depth of a scene from a grayscale picture. According to the SfS algorithm, the reflection equation for a lambertian surface is:
Figure BDA0003107499940000081
E(x,y)is the gray value at the (x, y) pixel. (p)s,qsAnd 1) is the position of the light source above the target surface.
Figure BDA0003107499940000082
p and q are the partial derivatives of the depth Z in the x and y directions, respectively. The above equation can be rewritten as:
Figure BDA0003107499940000083
for a fixed point (x, y) and a given picture E, after a first order taylor expansion, the above equation can be written as:
Figure BDA0003107499940000084
all values of the depth matrix are initialized to 0, and as the iterative computation proceeds, the value of f tends to 0, which is the depth Z convergence. Thus, given a target picture ItBased on SfThe depth matrix of the S algorithm can be calculated and is denoted as DSfS(It). After normalization, SfS Each element D in the depth matrixSfS(p) and each corresponding element D in the pixel depth prediction matrix output by the networktThe difference in (p) should be 0. The depth loss is therefore defined as follows:
Figure BDA0003107499940000085
in the seventh step, the smoothness of the pixel depth prediction matrix is calculated, i.e. the sum of the absolute values of the derivatives of each element of the matrix is calculated, called smoothness penalty.
Smoothness loss is defined as:
Figure BDA0003107499940000086
and step eight, adding the four losses to form a loss function. The loss function is optimized and network parameters are updated until the loss values converge.
The total loss is defined as:
Figure BDA0003107499940000091
where lambda ism,λd,λsWeights representing confidence loss, depth loss and smoothness loss, respectively.
Test procedure
Firstly, inputting a wireless capsule endoscope picture sequence into a trained visual odometer network.
Secondly, inputting three continuous pictures into a trained sub-network 'posture network', and converting the output target picture into a posture transformation matrix of two source pictures by the network.
Third, the present capsule pose with respect to the previous anatomical landmark can be calculated based on the two newly calculated pose transformation matrices and the historical pose transformation matrices previously accumulated from the previous anatomical landmark as a starting point.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims (10)

1. A relative positioning method in vivo of a wireless capsule endoscope is characterized by comprising a training process and a testing process;
wherein the training process comprises the steps of:
s1, inputting a wireless capsule endoscope picture sequence to a visual odometry network based on unsupervised learning, wherein the picture sequence comprises a source picture and a target picture, and the visual odometry network comprises an attitude network and a depth network;
s2, outputting a pose transformation matrix converted from the target picture to the source picture and a confidence mask matrix of the target picture by the posture network;
s3, outputting a pixel depth prediction matrix of the target picture by the depth network;
s4, reconstructing and projecting the source image to a target image according to a camera model based on the pose transformation matrix, the pixel depth prediction matrix and the confidence mask matrix, and calculating a pixel difference between the target image and the reconstructed image, wherein the pixel difference is called pixel loss;
s5, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss;
s6, directly estimating the relative depth of the pixel from the target image, and calculating the difference between the depth matrix and the pixel depth prediction matrix, which is called depth loss;
s7, calculating smoothness loss of the pixel depth prediction matrix;
s8, forming a loss function by the pixel loss, the mask loss, the depth loss and the smoothness loss, optimizing the loss function, and updating network parameters until convergence;
wherein, the test process comprises the following steps:
t1, inputting a wireless capsule endoscope picture sequence including a source picture and a target picture into the trained visual odometer network;
t2, outputting a pose transformation matrix converted from the target picture to the source picture by the pose network;
and T3, calculating the current position of the wireless capsule endoscope relative to the previous biological anatomical landmark according to the newly calculated position and position transformation matrix and the historical position and position transformation matrix accumulated by taking the previous biological anatomical landmark as a starting point.
2. The in vivo relative positioning method of a wireless capsule endoscope as recited in claim 1, wherein said sequence of pictures comprises three consecutive pictures, wherein a first picture and a third picture are said source pictures and a second picture is said target picture.
3. The in vivo relative positioning method of a wireless capsule endoscope as recited in claim 1 or 2, wherein the confidence mask matrix is a confidence mask matrix of the target picture at four different scales, and the pixel depth prediction matrix is a pixel depth prediction matrix of the target picture at four different scales.
4. The in vivo relative positioning method of a wireless capsule endoscope as claimed in claim 1 or 2, characterized in that in step S6, pixel relative depth is directly estimated from a target picture according to the Shapefromsanding algorithm, called SfS depth matrix, and then the difference between said SfS depth matrix and said pixel depth prediction matrix is calculated.
5. The in vivo relative positioning method of a wireless capsule endoscope according to claim 1 or 2, characterized in that in step S7, the sum of absolute values of the derivatives of each element of said pixel depth prediction matrix is calculated as smoothness loss of said pixel depth prediction matrix.
6. The in vivo relative positioning method of a wireless capsule endoscope according to claim 1 or 2, wherein in step S8, the pixel loss, the mask loss, the depth loss, and the smoothness loss are added to constitute the loss function.
7. A method for training a visual odometry network for in vivo relative positioning of a wireless capsule endoscope based on unsupervised learning, the method comprising the steps of:
s1, inputting a wireless capsule endoscope picture sequence to a visual odometry network based on unsupervised learning, wherein the picture sequence comprises a source picture and a target picture, and the visual odometry network comprises an attitude network and a depth network;
s2, outputting a pose transformation matrix converted from the target picture to the source picture and a confidence mask matrix of the target picture by the posture network;
s3, outputting a pixel depth prediction matrix of the target picture by the depth network;
s4, reconstructing and projecting the source image to a target image according to a camera model based on the pose transformation matrix, the pixel depth prediction matrix and the confidence mask matrix, and calculating a pixel difference between the target image and the reconstructed image, wherein the pixel difference is called pixel loss;
s5, calculating the cross entropy loss of the confidence level mask matrix, namely the mask loss;
s6, directly estimating the relative depth of the pixel from the target image, and calculating the difference between the depth matrix and the pixel depth prediction matrix, which is called depth loss;
s7, calculating smoothness loss of the pixel depth prediction matrix;
s8, forming a loss function by the pixel loss, the mask loss, the depth loss and the smoothness loss, optimizing the loss function, and updating the network parameters until convergence.
8. An in vivo relative positioning method of a wireless capsule endoscope, using a visual odometer network trained by the training method of claim 7, comprising the steps of:
t1, inputting a wireless capsule endoscope picture sequence including a source picture and a target picture into the trained visual odometer network;
t2, outputting a pose transformation matrix converted from the target picture to the source picture by the pose network;
and T3, calculating the current position of the wireless capsule endoscope relative to the previous biological anatomical landmark according to the newly calculated position and position transformation matrix and the historical position and position transformation matrix accumulated by taking the previous biological anatomical landmark as a starting point.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
10. An in vivo relative positioning device of a wireless capsule endoscope, comprising a processor and a computer readable storage medium, wherein the processor, when executing a computer program stored on the computer readable storage medium, implements the method of any of claims 1 to 8.
CN202110640677.1A 2021-06-09 2021-06-09 In-vivo relative positioning method and device of wireless capsule endoscope Pending CN113538335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110640677.1A CN113538335A (en) 2021-06-09 2021-06-09 In-vivo relative positioning method and device of wireless capsule endoscope

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110640677.1A CN113538335A (en) 2021-06-09 2021-06-09 In-vivo relative positioning method and device of wireless capsule endoscope

Publications (1)

Publication Number Publication Date
CN113538335A true CN113538335A (en) 2021-10-22

Family

ID=78095731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110640677.1A Pending CN113538335A (en) 2021-06-09 2021-06-09 In-vivo relative positioning method and device of wireless capsule endoscope

Country Status (1)

Country Link
CN (1) CN113538335A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782470A (en) * 2022-06-22 2022-07-22 浙江鸿禾医疗科技有限责任公司 Three-dimensional panoramic recognition positioning method of alimentary canal, storage medium and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782470A (en) * 2022-06-22 2022-07-22 浙江鸿禾医疗科技有限责任公司 Three-dimensional panoramic recognition positioning method of alimentary canal, storage medium and equipment

Similar Documents

Publication Publication Date Title
Song et al. Mis-slam: Real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing
JP4631057B2 (en) Endoscope system
Song et al. Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery
CN109448041B (en) Capsule endoscope image three-dimensional reconstruction method and system
JP5153620B2 (en) System for superimposing images related to a continuously guided endoscope
Wu et al. Three-dimensional modeling from endoscopic video using geometric constraints via feature positioning
CN112802185B (en) Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
Dimas et al. Intelligent visual localization of wireless capsule endoscopes enhanced by color information
US20220198693A1 (en) Image processing method, device and computer-readable storage medium
CN111080778A (en) Online three-dimensional reconstruction method of binocular endoscope soft tissue image
Dimas et al. Visual localization of wireless capsule endoscopes aided by artificial neural networks
Wei et al. Stereo dense scene reconstruction and accurate localization for learning-based navigation of laparoscope in minimally invasive surgery
CN114010314A (en) Augmented reality navigation method and system for endoscopic retrograde cholangiopancreatography
Zhou et al. Real-time nonrigid mosaicking of laparoscopy images
Van der Stap et al. The use of the focus of expansion for automated steering of flexible endoscopes
CN115530724A (en) Endoscope navigation positioning method and device
CN113538335A (en) In-vivo relative positioning method and device of wireless capsule endoscope
Liu et al. Capsule endoscope localization based on computer vision technique
CN114399527A (en) Method and device for unsupervised depth and motion estimation of monocular endoscope
CN116485850A (en) Real-time non-rigid registration method and system for surgical navigation image based on deep learning
Liu et al. Hybrid magnetic and vision localization technique of capsule endoscope for 3D recovery of pathological tissues
Deguchi et al. A method for bronchoscope tracking using position sensor without fiducial markers
Shoji et al. Camera motion tracking of real endoscope by using virtual endoscopy system and texture information
Lin et al. SuPerPM: A Large Deformation-Robust Surgical Perception Framework Based on Deep Point Matching Learned from Physical Constrained Simulation Data
WO2024050918A1 (en) Endoscope positioning method, electronic device, and non-transitory computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination