AU2020100044A4

AU2020100044A4 - Method of tracking of Surgical Target and Tool

Info

Publication number: AU2020100044A4
Application number: AU2020100044A
Authority: AU
Inventors: Xichan Lin; Xinyue Ming; Haomin Shao
Original assignee: Lin Xichan Miss; Ming Xinyue Miss
Current assignee: Lin Xichan Miss; Ming Xinyue Miss
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2020-02-13
Anticipated expiration: 2028-01-10

Abstract

With the wide application of augmented reality navigation system in minimally invasive surgery, growing concerns have been put into the target tracking. Due to the complex situation of surgery and real-time requirements, most extant algorithms do not work effectively. By considering the occultation, deformation, and reflection, this invention patent ensures the robustness tracking for both surgical tools and targets based on superpixel segmentation and character description. Meanwhile, it gives a more accurate contour of surgical tools instead of a region of interest by the combination of KCF, Region growing algorithm. First, traditional TLST model can be optimized into TLST-I by only traversing ROI instead of the whole image and classifying superpixels into three types according to gradient, which is convenient for dealing with singular point and obviously reduces the complexity of calculation.Second, TLST-I can integrate image similarity weight model to detect target, while using DSST achieves scale invariant. This multi-model TSTL algorithm has a good performance on surgical target detection.Third, to detect surgical tool, region-growing is applied within ROI detected by KCF to obtain the contour of target.Finally, to make the tracking process more robust, it is possible to verify the result, after obtaining the contours of the target, according to a novel feature extraction method, making contours more accurate. Optimized TLST_ INPUT Multi-model TSLT mel a ROI segmenttin _ weg hting modelRsl selection ROI Clu~st otourSmi-rt Multi-ROI selection KCF Region- Sampale __ Feature algR groDwing points extraction Tool detection TT ROI Cluster Contour Similarity detection measurement Figure 1 Figure 2

Description

Method of tracking of Surgical Target and Tool

FIELD OF THE INVENTION

This invention lies in the field of Image Processing, which specifically aims at the application of augmented reality system in MIS.

BACKGROUND

Against the backdrop of IT flourish, various techniques are coalescing with medical field. Target tracking in assisting minimally invasive surgery is one significant branch of this field. In present, various target tracking algorithms have been proposed. Unmil raised an algorithm of detection based on single-camera using Camshift algorithm and Kalman filter, which is effective for multi-feature target without occultation. Zhao presented a tracking algorithm based on deep learning and CNN.

Feature descriptor is a significant part in tracking algorithm. Colors, Gradients, Texture and Shape stored as parameters are widely used by the representation of features.

After given a set of extracted features, the estimation of parameter is a significant part of detection, such as pose description. In fact, tracking algorithm is divided into two categories (generative and discriminative). The main existing prototype is SVM+HOG, Decision Forest, which belongs to generative approach, and CNN, Point-based, Region-based which are part of discriminative method.

Yet, most extant methods are aimed at tracking surgery tools and the requirement of real-time and complex situation of MIS lead to the ineffectiveness of some tracking methods.

SUMMARY

In order to solve the shortcomings of existing target tracking algorithms applied in MIS, this invention does some optimization based on TLST model and proposes a novel target tracking algorithm combined with KCF and region growing algorithm. Compared with the traditional TLST, our optimized multi-model TLST reduces the complexity of computation and has a better performance on dealing with singular point. Then,

2020100044 10 Jan 2020 through combining KCF and region growing algorithm, the rigid surgical tool can be detected more accurately. Besides, this invention proposes a novel feature extraction which can be optionally used to do further examination.

The technical solution of this invention can be implemented as follows:

Step (1), traditional TLST model can be optimized into TLST-I by only traversing ROI instead of the whole image and classifying superpixels into three types according to gradient, which is convenient for dealing with singular point and obviously reduces the complexity of calculation.

Step (2), TLST-I can integrate image similarity weight model to detect target, while using DS ST achieves scale invariant. This multi-model TSTL algorithm has a good performance on surgical target detection.

Step (3), to detect surgical tool, region-growing is applied within ROI detected by KCF to obtain the contour of target.

Step (4), to make the tracking process more robust, it is possible to verify the result, after obtaining the contours of the target, according to a novel feature extraction method, making contours more accurate.

DESCRIPTION OF THE DRAWINGS

Figure 1 The flow chart of the algorithm

Figure 2 The tracking result of surgical target and surgical tool

Figure 3 The frame after preprocessed by bilateral filter

Figure 4 The result of contour and edge detection

Figure 5 The result of region-growing optimization

Figure 6 Primary model

Figure 7The iterative clustering method is similar to k-means but the limitation of traverse area decrease the complexity of calculation Figure 8The algorithm block diagram.

DESCRIPTION OF PREFERRED EMBODIMENT

1. Optimization of TLST model

Xiaofeng Ren and Jitendra Malik introduced the concept of superpixel in 2003 [2], which is a kind of over-segmentation, effective in image preprocessing. This study utilizes Simple Linear Iterative Clustering (SLIC) algorithm to segment frames of the video.

1.1 SLIC

2020100044 10 Jan 2020

SLIC is based on gradient descent. Detailed processes are as follow:

• Step 1: Initialize the clustering center.

According to the predetermined number of superpixel, seeds are equally distributed on the image. Assume that the size of image needing processing is N X N, the number of superpixel is k, so the distance between the adjacent seeds is S =

Besides, in order to avoid accidentally choosing the clustering center on edge, the limited window with the size of η x n is set. The superpixel seed will be chosen as the point with the smallest gradient within the limited window..

• Step 2: features based on CIELAB&XY and distance.

Pixels in image are described in CIELAB color space. With the combination of the coordinates, a 5-D feature Cj= [li,ai,bi,Xi,yi], According to color space, the Euclidean distance of color:

dc= J(li - lj)² + (Hi - 3j)² + (bi - bp²

According to coordinates, the Euclidean distance of space:

/n

Normalization: N_s = S = - is the longest distance of space, while N_cis the longest distance of color

N_c is hard to calculate, so a constant value m is set to avoid this problem

D' =

d_s(d_c)² + (y)^2m2 • Step 3: Clustering

1.2 Optimized TLST model

1.2.1 Traditional TLST

Jun Wang presented TLST algorithm in 2017, using SLIC algorithm to obtain superpixel and building connection of superpixels of images between two levels.

Assume that X_t is the tracking result at t^th frame, while Y_t is the observed result. The aim of this algorithm is to find the best matching X_t

2020100044 10 Jan 2020 with given Y_t. Based on Bayesian model:

In which N_sp is the number of superpixel, while P_sp (k) represents the position information of each superpixel is the average value of the pixel’s position value within it. C_sp(k) means the area of superpixel within the target region(ROI).

Senior model

When tracking target in the next frame, the motion is default to be slight. Therefore, the assumption is that target is still in ROI. However, to avoid the deformation of the target, more superpixels are needed to segment the ROI.

Assume sps means the superpixel in next frame while sp is the set of previous superpixel:

W_sps(r) = cp(r,k) * W_sp(k)

In which, cp(r, k) = cp_f(r, k) * cp_p(r, k) <Pf(r, k) — e 2dfmax('k^sP^sl^r)-fsp(k)|l² φ (_r,k) = e”^2dP^max''|Psps(r)-_Psp(k)|b

In which dfmax and dpmax is normalization factor. For any superpixel r, finding superpixel k maximizes cp(r, k)

M_ab = {P_sps(k),F sps (k),W_sps(k)}

1.2.2 Improvement in TLST

In traditional primary model, the weight within the object region is 1, however, some superpixels in it are not target. Meanwhile, in traditional senior model, every superpixel in primary model has to be calculated with the superpixel in senior model, which dramatically leads to the computation complexity.

1.2.2.1 Optimized primary model

First, the target region is divided into m X n blocks with the size of p X q.

Second, set a threshold of gradient and determine the boudary. All blocks are traversed and their HOG are calculated. If ones HOG is bigger than the threshold, this block will be considered as part of target boundary, vice versa.

Third, delete possible singularity blocks. It is obvious that the edge should be continuous. If a block is isolated, it can be eliminate.

2020100044 10 Jan 2020

Finally, for each superpixel, its region covers s blocks. Those blocks can be divided into 3 categories—boundary, in-boundary and out-of-boundary. If the ratio of the number of in-boundary blocks and s is bigger than the set threshold, the weight of this superpixel will be defined as 1, vice versa.

1.2.2.2 Optimized senior model

In this model, the center of superpixel k is denoted as (x_k, y_k), and the traverse within the controlling window DXD contains δ superpixels, which reduce the complexity of computation.

1.3 Multi-model target tracking based on TLST

Image Similarity Weighting Models

In an image retrieval system, the similarity between two images often is explained as a distance between visual features. However, human similarity judgement of two images is highly correlated with human cognition, because similarity is the result of subjective cognition. In order to simulate and analyze the human cognition, we apply the Image Similarity Weighting Models based on the analysis of multidimensional scaling and feature similarity model to measure the similarity between two images.)

In the model of this paper, image similarity metrics are used to help target tracking. During our experiment, we have tried three scales MSE,PSRN,SSIM and finally chose MSE to process the similarity comparison. The introduction of these image similarity metrics are as belows.

1. MSE

MSE(Mean Squared Error) which is simplest and the most direct metric of image similarity. For given m*n images I and K, MSE is

defined as:

I w-1 H-l 2

MSE directly compares the difference between the corresponding pixel points between the two images. The above is a calculation method for grayscale images. If it is a color image, it is usually to calculate the MSE of RGB three channels, and then average.

2. PSRN

PSRN(Peak Signal-to-Noise Ratao), Based on MSE, PSRN is defined as:

2020100044 10 Jan 2020

Where i_s the maximum possible pixel value of the image. If each pixel is represented by an 8-bit binary, then it is 255. PSNR is currently the most common indicator for similarity evaluation. However, the results of PSNR are not necessarily the same as human visual effects. This is because the perception of the human eye is affected by many factors.

3. SSIM

SSIM (Structural Similarity Index), Before SSIM, people have tried to improve the MSE, for example, converting color space to HVS. Converting stored pixel values to displayed brightness values, etc. However, this improvement still has many defects. So we considered new ideas, Designing new evaluation methods based on the high degree of structuring of natural images. SSIM's design idea basically evaluates the similarity of two images through three aspects, namely luminance, contrast and structure.

The basic flow is: for the input x and y, first calculate the luminance measurement, Perform an alignment to obtain the first similarity-related evaluation; subtract the effect of the brightness and calculate the contrast metric ,compare , get a second evaluation; use the result of the previous step to remove the contrast, In the comparison of the structure. Finally, the results are combined to obtain the final evaluation result. The specific calculation formula is as follows:

σ_Λ.σ_ν+<?₂

The formula definition of SSIM is given based on Equations ()()()

SSIM(x, v)=[7(t, y) ^a*c(x,y)^/'*5(x,y)⁷]

Where cl, c2 are two constants, in order to avoid dividing by 0, generally take c3=c2/2. Set «=A=7⁼¹,Bring the formulas ()()() into the formula ():

SSIM(x,y)= (2//j/_v+cJ(2/t„.+c₂) (/4+/4+^)(/4+ σ²+σ,)

Each time you calculate, take an N*N window from the image, then slide the window to calculate, and finally take the average as the global SSIM.

2020100044 10 Jan 2020

Application

The TLST algorithm establishes a weight map relationship with a previous super pixel block by a new one-frame super-pixel segmentation super-pixel block, thereby establishing a confidence map to obtain an optimal position. The establishment of the weight relationship is based on the HSI characteristics of the super pixel block and the position information. Therefore, fundamentally, the TLST algorithm is actually a comprehensive utilization of super pixel block color and position information, and the weight is only an intermediate carrier. In order to apply the color and position information of super pixel blocks more directly, this section proposes a target tracking method based on down-sampling pyramid and image similarity.

In simple terms, by taking down the down-sampling of the ROI region, it is similar to the steps in the SIFT feature. That is, the ROI area is image-compressed, and images of different resolutions are acquired, and then the initial size of the target frame is used as a window to traverse the entire ROI area, and the image is compared to find the position with the highest probability And through the corresponding resolution ratio, find the most likely position in the original picture. The following is an introduction to the algorithm.

1. In the first step, a bilateral filtering preprocessing is added to each frame of the image.

For scenes in surgery, because of lighting, lighting, and other reasons, there will be some outliers, that is, some dark spots or bright spots appear in the field of view, which requires pre-emption for super-pixel technology Otherwise this noise may form a special block later, affecting the result. But simple Gaussian filtering only considers the positional information of the pixels, and does not consider the color difference between the pixels. Therefore, the edge of the object contour is blurred, which is not good for subsequent super pixel processing. Therefore, there is a need for a filter that can both filter out noise and protect the boundary of the object for pre-processing, while the bilateral filtering just satisfies the condition.

2. In the second step, the original ROI area is down-sampled.

The purpose of down-sampling is to reduce the time complexity of subsequent calculation of image similarity If the target is compared on the original pixel basis, because the resolution level is too high, the processing time is too long to meet the real-time requirements. The idea of down-sampling is the same as the down-sampling in SIFT

2020100044 10 Jan 2020 feature extraction, but the difference is that the ratio of the reduction between layers is different. Assuming that the original ROI is an image of m*n, as long as the resolution is less than am*an, it can meet the real-time requirements. Then the resolution factor of the series of images obtained by down-sampling is l«o,«„«₂.....«„].

3. In the third step, the best tracking position is obtained for each layer of down-sampled image.

The target frame image of the corresponding compression ratio is constructed, and the compressed target frame image is used as a window, and the ROI region of the down-sampled layer is traversed to obtain the best position for the comparison result. For example, for the kth layer down-sampling result, the ROI area size is yh_eoriginal size of the target box is w*h. Then the corresponding compression target size Ϊ8^αί^κ’*^αΑ Determine the window of from the top left comer of the compressed ROI image. Then traverse the window horizontally to the right and then vertically downward to traverse the entire image. Calculate the similarity comparison result between the image in the window and the target compressed image every time it is slid, so that the position with the best result is obtained.

4. The fourth step is weighted in proportion.

Since the information is lost after the image is compressed, the lower the compression ratio, the lower the weight of the result of the down-sampling layer should _{α o} be. So suppose [«.>«>«!.....«3 is the [ ^⁼2/^ζ' resolution scale factor of the down-sampled layer from bottom to top, then the corresponding weight is and

Discriminative Scale Space Tracker

DS ST, the Discriminative Scale Space Tracker, is the first algorithm on the VOT in 2014 and is a filter-based algorithm. The algorithm is based on an improvement of MOSSE, and the more eye-catching part is the addition of scale transformation.

The principle of DSST scale filtering

Considering that the features of the image are multi-dimensional, the relevant filters are involved in DSST for position tracking and scaling. The former uses a position filter and the latter uses a scale filter, which are independent of each other. Therefore, different features can be

2020100044 10 Jan 2020 selected for training and estimation. In the DSST algorithm, the cost function considered as:

Only one training image is considered, and the image is characterized by d dimension.Xis a regular coefficient. It is similar to MOOSE, its optimal filtering is:

_n_ GF‘ _a\ ^H‘ Λ Β_τ

V¹ p^p^ ⁷ *=!

When a new frame arrives, the corresponding target position is obtained by the corresponding value of the maximum correlation filter.

d

After the tracking result is obtained, the model can be updated in turn for the tracking of the next frame by using formula () and () ^;₊₁=(ΐ-//Μ!₊/_Ζο+:

d k=l

DSST scale selection method

The highlight of the DSST algorithm is the development of a portable scale estimation filter. The specific operation method is: after detecting the target position, taking the current center position as a center point, acquiring candidate blocks of different scales, and then resize to the same size, and then extracting features for scale filtering training and updating. The specific size selection principles are as follows: «Px«7i,»e{[-EL.....[Ei]}

Where P, R are the width and height of the target in the previous frame. a=l .02 is the scale factor, and S=33 is the number of scales. The purpose of this definition is to obtain the scale from the inside to the outside in a nonlinear manner from the fine to the course. That is to say, the scale change is small, and at the beginning, the smaller scale change is selected, and the change gradually increases.

2020100044 10 Jan 2020

Application

According to the description of the previous DS ST, the paper obtains the position of the target through the first two models, and then applies the scale filter to adjust the scale at the center of the position. About scale filter, it is mainly divided into three parts: initialization of scale model, scale model prediction, and scale model update.

1. Scale model initialization

According to the DSST scale extraction method, 33 scale samples centered on the initial result of the first frame are extracted and then resized to the same size. The FHOG feature extracted for each sample is recorded as sf. The feature is converted into the frequency domain by fast Fourier transform and calculated as sff. Multiply sff and its complex conjugate point as B. The corresponding fast Fourier transform result of the Gaussian output is denoted as ysf, and the complex conjugate point of ysf and sff is denoted as A. A and B are the numerator and denominator of the initial scale filtering.

2. Scale model prediction

After the new frame arrives, after obtaining the target position, 33 photos of the scale are also obtained in the target center and resize to the same size, and the FHOG feature of the corresponding image is obtained and fast Fourier transform is performed, which is recorded as xsf. Then multiply the xsf and A points, then divide the result point by Β+λ, then perform the inverse fast Fourier transform, and take the actual part. Then the corresponding image of the new frame is obtained under the scale filtering. The biggest result of the corresponding is the best scale.

3. Scale model update

Extracting the best scale FHOG characteristics after the best scale has been detected. Record the result of the fast Fourier transform as xsf, and record the corresponding Gaussian corresponding fast Fourier transform result as ysf, Therefore,

A = (1 - η )A + qysf*xsf B=(l-q)B+>iysf*xsf

Where η represents the learning rate. That is, the new scale filtering is obtained by updating A, B based on the detection result.

DSST scale model______________________________________________

1. Input: Input image/_t, Previous frame scaleSt^,Model scaleA^, E^Current positionP_t.

2. Output: Estimated target scaleS_t,Updated scale modeL4_t, B_t.

io

2020100044 10 Jan 2020

3. Extract 33 samples of different scales centered on the current new location of the targetZ_sampZe.

4. Using Z_sample,A_t_₁, B_t_₁ Calculate y according to the formula y = _P-i ( ₇ λ \B_t_₁₊A Sample)·

5. Calculate max(y) to get the target preparation scales_t.

6. Based on current location P_tand Estimated scaleS_t.Extract sample characteristics/^.

7. Update based on A_t = (1 — p)A_t__r + ηΟ_ίΡ_ΐ

8. Update based on B_t = (1 — η)Β_ί;1 + ly/y/y

2. KCF

2.1Tracking of the target

The fundamental step of the tracking of the appointed target in the real-time video is the generation of tracker model which can be accelerated by the KCF filter which stands for the Kemelized Correlation Foundation. It’s an very effectivetechnology to generate models andspeed upprocessing.

The circulation matrix is consisted of the sample’s n-dimension vectors ,and the circulation matrixare built up by these vectors.

x = [x_lfx₂,..., x_n]T /0 - 1\

PI ··. ;

\... 1 0/

Ρ_χ=(χ_η,χ_η_ι,···,χι)^τ

In order to calculate the best parameter w of the filter, by the Fourier transformation, it can be deduced that:

w=(XX+XI)’'xy

X=Fdiag(x)F^H

With the usage of DFT matrix, it can be further inferred that _Λ x^A*Oy^A w---χ^Λ*0χ^Λ+λ

The KCF filter is able to enhance the speed and accuracy of the whole processing.

2.2The detection of the edge

Foreachffameinthevideo, the algorithm uses the Canny operator in the Opencv library in order to get the preliminary edge of the target. The Canny operator is designed to be an optimal and comprehensive edge-detector and to begin with, the input image need to pass through

2020100044 10 Jan 2020 bilateral filter to avoid of certain noises.

The Canny operator provides gradient detection and has 4 stages.First, the Gaussian filter devotes to filter out noises and can be adjusted by the set of Gaussian kernels.

Secondly, with the use of Sober operator, the algorithm will find the intensity gradient of the image. The gradient strength and direction can be acquired by following formula:

g=Vg!TgJ cy

G=arctan(—)

Thirdly, Canny applies the removal of pixels which don’t have maximum in the certain spatial region which implies that they are not thin enough to be the candidates for the edge.

In the fourth step, Canny use an upper and a lower threshold to confirm the detection of the edge. During this section, only the pixel gradient which is higher than the upper threshold or is between the two threshold but has neighboring connection with the pixel which is higher than the upper bound will be chosen.

However, if the optimization is omitted in the program, the original results of the contour detection of the image will conclude too many small loop curves inside the target and thus, the outcome of edge-detection requires the adjustment of other advanced methods.

3. Region-Growing

In order to adjust the edge-detest of the surgical instruments, the algorithm of region-grow is introduced.

1) selection of the seeds

The first step is the selection of ROI which is appointed at the beginning of the video. In the first frame, the ROI is a rectangle region which covered the target surgical instruments. Then the user need to choose the initial seed points within the area and begin the segmentation of the region based on the region-grow algorithm.

2) The increasement of seedlist

The second step is to enlarge the region by comparing the neighboring pixels with the seeds and appending the points whose pixels are in the range which determined by the seedlist. Due to the experiment and the characteristics of surgical instruments and the surrounding organs, it turns out that the RGB color image has better outcome than the grayscale map. Therefore, when counting the distance of pixels, it’s the 3 channel array that been chosen. For instance, in the growth of target image, the Euclidean distance less

2020100044 10 Jan 2020 than 40 will be included in the target area while the pixels will be added into the renewed seed list.

3) Elimination the impact of the hollows

Because the surgical instruments have irregular shape which makes the shadow more complicated in the intracorporal organs, the shadow will create obstacles against the generation of the target region as well as the hollows. By set the threshold of the size of the detected region and throwing off rather small parts, the obstruction of shadow and highlighted area can be removed to some degree.

4) Adjust the contour

The original purpose of the region-grow methods is to refine the result of contour line. Considering the speed of the processing of region-grow algorithm, it’s recommend that the region-grow adaptive adjust can be carried in each 15 frames under the GPU running environment.

4. Length-Ratio & Angles shape descriptor

A more accurate method can be utilized to get precise edge of the target avoiding unexpected error in previous process.

After getting the contour matrix of the target, some equal-distant sample points Pi(Xj,y0 (i=l,2,...,N) are stored. Then, the coordinate of the centroid point G=(x_g, y_G) is obtained by the following formula:

Set a sample rate t. Then choose Point Pj as vertexl (the start point), choose another two points to form a triangle (vertex2 Pj and vertex3 P_k).

H = 2^t

All sample points are denoted from 1 to N clockwise. Then vertex2 is denoted as ‘start point - mXH’, vertex 3 is ‘start point + mXH’ in which m is 1,2,...,t. These three points on the contour form a triangle. Then new centroid point g is:

i ¹

X_g = -(Xj+Xj+X_k) ) ¹ |/_g = 3<yi+yj+yk)

The degrees of interior angles of the triangle are denoted as a₂ a₃. Then calculate the Euclidean distance of PjPj and PjP_k

2020100044 10 Jan 2020

p.p.

Obtain the length ratio: 1 = which is scale invariant. PjPk

The shape descriptor LRA(Pj, Pj, P_k) is [ο^, a₂, a₃,1 ]

ΕΡΑ(Ρ_1; P_H, P_N__H+1) ... LRA(P_n, P_h, P_k)

LRA(Pi_, P_tH, P_N__tH+1) ... LRA(P_N, P_tH, P_N__tH+1)

Here, the Fourier transform is applied to each row (denoted by d_t) of the shape descriptor:

—j2n(u — l)i ^{FDt(i) =} NZ_u=1 ^dt(u)eXp(----N----’

The absolute value of FD_t is starting point invariance of shape contour.

Shape similarity calculation is

N—1 M

Dist(A,^{B) =} (N -\)Μ Σ Zl^abs(FDAt) “ ^absfFDV| t=l v=l

A is denoted as the original shape descriptor, while B is the new data obtained from the next frame. M is the available number of starting point.

Claims

1. Method of tracking of surgical target and tool, wherein said method as follows:

Step (1), traditional TLST model can be optimized into TLST-I by only traversing ROI instead of the whole image and classifying superpixels into three types according to gradient, which is convenient for dealing with singular point and obviously, reduces the complexity of calculation;

Step (2), TLST-I can integrate image similarity weight model to detect target, while using DS ST achieves scale invariant; this multi-model TSTL algorithm has a good performance on surgical target detection;

Step (3), to detect surgical tool, region-growing is applied within ROI detected by KCF to obtain the contour of target;