WO2020133636A1

WO2020133636A1 - Method and system for intelligent envelope detection and warning in prostate surgery

Info

Publication number: WO2020133636A1
Application number: PCT/CN2019/074084
Authority: WO
Inventors: 郭成城; 王行环; 毋世晓; 赵亚楠; 郝玉洁
Original assignee: 武汉唐济科技有限公司
Priority date: 2018-12-27
Filing date: 2019-01-31
Publication date: 2020-07-02
Also published as: CN109754007A

Abstract

A method and system for intelligent envelope detection and warning in prostate surgery. The method comprises the following steps: (1) acquiring envelope image data in a prostate surgery video; (2) performing gray scale processing and singular value decomposition on the envelope data, and extracting a principal component eigenvalue of an image; (3) performing, by means of deep bilateral learning, image enhancement on the envelope image having undergone image preprocessing in step 1; (4) training a neural network; and (5) performing detection, and issuing a warning. A latest artificial-intelligence image recognition technique is used for prostate envelope target detection, thereby providing an intelligent warning in prostate surgery. Compared with an existing static medical image recognition technique, the method performs recognition and warning analysis on a dynamic image of a surgery video. Image preprocessing measures such as data augmentation, principal component analysis, and image enhancement are used, thereby striking a balance between speed and accuracy, and meeting actual application requirements of auxiliary warnings for prostate surgery.

Description

Method and system for intelligent detection and early warning of outer envelope in prostate surgery

Technical field

The invention relates to the technical field of target detection of artificial intelligence, in particular to a method and system for intelligent detection and early warning of an outer envelope in prostate surgery.

Background technique

In the traditional image processing field, target detection is a very popular key technology, and more researches include face detection and pedestrian detection. The traditional target detection generally uses a sliding window framework, which mainly includes three steps: one is to use the sliding window to select the candidate region; the second is to extract the visual features related to the candidate region; the third is to use the classifier for recognition. The more classic algorithm is a multi-scale deformable component model. This algorithm can be regarded as an extension of the "gradient histogram + support vector machine" method. The disadvantage is that it is more complicated and the calculation speed is slow, which cannot support applications with high real-time requirements.

After the development of target detection based on deep learning, the real-time effect has been greatly improved. In 2013, Region-based Convolutional Neural Networks (R-CNN) was born, and the average detection accuracy (Mean Average Precision, mAP) was increased to 48%. After modifying the network structure in 2014, the average accuracy was increased to 66% again, which is a solution that can be applied in industrial grade. Later, Spatial Pyramid Pooling Network (SPP-net), fast region-based convolutional neural network (Fast Region-based Convolutional Neural Networks Fast, R-CNN), and faster region-based volume Neural network (Faster Region-based Convolutional Neural Networks, Faster R-CNN), region-based full convolutional network (Region based Fully Convolutional Network, R-FCN), unified real-time target detection (YouOnlyLookOnce: Unified , Real-Time Object Detection (YOLO), single-shot multi-box detector (Single shot multibox detector, SSD) and other faster and more accurate solutions. Target detection algorithms based on deep learning can be divided into two categories. One is based on regional nomination algorithms, including R-CNN, SPP-net, Fast R-CNN, Faster R-CNN, and R-FCN; End-to-end algorithms, such as YOLO and SSD, however, these two algorithms have the problem of long training time and insufficient positioning accuracy.

Literatures [1][2][3][4] adopted artificial neural network, probabilistic neural network, multi-layer neural network, support vector machine and other technologies to solve medical image processing problems. Reference [5] uses a suitable filter as a preprocessing to remove noise. Reference [6] made an intelligent model by using principal components analysis (Primary Components Analysis, PCA) and segmentation. Reference [7] uses gradient vector flow to extract the edges of tumors in the image, and uses a combination of principal component analysis and artificial neural network (Primary Components Analysis-neural network, PCA-ANN) method to detect the region of interest. Reference [8] uses discrete wavelet changes to obtain the features of medical images, and uses PCA to reduce the features. Literature [9] uses discrete wavelet transform to extract features and PCA to reduce features. However, none of the above studies have considered the real-time nature of the algorithms. Therefore, these algorithms are not suitable for minimally invasive plasma bipolar electrosurgical procedures that require high real-time performance. At present, the real-time target detection algorithms based on deep learning are YOLO and SSD, but they still have the problems of real-time guarantee and target positioning inaccuracy in the detection of the outer envelope in the prostate surgery video. Therefore, it is necessary to research and design a new method to meet the requirements of faster and more accurate detection and judgment of the outer envelope in prostate surgery.

Summary of the invention

In view of the specific needs of the early warning analysis application of minimally invasive plasma bipolar electrosurgical surgery and the development status of medical image processing technology, the present invention proposes a method and system for intelligent detection and early warning of the outer envelope in prostate surgery, focusing on solving two problems : First, the problem of real-time guarantee of the outer envelope detection based on the video images of the operation site; second, on the premise of ensuring that no leaks are detected, try to improve the accuracy of the positioning of the outer envelope, and provide better warning instructions for the surgeon And help.

The method for intelligent detection and early warning of the outer envelope in prostate surgery proposed by the present invention is special in that the method includes the following steps:

1) Data collection: collect the outer envelope image data in the prostate surgery video;

2) The first image pre-processing: grayscale processing and singular value decomposition are performed on the outer envelope data to extract the outer film image with the main component characteristic value;

3) Second image pre-processing: using deep bilateral learning to enhance the outer envelope image after the first image pre-processing;

4) Neural network training: perform feature extraction and network training on the outer envelope image after the second image preprocessing to generate a trained detection model;

5) Detection and early warning: real-time dynamic images of the prostate surgery site video are collected, and the dynamic images are recognized as image data after the first image preprocessing and the second image preprocessing are input to the detection model, when the detection model detects outsourcing When the membrane features a target, an alarm message is output.

Preferably, before step 2), a data amplification step is also included. The training samples are all from prostate surgery video recordings. Due to various reasons, it is inevitable that the captured image features are not obvious and the features are redundant. In addition, after all, the video data is limited. It is necessary to consider the different habits and operating methods of different doctors in the application, which will inevitably lead to the possibility that the outer envelope image will show different angles and various shapes. Therefore, the present invention is designed to use the "amplifier" to enhance the number of images.

Preferably, the step 4) is based on the YOLOv2 platform and the MobileNet deep learning model. Because the detection and warning system needs to run on the embedded device integrated with the surgical host, the biggest advantage of using the combination of mobilenet + YOLOv2 is that the real-time performance can be well guaranteed, and the balance between speed and accuracy is met, which meets the prostate. The practical application requirements of surgical assistant early warning.

Preferably, the specific steps of step 3) include:

3.1) Convert high-resolution input images to low-resolution streams;

3.2) The low-resolution stream is divided into a local path and a global path. The local path uses a full convolution layer to learn the local features of the image data, and the global path uses the convolution layer and the fully connected layer to learn the global features of the image, and then the two paths The output of is fused into a common set of fusion features;

3.3) Use the fusion feature as a third-dimensional unfolded bilateral network and output a bilateral grid of emissivity coefficients;

3.4) Upsampling the two-sided grid of emissivity through a single-channel guide map;

3.5) Affine transform the fused features and output at full resolution.

Preferably, the specific steps of the data amplification step are: importing the module, instantiating the pipeline object, specifying the directory containing the image to be processed; defining data enhancement operations, including perspective, angle deviation, shearing, elastic deformation, brightness, Contrast, color, rotation, and cropping are added to the pipeline; call the pipeline's sample function to specify the total amount of samples after enhancement.

Preferably, the specific steps of step 4) include: 4.1) pre-training; 4.2) feature extraction; 4.3) boundary box prediction; 4.4) classification.

The invention also provides an intelligent detection and early warning system based on the outer envelope in prostate surgery, which is special in that it includes an image acquisition module, an image processing module, and an image detection and early warning module; the image acquisition module is used for acquisition and storage Image information and model; the image processing module is used to perform the first image preprocessing and the second image preprocessing on the collected image data; the image detection and early warning module is used to perform network training on the processed image to generate After the training of the detection model, the image to be detected is input into the detection model to obtain the detection and early warning results.

Further, the image acquisition module includes a digital video interface for interfacing with an endoscope, an image data memory for storing real-time image data during surgery, and a model for storing processed images and models after deep learning. Image model storage.

Furthermore, the image processing module includes a data amplification component, an image feature extraction component and an image enhancement component.

Furthermore, the image detection and early warning module includes an image depth training component and an image detection and early warning component.

The working process of the present invention is as follows: first, a certain number of prostate outer capsule images are extracted from the surgical video recording; second, if the extracted capsule images are too few, the data enhancement method can be used to enhance the number of pictures; again , Use PCA to extract image features, perform the first step of image preprocessing; then, use deep bilateral learning to perform a second preprocessing on some pictures with insignificant features, and then use mobilene+YOLOv2 to train the pictures; finally, monitor The real-time surgical video image on the device is used to detect the outer envelope image target.

The beneficial effects of the present invention are:

1) During the operation, the endoscope tracks the mechanical operation site through the probe to obtain a visual image of the operation interval. Due to the different postures of the patients and the different habitual operation methods of the doctors, the outer envelope image will show different angles and various shapes. The data expansion method can greatly enrich the original data set, avoiding the phenomenon of overfitting during deep learning, so as to achieve better detection results.

2) If there are too many feature values of the target in an image, it will lead to the problem of inaccurate positioning. In addition, the texture, color and other characteristics of the outer envelope are similar to some polyp tissues, and they need to be carefully observed to distinguish them. Preprocessing the pictures using the principal component analysis method can effectively select key image features, on the one hand reduce the deep learning training time, on the other hand optimize the existing detection model, you can get a more accurate outer envelope Positioning effect.

3) The focus of the endoscopic image must be manually operated by the surgeon, and the distance between the light source and the subject is constantly changing. Therefore, it is inevitable that some images are not clear. Using image enhancement technology in conjunction with the above-mentioned principal component analysis method to preprocess the picture can make the feature parts of the partially dark picture more obvious, so as to better extract features during training. At the same time, adding image enhancement to the detection process can effectively improve the recognition accuracy.

4) Since the detection and early warning system needs to run on the embedded device integrated with the surgical host, the biggest advantage of using the combination of mobilenet+YOLOv2 is that the real-time performance can be well guaranteed, but the disadvantage is that the detection accuracy is not high. To this end, we adopt image preprocessing measures such as data expansion, principal component analysis, image enhancement, etc., which can achieve a balance between speed and accuracy, and meet the practical application requirements of auxiliary warning for prostate surgery.

BRIEF DESCRIPTION

FIG. 1 is a structural block diagram of a system for an intelligent detection and early warning method of an outer envelope in prostate surgery of the present invention.

FIG. 2 is a working flowchart of a method for intelligent detection and early warning of an outer envelope in prostate surgery of the present invention.

3 is a detection effect diagram of an intelligent detection and early warning method of an outer envelope in prostate surgery of the present invention.

detailed description

The present invention will be further described in detail below with reference to the drawings and embodiments, but this embodiment should not be construed as limiting the present invention.

The invention is mainly for real-time early warning and recognition of the outer envelope image in the video image of minimally invasive prostate surgery. As shown in Figure 1, the early warning system mainly includes an image acquisition module, an image processing module, and an image detection and early warning module.

Image acquisition module, used to collect and store image information and models, which contains an adapter interface connected to the digital video interface DVI of the endoscopic imaging device, an image data storage, and an image model storage; the adapter interface is responsible for the endoscope The 1920×1200p/60Hz CVT-RB video stream output from the digital video interface is converted to 1920×1080p/60Hz RGB24 video stream, and input into the management machine running the early warning analysis system; the image data storage is responsible for buffering real-time video data of surgical images, The buffer space can be set for the image quality of 1080p (or 720p); the image model memory is responsible for storing the pre-processed image and the model after deep learning training.

The image processing module is used to perform the first image preprocessing and the second image preprocessing on the collected image data. It contains a data amplification component, an image feature extraction component, and an image enhancement component.

The data amplification component implements operations such as rotation, stretching, elastic deformation, and cropping on the marked outer envelope image.

The image feature extraction component realizes the feature acquisition of the outer envelope picture based on principal component analysis, and extracts a total of 300 feature values, including the following functions:

1) Perform gray-scale processing on the collected outer envelope image. Perform gray-scale processing on the collected envelope image; the color of each pixel in the color image is determined by the three components of R, G, and B, and each component has a median value of 255, such a pixel can have more than 1600 pixels Thousands (255 × 255 × 255) range of colors. The grayscale image is a special color image with the same three components of R, G, and B. The change range of one pixel is 255 kinds. Therefore, in digital image processing, the images of various formats are generally converted into Grayscale images to make subsequent images less computational.

2) Perform singular value decomposition on the grayscale image. Eigenvalue decomposition is a very good method for extracting matrix features, but it is only for square matrix. In the real world, most of the matrices we see are not square matrices, but the use of singular value decomposition can describe the important characteristics of such ordinary matrices. Any m×n matrix can be singular value decomposition, split into three matrix multiplication form. Singular value decomposition can represent a more complex matrix by multiplying several smaller and simpler sub-matrices. These small matrices describe the important characteristics of the original matrix. Since the singular vectors derived from singular value decomposition are arranged from large to small, from the perspective of principal component analysis, the axis with the largest variance is the first singular vector, and the axis with the second largest variance is the second singular vector. Therefore, the most important key features of the grayscale image can be obtained based on the singular value decomposition.

3) Regenerate and save the outer envelope picture. In the 300×300 image, we extracted 300 features.

The image enhancement component implements image enhancement on some dark pictures, and determines the final training data set, including the following functions:

1) Feature extraction of low-resolution images. By converting high-resolution input images to low-resolution, and performing most of the learning and training processes at low-resolution, a large amount of computational cost can be saved and the model can be quickly evaluated. Low-resolution copy of input image I in a low-resolution stream

Most of the inferences were made, and finally a local affine transformation was predicted with a representation similar to a bilateral grid.

2) The fusion feature as the third bilateral network that has been developed. Since image enhancement usually depends not only on local image features, but also on global image features, such as histogram, average intensity, and even scene category. Therefore, our low-resolution stream is further divided into local paths and global paths. Then, our architecture merges these two paths to produce the final coefficients that represent the affine transformation. Low resolution stream input

The image size is adjusted to 256×256, which is first processed by a series of convolutional layers to extract low-level features and reduce spatial resolution. Then the final low-level features are processed by two asymmetric paths, one path is fully convoluted, specializing in learning local features of image data while retaining spatial information; the second path uses a convolutional layer and a fully connected layer to learn the global feature. Finally, the outputs of the two paths are fused into a common set of features, and the point-wise linear layer outputs the final array A from the fused stream, which is called a double-sided grid of affine coefficients.

3) Use trainable slices for upsampling. Introducing a layer based on bilateral grid sharding operation can transform the information of the previous step into a high-resolution space. This layer takes the single-channel guide map g and the feature map A (considered as a double-sided grid) as inputs, and performs data search on A, and the sharding operator performs upsampling operations, that is, by trilinear interpolation at the position defined by g A coefficient, the output is a new feature map

Its spatial resolution is the same as g. Sharding is done using OpenGL (Open Graphics Library), and by this operation, the edges of the output graph follow the edges of the input graph to achieve the edge-preserving effect.

4) Realize the final output of full resolution. For input image I, extract its features

Its purpose is to obtain the guide map, and the second is to do regression for the full-resolution local affine model obtained above. The guide map is obtained by adding three channels to the original image, and the final output can be regarded as the result of affine transformation of the input features.

The image detection and early warning module is used to perform network training on the processed images to generate a trained detection model, and then input the image to be detected into the detection model to obtain detection and early warning results, which includes an image depth training component and an image detection and early warning component.

The image depth training component consists of the following functions:

1) Pre-train with the final data set. First train the network from the beginning with a 224×224 input, about 160 sequences (run all training data 160 times in a loop); then, adjust the input to 448×448 and train 10 more sequences.

2) Use mobilenet to perform feature feature extraction on the pre-processed outer envelope image to generate a feature map. Mobilenet is a lightweight deep network model proposed mainly for mobile applications. Depthwise Separable Convolution is mainly used to decompose the standard convolution kernel to reduce the amount of calculation. The purpose of using this network is to deploy deep networks on embedded devices.

3) After feature extraction, YOLOv2 (the second version of YOLO) is used for classification. Although the deep training network based on mobilenet+YOLOv2 can meet the fast detection in real time, the detection accuracy is not high. Therefore, we expanded the data before detection, and extracted features using principal component analysis, and used deep bilateral learning to enhance some dark and insignificant images. Finally, in terms of speed and accuracy Balanced.

The image detection and early warning component realizes real-time detection, recognition and early warning of the prostate surgery video image with the weight of training. In order to speed up the detection, a neural network computing stick is used. Movidius Neural Computing Stick (NCS–Neural Computing), its biggest feature is that it can provide more than 100 billion floating-point operations per second at a power of 1 watt. The steps include, first, prepare the Mobilenet+Yolo deep neural network model and test data set that have been trained using the caffe deep learning platform. The test data set of the video detection task is real-time video. Secondly, the Caffe model is compiled into a graph file dedicated to the neural computing stick by using the compilation tool mvNCCompile provided by the NCS provided by the neural computing stick; again, the python interface provided by the NCS SDK is called on the neural computing stick to run the compiled nerve Network model. Introduce mvnc module to call neural computing stick for inference work. When the detected classification score reaches more than 94%, the system immediately sends an early warning signal.

The method for intelligent detection and early warning of the outer envelope in prostate surgery proposed by the present invention,

It includes the following steps:

1) Data collection: Collect the outer envelope image data in the prostate surgery video; the outer envelope image data comes from the prostate surgery video recording and mark the images with the outer envelope feature.

2) Data augmentation: all training samples are from video recordings of prostate surgery. For various reasons, it is inevitable that the captured image features are not obvious, and the features are redundant. In addition, after all, the video data is limited. It is necessary to consider the different habits and operating methods of different doctors in the application, which will inevitably lead to the possibility that the outer envelope image will show different angles and various shapes. Use the "amplifier" to expand the image. "Amplifier" is a software package for image enhancement, which can be used to generate image data for machine learning. Data amplification is usually a multi-stage process. The "amplifier" uses a pipeline-based processing method, adding various operations in sequence to form the final operation pipeline. The image is sent to the pipeline, and the operations in the pipeline sequentially act on the picture to form a new picture and save it. The operation defined in the "amplifier" pipeline is to randomly process the pictures according to a certain probability.

"Amplifier" has many classes for image processing functions, including operations: perspective, angle deviation, shear, elastic deformation, brightness, contrast, color, rotation, cropping, etc. It uses a "pipeline"-based processing method, and different operations are added to the pipeline in sequence to form the final operation pipeline. The operation is mainly divided into three steps:

①Import related modules, instantiate pipeline objects, and specify the directory containing the pictures to be processed;

②Define data enhancement operations, such as perspective, angle deviation, shearing, elastic deformation, brightness, contrast, color, rotation, cropping, etc., to add to the pipeline;

③Call the sample function of the pipeline, and at the same time, specify the total amount of samples after enhancement, no matter how many initial samples, can generate a specified number of samples.

The expanded data set can be based on the limited original image data to avoid overfitting during deep learning training, so as to achieve better detection results.

3) The first image pre-processing: gray-scale processing and singular value decomposition of the outer envelope data to extract the main component feature values of the image.

If there are too many feature values of an object in an image, it will cause inaccurate positioning. In addition, the texture, color and other characteristics of the outer envelope are similar to some polyp tissues, and they need to be carefully observed to distinguish them. To this end, the present invention uses the "dimension reduction" method of principal component analysis to process pictures, and extract the main key features of the pictures. The advantage of this is that on the one hand, it reduces the model training time; on the other hand, it improves the location accuracy of detection and recognition. The steps are: 1) Load the image. 2) Obtain the gray value of the image. 3) Perform singular value decomposition on the grayscale image.

The principal component analysis problem is a basic transformation, that is, transformation from one matrix to another matrix, so that the transformed data has the largest variance. The size of the variance describes the amount of information of a variable. For the data used for machine learning, it makes sense only if the variance is large. The direction with large variance is the direction of the signal, and the direction with small variance is the direction of noise. To put it simply, principal component analysis is to sequentially find a set of mutually orthogonal coordinate axes in the original space: the first axis is the coordinate that maximizes the variance; the second axis is orthogonal to the first axis The coordinate in the plane that maximizes the variance; the third axis is the coordinate with the largest variance in the plane orthogonal to the 1st and 2nd axes. Suppose that in n-dimensional space, if n such coordinate axes can be found, take the first r to approximate this space, so that an n-dimensional space is compressed into r-dimensional space, and the r coordinate axes should be selected to make the space as much as possible. The loss of data during compression is minimal.

Given an m×n size image, it is represented as a vector matrix, and the elements in the vector are pixel grayscale, stored in rows and columns, defined as A _m×n . Suppose each row of the matrix represents a sample, and each column represents a set of features, expressed in matrix language as,

To change the coordinate axis of an m×n matrix A, P is a transformation matrix that transforms an n-dimensional space into another n-dimensional space, and performs some spatial rotation, stretching and other changes.

Refers to the transformed matrix. That is: A is the original image matrix, and the purpose of principal component analysis is to make the original image matrix A undergo a transformation matrix P to finally obtain the transformed matrix

By transforming an m×n matrix A into an m×r matrix, the original sample with n features can be transformed into a sample with only r (r<n) features. The r features are the original Refinement and compression of n features. If we compress the original image, then after an r×r transformation matrix, we will get the dimensionality-reduced transformation matrix

The r×r transformation matrix is the selected feature vector after sorting. Expressed in mathematical language

The singular vectors obtained by singular value decomposition are also arranged from singular values from large to small. From the perspective of principal component analysis, the axis with the largest variance is the first singular vector, and the axis with the second largest variance is the second Singular vector. Singular value decomposition formula

A _m×n ≈U _m×r E _r×r V _r×n ^T , (3)

Among them, A is an m×n matrix, then matrix decomposition will get U, E, V ^T (transpose of V) three matrices, where U is an m×r square matrix, known as left singularity Vectors, the vectors in the square matrix are orthogonal; E is an r×r diagonal matrix, except for the elements of the diagonal are all 0, the value on the diagonal is called the singular value; V _T (V Transpose) is an r×n matrix, called right singular vector, and the vectors in the square matrix are also orthogonal.

If both sides of the singular value decomposition formula are multiplied by an orthogonal matrix V at the same time, formula (3) becomes

A _m×n V _r×n ≈U _m×r E _r×r V _r × _n ^T V _r×n = U _m×r E _r×r . (4)

Compare formula (4) with formula (2), which means that the columns of the matrix are compressed. Similarly, if you need to compress the rows, just multiply both sides of the singular value expression by the transpose matrix of U.

U _r×m ^T A _m×n ≈E _r×r V _r×n ^T (5)

Through formulas (4) and (5), we can get the principal component eigenvalues compressed in two directions. After the eigenvalues are calculated, the eigenvalues in the covariance matrix will be arranged in descending order, and the eigenvectors will also change correspondingly. Taking the first 300 eigenvectors, the reconstructed image can be generated and compressed with the main component eigenvalues. Film image.

4) Second image pre-processing: using deep bilateral learning to enhance the outer envelope image after the first image pre-processing.

In the current minimally invasive prostate surgery, the focus of the endoscopic image must be operated by the surgeon, and the distance between the light source and the subject is constantly changing. Therefore, there are inevitably some images that are not clear. Happening. Based on the design requirements of the surgical early warning system "would rather misjudge, but also avoid missing judgment", combined with the outer envelope recognition mainly based on the characteristics of its texture features (color and shape are not important), we use deep bilateral learning method to Pictures that are too clear are enhanced with pictures to make the detected image features more obvious. This is helpful for early model training and early detection and early warning. The new network architecture built by the algorithm can reproduce the image enhancement in real time on mobile devices at full HD resolution. The algorithm processing result has HDR (High Dynamic Range Image Processing) function, which makes the picture expressive and retains edge information, and only requires limited calculations at full resolution. Therefore, the algorithm can also be used for real-time image enhancement on embedded devices for minimally invasive surgery.

4.1) Feature extraction of low-resolution images, by converting high-resolution input images to low-resolution, and performing most of the learning and training processes at low-resolution, can save a lot of calculation costs and achieve rapid evaluation of the model . Low-resolution copy of input image I in a low-resolution stream

Adjust the image size to 256×256, and then down-sample through a series of convolution kernels with stride of 2 (stride=2). The formula is as follows,

Among them, S _i is the stride convolution layer,

Is the index of the convolution layer; x′, y′ are the horizontal and vertical coordinates of the pixel before convolution, x, y is the horizontal and vertical coordinates of the pixel after convolution; c and c′ are the index of the convolution layer channel; w Convolution kernel weight matrix; b is offset. When the activation function σ adopts ReLU convolution, it is filled with 0. Since the scale of the image will be reduced after the convolution, the pixels that are initialized to 0 at the periphery of the original picture can be added to maintain the scale of the convolved image to a certain extent. This formula means to perform n _s layer operation on the low-resolution copy of the image. Each convolution layer includes the convolution operation of the convolution check image and input the result into the activation function, thus obtaining the feature map of the low-resolution image.

The image actually shrank

Times. n _s is the maximum value of the above convolutional layer index i). It has two functions: one is to drive the learning of low-resolution input and the learning of the affine coefficients in the final grid. The larger the n _s , the coarser the grid; the second is to control the prediction As a result of the complexity, the deeper network layers can obtain more complex and abstract features. Here, n _s = 4 is set, and the size of the convolution kernel is 3×3.

4.2) The low-resolution stream is divided into a local path and a global path. The local path uses the full convolution layer to learn the local features of the image data, and the global path uses the convolution layer and the fully connected layer to learn the global features of the image. Then, the two The output of the path is fused into a common set of fused features.

Local features: further processing features of low-resolution images,

That is, the number obtained in formula (6)

The layer feature map further extracts features through a convolutional layer with n _L =2. Set stride=1 here, which means that the resolution of this part will not change anymore, and at the same time, the number of channels will not change. Therefore, adding the convolution used in step 4.1), the total is n _S + n _L layers.

Global features: Global features further develop the features in the feature map of low-resolution images. This part is represented by G ⁱ ,

The number of layers n _G = 5, the first

The layer feature map extracts global features through two convolutional layers and three fully connected layers. Global information possessed by global features can be used as a priori for local feature extraction. If there is no global feature to describe the high-dimensional representation of image information, the network may make incorrect local features.

Use a point-by-point radiation transformation to fuse global features and local features, that is, to obtain the local feature map

And global feature map

Perform affine addition and use the ReLu function to activate. The calculation formula is as follows, where F represents the feature map after fusion,

In this way, a 16×16×64 feature matrix is obtained, and its input to a 1×1 convolution layer can obtain a feature of 16×16 size and an output channel of 96. The calculation formula is as follows:

A _c [x,y]=b _c +∑ _c′ F _c′ [x,y]w _cc ′. (8)

4.3) Use the fusion feature as the third-dimensional unfolded bilateral network, and output a bilateral grid of emissivity coefficients.

Using the fusion feature as the third-dimensional already developed bilateral network, the calculation formula is as follows

Among them, dc = 8 is the depth of the network. Through this conversion, A can be seen as a 16×16×8 double-sided grid, each grid has a 3×4 affine color transformation matrix. This conversion makes the previous feature extraction and operation operate in the bilateral domain, which corresponds to the convolution in the x and y dimensions, learning the features of the z and c dimensions blending with each other. Therefore, the previous feature extraction operation is also more expressive than using 3D convolution in a double-sided grid, because the latter can only relate to the z dimension. At the same time, it is more effective than the general two-sided grid, because it only needs to focus on the c-dimensional discretization. In short, that is, by using 2D convolution and using the last layer as a double-sided grid, it can be used to determine the optimal way to convert 2D to 3D.

4.4) Upsample the two-sided grid of emissivity through a single-channel guide map.

Convert the output result of the previous step to the input high-resolution space, and "up-sample" it with a single-channel guide map. The upsampling of A based on the guide graph g is to use the coefficient of A for cubic linear interpolation, the position is determined by g, and the calculation formula is as follows

Among them, A _c [i, j, k] represents the bilateral grid coefficients obtained based on the low-resolution image, and i, j, k represents its three dimensions.

Represents the coefficient based on high-resolution space obtained after up-sampling of A _c [i,j,k]. τ(·)=max(1-|·|,0)τ(·) represents linear interpolation, s _x and s _y represent the height and width ratios of the grid and the full resolution original image, in particular, each pixel Is assigned a coefficient (this coefficient is the coefficient of the affine transformation above), and its corresponding depth in the grid is determined by the image gray value g[x,y], which is A _c [x,y,g[x ,y]], that is, use the guide map to interpolate the grid. After interpolation, the depth of each pixel is the corresponding guide map pixel minus the depth of the corresponding grid. The slicing is done using the OpenGL library. Through this operation, the edges of the output graph follow the edges of the input graph, achieving the edge-preserving effect.

4.5) After affine transformation of the fusion feature, it is output at full resolution.

For input image I, extract its features

Its purpose is to obtain the guide map, and the second is to do regression for the full-resolution local affine model obtained above.

The guide map is obtained by adding three channels to the original image, and the calculation formula is as follows

among them,

Is a 3×3 color conversion matrix, and b and b′ are offset. Ρ _c is a piecewise linear conversion module, including threshold t _c,i and gradient a _c,i , which is obtained by 16 ReLU activation units. The calculation formula is as follows:

The parameters M, a, t, b, b′ are all obtained through learning.

In the original image I (here with

Same) The coefficient matrix obtained in the above process

Calculate the final output O, which can be regarded as the result of the input, the calculation formula is as follows,

5) Neural network training: Perform feature extraction and network training on the outer envelope image after the second image preprocessing to generate a trained detection model. Specific steps include:

5.1) Pre-training

YOLOv2 divides pre-training into two steps: first train the network from the beginning with a 224×224 input, about 160 sequences (loop all training data 160 times); then, adjust the input to 448×448, and train 10 more sequence.

5.2) Feature extraction

The training structure adopted by the present invention uses MobileNet for feature extraction. The core idea of MobileNet is to decompose the standard convolutional layer into two convolutional layers: sub-channel convolution and single-pixel convolution. Sub-channel convolution uses M convolution kernels to generate M feature maps, and single pixel convolution linearly combines the feature maps.

The calculation of the MobileNet convolutional layer can be divided into two steps:

Sub-channel convolution. For each input channel, a convolution kernel of D _K ×D _K ×1 is used for convolution, and a total of M convolution kernels are used to obtain M D _F ×D _F ×1 feature maps. These The feature maps come from different channels of input and are independent of each other.

Single pixel convolution. For the input of the M channels obtained in the previous step, standard convolution is performed with N 1×1×M convolution kernels, and the output of D _F ×D _F ×N is obtained.

Compared with the standard convolutional layer, the calculation amount using the MobileNet convolution method can save about 8 to 9 times, which can effectively reduce the parameter amount of the Yolo algorithm, reduce the calculation amount, and further ensure the real-time performance of the early warning function.

5.3) Boundary box prediction

YOLOv2's "anchor box" is obtained by clustering. Count the training samples, and take the top shapes as the "anchor box". Since the data comes from training samples, if each grid is predicted according to this, it will basically cover the most likely situation, and the recall rate will be relatively high. YOLOv2 uses "anchor boxes" to predict "boundary boxes".

YOLOv2 performs target angle detection by dividing grids. Each grid is responsible for detecting a part of the picture. Each grid includes 5 “anchor boxes”. YOLOv2 predicts four coordinate values (t _x , t _y , t _w , t _h ) for each “anchor box”, based on the offset (c _x , c _y ) in the upper left corner of the image and the width p of the previously obtained bounding box _w and high p _h , the equation is as follows,

b _y =σ(t _y )+c _y ;

b _x =σ(t _x )+c _x ;

YOLOv2 predicts the score of an object through logistic regression for each "boundary box". If the predicted "boundary box" mostly coincides with the real border value and is better than all other predicted, then this value is 1. If the overlapping part does not reach a threshold (the default threshold set in YOLOv2 is 0.5), then the predicted "boundary box" will be ignored, which means that there will be no loss.

5.4) Classification

The vector size of the YOLOv2 neural network output is 13×13×30, of which 13×13 is to divide the picture into 13 rows and 13 columns, a total of 169 cells, and 30 represents 30 data per cell. The 30 data for each cell is decomposed into 30=5×(5+1), that is, each cell includes 5 “anchor boxes”, and each “anchor box” includes 6 data: item existence confidence, item center position (x,y), item size (w,h) and category information.

6) Detection and early warning: real-time dynamic images of the prostate surgery site video are collected, and the dynamic images are recognized as image data after the first image preprocessing and the second image preprocessing are input to the detection model, when the detection model detects outsourcing When the membrane features a target, an alarm message is output.

6.1) Testing process

The workflow of system detection and early warning is shown in Figure 2.

●The management machine reads the endoscope equipment through a dedicated video adapter card to output real-time video;

●The real-time video is analyzed by the detection and early warning module, and the detection result is output in the form of video. When the outer envelope target appears, the buzzer sounds to alert the doctor;

●The doctor watches the test results in real time and quickly locates the lesion.

6.2) Test results

Part of the detection effect is shown in Figure 3. The frame rate of detection and identification meets 30fps, and the average accuracy of identification can reach 90%.

6.3) System configuration requirements

The minimum operating system requirements for the management machine are Windows 7 or ubuntu 16.04, CPU i5 quad-core, and 8G memory. Equipped with an image processing unit (GPU) or multiple Movidius (neural computing sticks) that support deep learning algorithms can further accelerate the video processing speed.

The specific implementation examples described in the present invention merely exemplify the spirit of the present invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific implementation examples or replace them in a similar manner, but they will not deviate from the spirit of the present invention or go beyond the appended claims Defined scope.

references:

[1] Kadam D, Gade S, Uplane M, D, et al. Neural network based brain detection detection using MR images [J]. 2011, 2: 325-31.

[2] Othman M, F, Basri M, Probabilistic Neural Network, for Brain Tumor Classification [C]//Second International Conference, Intelligent Systems, Modelling, and Simulation. IEEE, 2011: 136-138.

[3]SelvamV,S,ShenbagadeviS.Brain,tumor,detection,scalp,eeg,withmodified,Wavelet-ICA,andmultilayer,feed,forward,neural,network[C]//International,Conference,oftheIEEE,Engineering,Medicine&Biology,Society.Confl,Proc ,2011:6104.

[4] Du X, Li Y, Yao D. A Support Vector Vector Machine Based Algorithm Algorithm for Magnetic Resonance Image Segmentation [C]//Fourth International Conference on Natural Computation. IEEE Computer Society, 2008: 49-53.

[5] Pujar, J, H, Gurjal, P, S, Shambhavi, D, S, et. Al., Medical, image, segmentation, based, on, vigorous,, smoothing, and edge, detection, andideology [J]. World Academy of Science, Engineering & Technology, 2010, 19 (68): 444.

[6]HotaH,Sukla,SukP,GulhareK.Review ofIntelligentTechniquesApplied forClassification andPreprocessing ofMedicalImageImageData[J].International Journal of Computer ScienceIssues,2013,10(1).

[7] Vinod Kumar, NiranjanKhandelwal and et.al. "Classification of BrainTumors using PCA-ANN", 978-1-4673-0126-8/11, IEEE 2011.

[8] Rajini, N, Bhavani, R. Classification, MRI, brain, images, using k-nearestneighbor, and artificial neural network[C]//InternationalConferenceRecentTrendsInformationTechnology.IEEE,2011:563-568.

[9] Najafi S, Amirani M, C, Sedghi Z. Anew approach to MRI brain images classification [C]//Electrical Engineering. IEEE, 2011: 1-5.

Claims

A method for intelligent detection and early warning of outer envelope in prostate surgery, characterized in that the method includes the following steps:

1) Data collection: collect the outer envelope image data in the prostate surgery video;

2) The first image pre-processing: grayscale processing and singular value decomposition are performed on the outer envelope data to extract the outer film image with the main component characteristic value;

3) Second image pre-processing: using deep bilateral learning to enhance the outer envelope image after the first image pre-processing;

4) Neural network training: perform feature extraction and network training on the outer envelope image after the second image preprocessing to generate a trained detection model;

5) Detection and early warning: real-time dynamic images of the prostate surgery site video are collected, and the dynamic images are recognized as image data after the first image preprocessing and the second image preprocessing are input to the detection model, when the detection model detects outsourcing When the membrane features a target, an alarm message is output.
The method for intelligent detection and early warning of outer envelope in prostate surgery according to claim 1, characterized in that before step 2), a data amplification step is further included.
The intelligent detection and early warning method of outer envelope in prostate surgery according to claim 1, characterized in that: step 4) is implemented based on the YOLOv2 software platform and the MobileNet deep learning model.
The intelligent detection and early warning method of outer envelope in prostate surgery according to claim 1, characterized in that the specific steps of step 3) include:

3.1) Convert high-resolution input images to low-resolution streams;

3.2) The low-resolution stream is divided into a local path and a global path. The local path uses a full convolution layer to learn the local features of the image data, and the global path uses the convolution layer and the fully connected layer to learn the global features of the image, and then the two paths The output of is fused into a common set of fusion features;

3.3) Use the fusion feature as a third-dimensional unfolded bilateral network and output a bilateral grid of emissivity coefficients;

3.4) Upsampling the two-sided grid of emissivity through a single-channel guide map;

3.5) Affine transform the fused features and output at full resolution.
The intelligent detection and early warning method of outer envelope in prostate surgery according to claim 2, characterized in that: the specific steps of the data amplification step are: importing a module, instantiating a pipeline object, and specifying a directory containing images to be processed; Define data enhancement operations, including perspective, angle deviation, shear, elastic deformation, brightness, contrast, color, rotation, and cropping, and add to the pipeline; call the pipeline's sample function to specify the total amount of samples after enhancement.
The intelligent detection and early warning method of outer envelope in prostate surgery according to claim 1, characterized in that the specific steps of step 4) include: 4.1) pre-training; 4.2) feature extraction; 4.3) boundary box prediction; 4.4) classification.
An intelligent detection and early warning system for outer envelope in prostate surgery according to any one of claims 1 to 6, characterized by comprising: an image acquisition module, an image processing module, an image detection and early warning module; the image acquisition module It is used to collect and store image information and models; the image processing module is used to perform the first image preprocessing and the second image preprocessing on the collected image data; the image detection and early warning module is used to process the processed image Perform network training to generate a trained detection model, and then input the image to be detected into the detection model to obtain detection and early warning results.
The intelligent detection and early warning system for outer envelope in prostate surgery according to claim 7, wherein the image acquisition module includes a digital video interface for docking with an endoscope, and an image for storing real-time image data during surgery Data memory and image model memory for storing processed images and models after deep learning.
The intelligent detection and early warning system for outer envelope in prostate surgery according to claim 7, wherein the image processing module includes a data amplification component, an image feature extraction component and an image enhancement component.
The intelligent detection and early warning system for outer envelope in prostate surgery according to claim 7, wherein the image detection and early warning module includes an image depth training component and an image detection and early warning component.