CN112200795A

CN112200795A - Large intestine endoscope polyp detection method based on deep convolutional network

Info

Publication number: CN112200795A
Application number: CN202011143365.1A
Authority: CN
Inventors: 曹鱼; 王德纯; 刘本渊
Original assignee: Suzhou Huiwei Intelligent Medical Technology Co ltd
Current assignee: Suzhou Huiwei Intelligent Medical Technology Co ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2021-01-08

Abstract

The invention relates to a large intestine endoscope polyp detection method based on a deep convolutional network, which comprises the following steps: s1, constructing a convolutional neural network, pre-training the convolutional neural network in ImageNet through a backbone network, and training the convolutional neural network through a training image data set containing polyp types and position information; s2, preprocessing the color endoscope image; s3, inputting the preprocessed color endoscope image into a convolutional neural network, and extracting picture characteristic layer information; s4, feature enhancement and view field promotion are carried out on the picture feature layer information; s5, decoding the picture characteristic layer information into the type of polyp and the position information thereof, the method for detecting the polyp of the large intestine endoscope based on the deep convolutional network has convenient operation, can improve the reasoning speed and the sensitivity to the polyp image, thereby being capable of quickly positioning the polyp position in the endoscope image and avoiding the inconsistency of label setting caused by different rotation angles in the same polyp object under the endoscope image.

Description

Large intestine endoscope polyp detection method based on deep convolutional network

Technical Field

The invention belongs to the technical field of polyp detection technology, and particularly relates to a method for detecting polyps of a large intestine endoscope based on a deep convolutional network.

Background

Colon cancer belongs to high-incidence malignant tumors all over the world, 1.8 ten thousand cases of colon cancer and 881000 deaths are expected in 2018, intestinal cancer develops from benign polyp lesions, and therefore, early screening discovery is a very effective prevention means; for example, benign adenomatous polyps and hyperplastic polyps can be effectively prevented from developing malignant tumors by using an endoscope for removal.

In the article, "fans in the mistake Rate of Polyps in a Back-To-top-Back Colonoscopy Study" published by Leufkens et al in 2012, 25% of adenomatous Polyps in the experiment are missed by doctors, and in the actual operation, many Factors cause the missed detection of Polyps, which can be summarized as: polyp undersize, polyp is covered by the foreign matter, it is too fast to retreat the mirror, polyp does not appear in the camera lens, polyp appears in the camera lens but is neglected by the doctor, although some factors can reduce the miss detection rate through intestinal tract preparation before the operation, promote the equipment performance and promote doctor's operation, because endoscope operation relies on doctor's vision very much to look for the colon polyp that the characteristic is not especially obvious in the intestinal tract, because the missing detection that artificial factors such as endoscope equipment operator fatigue, experience and sensitivity cause is very difficult to avoid, therefore automatic polyp detects and is used for helping the doctor to reduce the miss detection rate very effective tool.

Ameling et al in Texture-based polyp detection in colonoscopy uses a method of distinguishing normal intestinal lining from polyps based on the Texture, color and shape of the polyp surface, but this method can cause a large amount of false positives in real environments, such as some normal tissues, bumps, spots on the lens, with obvious vascular regions and intestinal foreign bodies.

Regarding the Detection of colon cancer pictures by using the existing techniques such as deep learning, Mo et al use fast-RCNN for polyp Detection in "Polyps Detection in Endoscopic video Based on fast R-CNN", a dual-stage anchor frame-Based object detector, which is different from the traditional image processing method, deep learning calls a large-scale convolutional neural network CNN to extract image feature information, and uses more features to distinguish easily confused morphological features of Polyps from morphological features of normal intestinal walls, but because the dual-stage architecture is used, the first stage generates more candidate frames, which does not meet the requirement of real-time Detection.

Therefore, the two existing detection methods have some defects and cannot meet the actual use requirements.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for detecting polyps of a large intestine endoscope based on a deep convolutional network, which has high detection method precision, low false alarm rate and high detection speed.

In order to achieve the purpose, the invention adopts the technical scheme that: a large intestine endoscope polyp detection method based on a deep convolutional network comprises the following steps:

s1: constructing a convolutional neural network, wherein the convolutional neural network is pre-trained in ImageNet by a backbone network and then trained by a training image data set containing polyp categories and position information;

s2: preprocessing a color endoscope image;

s3: inputting the preprocessed color endoscope image into a convolutional neural network and extracting picture characteristic layer information;

s4: carrying out feature enhancement and visual field domain promotion on the image feature layer information;

s5: the picture feature layer information is decoded into the category of polyps and its location information.

Further, the convolutional neural network comprises sixteen layers of convolution, and each layer of network updates the weight parameters thereof through back propagation;

the convolutional neural network is divided into different stages with the step length of 4,8,16,32 and 64 by taking max pool as a boundary;

further, the step of preprocessing the color endoscope image is as follows:

cutting a black edge of the color endoscope picture;

the image size is adjusted to a uniform size and pixel mean normalization is performed.

Further, in S3, the preprocessed color endoscope image is input into a convolutional neural network for convolution operation, so as to obtain image feature layer information with sizes of 80x80,40x40,20x20,10x10, and 5x5, which respectively correspond to outputs with step sizes of 4,8,16,32, and 64 in the convolutional neural network.

Further, the step in S4 is as follows:

a, performing feature enhancement on the preprocessed picture feature layer information;

and b, inputting the image characteristic layer information subjected to characteristic enhancement into a visual field lifting module to obtain the image characteristic layer information subjected to visual field lifting.

Further, the processing method of the feature enhancement is as follows:

extracting k intermediate layers from a convolutional neural network, wherein k is 5;

secondly, when feature fusion is carried out on the feature layer Si, wherein i is 0-k-1, firstly, smoothing and dimensionality reduction operation is carried out on the feature layer Si +1 by using a convolution layer with a convolution kernel of 1x 1;

thirdly, performing upsampling by using the transposed convolution with the convolution kernel of 2x2 and the step length of 2, and adding the result and the characteristic layer Si to obtain an enhanced characteristic Ei, wherein the formula is as follows:

E_i＝S_i+deconv(smooth(S_i+1))；

wherein, deconv () is a transposed convolution and smooth () is a convolution operation of 1x 1;

further, the horizon promoting module comprises three branches, each branch processes 1/3 channels respectively, and comprises 1,2,3 sparse convolutions with 3 convolution kernels, wherein 3 × 3 step length is 1 sparse convolution and 3 sparse convolution; and after the picture characteristic layer information enters the visual field lifting module, outputting the convolution result of each branch in the cascade connection of the channel level.

Further, the decoding process in S5 is as follows:

for a bounding box regression feature layer, the output of a feature point x, y is predicted to be

Wherein the content of the first and second substances,

decoded to obtain the predicted object position c_x，c_y，w，h]Wherein c is_x，c_yThe value x, y, w, h of the center point of the object frame is the width and height of the object frame, and then the classification and position information [ s, cx, xy, w, h ] of the polyp is obtained by combining the classification information of the object]；

Where s is the object class.

Further, the training method of the convolutional neural network is as follows:

firstly, pre-training is carried out on ImageNet by a backbone network, and then training is carried out by a training image data set containing polyp types and position information to obtain a convolutional neural network;

preprocessing a color endoscope image;

inputting the preprocessed color endoscope image into a convolution neural network to extract image characteristic layer information;

fourthly, performing feature enhancement and visual field promotion on the image feature information extracted by the convolutional neural network;

giving different labels to the feature points of different feature layers according to the size of the target polyp;

suppose a polyp target bounding box b ═ c_x，c_y，w，h),(c_x，c_y) Is the center point of the bounding box, and (w, h) is width and height; it is allocated to the scaling step size s_lCharacteristic layer K of_lThe feature layer contains m²A feature point, where m is the feature layer side length, set

Representing feature points in an original drawingIn which x_i＝s_l×i，y_j＝s_l×j；

The positive sample region is

The non-positive sample is

Wherein

Area size according to epsilon_p＝0.75，ε_nScaling is performed as 1.25;

when the feature point is in the area of the positive sample, determining as the positive sample; when the feature point is between the regions of the positive sample and the non-positive sample, determining to ignore the sample; when the feature point is outside the region of the non-positive sample, determining as a negative sample;

sixthly, polyp target distribution of the characteristic points and polyp target screening of the current layer are carried out;

when positive and negative samples are judged, assuming that an object b belongs to a feature layer i, i belongs to [0, k-1], and d is a value of the distance scaling size between the current feature layer and the optimal feature layer, wherein the value is as follows:

δ¹＝max(L_cosin(λ，d，k)，0)；

the new non-positive samples for the current layer are therefore:

wherein, delta^lThe calculated scaling non-positive sample proportion;

and for different polyp targets, different distance heat force values can be generated for different feature layers to assist the determination of the polyp target at the current position by the feature points, and the specific formula is as follows:

wherein:

min (x, y) is the minimum value of x and y, and max (x, y) is the maximum value of x and y in the same way;

wherein left, right, top and bottom are respectively expressed as the distance from the feature point to the left frame, the right frame, the upper frame and the lower frame of the sample frame;

using binary cross entry as a loss function in the training phase of the generated thermodynamic diagram, and performing point multiplication on the generated thermodynamic diagram and the object class predicted value obtained in the step 8;

in the training stage, different weights are generated by using a formula when classification loss functions are calculated for different characteristic layers for different polyp targets; different distances are given different weights in calculating the loss by a Gaussian distribution formula

The concrete formula is as follows:

wherein alpha is 1, c^bIs the position of the target object，c^pIs the position of the characteristic point;

ninthly, calculating a bounding box regression loss L by using a smooth L1 loss function as a fitting loss function for the position sizes of the feature points in the training phase_locAnd encoding the feature points of the bounding box by using a function:

wherein the content of the first and second substances,

as a regression target, β is 0.45;

the r training phase uses the following loss function for the classification of feature points:

L_F(p_t)＝-α_t(1-p_t)^γlog(p_t)；

wherein L is_F(pt) is focal length. L is_CESorting the loss function for cross entry, L_clsAs a final classification loss function

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the method for detecting the polyp of the large intestine endoscope based on the deep convolutional network has the advantages that the operation is convenient, the reasoning speed and the sensitivity to the polyp image can be improved, the polyp position can be rapidly positioned in the endoscope image, the inconsistency of label setting caused by different rotation angles in the same polyp object under the endoscope image is avoided, the actual use requirements are met, and the method has high practicability and popularization value.

Drawings

The technical scheme of the invention is further explained by combining the accompanying drawings as follows:

FIG. 1 is a block diagram of a first embodiment of the present invention;

FIG. 2 is a schematic diagram of the process of using the present invention;

FIG. 3 is a diagram illustrating the practical use of the method for determining positive and negative samples in the present invention;

FIG. 4 is a diagram of the actual detection effect of the present invention;

FIG. 5 is a diagram of another practical test result of the present invention;

FIG. 6 is a graph of the recognition effect of the present invention;

wherein: polyp 1, first solid frame 11, second solid frame 13.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

Referring to fig. 1, the method for detecting polyps of a large intestine endoscope based on a deep convolutional network according to the present invention includes the following steps: s1: constructing a convolutional neural network, wherein the convolutional neural network is pre-trained in ImageNet by a backbone network and then trained by a training image data set containing polyp categories and position information; s2: preprocessing a color endoscope image; s3: inputting the preprocessed color endoscope image into a convolutional neural network and extracting picture characteristic layer information; s4: carrying out feature enhancement and visual field domain promotion on the image feature layer information; s5: the picture feature layer information in S4 is decoded into the category of polyp and its position information.

Referring to fig. 2, at S1, the backbone network includes but is not limited to: alexnet, VGGnet, Shufflenet, densenert, Squeezenet, ResNet; in the first embodiment, a convolutional neural network is constructed by using the pretrained VGG16 based on ImageNet, and a new network layer and module are additionally added on the convolutional neural network; the VGG16 convolutional neural network comprises 16 layers of convolution, each layer of which updates its weight parameters by back-propagation. The convolutional neural network is divided into different stages with step sizes of 4,8,16,32,64, bounded by max pool.

As the network deepens and the size of the feature map is reduced by half after each pooling, the number of channels is doubled continuously.

Moreover, the RELU activation function is used for each layer of convolution result, and the problem of gradient disappearance of the deep neural network can be effectively overcome and the overfitting problem can be relieved through the linearity of the segmented excitation function.

In S2, before extracting the image feature information, preprocessing the original color endoscopic image, including: and cutting the black edge of the input picture, adjusting the image size to be uniform in size and carrying out pixel mean value normalization.

This is because, in practical application, due to different endoscope devices, the output image resolution is also different, and in order to facilitate subsequent feature extraction, the present invention first scales the color endoscopic image to the image resolution of 320 × 320, and then performs pixel mean normalization on the image, so that each dimension of the data has zero mean and unit variance.

In S3, inputting the preprocessed color endoscope image into a convolutional neural network, and extracting picture characteristic layer information; inputting the preprocessed pictures into a convolutional neural network for convolution operation, and obtaining the outputs of picture feature layer information with the sizes of 80x80,40x40,20x20,10x10 and 5x5 corresponding to backbone networks Conv3_3, Conv4_3, Conv5_3, Conv6_2 and Conv7_2 respectively.

In S4, in order to increase the detection rate of small objects, feature enhancement is performed using the backbone networks Conv3_3, Conv4_3, Conv5_3, Conv6_2, and Conv7_ 2.

In order to solve the problems that after deep layers are subjected to repeated pooling, semantic information is increased but is insensitive to small objects, shallow layer semantic information is few, and accuracy is low, the method uses a top-down method of a characteristic pyramid structure to enhance the semantic information of a shallow layer characteristic layer, and accuracy of the small objects is improved.

The method comprises the following specific steps:

in order to extract k intermediate layers from the convolutional neural network, k is 5 in this embodiment; when the value [0, k-1] of i is taken as the characteristic layer Si, the characteristic layer Si +1 is subjected to smoothing and dimensionality reduction operation by using a convolution layer with a convolution kernel of 1x1, then the transposed convolution with the convolution kernel of 2x2 and the step length of 2 is used for up-sampling, and the result is added with the characteristic layer Si to obtain an enhanced characteristic Ei;

the formula is as follows: e_i＝S_i+deconv(smooth(S_i+1))

Where deconv () is a transposed convolution and smooth () is a convolution operation of 1x 1.

Next, in order to further improve the view field under the condition of saving computing resources, the feature-enhanced Ei is input to a view field improving module, please refer to fig. 2, which includes three branches, each branch processes 1/3 channels respectively and includes 1,2,3 sparse convolutions with 3 convolution kernels of 3 × 3 step length 1 and sparse convolution of 3, and the output of the module is the cascade of convolution results of each branch at the channel level.

In operation, by using two or three 3x3 convolutions, the effect can reach a 5x5 or 7x7 convolution visual field and use less computing resources, and the feature layer after the step contains more visual field, so that semantic information is increased, and the accuracy and the recall rate of small objects are improved.

In S5, decoding the extracted picture feature layer information: and decoding the digital characteristic information extracted by the network into the polyp category and the polyp position information, and updating the network weight by using a loss function in cooperation with back propagation in the training process of the convolutional neural network.

Wherein the content of the first and second substances,

decoded to obtain the predicted object position c_x，c_y，w，h]，c_x，c_yThe values of x and y are the central points of the object frame, and w and h are the width and height of the object frame.

In addition, when the convolutional neural network is actually tested and used, the convolutional neural network needs to be trained for many times, so that the convolutional neural network is fixed, and the test accuracy of the convolutional neural network meets the actual use requirement.

In this embodiment one, the training method of the convolutional neural network is as follows:

(ii) constructing a convolutional neural network based on ImageNet's pre-trained VGG 16.

And secondly, preprocessing the color endoscope image.

Inputting the preprocessed color endoscope image into a convolution neural network to extract the characteristic layer information of the image.

And fourthly, performing feature enhancement and visual field promotion on the image feature information extracted by the convolutional neural network.

The training stage gives different labels to the feature points of different feature layers according to the size of the target polyp, the labels are used for the subsequent calculation of function calculation and regression, and the method is different from the traditional method of IoU (overlapping rate) calculation based on an anchor frame, and the method uses a more concise means to calculate the labels.

We propose a method for determining positive and negative samples based on the region of the target bounding box, which saves the extra hyper-parameters and computational load due to the anchor box, and to represent this process more intuitively, we assume a polyp target bounding box b ═ (c)_x，c_y，w，h)，(c_x，c_y) Is the center point of the bounding box, and (w, h) is width and height; allocate it to scale to s by size_lCharacteristic layer K of_lThe feature layer contains m²A feature point, where m is the feature layer side length, set

Indicating the position of the feature point in the original image, where x_i＝s_l×i，y_j＝s_l×j。

The positive sample region is

The non-positive sample is

Wherein

Area size according to epsilon_p＝0.75，ε_nScaling is performed as 1.25.

When the feature point is in the area of the positive sample, determining as the positive sample; when the feature point is between the regions of the positive sample and the non-positive sample, determining to ignore the sample; when the feature point is outside the region of the non-positive sample, determining as a negative sample; a practical example is shown in fig. 3, where the second solid frame is a non-positive sample region, the first solid frame is a positive sample region, and the squares are feature points.

And sixthly, fitting polyp targets with different sizes for the feature layers with different scaling in a training stage, wherein the training stage comprises polyp target distribution of feature points and polyp target screening of the current layer.

In determining positive and negative samples, different polyp object sizes are assigned to different feature layers, but feature points of some adjacent feature layers, especially those located at the center of the object, have sufficient semantic information to predict the object position.

In order to use the characteristic points, a scaling formula is designed, so that the non-positive sample region belonging to a certain layer can be scaled to different characteristic layers according to the distance size, and the recall rate is improved.

Suppose that object b belongs to a feature layer i, i ∈ [0, k-1]]And d ═ i-l | is the scaling value of the distance between the current characteristic layer l and the optimal characteristic layer: delta¹＝max(L_cosin(λ，d，k)，0)；

The new non-positive samples for the current layer are therefore:

wherein, delta^lThe scaled non-positive sample proportion is calculated.

The training stage can generate different distance heat force values for different polyp targets for different feature layers to assist the feature points in judging the polyp target at the current position, and the specific formula is as follows:

wherein:

min (x, y) is the minimum value of x and y, and max (x, y) is the maximum value of x and y in the same way.

the generated thermodynamic diagram uses binary cross entry as a loss function in a training stage, and the prediction stage and the object class prediction value obtained in the step 8 are subjected to point multiplication.

And in the training stage, different weights are generated by using a formula when the classification loss function is calculated for different characteristic layers for different polyp targets.

When the characteristic point is closer to the central point of the target object boundary frame, the loss of the characteristic point is considered to be more important to a detection result in prediction, and therefore a Gaussian distribution formula is designed to give different weights to different distances in loss calculation

The concrete formula is as follows:

wherein alpha is 1, c^bIs the target object position, c^pIs the position of the feature point.

wherein the content of the first and second substances,

for regression objective, β is 0.45.

L_F(p_t)＝-α_t(1-p_t)^γlog(p_t)；

wherein L is_F(p_t) Is focal length. L is_CEIs cross entry scoreA class loss function; l is_clsIs the final classification loss function.

In practical use, the detection effect graphs of fig. 4 and 5 can accurately locate the position of polyp, fig. 6 is a graph of the recognition effect of the present invention, and it can be seen that when the recognition accuracy of the polyp image recognition system of the present invention is 89.8%, the recall rate is 83.87%.

The above is only a specific application example of the present invention, and the protection scope of the present invention is not limited in any way. All the technical solutions formed by equivalent transformation or equivalent replacement fall within the protection scope of the present invention.

Claims

1. A large intestine endoscope polyp detection method based on a deep convolutional network is characterized by comprising the following steps:

s2: preprocessing a color endoscope image;

2. The deep convolutional network-based large intestine endoscope polyp detection method according to claim 1, characterized in that: the convolutional neural network comprises sixteen layers of convolution, and each layer of network updates the weight parameters thereof through back propagation;

the convolutional neural network is divided into different stages with step sizes of 4,8,16,32,64, bounded by max pool.

3. The deep convolutional network-based large intestine endoscope polyp detection method as claimed in claim 1, wherein the step of preprocessing the color endoscope image is as follows:

cutting a black edge of the color endoscope picture;

4. The deep convolutional network-based large intestine endoscope polyp detection method according to claim 1, characterized in that: in S3, the preprocessed color endoscope image is input into a convolutional neural network for convolution operation, so as to obtain image feature layer information with sizes of 80x80,40x40,20x20,10x10, and 5x5, which respectively correspond to outputs with step sizes of 4,8,16,32, and 64 in the convolutional neural network.

5. The deep convolutional network-based large intestine endoscope polyp detection method as claimed in claim 4, wherein the step in S4 is as follows:

b: and inputting the image feature layer information subjected to feature enhancement into a visual field lifting module to obtain the image feature layer information subjected to visual field lifting.

6. The deep convolutional network-based large intestine endoscope polyp detection method as claimed in claim 5, wherein said feature enhancement processing method is as follows:

E_i＝S_i+deconv(smooth(S_i+1))；

7. The deep convolutional network-based large intestine endoscope polyp detection method according to claim 6, characterized in that: the horizon lifting module comprises three branches, each branch is used for processing the channel number of 1/3 and comprises 1,2 and 3 sparse convolutions with convolution kernels of 3x3 and step length of 1 and sparse convolution of 3; and after the picture characteristic layer information enters the visual field lifting module, outputting the convolution result of each branch in the cascade connection of the channel level.

8. The deep convolutional network-based large intestine endoscope polyp detection method as claimed in claim 1, wherein the decoding process in S5 is as follows:

Wherein the content of the first and second substances,

decoded to obtain the predicted object position c_x，c_y，w，h]Wherein c is_x，c_yThe central point x, y value of the object frame, w, h are the width and height of the object frame, and then the object class information is combinedDeriving polyp classification and location information [ s, cx, xy, w, h]；

Where s is the object class.

9. The deep convolutional network based large intestine endoscope polyp detection method as claimed in claim 7, wherein the training method of the convolutional neural network is as follows:

preprocessing a color endoscope image;

suppose a polyp target bounding box b ═ c_x，c_y，w，h)，(c_x，c_y) Is the center point of the bounding box, and (w, h) is width and height; it is allocated to the scaling step size s_lCharacteristic layer K of_lThe feature layer contains m²A feature point, where m is the feature layer side length, set