CN113012140A

CN113012140A - Digestive endoscopy video frame effective information region extraction method based on deep learning

Info

Publication number: CN113012140A
Application number: CN202110354852.0A
Authority: CN
Inventors: 张阔; 刘奇为; 于天成; 胡珊
Original assignee: Wuhan Endoangel Medical Technology Co Ltd
Current assignee: Wuhan Endoangel Medical Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-22

Abstract

The invention relates to the technical field of medical image processing, in particular to a method for extracting an effective information area of a digestive endoscopy video frame based on deep learning. When artificial intelligence real-time auxiliary quality monitoring, focus marking and lesion diagnosis are carried out on an endoscope inspection video, the method is used as a preprocessing module, an effective information area of a current picture frame is identified by using an intelligent edge cutting model, and a new picture frame only containing the effective information area is generated and used as the input of a subsequent deep learning identification algorithm. Therefore, interference of invalid information can be effectively prevented, the recognition precision of a subsequent deep learning algorithm is improved, and the auxiliary recognition capability of artificial intelligence is effectively improved.

Description

Digestive endoscopy video frame effective information region extraction method based on deep learning

Technical Field

The invention relates to the technical field of medical image processing, in particular to a digestive endoscopy video frame effective information region extraction method based on deep learning.

Background

In recent years, artificial intelligence technology with a deep neural network as a core has been successful in a plurality of application fields. Recent research results show that computers can achieve near-to-even-superior human performance in many applications through deep neural network algorithms, artificial intelligence models trained on large-scale datasets. Zhang hongna et al, the "progress of application of artificial intelligence in digestive endoscopy", states that artificial intelligence is gradually applied to the medical field at present, and that the research on the application of artificial intelligence in endoscopy is increasing, and the development of artificial intelligence provides a new idea for the diagnosis of digestive tract diseases, no matter the type of lesion (tumor, polyp, hemorrhage, inflammatory reaction, etc.) or the research on lesion parts (upper, middle and lower digestive tracts). Expert consensus opinions (draft 2019, Shanghai) of a digestive endoscopy artificial intelligence data acquisition and labeling quality control system focus on acquisition and labeling of artificial intelligence endoscope data, and put forth key problems of data acquisition, labeling, storage, privacy protection, data security, use labeling and the like, so as to better serve training, optimization and evaluation of an endoscope artificial intelligence model. The artificial intelligence model of the endoscope completes training based on a large number of digestive endoscope images, so that the quality of the endoscope images is critical. The video image frame of the digestive endoscope comprises information such as endoscope model, image size and current mode around the image before being processed, the area where the information is located belongs to an invalid area for processing and training the image, interference of different degrees is generated for processing and training the image, the invalid area needs to be removed, and only an effective information area is extracted.

Patent CN110613417A discloses that the image is subjected to effective area cropping processing, and the cropped image is obtained. The method for clipping the effective area of the image has two methods at present, one is to directly give the coordinate range of the clipping area and clip according to the coordinate range, and the method has no adaptability and has strong limitation only when the clipping area is fixed; the other method is to obtain the outline of the effective area through an edge detection algorithm such as Sobel and Canny, and to cut the effective area according to the outline, the method has certain adaptability, but the setting of parameters has limitations, different areas can be cut through different parameter settings, and certain requirements are provided for the brightness, the color and the like of the picture. The digestive endoscopy image comprises images of different models, different light source types and different digestive tract types, and the color, the shape, the brightness and the like of the image are greatly changed. The two methods can only be used for specific models, scenes, images and the like, and have strong limitation. Therefore, a digestive endoscopy video frame effective information region extraction method based on deep learning is provided.

Disclosure of Invention

Based on the technical problems in the background art, the invention provides a method for extracting an effective information area of a digestive endoscope video frame based on deep learning, which is used for extracting the effective information area of a digestive endoscope image in real time in the inspection process of the digestive endoscope and using the image without an invalid area for subsequent image processing operation, so that the subsequent image processing capability can be well improved, and the inspection quality of the digestive endoscope is improved.

The invention provides the following technical scheme: the digestive endoscopy video frame effective information area extraction method based on deep learning comprises the following steps:

s1, collecting digestive endoscopy video clips, marking effective information areas, and generating a corresponding black-and-white mask image for each picture according to the marking file;

s2, constructing an intelligent edge cutting model;

s3, training the intelligent trimming model by using the picture of the digestive endoscopy and the corresponding black-and-white mask image as training samples to obtain weight parameters of the intelligent trimming model for subsequent calculation of an effective information area;

s4, acquiring a real-time video of the digestive endoscopy, unframing the video into a picture, and caching the image of the current frame;

s5, loading the trained network model of the intelligent trimming model and corresponding weight parameters, outputting the current frame endoscopic image as a probability distribution map of whether the whole image is an effective information area or not through the network model, and calculating to obtain the effective information area of the frame image according to the probability distribution map;

and S6, processing the current frame image by the calculated effective information area range, extracting the effective information area of the frame image, and finally generating a video frame image only containing the effective information area.

Preferably, in step S1, digestive endoscopy video clips of different types, different light sources, and different types of digestive tracts are collected as a training set.

Preferably, in step S1, the video segment is unframed into a picture, and the effective information area of each frame of image is manually labeled.

Preferably, in step S1, the picture size is compressed, and the length-width ratio is kept unchanged by filling the black border.

Preferably, the intelligent trimming model constructed in step S2 is a symmetric encoding-decoding structure, and the decoding structure and the encoding structure are respectively composed of convolution modules.

Preferably, the coding structure is a down-sampling network for receiving input and outputting feature vectors; the decoding structure is up-sampling and is used for acquiring the feature vector from the coding structure and outputting a result which is most similar to the expected output.

Preferably, the intelligent trimming model is trained according to a loss function, and the loss function is as follows:

wherein N represents the number of samples, y_iReal label representing pixel, y_i0 indicates that the pixel is an invalid region, y _i1 indicates that the pixel is an effective information area;

the probability value of the pixel predicted by the network model is represented, and the value range is

Preferably, in step S5, the mask map of the effective information area of the picture is obtained by calculating using a probability distribution map, and the formula is as follows:

wherein, mask (i, j) is a pixel value p (i, j) of the generated mask image at the (i, j) position, and is a probability value of the probability distribution diagram at the (i, j) position, and i, j are respectively the ith row and the jth column of the image matrix.

Preferably, in step S7, an image including only the effective information area is finally generated by taking the circumscribed rectangle from the effective image area having an irregular shape.

The invention innovatively adopts a deep learning network method to extract the effective information area of the video frame of the digestive endoscopy. An intelligent trimming model SCNet is constructed by adopting a basic convolution module, the model is trained by collecting endoscope images of different models, and the model is used for extracting an effective information area of an endoscope image frame. The comparison between the black-and-white mask image obtained after the digestive endoscopy image is processed by the traditional edge detection algorithm and the black-and-white mask image obtained after the digestive endoscopy image is processed by the invention is shown in fig. 3, and it can be seen that the invention utilizes the deep learning network model to extract the effective information area of the digestive endoscopy image frame, and has high accuracy, and strong generalization and robustness.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a network structure diagram of an intelligent edge cutting model (SCNet) constructed in an embodiment of the present invention;

FIG. 3 is a comparison of the processing results of a conventional edge detection algorithm and the algorithm of the present invention; (in the figure, a represents the original image, b represents the result of the black-and-white mask image processed by the edge detection algorithm, and c represents the result of the black-and-white mask image processed by the algorithm of the present invention.)

FIG. 4 is a schematic diagram illustrating annotation of video frame images according to an embodiment of the invention; (in the figure, a represents a polygon effective information area label, and b represents a circle effective information area label)

FIG. 5 is a black and white mask map generated according to the label in the embodiment of the present invention; (in the drawings, a represents a polygon effective information area label, b represents a black-and-white mask generated based on the polygon label, c represents a circle effective information area label, and d represents a black-and-white mask generated based on the circle label)

FIG. 6 is a black and white mask image outputted after the image passes through the intelligent edge cutting model (SCNet) in the embodiment of the present invention; (in the figure, a represents an image before trimming, and b represents a black-and-white mask image outputted after trimming)

FIG. 7 is an image of an endoscopic image of different types, different light sources, and different types of digestive tracts according to an embodiment of the present invention after passing through an effective information extraction region according to the present invention; (in the figure, a1, a2 and a3 respectively represent endoscopic images of three different types, different light sources and different types of alimentary tracts, and b1, b2 and b3 respectively represent corresponding images generated after effective information areas are extracted)

FIG. 8 is a graph comparing the test accuracy of the edge detection algorithm and the algorithm of the present invention for the same test set;

fig. 9 is a diagram showing results of extracting effective areas from pictures selected in the test set by different algorithms. (in the figure, the left side a represents the result of the edge detection algorithm processing, and the right side b represents the result of the processing of the present invention.)

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a technical solution: the method for extracting the effective information area of the digestive endoscopy video frame based on deep learning comprises the steps of firstly collecting digestive endoscopy video clips, marking the effective information area, and generating a corresponding black-and-white mask image for each picture according to a marking file; constructing an intelligent trimming model; training the intelligent trimming model by using the picture of the digestive endoscope and the corresponding black-white mask image as training samples to obtain weight parameters of the intelligent trimming model for subsequent calculation of an effective information area; acquiring a real-time video of digestive endoscopy, unframing the video into a picture, and caching an image of a current frame; loading the trained network model of the intelligent trimming model and corresponding weight parameters, outputting the current frame endoscope image as a probability distribution map of whether the whole image is an effective information area or not through the network model, and calculating to obtain the effective information area of the frame image according to the probability distribution map; and processing the current frame image in the effective information area range obtained by calculation, extracting the effective information area of the frame image, and finally generating a video frame image only containing the effective information area.

When artificial intelligence real-time auxiliary quality monitoring, focus marking and lesion diagnosis are carried out on an endoscope inspection video, the method is used as a preprocessing module, an effective information area of a current picture frame is identified by using an intelligent edge cutting model, and a new picture frame only containing the effective information area is generated and used as the input of a subsequent deep learning identification algorithm. Therefore, interference of invalid information can be effectively prevented, the recognition precision of a subsequent deep learning algorithm is improved, and the auxiliary recognition capability of artificial intelligence is effectively improved.

First, 1000 endoscopic video clips of different models, different light sources and different types of digestive tracts were collected, including 5 different models, the different light sources being white light and narrow band imaging light sources (NBI or BLI), the different types of digestive tracts being gastroscopic video and enteroscope video, each type being 50, for a total of 50 x 5 x 2-1000. (other number of endoscopic video clips can be collected as required)

The collected video clips are unframed into pictures, and the effective information area boundaries of the continuous picture sets are manually marked by a professional through VGG Image indicator (VIA) marking software, wherein a marking schematic diagram is shown in figure 4.

And preprocessing the marked picture set, compressing the picture size to 256 × 256 by adopting a bilinear interpolation algorithm, keeping the length-width ratio unchanged by a black edge supplementing mode, generating a corresponding black-white mask image for each picture according to the marked file, and forming a training sample by the original image and the black-white mask image for subsequent model training. The resulting mask image is schematically shown in fig. 5.

Constructing an intelligent edge cutting model (SCNet) based on a Convolutional Neural Network (CNN), wherein the intelligent edge cutting network is a symmetrical encoding-decoding network structure as shown in FIG. 2, the encoding structure is a network which is used for down-sampling, receiving input and outputting feature vectors, and the existence of a pooling layer can reduce the dimensionality of a space; the decoding structure is up-sampling, the feature vector is obtained from the coding structure, the result which is most similar to the expected output is output, and the decoding structure can gradually recover the dimension and the detail information of the space.

In order to reduce the spatial information loss brought by the down-sampling process, a coding structure and a decoding structure of a layer-skipping connection mode are adopted, and the mode can enable the feature graph recovered by up-sampling to contain more semantic information of low levels, so that the result is better.

The encoding-decoding structure consists of basic convolution modules consisting of 3 × 3 convolutional layers, ReLU active layers, 2 × 2 max pooling layers and upsampling layers. The convolution layer is used for extracting different data characteristics of an input digestive endoscopy image, the ReLU activation layer is used for carrying out nonlinear mapping on an output result of the convolution layer, the pooling layer is used for screening the characteristics with smaller dimensionality, and the up-sampling layer is used for enlarging the size of the characteristics.

The dimension calculation formula of the feature map of the convolutional layer is as follows:

the size of the input feature map is W multiplied by W, the size of the convolution kernel is F multiplied by F, the convolution step length is S, the number of pixels supplemented around is P, and the size of the output feature map is N multiplied by N. The activation function ReLU is:

wherein, x refers to the input weight feature vector, and alpha is a coefficient. And the up-sampling layer adopts a nearest neighbor interpolation algorithm to perform up-sampling.

And (3) forming a training sample by the original image with the size of 256 × 256 generated by preprocessing and the corresponding black and white mask image, training by using the constructed intelligent edge cutting network model (SCNet), and obtaining a weight parameter of the network model after training for the subsequent prediction calculation of the effective information area.

The intelligent edge cutting model (SCNet) is trained according to a loss function, wherein the loss function is as follows:

The model is trained by adopting a random gradient descent method and an adaptive matrix optimizer, wherein the learning rate is 0.0001, and the training round is 50 rounds. After the training is finished, the weight of the model is obtainedAnd (4) the coefficient.

And acquiring a real-time video of digestive endoscopy by using endoscopy equipment, and unframing the video into a picture to acquire an image of the current frame. And loading the network structure and the weight parameters of the trained intelligent cutting edge model, and calculating the probability distribution map of the effective information area of the current frame image by using the model.

According to the calculated probability distribution map of the effective information area of the current frame image, calculating a mask map of the effective information area of the current frame image by using the probability distribution map, wherein the formula is as follows:

wherein, mask (i, j) is a pixel value p (i, j) of the generated mask image at the (i, j) position, and is a probability value of the probability distribution diagram at the (i, j) position, and i, j are respectively the ith row and the jth column of the image matrix. And finally obtaining an image only containing the effective information area of the frame according to the generated black-and-white mask image of the effective information area. Fig. 6 shows a black-and-white mask image of the effective information region generated by the algorithm, and fig. 7 shows a final image including only the effective information region.

The method comprises the steps of selecting 110 digestive endoscopy pictures with different light source types and different digestive tract types as a test set, extracting effective information areas by respectively adopting an edge detection algorithm and the algorithm of the invention, and counting the accuracy according to the extracted result, wherein the accuracy of the edge detection algorithm is 87.2 percent, the accuracy of the algorithm of the invention is 100 percent, and the accuracy of extracting the effective information areas is obviously improved. Statistical results are shown in fig. 8, and an example of results of extracting effective regions by different algorithms in the test set is shown in fig. 9.

The invention provides a digestive endoscopy video frame effective information region extraction method based on deep learning, which is characterized in that the effective region is cut by an artificial intelligence technology based on big data, a deep learning model is trained by collecting a large number of images with different models, different light source types and different digestive tract types, and the algorithm has strong adaptability and accuracy by training and learning the effective regions of a large number of images.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. The digestive endoscopy video frame effective information area extraction method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

s2, constructing an intelligent edge cutting model;

2. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1, wherein: in step S1, digestive endoscopy video clips of different types, different light sources, and different types of digestive tracts are collected as a training set.

3. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1 or 2, characterized in that: in step S1, the video segment is unframed into a picture, and the effective information area of each frame of image is artificially labeled.

4. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 3, wherein: in step S1, the picture size is compressed, and the aspect ratio is kept unchanged by filling the black border.

5. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1, wherein: the intelligent cutting edge model constructed in the step S2 is a symmetric encoding-decoding structure, and the decoding structure and the encoding structure are respectively composed of convolution modules.

6. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 5, wherein: the coding structure is a down-sampling network used for receiving input and outputting a feature vector; the decoding structure is up-sampling and is used for acquiring the feature vector from the coding structure and outputting a result which is most similar to the expected output.

7. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1, wherein: the intelligent trimming model is trained according to a loss function, wherein the loss function is as follows:

wherein N represents the number of samples, y_iReal label representing pixel, y_i0 represents the sameThe pixel is an invalid region, y_i1 indicates that the pixel is an effective information area;

8. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1 or 7, wherein: in step S5, a mask map of the effective information area of the picture is obtained by calculating using the probability distribution map, and the formula is as follows:

9. The deep learning based digestive endoscopy video frame effective information region extraction method according to claim 1, wherein: in step S7, an image including only the effective information area is finally generated by taking the circumscribed rectangle of the effective image area having an irregular shape.