CN111080639A

CN111080639A - Multi-scene digestive tract endoscope image identification method and system based on artificial intelligence

Info

Publication number: CN111080639A
Application number: CN201911393700.0A
Authority: CN
Inventors: 唐承薇; 宋捷
Original assignee: Sichuan Novuseeeds Medtech Co ltd
Current assignee: Sichuan Novuseeeds Medtech Co ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-04-28

Abstract

The invention relates to the technical field of image recognition, and discloses a multi-scene digestive tract endoscope image recognition method and system based on artificial intelligence. The method comprises the following steps: acquiring a data sample of an endoscope image of the alimentary tract, labeling a shooting part and an inspection scene of the image aiming at the data sample, and marking a segmentation area of a lesion; training a corresponding shooting part recognition model/inspection scene recognition module based on a data sample set of the same shooting part/inspection scene; aiming at the combination of each shooting part and each inspection scene, training to obtain a lesion recognition model of the corresponding shooting part in the corresponding inspection scene; aiming at an endoscope image to be identified, a shooting part and an inspection scene are identified through a corresponding identification model, a lesion identification model of the corresponding shooting part in the corresponding inspection scene is selected, and a lesion part is identified. The scheme is beneficial to improving the identification accuracy and the generalization capability of the system.

Description

Multi-scene digestive tract endoscope image identification method and system based on artificial intelligence

Technical Field

The invention relates to the technical field of image recognition, in particular to a multi-scene digestive tract endoscope image recognition method and system based on artificial intelligence.

Background

Digestive system tumors account for more than one third of the incidence rate of malignant tumors, and in actual medical practice, early symptoms of digestive tract tumors are not obvious, and once symptoms appear, most patients reach middle and late stages. The most direct and effective means for early detection of digestive tract tumors is digestive endoscopy. However, the current development of recognition technology based on digestive endoscopy images has many defects, for example, the scenes involved in the images of digestive endoscopy are very complicated, the difference of the image-taking parts is huge, the types of lesions in the images are more, and the feature expression of each lesion is complicated. In such a case, the efficiency of the conventional manual identification is low, and the missing rate and the false rate are high.

The artificial intelligence technology has been developed rapidly in recent years, wherein the technology of artificial intelligence in the field of image recognition has become mature, the application in the field has been precedent for success, and the extraction and recognition of the fine feature points of the image have been developed dramatically in some aspects.

Therefore, the analysis and the judgment of the relevant data of the digestive endoscopy video and the digestive endoscopy picture are assisted by artificial intelligence, the early lesion discovery rate in the digestive tract image can be effectively improved, and the identification precision is improved.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the existing problems, a multi-scene digestive tract endoscope image identification method and system based on artificial intelligence are provided.

The technical scheme adopted by the invention is as follows: a multi-scene digestive tract endoscope image identification method based on artificial intelligence comprises the following steps:

acquiring a data sample of an endoscope image of the alimentary tract, labeling a shooting part and an inspection scene of the image aiming at the data sample, and marking a segmentation area of a lesion;

training a corresponding shooting part recognition model based on a data sample set of the same shooting part; training a corresponding inspection scene recognition module based on a data sample set of the same inspection scene;

combining any condition of the shot part with any condition of the examination scene to obtain a plurality of training data sample sets for lesion recognition; training each training data sample set to obtain a lesion recognition model of a corresponding shooting part in a corresponding examination scene;

aiming at an endoscope image to be identified, a shooting part identification model and an inspection scene identification module are adopted to identify a shooting part and an inspection scene of the endoscope image, a lesion identification model of the corresponding shooting part in the corresponding inspection scene is selected according to the specific shooting part and the inspection scene, and a lesion part is identified.

Further, the inspection scene includes normal white light, NBI, magnification, iodine staining.

Further, the process of labeling the shot part and the inspection scene of the image for the data sample and marking the segmentation region of the lesion includes: determining the classified content of the digestive tract part of the shooting part and the classified content of the inspection scene; and importing the desensitized endoscope images into a marking tool, and marking the endoscope images by using a picture marking tool, wherein each endoscope image needs to mark the corresponding digestive tract part, the corresponding examination scene and the condition that the picture content contains lesion.

Further, for the condition that the picture content contains the lesion, marking the segmentation area of the lesion, and designating a corresponding label and marking the corresponding label on the corresponding image.

Further, if a plurality of lesions are contained in one picture content, the segmentation area of each lesion is marked and the corresponding label is assigned.

Further, the training method of the shooting part recognition model, the examination scene recognition module and the lesion recognition model comprises the following steps:

(a) acquiring a data sample set aiming at different identification models, and balancing the number of pictures corresponding to each label in the data sample set;

(b) dividing the balanced data sample set into a training set, a verification set and a test set according to a specific proportion; (training set and validation set for model training, test set for final validation training results;)

(c) The training set is used for optimizing network parameters of the model, the verification set is used for testing the training effect of each training round, training is carried out based on the deep convolutional neural network, the training is stopped when the accuracy and the loss value obtained through the training set and the verification set meet the set optimization value, the training model is obtained, then the test set is used for verifying whether the training model reaches the preset accuracy, and if the accuracy and the loss value meet the set optimization value, the final recognition model is obtained.

Further, the data sample set obtained after balancing is divided into a training set, a verification set and a test set according to the ratio of 6:2: 2.

Furthermore, in the process of identifying the shooting part and the inspection scene of the endoscope image, the shooting part identification model and the inspection scene identification model only return corresponding labels.

Further, in the process of identifying the lesion part, when a plurality of polygonal lesion segmentation areas are identified, a corresponding label is marked for each lesion area, and the corresponding label and the polygonal segmentation areas are returned by the lesion identification model.

The invention also discloses a multi-scene digestive tract endoscope image recognition system based on artificial intelligence, which comprises:

the data acquisition module is used for acquiring a data sample of the gastrointestinal endoscope image, labeling a shooting part and an inspection scene of the image aiming at the data sample, and marking a segmentation area of a lesion;

the recognition model training module is used for training a recognition model of a corresponding shooting part based on data samples of the same shooting part; the system comprises a corresponding inspection scene recognition module, a corresponding inspection scene recognition module and a corresponding inspection scene recognition module, wherein the corresponding inspection scene recognition module is used for training the corresponding inspection scene recognition module based on the same inspection scene;

the recognition model training module is also used for combining any condition of the shot part and any condition of the inspection scene to obtain a plurality of lesion recognition training data sample sets; training each training data sample set to obtain a lesion recognition model of a corresponding shooting part in a corresponding examination scene;

and the identification module is used for identifying the shooting part and the inspection scene of the endoscope image by adopting the shooting part identification model and the inspection scene identification module firstly aiming at the endoscope image to be identified, selecting the lesion identification model of the corresponding shooting part in the corresponding inspection scene according to the specific shooting part and the inspection scene, and identifying the lesion part.

Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows: according to the technical scheme, the shooting parts and the inspection scenes of the endoscope image are classified, and the recognition models are trained respectively, so that the commonalities and background conditions of lesions in any single scene are more consistent, the recognition accuracy and the generalization capability of a system are improved, and the problem of recognizing the lesion region of the endoscope image in the whole digestive tract is solved step by step.

Drawings

FIG. 1 is a flow chart of the multi-scene digestive tract endoscope image identification method based on artificial intelligence.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, a multi-scene digestive tract endoscope image recognition method based on artificial intelligence comprises:

acquiring a data sample of an endoscope image of a digestive tract, labeling a shooting part of the image for the data sample, for example, the classification content of the shooting part of the embodiment includes esophagus, stomach, duodenum and colon, labeling an inspection scene for the data sample, the classification content of the inspection scene includes common white light, NBI, amplification and iodine staining, and labeling a segmentation region of a lesion for the data sample;

training corresponding shooting part recognition models, namely a shooting part recognition model of an esophagus, a shooting part recognition model of a stomach, a shooting part recognition model of a duodenum and a shooting part recognition model of a colon and a rectum, based on data samples of the same shooting part; training corresponding inspection scene recognition modules based on data samples of the same inspection scene, wherein the corresponding inspection scene recognition modules comprise a common white light inspection scene recognition module, an NBI inspection scene recognition module, an amplified inspection scene recognition module and an iodine dyeing inspection scene recognition module;

combining any condition of the shot part with any condition of the examination scene to obtain a plurality of training data sample sets for lesion recognition; as shown in table 1 below;

TABLE 1

The different conditions of the shooting part and the inspection scene are combined, partial combination is eliminated, no data sample is shown on the surface and the inner surface, and multiple conditions shown in the table are obtained; each lesion identification model can identify a specific digestive tract part corresponding to each lesion identification model and digestive tract lesion conditions under an inspection scene, for example, an esophagus white light scene detection model can accurately identify lesions under a common white light detection scene in an esophagus; training each training data sample set to obtain a lesion recognition model of a corresponding shooting part in a corresponding examination scene;

Preferably, the process of labeling the shot part of the image and the examination scene for the data sample and marking the segmentation region of the lesion includes: determining digestive tract part classification contents (for example, esophagus, stomach, duodenum and colorectal) and examination scene classification contents (for example, common white light, NBI, magnification and iodine staining) of the shooting part; and importing the desensitized endoscope images into a labeling tool, and labeling the endoscope images by using a picture labeling tool, wherein each endoscope image needs to be labeled with a corresponding label of the corresponding digestive tract part, the corresponding examination scene and a corresponding lesion region contained in the picture content. Each image in the data sample thus includes a gut site attribute, an examination scene attribute, and a lesion region attribute. Specifically, if a plurality of lesions are included in a picture, the segmentation area of each lesion is marked and the corresponding label is assigned. For example, table 2 below is a label for one data sample:

table 2:

in the specific implementation process, since the correctness of the labeling plays an important role in the training of the whole model, after the labeling of each picture is completed, manual review and final confirmation can be performed, and if the review fails, the picture can be replaced into the collection of the unlabeled pictures for manual labeling.

Preferably, the training method of the shooting part recognition model, the examination scene recognition module and the lesion recognition model includes:

(a) creating a digestive tract part data sample set (an esophagus data sample set, a stomach data sample set, a duodenum data sample set and a colorectal data sample set) according to the marking condition of the digestive tract part; creating an inspection scene data sample set (a common white light data sample set, an NBI data sample set, an amplified data sample set and an iodine staining data sample set) according to the marking condition of the inspection scene; generating lesion data sample sets corresponding to the digested part and the examination scene according to labeling conditions of the digestive tract part, the examination scene and the lesion respectively, specifically including the following lesion data sample sets in terms of the present embodiment:

esophageal white light scene lesion data sample set

Esophageal NBI scene lesion data sample set

Esophagus magnification scene lesion data sample set

Esophageal iodine-stained scene lesion data sample set

Stomach white light scene lesion data sample set

Stomach NBI scene lesion data sample set

Stomach magnification scene lesion data sample set

Duodenum white light scene lesion data sample set

Duodenum NBI Scenario data sample set

Duodenum enlarged scene lesion data sample set

Colorectal white light scene lesion data sample set

Colorectal NBI scene lesion data sample set

Colorectal magnification scene lesion data sample set

If the number of pictures or labeled areas contained in different labels in the data sample set is too different, the trained model is obviously biased to the labels with more data during reasoning, which is very disadvantageous to the model training. In reality, the data come from real cases, and the unbalanced data is almost inevitable. Therefore, in the process of model training, the number of data corresponding to each label is balanced; for the label with smaller corresponding data quantity, the data quantity of the corresponding data can be increased by adopting data enhancement means such as rotation, brightness conversion, shift, inversion, cutting, Gaussian noise and the like; for the labels with obviously larger data quantity, a part of data can be properly removed;

the recognition models to be trained can be divided into two types, and the shooting part recognition model and the inspection scene recognition model belong to classification models; the lesion recognition model belongs to a segmentation model. The basic flow of these model training is the same, as follows:

(b) dividing the balanced data sample set into a training set, a verification set and a test set according to a specific proportion; the training set and the verification set are used for model training, and the test set is used for finally verifying the training result;

(c) the training set is used for optimizing the network parameters of the model, the verification set is used for testing the training effect of each training turn, and training is carried out based on the deep convolutional neural network, so that the basis for optimizing the training hyper-parameters is provided for an AI engineer. And obtaining a training model after the accuracy and the loss value obtained by the training set and the verification set are stable and reach a relatively optimized value. At this point, a test set, which is a data set that the training model never has been exposed to, can be used, thus verifying the final performance of the training model. If the accuracy obtained through the test of the test set meets the expected requirement, the model meets the requirements in the aspects of accuracy and generalization, and the recognition model can be obtained preliminarily through model training. If the test effect on the test set is not good, which indicates that the generalization capability of the model is not good, at this time, the training of the model is restarted, the hyper-parameters of the data sample set and the model are rearranged and optimized, a new model is trained, and the steps are repeated until a recognition model which can pass the test set is obtained.

Preferably, since the recognition model of the digestive tract photographed part and the recognition model of the examination scene are both classified models, by using the two models, the digestive tract part (one of esophagus, stomach, duodenum and colorectal) and the examination scene (one of common white light, NBI, iodine staining and magnifying) where the digestive tract photographed picture is located can be recognized for one digestive tract photographed picture, and at this time, the recognition model of the photographed part and the recognition model of the examination scene only return corresponding labels, but do not return a specific rectangle or a divided area on the picture. .

Preferably, since the lesion identification models are segmentation models, for a plurality of polygonal lesion segmentation areas of one gastrointestinal endoscope image, each lesion area may be marked with a corresponding label, and when a lesion area is identified, the lesion identification model returns the corresponding label and the polygonal segmentation area.

The classification model of the invention is trained based on Deep Convolutional Neural Network (DCNN). The use of deep convolutional neural networks for lesion classification and segmentation for medical images is one of the currently very efficient methods. The DCNN gradually extracts useful information of medical image features through a series of convolution and pooling operations, gradually converges simple image features into high-order features, further judges whether a patient is ill or not, and accurately segments a lesion area. Meanwhile, DCNN also has the ability to automatically learn deep and more distinctive features from data. The DCNN is composed of neurons, network weights and deviation values to be learned are contained among the neurons, the neurons perform convolution operation on input data to obtain output results (forward propagation), the output results are compared with preset labels, errors (residual errors) are calculated, then the errors are transmitted backwards layer by layer to update weights and offsets (backward propagation) among the neurons, and the operations are repeated continuously until the errors are within a threshold range and the identification accuracy is met. The convolutional neural network is a multi-layer perceptron model designed for identifying two-dimensional and above images, and the structure of the model can have high invariance to image scaling, image inclination, image translation and other forms of deformation, so that the convolutional neural network is particularly suitable for the classification task of medical images. The structure of the network mainly comprises two features of sparse connection and weight sharing, and in the aspect of feature extraction, each neuron obtains input from a local connection node of the upper layer, so that the local feature is extracted by processing. In the aspect of feature mapping, each layer in a network structure is composed of a plurality of feature mappings, each feature mapping is in a two-dimensional form, neurons in the two-dimensional form can locally share the same weight, invariance of image displacement is guaranteed, and free parameters of the network are correspondingly reduced. The convolutional neural network also has a special structure, namely downsampling, in general, each convolutional layer is followed by a layer of downsampling, and the downsampling layer can realize the functions of local averaging and calculation, reduce the resolution of feature mapping and simultaneously reduce the sensitivity of a model to image deformation caused by translation and the like. Because the medical image is mostly a gray image, the information of the position, the boundary, the size, the shape and the like of the lesion is relatively fuzzy, and the structure of the convolutional neural network can be well adapted to the characteristics of the medical image. Based on the characteristics, the special deep convolutional neural network realized by the invention can well realize the classification and identification functions of the shot parts of the digestive tract and the inspection scene.

Our segmentation model is trained based on the DCNN-based Full Convolution Network (FCN). FCN provides a very good solution for semantic segmentation of medical images. The FCN model performs feedforward calculation on an input image from left to right through a network model by using a convolutional layer, a pooling layer and a corresponding activation function, and characteristic representation is extracted layer by layer; using 1 × 1 convolutional layer to replace the fully connected layer, and using the deconvolution layer to up-sample the feature map of the last convolutional layer, so that it is restored to the same size of the input image, thereby generating a prediction for each pixel; the segmentation golden standard is then used to train and adjust network parameters in a supervised manner by back-propagating the error. The FCN does not need a process of calculating image blocks, reserves spatial information in an original input image, and can perform pixel-by-pixel classification on an up-sampled feature map, namely, can perform pixel-to-pixel semantic segmentation. The edges of the predicted graph obtained by directly up-sampling the final feature graph may not be smooth, and the shallow feature and the high-level abstract feature need to be fused, and then output is obtained by up-sampling. The method gives consideration to local and global information, and can obtain a very good segmentation effect on the medical image.

Corresponding system embodiment, a multi-scene digestive tract endoscope image recognition system based on artificial intelligence, includes:

In practical application, the main control program is set to be responsible for receiving information of the auxiliary recognition system and the outside, calling and controlling a plurality of artificial intelligent recognition models and displaying and outputting a final judgment result.

The auxiliary decision system can recognize either a single picture or a continuous video stream. For a single picture, the main control program can directly start an identification process; for video streams, the host reads the data from the video data buffer frame by frame, so as to split the video stream into pictures one by one and identify them in sequence.

The running mode of the main control program comprises an automatic mode and a manual mode.

In the automatic mode, the main control program firstly identifies the received image by using a digestive tract shooting part identification model so as to determine the digestive tract part represented by the current image; then, identifying the current image by using an inspection scene identification model so as to obtain an inspection scene where the content of the current image is located; and finally, selecting a corresponding lesion recognition model according to the obtained digestive tract shooting part information and the examination scene information, and recognizing the current image by using the model, thereby finally determining the lesion condition of the current image content. Since the lesion recognition model is a segmentation model, the final recognition result is that 0 to a plurality of polygonal segmentation regions labeled with lesion names on a picture are recognized. In the auxiliary diagnosis process in the automatic mode, when the digestive tract shooting part or the examination scene is changed, the main control program automatically responds and immediately switches to the state of recognizing the lesion by using the corresponding lesion model.

The main control program provides the function of manually setting the shooting part and the checking scene of the digestive tract, under the manual mode, the main control program does not call the relevant model to identify the shooting part and the checking scene of the digestive tract where the current picture content is located, but directly selects the corresponding lesion identification model according to the relevant setting of the user, and other processing logics are the same as those of the automatic mode.

And after the main control program acquires the identification result of the lesion identification model, the identification result is displayed in an auxiliary judgment system in a visual mode. The output information includes:

original content of video or image

Identified lesion segmentation areas (either displayed separately or superimposed on the original picture or video)

The digestive tract part of the identified current scene and the examination scene information.

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.

Claims

1. A multi-scene digestive tract endoscope image identification method based on artificial intelligence is characterized by comprising the following steps:

2. The artificial intelligence based multi-scene endoscopic image recognition method of claim 1, wherein said examination scene comprises normal white light, NBI, magnification, iodine staining.

3. The artificial intelligence based multi-scene endoscope image recognition method of claim 1, wherein the process of labeling the shot part of the image, the examination scene for the data sample, and marking the segmentation area of the lesion comprises: determining the classified content of the digestive tract part of the shooting part and the classified content of the inspection scene; and importing the desensitized endoscope images into a marking tool, and marking the endoscope images by using a picture marking tool, wherein each endoscope image needs to mark the corresponding digestive tract part, the corresponding examination scene and the condition that the picture content contains lesion.

4. The method as claimed in claim 3, wherein for the case that the image content contains a lesion, the segmented region of the lesion is marked, and the corresponding label is assigned and marked on the corresponding image.

5. The artificial intelligence based multi-scene endoscopic image recognition method of claim 4, wherein if a plurality of lesions are included in a picture, the segmentation area of each lesion is marked and assigned with a corresponding label.

6. The artificial intelligence based multi-scene endoscope image recognition method of claim 1, wherein the training method of the shooting part recognition model, the examination scene recognition module and the lesion recognition model comprises:

(b) dividing the balanced data sample set into a training set, a verification set and a test set according to a specific proportion;

7. The artificial intelligence based multi-scene endoscope image recognition method of claim 6, wherein the data sample set obtained after balancing is divided into a training set, a validation set and a testing set according to a ratio of 6:2: 2.

8. The artificial intelligence based multi-scene endoscope image recognition method of claim 1, wherein during the process of recognizing the shooting part and the inspection scene of the endoscope image, the shooting part recognition model and the inspection scene recognition model only return corresponding labels.

9. The artificial intelligence based multi-scene endoscopic image recognition method of claim 1, wherein in said process of recognizing the lesion site, when a plurality of polygonal lesion segmented regions are recognized, each lesion segmented region is marked with a corresponding label, and the lesion recognition model returns the corresponding label and the polygonal segmented regions.

10. A multi-scene digestive tract endoscope image recognition system based on artificial intelligence is characterized by comprising: