CN113129287A

CN113129287A - Automatic lesion mapping method for upper gastrointestinal endoscope image

Info

Publication number: CN113129287A
Application number: CN202110437503.5A
Authority: CN
Inventors: 杨振宇; 刘奇为; 于天成; 胡珊
Original assignee: Wuhan Endoangel Medical Technology Co Ltd
Current assignee: Wuhan Endoangel Medical Technology Co Ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-07-16

Abstract

The invention discloses a focus automatic mapping method aiming at an upper gastrointestinal endoscope image, which comprises the following steps: collecting and preprocessing digestive tract data; constructing a large part classification model M-Net; constructing a focus detection model to respectively detect the focus of most parts; constructing a focus separating and summarizing device, numbering input pictures according to focuses and summarizing the input pictures into different folders respectively; constructing a mucous membrane visualization model, and removing invalid pictures or artifact pictures with poor visualization degree; constructing a 26-part positioning model, and carrying out 26-part classification on the upper digestive tract; and (5) automatically outputting the graph-retaining result. The invention can finally realize automatic retention of high-quality pictures of the upper gastrointestinal tract abnormal focus and fast positioning to a specific part. The method is beneficial to assisting an endoscope doctor to quickly and automatically position and retain the abnormal focus of the upper digestive tract, well improves the integrity and the effectiveness of retained images of the endoscope report, realizes the comprehensive quality homogenization of the endoscope report, and consolidates the guide treatment value and the follow-up value of the endoscope report.

Description

Automatic lesion mapping method for upper gastrointestinal endoscope image

Technical Field

The invention relates to the technical field of medical assistance, in particular to a focus automatic mapping method aiming at an upper gastrointestinal endoscope image.

Background

The endoscope report is a medical image-text report form with wide application, and is a visual and complete digestive endoscope examination summary file which is convenient for a patient to store and a doctor to look up. The endoscope report is the objective basis of the scenes of diagnosis and treatment follow-up visit, consultation exchange, medical identification and the like of the disease.

During the examination of the upper gastrointestinal endoscope, an endoscope physician presses a picture taking button or presses a picture taking pedal to record the examination findings, and all the collected pictures are stored in a picture and text workstation in a picture and text form. The endoscopist will review and review the library of images to form a final summarized endoscopy image-text report. The information of the endoscopy image-text report is derived from an image-taking gallery, and the quality of the information depends on the comprehensive quality of the image-taking gallery. At present, a plurality of guidelines propose quality control opinions on digestive endoscopy examination reports, and require that endoscopic picture records completely contain all anatomical landmark parts and abnormal focus pictures; but there is no control over the picture quality. In the actual clinical environment, due to the interference of factors such as working experience of different endoscopists, working habits, clinical environment and the like, the manual recording mode usually has a leak, and the problems of focus pictures in the image library such as omission and unclear collected pictures exist. The difference between the quality and the effectiveness of the image retention of the endoscope doctors is large, which is not beneficial to the high-level homogenization of the image-text report quality of the endoscope.

In recent years, the artificial intelligence technology has made great progress in the field of digestive endoscopy. The deep learning technology, which is a powerful tool in the field of image recognition, has been applied to various aspects of disease diagnosis in the digestive endoscopy, and shows great diagnosis and assistance potential. Artificial intelligence devices have the ability to capture pictures autonomously, but they are blinded and variable. A set of completely designed focus mapping algorithm is expected to realize artificial intelligence to stably and automatically generate a high-quality endoscope image-text report. Therefore, we propose an automatic mapping method for the lesion of the upper gastrointestinal endoscopic image.

Disclosure of Invention

Based on the technical problems existing in the background technology, the invention provides an automatic image-keeping method for the focus of an upper gastrointestinal endoscope image, which realizes automatic and stable keeping of a high-quality image of an abnormal focus of an upper gastrointestinal tract, and fast position of the abnormal focus of the upper gastrointestinal tract, so that a doctor can conveniently trace back and check the abnormal focus of the upper gastrointestinal tract, and solves the problems that in the prior art, the difference between the quality and the effectiveness of image keeping of an endoscope doctor is large, and the high-level homogenization of the image-text report quality of the endoscope is not facilitated.

The invention provides the following technical scheme: an automatic lesion mapping method aiming at an endoscopic image of an upper gastrointestinal tract comprises the following steps:

s1, collecting and preprocessing digestive tract data;

s2, constructing a large part of classification models M-Net, and performing classification filtering on original data to obtain a stomach picture set and an esophagus picture set;

s3, constructing a focus detection model to respectively detect the focus of the stomach picture and the esophagus picture;

s4, constructing a focus separating and summarizing device, numbering the input pictures according to the focuses, and summarizing the input pictures into different folders respectively;

s5, constructing a mucous membrane visualization model, and removing invalid pictures or artifact pictures with poor mucous membrane visualization degree;

s6, constructing a 26-part positioning model, and performing classified positioning on 26 upper gastrointestinal parts of the focus picture;

and S7, automatically outputting a focus result.

Preferably, in step S1, a gastroscope video is acquired through an endoscopy device, the video is decoded into a picture, and size normalization preprocessing is performed to only retain structural information of the picture.

Preferably, in step S2, the original data is divided into three categories: stomach pictures, esophagus pictures, and shielding pictures; wherein the picture of the duodenum is included in the picture of the stomach and analyzed, and the mask picture is discarded after being filtered.

Preferably, the stomach picture and the esophagus picture in the step S3 are respectively input into a stomach-Yolov 3 model and an esophagus-Yolov 3 model for lesion detection.

Preferably, in the step S4, based on the lesion detection model in the step S3, the output single picture result is input to a lesion isolation induction unit, and whether the single picture result is activated is determined according to the time sequence information, and all the lesion pictures in each activation period are regarded as the same lesion.

Preferably, in step S5, the mucosa visualization model is used to filter the lesion pictures, so as to remove invalid pictures or artifact pictures with poor mucosa visualization collected by the machine.

Preferably, in step S7, each lesion picture is subjected to confidence weighting sorting according to the output results of the lesion detection model in step S3, the mucosa visualization model in step S5, and the 26-region localization model in step S6, and the picture with the highest output value is output.

The invention provides a method for automatically retaining a picture of a focus aiming at an upper gastrointestinal endoscope image, which is characterized in that based on the image framing of a video acquired by an upper gastrointestinal endoscope in real time, an M-Net model is utilized to accurately detect the focus under two different scenes of an esophagus and a stomach.

Drawings

FIG. 1 is a schematic diagram of the principles of the present invention;

FIG. 2 is a flow chart of the method of the present invention;

FIG. 3 is a schematic illustration of the dimension normalization of the present invention;

FIG. 4 is a schematic structural diagram of an M-Net model;

FIG. 5 is a schematic view of a lesion detection model;

FIG. 6 is a schematic view of a lesion separation generalizer;

FIG. 7 is a schematic structural diagram of a mucosa visualization model;

FIG. 8 is a schematic view of a 26-part positioning model;

FIG. 9 is an automatic mapping method result visualization.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 and 2, the present invention provides a technical solution: an automatic focal mapping method for endoscopic images of upper gastrointestinal tract,

and S1, acquiring data and preprocessing. Acquiring a gastroscope video through endoscopy equipment, decoding the video into a picture set according to 7 frames per second, and carrying out size normalization and other preprocessing: the black-filling edges of the images with different length-width ratios are unified into a square, and then the square is scaled to 360 × 360 pixels, and only the structural information of the images is reserved. Such as the size normalization scheme shown in fig. 3.

S2, constructing a large-part classification model M-Net, and dividing the whole upper digestive tract image into a food canal part and a stomach part, wherein the cardia part picture is put into the food canal part, and the duodenum part picture is put into the stomach part for analysis.

Preparation of a data set: the endoscope pictures are classified and marked by a professional physician, and the labels are as follows: stomach pictures, esophagus pictures, and mask pictures. And then pre-processed to scale to 224 x 224 size.

Constructing and training a model: preferably, with ResNet50 as the basic neural network structure, the final pooling layer will be changed from global average pooling to maximum pooling, and the activation function adopts Softmax. As shown in FIG. 4, the loss function of M-Net takes the multivariate cross entropy loss:

wherein m is the number of input samples, n is the number of categories, the predicted value of the model is Y, and the true value is

The result output by the M-Net is a multi-dimensional column vector, each dimension corresponds to the probability of a category, and the higher the probability value is, the more likely the image belongs to the category.

And inputting the result of the step S1 into an M-Net model for three classifications, wherein the output result of each picture is as follows: stomach pictures, esophagus pictures, and mask pictures. The masked pictures are discarded after filtering (such as in-vitro pictures, oropharyngeal pictures and pictures of parts and lesions which cannot be identified due to too much blurring), and a stomach picture set and an esophagus picture set are reserved.

S3, constructing a focus detection model to detect and frame the focus. As shown in fig. 5.

Preparation of a data set: the method comprises the steps that a professional physician marks lesion areas in a stomach picture and an esophagus picture, lesion category and rectangular frame coordinate information are saved as xml files, original pictures are uniformly scaled to 352 × 352, and one xml file and a corresponding jpg picture serve as a pair of training data.

Constructing and training a model: preferably, a Yolov3 model is used as a basic network structure, the input of the model is 352 × 352, feature extraction is carried out through a Darknet-53 feature network, and the predicted values of 3 channels are output and correspond to three targets with different scales of large, medium and small.

The loss function of the lesion detection model is divided into three parts:

Loss＝lbox+lobj+lcls

(1) the error due to the coordinate calculation part, namely loss due to bbox:

where S × S is the mesh size, the number of candidate frames generated per mesh is B,

the jth anchor box representing the ith mesh is responsible for this object, and is 1 if it is responsible, otherwise it is 0.

To take into account the distance, overlap and scale between the target and the anchor, the IoU loss was modified to be a CIoU loss as follows:

where α is a weight function, v is a similarity parameter for measuring aspect ratio, ρ represents euclidean distance, b represents a center point, and c represents a diagonal distance of a minimum bounding rectangle.

(2) The error caused by the confidence, that is, the probability of whether the bounding box contains the object:

(3) the error caused by the category, namely the loss caused by the category:

and respectively entering a stomach-yolk 3 model and an esophagus yolk 3 model as input based on the stomach picture set and the esophagus picture set obtained in the step S2, wherein each picture can obtain a prediction result of 0 or 1, 0 represents no focus, and 1 represents focus.

S4, constructing a focus separating and summarizing device. As shown in fig. 6, based on the output result of the lesion detection model in step S3, if 7 consecutive 10 pictures are determined to be 1 according to the timing information, the process starts to enter the lesion activation time period, and all pictures in the lesion activation time period are regarded as the observation period of the same lesion. And if 7 continuous 10 pictures are judged to be 0, quitting the focus activation time period. Pictures with a prediction result of 1 during the inactive period are considered as false positive rejection. All pictures in each activation time period are numbered according to the focuses and are respectively summarized into different folders.

And S5, constructing a mucous membrane visualization model to evaluate the visual degree of the mucous membrane of each picture, and aiming at eliminating invalid pictures collected by a machine and incapable of being analyzed. As shown in fig. 7.

Preparation of a data set: the endoscope pictures are classified and marked by a professional physician, and the labels are as follows: invalid pictures, valid pictures. The original pictures are uniformly scaled to 224 x 224 size.

Constructing and training a model: preferentially, ResNet34 is used as a basic neural network structure, the input of the model is 224 × 224, the input is convoluted, and after passing through a series of Building block modules, the input finally enters a full connection layer, is activated by a Sigmoid activation function, and outputs a classification result of 0 or 1. The loss function of HL-Net adopts binary cross entropy loss:

where m is the number of input samples, the predicted value of the model is Y, and the true value is

And inputting the output result of the step S4 into the mucous membrane visualization model, wherein the output result of each picture is an effective picture or an ineffective picture, and only the effective picture in each focus folder is reserved.

S6, constructing 26 a position positioning model, and classifying the precise position of the upper digestive tract. As shown in fig. 8.

Preparation of a data set: the endoscope pictures are classified and marked by a professional physician, and the labels are 26 parts of the upper digestive tract. The original pictures are uniformly scaled to 224 x 224 size.

Constructing and training a model: preferably, ResNet50 is used as a basic neural network structure, the dimension is reduced to the number of categories by using a Softmax activation function, the value of each dimension of the vector represents the probability of the corresponding category, the range is 0-1, and the category corresponding to the highest value is the predicted part category.

Based on the output result of step S5, the deblurred lesion picture is input into the 26-region model for performing 26 region classifications of the upper digestive tract, and the output result of each picture is a sequence number of the region classification and its confidence value.

S7, based on the focus detection model output in the step S3, the mucous membrane visualization model output in the step S5 and the 26-part positioning model output in the step S6, the weighted sorting is carried out, and the confidence coefficient C after each focus file clamping is weighted_riskOutputting the highest picture:

wherein C is_yoloConfidence of the lesion examination model, C_delConfidence for visualization model of mucosa, C_partThe confidence of the lesion inspection model is shown as lambda, and the lambda is a weight coefficient. As shown in fig. 9.

In order to verify the robustness of the method, 100 videos are taken, and experts evaluate the image retention result, wherein the focus missing rate is only 0.3%; the quality of the focus image-keeping result is high, and the image repetition rate and the fuzzy rate are only 4%; on average, each case saves 26s the endoscopist the time to complete the endoscopic report.

The invention carries out image framing on the video acquired by the upper gastrointestinal endoscope in real time, utilizes the M-Net model to accurately detect focuses under two different scenes of the esophagus and the stomach, and through the focus activation logic, the influence caused by false positive of the focus detection model is avoided, the accuracy of real-time focus detection is improved, omission and mistaken stay which are possibly caused by an endoscope doctor in image acquisition are avoided, the missed diagnosis rate and the misdiagnosis rate of the endoscope doctor in the upper gastrointestinal endoscope inspection process are reduced, and the quality of diagnosis reports of the endoscope doctor is finally improved;

the invention also standardizes the completeness and the effectiveness of the examination report of the endoscope physician, and can independently and separately summarize the focuses observed at different moments according to time sequence information after focus pictures are input into a focus separation summarizer, thereby avoiding repeated image retention of the same focus, and retaining images with the best focus angle and the clearest quality according to a confidence weighting method when each focus is observed, thereby improving the diversity of the selection and the efficiency of image selection when the endoscope physician prints the examination report;

the invention also can accurately position the part of the upper gastrointestinal tract abnormal focus so that an endoscope doctor can quickly know the position of the focus when backtracking the report, and can observe the focus part in a targeted manner during the reexamination, thereby being beneficial to improving the reexamination efficiency of the endoscope doctor, consolidating the guide treatment value and the follow-up value of the examination report and relieving the problem of resource shortage of the endoscope doctor in hospitals.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. An automatic lesion mapping method for an endoscopic image of an upper gastrointestinal tract is characterized by comprising the following steps of: the method comprises the following steps:

s1, collecting and preprocessing digestive tract data;

and S7, automatically outputting a focus image-keeping result.

2. The method according to claim 1, wherein the lesion mapping method comprises: in step S1, a gastroscope video is acquired by an endoscopy device, the video is decoded into a picture, and size normalization preprocessing is performed to only retain structural information of the picture.

3. The method according to claim 1, wherein the lesion mapping method comprises: in step S2, the original data is divided into three categories: stomach pictures, esophagus pictures, and shielding pictures; wherein the picture of the duodenum is included in the picture of the stomach and analyzed, and the mask picture is discarded after being filtered.

4. The method according to claim 1, wherein the lesion mapping method comprises: and in the step S3, the stomach picture and the esophagus picture are respectively input into a stomach-Yolov 3 model and an esophagus-Yolov 3 model to detect the lesion.

5. The method for automatically mapping a lesion in an endoscopic image of the upper gastrointestinal tract according to claim 1 or 4, wherein: in step S4, based on the lesion detection model in step S3, the output single picture result is input to a lesion isolation induction device, and whether the output single picture result is activated is determined according to the timing information, and all the lesion pictures in each activation period are regarded as the same lesion.

6. The method according to claim 1, wherein the lesion mapping method comprises: in the step S5, the mucosa visualization model is used to filter the lesion pictures, so as to remove invalid pictures or artifact pictures with poor mucosa visualization collected by the machine.

7. The method according to claim 1, wherein the lesion mapping method comprises: and in the step S7, performing confidence weighting sorting on each focus picture according to the output results of the focus detection model in the step S3, the mucous membrane visualization model in the step S5 and the 26-part positioning model in the step S6, and outputting the picture with the highest value.