CN110335240B

CN110335240B - Method for automatically grabbing characteristic pictures of tissues or foreign matters in alimentary canal in batches

Info

Publication number: CN110335240B
Application number: CN201910385767.3A
Authority: CN
Inventors: 曾凡; 黄锦; 柯钦瑜; 黄勇; 邰海军; 段惠峰
Original assignee: Henan Xuan Yongtang Medical Information Technology Co ltd
Current assignee: Henan Xuanwei Digital Medical Technology Co.,Ltd.
Priority date: 2019-05-09
Filing date: 2019-05-09
Publication date: 2021-07-27
Anticipated expiration: 2039-05-09
Also published as: CN110335240A

Abstract

The invention discloses a method for automatically capturing organization characteristic pictures in alimentary tracts in batches, which comprises the steps of converting a video format, removing a background in a video frame, carrying out gray scale conversion and binarization processing on target characteristics, outputting an intercepted image of the target characteristics by adopting contour detection of the target characteristics, and storing the intercepted image, and has the advantages of rapidness, accuracy and high speed.

Description

Method for automatically grabbing characteristic pictures of tissues or foreign matters in alimentary canal in batches

Technical Field

The invention relates to the technical field of image recognition, in particular to a method for automatically grabbing tissue characteristic pictures in a digestive tract in batches.

Background

The intelligent auxiliary diagnosis and treatment under the digestive endoscopy adopts deep learning as the most effective algorithm for realizing intellectualization, the deep learning depends on a data set of characteristics and a trained model, under most conditions, the deep learning model can not be completely learned from any data, the data needs to be labeled and classified, the general labeling and classification of the data are classified and grabbed by people with the precise target characteristics, but the manual grabbing and screening of the target pictures in the video needs a large amount of manpower, but the accuracy of the manually intercepted pictures is low, the pictures with the same characteristics can influence the model training of machine learning if the intercepted areas, sizes and segments are different, the environment in the digestive tract is non-geometric, dynamic, has fractal structures, and the space of a closed pipeline, the digestive endoscopy moves in the digestive endoscopy and identifies the target tissues on the intestinal tract, the contamination of the input training data for identifying the tissue characteristics by the characteristics of the inner wall of the digestive tract is caused, and the overfitting phenomenon occurs in the prediction process.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for automatically capturing texture feature pictures in a digestive tract in batches, which has the advantage of continuously and automatically capturing formatted designated feature pictures in batches from a historical operation video.

The invention is realized by the following technical scheme:

the method for automatically grabbing the tissue characteristic pictures in the alimentary canal in batches comprises the following steps:

a) the method comprises the following steps Video reading and color channel format conversion: reading a video of the digestive tract endoscope diagnosis and treatment process on the storage device, and converting the color channel format of the video from RGB into HSV;

b) the method comprises the following steps Locating the target in the video and removing the video background: adjusting the range of parameters H, S and V in HSV color space to locate video content, adjusting parameters H, S and V to remove all background except for target features, any of tissue organs, stool, examination and surgical instruments within the alimentary tract;

c) the method comprises the following steps Acquiring a target characteristic picture: acquiring a target feature picture according to the target feature;

d) the method comprises the following steps Carrying out gray level conversion and binarization processing on the target characteristic picture:

e) the method comprises the following steps Carrying out contour detection and positioning on the target features: contour detection is carried out on the binary image by adopting a Freeman chain code, and the position of the target characteristic picture, the target characteristic contour range and the statistical total number of the target characteristic points are returned;

f) the method comprises the following steps Calculating the proportion of the target features in the picture: mapping a target characteristic image in the binary image to a matrix, converting the matrix into vectors in a row end to end connection manner, accumulating and dividing vector values by 255 to obtain the number of all white pixel points of the characteristic values, and calculating the proportion of white pixels to background black pixels to obtain the size of the target characteristic on the image;

g) the method comprises the following steps And judging whether the target features in the video meet the interception judgment condition frame by frame, if so, intercepting the target feature image, and storing the interception result.

The method is characterized in that: in step c), obtaining a target feature picture is as follows: and performing mask operation on each pixel in the target feature by using a mask, wherein the target feature picture comprises a target feature area image and a non-target feature area image, the pixel value in the target feature area image is kept unchanged, and the pixel value of the non-target feature area image is zero.

In the step d), a gray level conversion formula is adopted to obtain a target characteristic picture, a binary image is obtained from the gray level image of the target characteristic picture through a binary threshold algorithm, and the binary image is denoised through morphological corrosion operation and expansion operation, wherein the gray level image of the target characteristic picture is a single-channel gray level image, the single-channel value range is 0-255, and the binary image is a binary image with a single-channel value of 0 or 255.

In step g), the step of determining whether the frames in the video satisfy the interception determination condition includes the following steps:

g1) the method comprises the following steps Judging whether the statistical total number of the target characteristic points in the step e) is larger than 5000, if so, turning to a step g 2), and if not, directly converting the next frame;

g2) the method comprises the following steps Judging whether the ratio of the width to the height of the target feature profile in the step e) is less than 5 times and more than one fifth of the width to the height, if so, turning to a step g 3), and if not, directly converting the next frame;

g3) the method comprises the following steps And f), judging whether the proportion of the target features in the step f) in the whole picture is in the range of 2% -20%, if so, intercepting the target features in the frame and storing the target features in a result set, otherwise, directly converting the next frame.

The invention discloses a method for automatically capturing organization characteristic pictures in alimentary tracts in batches, which comprises the steps of carrying out format conversion on a video, removing a background in a video frame, highlighting a target characteristic, carrying out gray scale conversion, binarization processing, denoising and expansion operation on the target characteristic, further highlighting the target characteristic, adopting contour detection of the target characteristic, outputting position information of the target characteristic, comparing the target characteristic at adjacent same positions, judging whether the target characteristic is a frame video with the same target characteristic or not, adopting a plurality of groups of video frame format units to carry out picture capturing, and storing captured images, and has the advantages of rapidness and precision.

Drawings

FIG. 1 is a flow chart of a method for automatically batch-grabbing pictures of tissue features in the alimentary tract.

FIG. 2 is a schematic diagram of a progress bar for parameter H, S and V adjustment.

Fig. 3 is a feature map after binarization when the target feature is a surgical instrument.

Fig. 4 is a picture from which the position and width of a target feature in the picture is determined.

Fig. 5 is a partial picture of a target feature being a surgical instrument taken from a video.

Fig. 6 is a schematic diagram of a storage structure for vectorization of pictures in each classification data set.

Fig. 7 is a diagram of the results of the neural network model identifying tissue or foreign matter in the real-time picture.

FIG. 8 is a graph of the results of the tissue or foreign object identified in FIG. 7 after storage.

Fig. 9 is the number of identical feature points in the two picture feature point sets.

FIG. 10 is a picture in a dataset that is not compared and archived.

FIG. 11 is a result of comparing and archiving the pictures in the data set of FIG. 10.

FIG. 12 is a diagram of the results of a high precision convolutional neural network identifying a surgical procedure and classification.

FIG. 13 is a pictorial result of identifying a metal collar of an electrosurgical resection ring during a surgical procedure.

Fig. 14 is a pictorial result of identifying the opening of a metal clip during a surgical procedure.

Figure 15 is an image identifying the undetached hemostatic titanium clip after closure during surgery.

Figure 16 is an image identifying the detachment of the hemostatic titanium clip after closure during surgery.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It is obvious that the described embodiment examples are only a part of the embodiment examples of the present invention, not all of the embodiment examples, and all other embodiment examples obtained by those skilled in the art without any inventive work based on the embodiment examples of the present invention belong to the protection scope of the present invention.

As shown in fig. 1, the method for automatically capturing the tissue feature pictures in the alimentary tract in batches comprises the following steps:

the method comprises the following steps: capturing operation video characteristic pictures from the video in batch,

a) the method comprises the following steps Video reading and color channel format conversion: reading a video of the digestive tract endoscope diagnosis and treatment process on the storage device, and converting the color channel format of the video from RGB into HSV so as to find a background mask which can be stripped from a specific target identification area;

b) the method comprises the following steps Locating the target in the video and removing the video background: as shown in fig. 2, adjusting the ranges of parameters H, S and V in the HSV color space to locate the video content, obtaining a corresponding HSV mask by using the HSV color space in the video background, locating a target feature in the video through the HSV mask, and adjusting parameters H, S and V to remove all backgrounds except the target feature, wherein the target feature is any one of tissue organs, feces, examination and surgical instruments in the alimentary tract;

c) the method comprises the following steps Acquiring a target characteristic picture: performing mask operation on each pixel in the target feature by using a mask, wherein the target feature picture comprises a target feature area image and a non-target feature area image, the pixel value in the target feature area image is kept unchanged, and the pixel value of the non-target feature area image is zero;

d) the method comprises the following steps Carrying out gray level conversion and binarization processing on the target characteristic picture: obtaining a target feature picture by adopting a Gray scale conversion formula Gray = (R299 + G587 + B114 + 500)/1000, obtaining a binary image from the Gray scale image of the target feature picture through a binary threshold algorithm, and performing morphological corrosion operation and expansion operation on the binary image to remove noise, wherein the Gray scale image of the target feature picture is a single-channel Gray scale image, the single-channel value range is 0-255, and the binary image is a binary image with a single channel of 0 or 255, as shown in FIG. 3;

e) the method comprises the following steps Carrying out contour detection and positioning on the target features: and performing contour detection on the binary image by using a Freeman chain code, and returning the position of the target feature image, the target feature contour range and the statistical total number of the target feature points, as shown in FIG. 4, wherein the target feature is a surgical instrument, and the wire frame position in the image is the position of the target feature in the image. The width of the wire frame is the outline range of the target feature;

g) the method comprises the following steps And judging whether the video frame meets the interception judgment condition frame by frame, if so, intercepting the target characteristic graph in the picture and storing the interception result. As shown in fig. 5, it is a partial picture of the target feature as a surgical instrument, which is taken from the video.

In step g), the step of determining whether the video frame meets the interception determination condition includes the following steps:

g1) the method comprises the following steps Judging whether the statistical total number of the target characteristic points in the step e) is larger than 5000, if so, turning to a step g 2), and if not, directly converting the next video frame;

g2) the method comprises the following steps Judging whether the ratio of the width to the height of the target feature profile in the step e) is less than 5 times and more than one fifth of the ratio of the width to the height, if so, turning to a step g 3), and if not, directly converting the next video frame;

g3) the method comprises the following steps And f), judging whether the proportion of the target features in the picture in the step f) is in the range of 2% -20%, if so, intercepting the target features in the frame and storing the target features in a result set, and if not, converting the next video frame.

The doctor manually screens the result set, and needs to delete the picture of the irrelevant characteristic, and finally the rest is the standard and accurate characteristic diagram.

Based on the step of grabbing the target characteristic diagram in batch, the work of removing the polyp video by endoscopic submucosal dissection can be further implemented, and the method specifically comprises the following steps:

step two: establishing a neural network model, and training the neural network model:

h) the method comprises the following steps Establishing a data set: classifying and storing target characteristic pictures acquired from gastrointestinal endoscopy to establish a classification data set;

establishing a mathematical and business model of a target characteristic picture according to the target characteristic attribute, automatically grabbing the target characteristic pictures appearing in the gastrointestinal endoscope detection in batch, and storing the target characteristic pictures in a classified manner to establish a classified data set;

the target characteristic attributes comprise that the target characteristics are irregularly distributed in the video, the proportion of the size of the target characteristics in the picture is 3% -20%, the color of the target characteristics is inconsistent with the color of the digestive tract, the illusion that the target characteristics move in the region can be obtained after the lens of the digestive tract endoscope moves and the background of the digestive tract is shielded, the number of frames of the video with the target characteristics is high, professional medical personnel are required to label the picture, and the obtained data volume is small;

the classified data sets are stored in a storage space, preferably in a folder format, opened up on a storage device, which comprises a magnetic disk or a removable hard disk. The classification data set comprises a background data set, a digestive tract tissue data set and a foreign body data set, wherein target feature pictures of the background data set comprise non-identification content pictures of an intestinal wall, a gastric wall, an esophagus and the like, target feature pictures in the digestive tract tissue data set comprise intestinal tissues needing to be identified and recorded, such as cardia, fundus, polyp, tumor and the like, and target feature pictures in the foreign body data set comprise contents of the non-intestinal tissues needing to be identified and recorded, such as feces, clips, ferrules, straws and the like.

i) The method comprises the following steps Establishing a training set, a verification set and a test set: more than 60% of data are extracted from each classified data set to generate a test set; dividing each classification data set into a training set and a verification set according to a K-fold cross verification method, and carrying out data vectorization processing on the test set, the training set and the verification set;

the K-fold cross verification method comprises the steps of dividing each data set into K partitions, obtaining K pictures, randomly obtaining K-1 partitions as a training set each time, and using the rest partitions as verification sets for verification;

the training set and the verification set are used for training a deep convolutional neural network model, and the test set is used for evaluating an actual recognition result of the deep neural network model;

the medical data has less marked data and the similarity of the content extracted from the video is higher, so that the data of the verification set is very small, the verification has larger fluctuation, and the division mode of the verification set can cause the deep learning neural network model to have large variance during evaluation.

In step i), the vectorization processing of the test set, the training set and the verification set comprises the following steps:

i1) the method comprises the following steps Creating a picture path vector imagepages storage unit, and sequentially storing the address information of each type of data set in the picture path vector imagepages;

i2) the method comprises the following steps Respectively creating data and label storage units, traversing all storage pictures in the imagePaths, compressing the pictures into pictures with the size of 96x96, traversing the mean values of the pictures according to columns, and splicing head and tail lines to obtain vectors of the pictures;

i3) the method comprises the following steps Dividing the color values of the picture vectors by 255 so as to convert the color values into decimal numbers in the range of 0 to 1, sequentially storing the decimal numbers in the data, and sequentially storing the category names corresponding to the picture vectors in label;

fig. 6 is a schematic diagram of a storage structure for vectorization of pictures in each classified data set.

j) The method comprises the following steps Creating a neural network model according to the 3D convolution, the maximum pooling, the full-link neurons, the data flattening and the probability output, and performing regularization processing on the test set, the training set and the verification set, wherein the neural network model comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a data flat transition layer, a full-link data layer and a probability output layer;

the input layer is an input inlet of a vectorized picture, the width and the height of the model of the input layer are all 150, and the color channels are three channels.

The first convolution layer inputs input content into a convolution kernel, the size of the convolution kernel is 3 x 3, 64 nodes are hidden, and an activation function is a correction linear unit;

the first maximum pooling layer performs 2 x 2 pooling on the convolution result of the first convolution layer;

the convolution kernel of the second convolution layer has the size of 3 x 3, nodes are hidden by 128, and an activation function is a modified linear unit;

the second maximum pooling layer performs 2 x 2 pooling on the convolution result of the second convolution layer;

the convolution kernel of the third convolution layer has the size of 3 x 3, 256 hidden nodes and an activation function of a correction linear unit;

the third maximum pooling layer performs 2 x 2 pooling on the convolution result of the third convolution layer;

the data flat transition layer is used for realizing one-dimensional multi-dimensional data and is used for the transition from the convolution layer to the full link layer;

the full-link data layer transmits input parameters into 1024 hidden nodes, and an activation function of the full-link data layer is a modified linear unit;

the probability output layer realizes probability distribution of different classifications through gradient logarithm normalization of finite discrete probability distribution;

the regularization of the neural network model adopts a regularization method of which the weight in the weight regularization is L2 norm so as to reduce the overfitting of the neural network model.

k) The method comprises the following steps Training the neural network model: setting a loss function of a neural network model, initializing network parameters of each layer, inputting a training set and a verification set after vector regularization for training, setting a root mean square error as an optimizer, and updating weight parameters in each layer of network through gradient reduction of a multi-classification cross entropy loss function value to obtain a training model.

l): testing the neural network model: and testing the vector regularized test set by using a training model to test the generalization ability and the recognition ability of the test set.

If the generalization capability and the recognition capability are insufficient, retraining is needed;

m): acquiring a real-time gastrointestinal endoscope video, identifying and recording the real-time gastrointestinal endoscope video: the method comprises the steps of obtaining a real-time gastrointestinal endoscope video image, uniformly dividing the real-time gastrointestinal endoscope video image into a plurality of sub-regions, compressing each sub-region to the size of a picture format input by a training model, traversing all the sub-regions of the gastrointestinal endoscope image, vectorizing each sub-region, inputting the vectorized sub-region into a neural network model, returning an identification probability vector by the model, using a probability scalar with the maximum value as a result, judging whether the probability scalar is larger than 95%, and if so, storing the identified target characteristic sub-region.

In the step m), the step of uniformly dividing the real-time gastrointestinal endoscope image into a plurality of subareas comprises the following steps:

m 1): acquiring the image width and the image height of an endoscope real-time image, and dividing the image width and the image height by ten to divide the gastrointestinal endoscope image into 100 sub-regions;

m 2): and traversing all the sub-regions, compressing all the sub-region pictures, vectorizing each sub-region picture, dividing the color values of each vectorized sub-region by 255, and compressing the RGB three channel values into decimal numbers in the range of 0 to 1.

Inputting the picture sub-region vector into a deep learning neural network model, outputting a probability vector predicted value and an index value corresponding to the predicted value, multiplying the predicted value by 100, if the predicted value is larger than 95, marking the predicted value in a picture, identifying tissues and foreign matters in intestinal tracts by adopting a square frame form in the picture shown in fig. 7, finding a corresponding value in label according to the index value, identifying the names of the tissues or the foreign matters of a feature map in a real-time picture, naming the grid picture of the feature tissues or the foreign matters by the time of the system, and storing and recording the picture, as shown in fig. 8.

Step three: and (5) traversing the video verification neural network model in batch, and generating a prediction picture according to the neural network model.

Step four: intelligently comparing pictures with higher similarity, and filing the pictures without similarity into a data set;

p): the processor acquires an input path and an output path of the pictures and sequences the pictures in the data set according to the picture modification time;

q): sequentially reading two pictures in a data set, wherein the two pictures are any one picture in the data set and a previous picture or a next picture adjacent to the picture in the modification time;

r): judging whether the ratio value of the sizes of the two pictures is within a preset ratio range, if so, turning to the step s), otherwise, simultaneously storing the two pictures in a data set pointed by an output path, and turning to the step q), wherein the ratio value of the sizes of the two pictures is the size of the picture before modification time divided by the size of the picture after modification time, the size of the picture is the product of the height and the width of the picture, and the preset ratio range is less than 0.5 or more than 1.5;

s): converting the two pictures into gray-scale pictures with the same size, performing sub-region conversion processing on the gray-scale pictures, and creating a gray-scale mean matrix;

t): judging whether the standard deviation of a matrix obtained by subtracting the mean value matrixes of the two pictures is smaller than a specified threshold, if so, turning to the step u), otherwise, simultaneously storing the two pictures in a data set pointed by an output path, and turning to the step q), wherein the specified threshold is 15;

u): carrying out characteristic value detection on the two pictures to respectively obtain two picture characteristic point sets, wherein the characteristic value detection is an SIFT (Scale innovative feature transform) characteristic value detector;

v): counting the number of the same feature points in the feature point sets of the two pictures, and performing matching and KNN by adopting LANN to obtain the number of the same feature points in the feature point sets, wherein as shown in FIG. 9, the horizontal axis represents the width of the image, and the vertical axis represents the height of the image, and the LANN is (Library for Approximate Nearest Neighbors) fast Approximate Nearest neighbor search;

w): calculating to obtain a threshold value of the number of the same characteristic points, judging whether the number of the same characteristic points exceeds the threshold value of the number of the characteristic points, if not, storing the picture after the modification time to a data set pointed by an output path, if so, not processing, and entering a step q) to compare the next picture again after the comparison is finished, wherein the threshold value of the number of the characteristic points is as follows: the ratio of the average of the sizes of the two pictures to the total number of pictures in the data set.

As shown in fig. 10, the pictures in the data set that are not compared and archived, and fig. 11 shows the results of comparing and archiving the pictures in the data set of fig. 10.

In step s), the step of converting the two pictures into the gray-scale images with the same size comprises the following steps:

s 1): sequentially acquiring the width, the height and the color channel information of the two pictures;

s 2): sequentially acquiring RGB single-channel color values of two pictures according to channel information, and sequentially performing gray level conversion on the two pictures by adopting a gray level conversion formula;

s 3): and respectively calculating the product result values of the width and the height of the two pictures, and converting the picture with the large product result value into the picture with the small product result value.

In step s), the gray scale map is subjected to sub-region conversion processing, and the gray scale mean value matrix is created, including the following steps:

s1): acquiring width and height information of a picture;

s2): dividing the width and height information of the picture by the same constant respectively to obtain the width CellWidth of each sub-region and the height CellHeigh of each sub-region, wherein the constant is an integer and is the number of the sub-regions of the picture in width or height;

s3): creating a dimensional matrix, wherein the size of the row or column of the matrix is equal to the number of sub-areas of the picture in width or height;

s4): traversing the width pixel of the picture, dividing the currently traversed pixel by the width CellWidth of the sub-region to obtain the current pixel which is the fourth sub-region in the picture width direction, traversing the height pixel of the picture, dividing the currently traversed pixel by the height CellHeigh of the sub-region to obtain the current pixel which is the fourth sub-region in the picture height direction, accumulating the determined pixel value of the current sub-region and the pixel value before the pixel of the sub-region, and storing the accumulated result in the row-column position of the matrix corresponding to the current pixel position;

s5): dividing each value in the matrix by the total number value of the sub-area to obtain an average color value of the gray value, subtracting the average color value in the space from 255 to obtain a negation value, and storing the negated space average color value into a corresponding matrix.

Step five: retraining the neural network model according to the data set of the non-similarity picture to obtain a high-precision neural network model; and (5) according to the method in the step two, taking the data set without the similarity picture as a training set, and performing network model training again until the overall classification precision reaches 95%.

Step six: reading and classifying the operation process pictures by the high-precision neural network model;

the method comprises the steps of marking pictures of opening and closing of the hemostatic forceps as training data to identify the hemostatic forceps in the operation process, marking pictures of opening and closing of metal clips as training data to identify the metal clips in the operation process, marking pictures of opening and tightening of an electric burning metal ferrule as training data to identify the electric burning metal ferrule, marking pictures of non-falling and non-falling after the hemostatic titanium clips are closed as training data to identify the hemostatic titanium clips, and identifying and classifying results are shown in figure 12, wherein (I) the pictures are classified hemostatic forceps, (II) the pictures are classified electric burning metal ferrules, (III) the pictures are classified metal clips, and (IV) the pictures are classified hemostatic titanium clips.

Step seven: identifying a video starting time confirmed by a specific surgical instrument and starting to record a video by the neural network model;

as shown in fig. 13, the high-precision neural network model identifies the first graph of the metal ferrule of the electroscalctomy ring during the operation, and records the time of the metal ferrule of the electroscalctomy ring;

as shown in fig. 14, the high-precision neural network model identifies the first picture of the metal clip opening, and records the opening time of the metal clip;

and judging the recorded time of the metal ferrule of the electric ablation ring and the recorded opening time of the metal clip, and starting video recording if more than three metal ferrules or metal clip opening images of the electric ablation ring appear in the high-precision neural network model and no video is recorded by taking the time before the recorded time as a time reference.

Step eight: identifying the video end time confirmed by a specific surgical instrument and ending recording by the neural network model;

the high-precision neural network model identifies the pictures which are not separated after the hemostatic titanium clamp is closed, and records the time when the last picture appears when the hemostatic titanium clamp is not separated after being closed, as shown in fig. 15;

the high-precision neural network model identifies the pictures of the hemostatic titanium clip which are separated after being closed, and records the time when the last picture appears when the hemostatic titanium clip is separated after being closed, as shown in fig. 16;

if pictures which are not separated after the hemostatic titanium clip is closed continuously appear, the time of the pictures which are separated after the hemostatic titanium clip is closed is taken as the end time;

and if the picture of the separation of the closed hemostatic titanium clamp appears, taking the time of the separation of the picture of the closed hemostatic titanium clamp as the final end time.

Step nine: the video is clipped and saved.

And clipping the video by taking the recorded starting time and the recorded ending time as the standard, and saving the video in a default specified wheel path for archiving.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. The method for automatically grabbing the tissue characteristic pictures in the alimentary canal in batches is characterized by comprising the following steps of: the method comprises the following steps:

d) the method comprises the following steps Carrying out gray level conversion and binarization processing on the target characteristic picture;

g) the method comprises the following steps Judging whether the target features in the video meet the interception judgment condition frame by frame, if so, intercepting the target feature picture, storing the interception result,

the method for judging whether the frames in the video meet the interception judgment condition comprises the following steps:

2. The method for automatically batch-grabbing the tissue characteristic pictures in the digestive tract according to claim 1, wherein: in step c), obtaining a target feature picture is as follows: and performing mask operation on each pixel in the target feature by using a mask, wherein the target feature picture comprises a target feature area image and a non-target feature area image, the pixel value in the target feature area image is kept unchanged, and the pixel value of the non-target feature area image is zero.

3. The method for automatically batch-grabbing the tissue characteristic pictures in the digestive tract according to claim 1, wherein: in the step d), a gray level conversion formula is adopted to obtain a target characteristic picture, a binary image is obtained from the gray level image of the target characteristic picture through a binary threshold algorithm, and the binary image is denoised through morphological corrosion operation and expansion operation, wherein the gray level image of the target characteristic picture is a single-channel gray level image, the single-channel value range is 0-255, and the binary image is a binary image with a single-channel value of 0 or 255.