WO2022148216A1 - 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 - Google Patents
基于深度学习的胶囊内窥镜影像识别方法、设备及介质 Download PDFInfo
- Publication number
- WO2022148216A1 WO2022148216A1 PCT/CN2021/137938 CN2021137938W WO2022148216A1 WO 2022148216 A1 WO2022148216 A1 WO 2022148216A1 CN 2021137938 W CN2021137938 W CN 2021137938W WO 2022148216 A1 WO2022148216 A1 WO 2022148216A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rgb
- optical flow
- image
- images
- neural network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 239000002775 capsule Substances 0.000 title claims abstract description 34
- 238000013135 deep learning Methods 0.000 title claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 43
- 230000003287 optical effect Effects 0.000 claims description 54
- 230000003902 lesion Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 238000001839 endoscopy Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 210000001035 gastrointestinal tract Anatomy 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 208000025865 Ulcer Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 210000000813 small intestine Anatomy 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 231100000397 ulcer Toxicity 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/84—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10068—Endoscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/031—Recognition of patterns in medical or anatomical images of internal organs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/032—Recognition of patterns in medical or anatomical images of protuberances, polyps nodules, etc.
Definitions
- the present invention relates to the field of medical equipment imaging, in particular to a deep learning-based capsule endoscope image recognition method, an electronic device and a readable storage medium.
- Capsule endoscopy is a medical device that integrates core devices such as cameras and wireless transmission antennas; it collects images in the digestive tract in the body and transmits them to the outside of the body synchronously to perform medical examinations based on the obtained image data. Capsule endoscopy will collect tens of thousands of images during the inspection process, and the large amount of image data makes the reading work difficult and time-consuming. With the development of technology, the use of image processing and computer vision techniques for lesion identification has gained extensive attention.
- the Chinese patent application with publication number CN103984957A discloses an automatic early warning system for suspicious lesions in capsule endoscope images.
- the texture features of the flat lesions are detected, and finally the classification and early warning module is used for classification, which realizes the detection and early warning function of the flat lesions of the small intestine.
- the Chinese patent application with publication number CN111462082A discloses a lesion image recognition device, method, equipment and readable storage medium, which utilizes a trained 2D target deep learning model to perform lesion recognition on a single image.
- the purpose of the present invention is to provide a capsule endoscope image recognition method, device and medium based on deep learning.
- an embodiment of the present invention provides a deep learning-based image recognition method for a capsule endoscope, the method comprising: collecting N original images in a time-generated sequence through the capsule endoscope;
- the N original images are divided into M groups of original image sequences of the same size
- Each of the RGB image sequences is composed of image data in RGB format, and each of the optical flow image sequences is composed of image data formed by calculating optical flow fields of adjacent RGB images;
- the RGB image sequence and the optical flow image sequence are respectively input into a 3D convolutional neural network model to output a recognition result;
- the recognition result is a probability value of the occurrence of preset parameters;
- the 3D convolutional neural network model includes: RGB branch and optical flow branch;
- the RGB image sequence and the optical flow image sequence are respectively input into the 3D convolutional neural network model to output the recognition result, including:
- T1 and T2 respectively represent the recognition accuracy of the verification set in the RGB branch and the optical flow branch in the process of constructing the 3D convolutional neural network model.
- a sliding window segmentation method is used to segment N original images into M groups of original image sequences of the same size, including:
- the N images are sequentially divided into M groups of original image sequences, where,
- the value range of the preset window size K is 2 ⁇ K ⁇ 1000, and the value range of the preset sliding step S is 1 ⁇ S ⁇ K.
- the training method of the 3D convolutional neural network model includes:
- the 2D convolution kernel parameters of size N*N in the pre-trained 2D recognition model are copied N times; the 2D recognition model is obtained by training images with lesion labels, and its input is a single-frame image, and can only be used for a single frame. image recognition;
- the 3D convolutional neural network model after parameter initialization is trained by using the stochastic gradient descent method, the parameters of the model are iteratively updated until the iterative stop condition is satisfied, and the 3D convolutional neural network model for outputting the recognition result is formed.
- the sequence of the self-processing flow is arranged, and the 3D convolutional neural network model includes:
- the number of the cooperative spatiotemporal feature structures is P, P ⁇ (4, 16);
- the processing flow from input to output is arranged in sequence, and the collaborative spatiotemporal feature structure includes: a first collaborative spatiotemporal convolution layer, a first normalization layer, an activation layer; and the first collaborative spatiotemporal convolution layer, the first A normalization layer, the activation layer executes in parallel, and the fast connections from the input to the output of the cooperative spatiotemporal feature structure.
- the processing flow from input to output is arranged in sequence, and the collaborative spatiotemporal feature structure further includes: a second collaborative spatiotemporal convolution layer after the activation layer, a second normalization layer Floor.
- the process of processing data by the first cooperative spatiotemporal convolutional layer includes:
- x is the input data of (t ⁇ h ⁇ w) ⁇ c 1
- t ⁇ h ⁇ w is the size of the input feature map
- c 1 is the number of channels of the input feature map
- w represents the convolution filter kernel
- the weighted summation of the three sets of input data obtains the output y of the first collaborative spatiotemporal convolutional layer:
- [a hw , a tw , a th ] is the coefficient of size c 2 ⁇ 3
- [a hw , a tw , a th ] is normalized by softmax
- c 2 is the number of output channels
- the number 3 represents three views.
- an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor executes the program When implementing the steps in the above-mentioned deep learning-based image recognition method for capsule endoscopy.
- an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the above-mentioned deep learning-based capsule endoscope The steps in the image recognition method.
- the beneficial effects of the present invention are: the deep learning-based capsule endoscope image recognition method, device and medium of the present invention form a sequence of images of a specific format from multiple frames of images shot continuously, and then use 3D images to form an image sequence of a specific format.
- the convolutional neural network model performs multi-channel recognition on multiple frames of images, and then combines the recognition probabilities of each channel to output the recognition results to improve the image recognition accuracy.
- FIG. 1 is a schematic flowchart of a method for recognizing images of capsule endoscopes based on deep learning according to the first embodiment of the present invention
- FIG. 2 is a schematic diagram of sliding window segmentation provided by a specific example of the present invention.
- Fig. 3 is the schematic diagram of utilizing the trained 2D recognition model convolution kernel parameters to generate the 3D convolutional neural network model convolution kernel initialization parameters provided by a specific example of the present invention
- FIG. 4 is a schematic structural diagram of a 3D convolutional neural network model provided by the present invention.
- FIG. 5 is a schematic structural diagram of a collaborative spatiotemporal feature structure provided by the present invention.
- FIG. 6 is a schematic flow chart of data processing by coordinating spatiotemporal convolutional layers in a specific example of the present invention.
- a first embodiment of the present invention provides a deep learning-based capsule endoscope image recognition method, the method comprising:
- Each of the RGB image sequences is composed of image data in RGB format, and each of the optical flow image sequences is composed of image data formed by calculating optical flow fields of adjacent RGB images;
- step S1 during the operation of the capsule endoscope, images are continuously captured by the camera set on the endoscope, and are collected and stored synchronously or asynchronously to form the original image;
- the sliding window segmentation method is used to divide the N original images into M groups of original image sequences of the same size, including: numbering the N original images according to the time generation sequence, which are 1, 2, ... N in sequence;
- the preset window size K and the preset sliding step S the N images are sequentially divided into M groups of original image sequences, where,
- the first group of original image sequences after segmentation consists of original images numbered 1, 2,...,K
- the second group of original image sequences consists of original images numbered S+1, S+2,... , S+K original images
- the last group of original image sequences consists of original images numbered NK, N-K+1, ..., N, which are divided into Group original image sequence, notation in formula Indicates rounded up.
- the value range of K is 2 ⁇ K ⁇ 1000
- the value range of S is 1 ⁇ S ⁇ K.
- N is not divisible by K
- the number of original image sequences that are not K is set as the first group or the last group.
- the number N of the original images selected for calculation can be divisible by K, which will not be further described here.
- the original image sequence consists of original images 1, 2, ..., 10,
- the second set of original image sequences consists of original images 6, 7, ..., 15, until the last set of original image sequences consists of original images 9991, 9992, ..., 10000, which are divided into 1999 original image sequences.
- analyzing N original images or analyzing M groups of original image sequences forms M groups of RGB image sequences, and each of the RGB image sequences is composed of image data in RGB format. Specifically, each original image in the original image sequence is converted into an image in RGB format, so that each original image sequence forms a corresponding RGB image sequence. It should be noted here that N original images can also be converted to RGB format first, and then M groups of RGB image sequences can be formed by the same sliding window segmentation method as the original image sequence. The RGB image sequences formed by the above two methods are the same.
- the original image is an image in RGB format, it does not need to be transformed again, and the original image sequence is an RGB image sequence, which will not be described further here.
- N original images or M groups of RGB image sequences are analyzed to form M groups of optical flow images.
- the original images can be directly analyzed to obtain optical flow images, and then the optical flow images are formed according to the original image.
- the same sliding window segmentation method of the image sequence forms M groups of optical flow image sequences; the original image sequence can also be analyzed to directly form the optical flow image sequence.
- the original image sequence is first converted into an RGB image sequence, and then the optical flow field image data is obtained by calculating the optical flow field of adjacent RGB images; when the original image is known, the original image phase is obtained.
- Corresponding RGB images and optical flow images are both in the prior art, therefore, they are not described in detail in this patent.
- the 3D convolutional neural network model includes: RGB branch and optical flow branch;
- T1 and T2 respectively represent the recognition accuracy of the verification set in the RGB branch and the optical flow branch in the process of constructing the 3D convolutional neural network model.
- the recognition accuracy is the probability of successful recognition.
- the displayed recognition result is the probability that the current image sequence contains lesions, such as bleeding, ulcers, polyps, erosions, etc.
- the RGB branch models the local spatiotemporal information, which can well describe the outline of the shooting content
- the optical flow branch models the changes of adjacent frame images, which can well capture the movement of the capsule endoscope.
- the resulting dynamic change process of the shooting content is conducive to recovering the global spatial information. Therefore, the same image sequence is transformed to form two kinds of data, which are respectively recognized and output through the two branches constructed, and the results of the two branches are further fused, which can improve the recognition effect.
- the construction methods of the RGB branch and the optical flow branch are the same.
- a 3D convolutional neural network model is used to summarize the two branches.
- the 3D convolutional neural network model can encode spatial and temporal information at the same time by extending the convolution kernel from two-dimensional to three-dimensional; in order to identify lesions in multiple frames of images, the shooting information from different angles obtained by continuously shooting adjacent images is comprehensively used. , compared with the 2D convolutional neural network model for single-frame image recognition, more information can be used, thereby improving the recognition accuracy.
- the training methods of the 3D convolutional neural network model include:
- the 3*3 convolution kernel of the 2D recognition model is copied 3 times to expand the dimension; further, the data of each dimension is divided by 3 to form a 3*3*3 3D volume
- the initialization parameters of the product kernel is copied 3 times to expand the dimension; further, the data of each dimension is divided by 3 to form a 3*3*3 3D volume
- the training method of the 3D convolutional neural network model also includes: M4, using the stochastic gradient descent method to train the 3D convolutional neural network model after parameter initialization, iteratively update the parameters of the model until the iterative stop condition is met, and form a model for output.
- M4 using the stochastic gradient descent method to train the 3D convolutional neural network model after parameter initialization, iteratively update the parameters of the model until the iterative stop condition is met, and form a model for output.
- the sequence of the self-processing flow is arranged, and the 3D convolutional neural network model includes: 7*7*7 3D convolutional layers, 3*3*3 3D pooling layers , at least 1 collaborative spatiotemporal feature structure, 3D pooling layer, and fully connected layer.
- the processing flow from input to output is arranged in order, and the collaborative spatiotemporal feature structure includes: a first collaborative spatiotemporal convolution layer, a first normalization layer, and an activation layer; and A fast connection is performed in parallel with the first collaborative spatiotemporal convolutional layer, the first normalization layer, and the activation layer, and from the input to the output of the collaborative spatiotemporal feature structure.
- the collaborative spatiotemporal feature structure further includes: a second collaborative spatiotemporal convolution layer and a second normalization layer after the activation layer.
- the processing flow of the first collaborative spatio-temporal convolution layer and the second collaborative spatio-temporal convolution layer are the same, and here, they are both expressed as collaborative spatio-temporal convolution layers;
- the process by which the layer processes data includes:
- x is the input data of (t ⁇ h ⁇ w) ⁇ c 1
- t ⁇ h ⁇ w is the size of the input feature map
- c 1 is the number of channels of the input feature map
- w represents the convolution filter kernel
- [a hw , a tw , a th ] is the coefficient of size c 2 ⁇ 3
- [a hw , a tw , a th ] is normalized by softmax
- c 2 is the number of output channels
- the number 3 represents three views.
- the collaborative spatiotemporal convolution layer convolves three orthogonal views of the input data, learns spatial appearance and temporal motion information respectively, and collaboratively learns spatial and temporal features by sharing convolution kernels of different views.
- an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor implements the above when executing the program Steps in a deep learning-based image recognition method for capsule endoscopy.
- an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the above-mentioned deep learning-based image recognition method for capsule endoscope. step.
- the deep learning-based capsule endoscope image recognition method, device and medium of the present invention form an image sequence of a specific format from multiple frames of images continuously shot, and then use a 3D convolutional neural network model to analyze the multiple frames of images. Perform multi-channel recognition, and then combine the recognition probability of each channel to output the recognition result to improve the image recognition accuracy.
- modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this implementation manner. Those of ordinary skill in the art can understand and implement it without creative effort.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Image Analysis (AREA)
- Endoscopes (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (10)
- 一种基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述方法包括:通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;将光流图像序列输入光流支路进行计算以输出第二分类概率p2;对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;p=w 1*p1+w 2*p2;w 1=T1/(T1+T2),w 2=T2/(T1+T2);其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
- 根据权利要求2所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述预设窗口大小K的取值范围为2≤K≤1000,所述预设滑动步长S的取值范围为1≤S<K。
- 根据权利要求1所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,3D卷积神经网络模型的训练方式包括:将预训练的2D识别模型中尺寸为N*N的2D卷积核参数复制N遍;所述2D识别模型通过有病灶标签的图像训练获得,其输入为单帧图像,且只能对单帧图像进行识别;将复制后的各核参数分别除以N,使得每一位置的核参数为原来的1/3;将新的核参数重新组合形成尺寸为N*N*N的卷积核参数,以构成3D卷积神经网络模型中3D卷积核的初始化参数;利用随机梯度下降法训练参数初始化后的3D卷积神经网络模型, 迭代更新模型的参数,直到满足迭代停止条件,形成用于输出识别结果的所述3D卷积神经网络模型。
- 根据权利要求1所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,自处理流程的先后顺序排布,所述3D卷积神经网络模型包括:7*7*7的3D卷积层,3*3*3的3D池化层,至少1个协同时空特征结构,3D池化层,全连接层。
- 根据权利要求5所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述协同时空特征结构的数量为P个,P∈(4,16);自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构包括:第一协同时空卷积层,第一归一化层,激活层;以及与第一协同时空卷积层,第一归一化层,激活层并行执行、且从所述协同时空特征结构输入到输出的快连接。
- 根据权利要求6所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构还包括:处于激活层之后的第二协同时空卷积层,第二归一化层。
- 根据权利要求6所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述第一协同时空卷积层处理数据的流程包括:将其入口输入特征图分解为三个视图,分别以H-W、T-H和T-W表示,配置三个视图的输出特征分别以x hw、x tw和x th表示,则:对三组输入数据进行加权求和得到第一协同时空卷积层的输出y:其中,[a hw,a tw,a th]为尺寸c 2×3的系数,且[a hw,a tw,a th]使用softmax进行归一化,c 2为输出的通道数,数字3表示三个视图。
- 一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现一种基于深度学习的胶囊内窥镜影像识别方法中的步骤,所述方法包括:通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构 成;将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;将光流图像序列输入光流支路进行计算以输出第二分类概率p2;对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;p=w 1*p1+w 2*p2;w 1=T1/(T1+T2),w 2=T2/(T1+T2);其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现一种基于深度学习的胶囊内窥镜影像识别方法中的步骤,所述方法包括:通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;解析N幅原始图像或解析M组RGB图像序列形成M组光流图像 序列;每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;将光流图像序列输入光流支路进行计算以输出第二分类概率p2;对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;p=w 1*p1+w 2*p2;w 1=T1/(T1+T2),w 2=T2/(T1+T2);其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21917257.4A EP4276684A4 (en) | 2021-01-06 | 2021-12-14 | CAPSULE ENDOSCOPE IMAGE RECOGNITION METHOD BASED ON DEEP LEARNING, DEVICE AND MEDIUM |
KR1020237022485A KR20230113386A (ko) | 2021-01-06 | 2021-12-14 | 딥러닝 기반의 캡슐 내시경 영상 식별 방법, 기기 및매체 |
US18/260,528 US20240070858A1 (en) | 2021-01-06 | 2021-12-14 | Capsule endoscope image recognition method based on deep learning, and device and medium |
JP2023540947A JP7507318B2 (ja) | 2021-01-06 | 2021-12-14 | 深層学習に基づくカプセル内視鏡画像認識方法、機器及び媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110010379.4 | 2021-01-06 | ||
CN202110010379.4A CN112348125B (zh) | 2021-01-06 | 2021-01-06 | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022148216A1 true WO2022148216A1 (zh) | 2022-07-14 |
Family
ID=74427399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/137938 WO2022148216A1 (zh) | 2021-01-06 | 2021-12-14 | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20240070858A1 (zh) |
EP (1) | EP4276684A4 (zh) |
JP (1) | JP7507318B2 (zh) |
KR (1) | KR20230113386A (zh) |
CN (1) | CN112348125B (zh) |
WO (1) | WO2022148216A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309604A (zh) * | 2023-05-24 | 2023-06-23 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | 动态分析时序mr图像的方法、系统、设备和存储介质 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112348125B (zh) * | 2021-01-06 | 2021-04-02 | 安翰科技(武汉)股份有限公司 | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 |
CN113159238B (zh) * | 2021-06-23 | 2021-10-26 | 安翰科技(武汉)股份有限公司 | 内窥镜影像识别方法、电子设备及存储介质 |
CN113591961A (zh) * | 2021-07-22 | 2021-11-02 | 深圳市永吉星光电有限公司 | 一种基于神经网络的微创医用摄像头图像识别方法 |
CN113591761B (zh) * | 2021-08-09 | 2023-06-06 | 成都华栖云科技有限公司 | 一种视频镜头语言识别方法 |
CN113487605B (zh) * | 2021-09-03 | 2021-11-19 | 北京字节跳动网络技术有限公司 | 用于内窥镜的组织腔体定位方法、装置、介质及设备 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103984957A (zh) | 2014-05-04 | 2014-08-13 | 中国科学院深圳先进技术研究院 | 胶囊内窥镜图像可疑病变区域自动预警系统 |
CN109886358A (zh) * | 2019-03-21 | 2019-06-14 | 上海理工大学 | 基于多时空信息融合卷积神经网络的人体行为识别方法 |
CN110222574A (zh) * | 2019-05-07 | 2019-09-10 | 杭州智尚云科信息技术有限公司 | 基于结构化双流卷积神经网络的生产操作行为识别方法、装置、设备、系统及存储介质 |
CN110705463A (zh) * | 2019-09-29 | 2020-01-17 | 山东大学 | 基于多模态双流3d网络的视频人体行为识别方法及系统 |
US20200210708A1 (en) * | 2019-01-02 | 2020-07-02 | Boe Technology Group Co., Ltd. | Method and device for video classification |
CN111383214A (zh) * | 2020-03-10 | 2020-07-07 | 苏州慧维智能医疗科技有限公司 | 实时内窥镜肠镜息肉检测系统 |
CN111462082A (zh) | 2020-03-31 | 2020-07-28 | 重庆金山医疗技术研究院有限公司 | 一种病灶图片识别装置、方法、设备及可读存储介质 |
CN112348125A (zh) * | 2021-01-06 | 2021-02-09 | 安翰科技(武汉)股份有限公司 | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5191240B2 (ja) * | 2008-01-09 | 2013-05-08 | オリンパス株式会社 | シーン変化検出装置およびシーン変化検出プログラム |
JP5281826B2 (ja) * | 2008-06-05 | 2013-09-04 | オリンパス株式会社 | 画像処理装置、画像処理プログラムおよび画像処理方法 |
CN108292366B (zh) * | 2015-09-10 | 2022-03-18 | 美基蒂克艾尔有限公司 | 在内窥镜手术中检测可疑组织区域的系统和方法 |
US10572996B2 (en) * | 2016-06-28 | 2020-02-25 | Contextvision Ab | Method and system for detecting pathological anomalies in a digital pathology image and method for annotating a tissue slide |
CN109934276B (zh) * | 2019-03-05 | 2020-11-17 | 安翰科技(武汉)股份有限公司 | 基于迁移学习的胶囊内窥镜图像分类系统及方法 |
CN111950444A (zh) * | 2020-08-10 | 2020-11-17 | 北京师范大学珠海分校 | 一种基于时空特征融合深度学习网络的视频行为识别方法 |
-
2021
- 2021-01-06 CN CN202110010379.4A patent/CN112348125B/zh active Active
- 2021-12-14 WO PCT/CN2021/137938 patent/WO2022148216A1/zh active Application Filing
- 2021-12-14 US US18/260,528 patent/US20240070858A1/en active Pending
- 2021-12-14 EP EP21917257.4A patent/EP4276684A4/en active Pending
- 2021-12-14 KR KR1020237022485A patent/KR20230113386A/ko active Search and Examination
- 2021-12-14 JP JP2023540947A patent/JP7507318B2/ja active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103984957A (zh) | 2014-05-04 | 2014-08-13 | 中国科学院深圳先进技术研究院 | 胶囊内窥镜图像可疑病变区域自动预警系统 |
US20200210708A1 (en) * | 2019-01-02 | 2020-07-02 | Boe Technology Group Co., Ltd. | Method and device for video classification |
CN109886358A (zh) * | 2019-03-21 | 2019-06-14 | 上海理工大学 | 基于多时空信息融合卷积神经网络的人体行为识别方法 |
CN110222574A (zh) * | 2019-05-07 | 2019-09-10 | 杭州智尚云科信息技术有限公司 | 基于结构化双流卷积神经网络的生产操作行为识别方法、装置、设备、系统及存储介质 |
CN110705463A (zh) * | 2019-09-29 | 2020-01-17 | 山东大学 | 基于多模态双流3d网络的视频人体行为识别方法及系统 |
CN111383214A (zh) * | 2020-03-10 | 2020-07-07 | 苏州慧维智能医疗科技有限公司 | 实时内窥镜肠镜息肉检测系统 |
CN111462082A (zh) | 2020-03-31 | 2020-07-28 | 重庆金山医疗技术研究院有限公司 | 一种病灶图片识别装置、方法、设备及可读存储介质 |
CN112348125A (zh) * | 2021-01-06 | 2021-02-09 | 安翰科技(武汉)股份有限公司 | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4276684A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309604A (zh) * | 2023-05-24 | 2023-06-23 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | 动态分析时序mr图像的方法、系统、设备和存储介质 |
CN116309604B (zh) * | 2023-05-24 | 2023-08-22 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | 动态分析时序mr图像的方法、系统、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP4276684A1 (en) | 2023-11-15 |
JP7507318B2 (ja) | 2024-06-27 |
EP4276684A4 (en) | 2024-05-29 |
US20240070858A1 (en) | 2024-02-29 |
KR20230113386A (ko) | 2023-07-28 |
CN112348125B (zh) | 2021-04-02 |
JP2024502105A (ja) | 2024-01-17 |
CN112348125A (zh) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022148216A1 (zh) | 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 | |
WO2020238734A1 (zh) | 图像分割模型的训练方法、装置、计算机设备和存储介质 | |
JP5417494B2 (ja) | 画像処理方法およびシステム | |
CN112041912A (zh) | 用于诊断肠胃肿瘤的系统和方法 | |
CN111276240B (zh) | 一种基于图卷积网络的多标签多模态全息脉象识别方法 | |
Zhang et al. | Dual encoder fusion u-net (defu-net) for cross-manufacturer chest x-ray segmentation | |
WO2023044605A1 (zh) | 极端环境下脑结构的三维重建方法、装置及可读存储介质 | |
CN108090954A (zh) | 基于图像特征的腹腔环境地图重建与腹腔镜定位的方法 | |
WO2022052782A1 (zh) | 图像的处理方法及相关设备 | |
WO2024193622A1 (zh) | 一种三维构建网络训练方法、三维模型生成方法以及装置 | |
Zhang et al. | An efficient spatial-temporal polyp detection framework for colonoscopy video | |
Zhang et al. | Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention | |
CN115330876A (zh) | 基于孪生网络和中心位置估计的目标模板图匹配定位方法 | |
Wang et al. | Paul: Procrustean autoencoder for unsupervised lifting | |
CN117934308A (zh) | 一种基于图卷积网络的轻量化自监督单目深度估计方法 | |
Yang et al. | 3D reconstruction from endoscopy images: A survey | |
Amirthalingam et al. | Improved Water Strider Optimization with Deep Learning based Image Classification for Wireless Capsule Endoscopy | |
Dabhi et al. | High fidelity 3d reconstructions with limited physical views | |
US20240087115A1 (en) | Machine learning enabled system for skin abnormality interventions | |
Vishwakarma et al. | Learning salient features in radar micro-Doppler signatures using Attention Enhanced Alexnet | |
CN115937963A (zh) | 一种基于步态识别的行人识别方法 | |
Chen et al. | Relation-balanced graph convolutional network for 3D human pose estimation | |
CN118037963B (zh) | 消化腔内壁三维模型的重建方法、装置、设备和介质 | |
Alkhalisy et al. | Students Behavior Detection Based on Improved YOLOv5 Algorithm Combining with CBAM Attention Mechanism. | |
US20240127468A1 (en) | Anatomical structure complexity determination and representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21917257 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20237022485 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023540947 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18260528 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021917257 Country of ref document: EP Effective date: 20230807 |