WO2022148216A1 - 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 - Google Patents

基于深度学习的胶囊内窥镜影像识别方法、设备及介质 Download PDF

Info

Publication number
WO2022148216A1
WO2022148216A1 PCT/CN2021/137938 CN2021137938W WO2022148216A1 WO 2022148216 A1 WO2022148216 A1 WO 2022148216A1 CN 2021137938 W CN2021137938 W CN 2021137938W WO 2022148216 A1 WO2022148216 A1 WO 2022148216A1
Authority
WO
WIPO (PCT)
Prior art keywords
rgb
optical flow
image
images
neural network
Prior art date
Application number
PCT/CN2021/137938
Other languages
English (en)
French (fr)
Inventor
张行
张皓
袁文金
张楚康
刘慧�
黄志威
Original Assignee
安翰科技(武汉)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安翰科技(武汉)股份有限公司 filed Critical 安翰科技(武汉)股份有限公司
Priority to EP21917257.4A priority Critical patent/EP4276684A4/en
Priority to JP2023540947A priority patent/JP2024502105A/ja
Priority to KR1020237022485A priority patent/KR20230113386A/ko
Priority to US18/260,528 priority patent/US20240070858A1/en
Publication of WO2022148216A1 publication Critical patent/WO2022148216A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/032Recognition of patterns in medical or anatomical images of protuberances, polyps nodules, etc.

Definitions

  • the present invention relates to the field of medical equipment imaging, in particular to a deep learning-based capsule endoscope image recognition method, an electronic device and a readable storage medium.
  • Capsule endoscopy is a medical device that integrates core devices such as cameras and wireless transmission antennas; it collects images in the digestive tract in the body and transmits them to the outside of the body synchronously to perform medical examinations based on the obtained image data. Capsule endoscopy will collect tens of thousands of images during the inspection process, and the large amount of image data makes the reading work difficult and time-consuming. With the development of technology, the use of image processing and computer vision techniques for lesion identification has gained extensive attention.
  • the Chinese patent application with publication number CN103984957A discloses an automatic early warning system for suspicious lesions in capsule endoscope images.
  • the texture features of the flat lesions are detected, and finally the classification and early warning module is used for classification, which realizes the detection and early warning function of the flat lesions of the small intestine.
  • the Chinese patent application with publication number CN111462082A discloses a lesion image recognition device, method, equipment and readable storage medium, which utilizes a trained 2D target deep learning model to perform lesion recognition on a single image.
  • the purpose of the present invention is to provide a capsule endoscope image recognition method, device and medium based on deep learning.
  • an embodiment of the present invention provides a deep learning-based image recognition method for a capsule endoscope, the method comprising: collecting N original images in a time-generated sequence through the capsule endoscope;
  • the N original images are divided into M groups of original image sequences of the same size
  • Each of the RGB image sequences is composed of image data in RGB format, and each of the optical flow image sequences is composed of image data formed by calculating optical flow fields of adjacent RGB images;
  • the RGB image sequence and the optical flow image sequence are respectively input into a 3D convolutional neural network model to output a recognition result;
  • the recognition result is a probability value of the occurrence of preset parameters;
  • the 3D convolutional neural network model includes: RGB branch and optical flow branch;
  • the RGB image sequence and the optical flow image sequence are respectively input into the 3D convolutional neural network model to output the recognition result, including:
  • T1 and T2 respectively represent the recognition accuracy of the verification set in the RGB branch and the optical flow branch in the process of constructing the 3D convolutional neural network model.
  • a sliding window segmentation method is used to segment N original images into M groups of original image sequences of the same size, including:
  • the N images are sequentially divided into M groups of original image sequences, where,
  • the value range of the preset window size K is 2 ⁇ K ⁇ 1000, and the value range of the preset sliding step S is 1 ⁇ S ⁇ K.
  • the training method of the 3D convolutional neural network model includes:
  • the 2D convolution kernel parameters of size N*N in the pre-trained 2D recognition model are copied N times; the 2D recognition model is obtained by training images with lesion labels, and its input is a single-frame image, and can only be used for a single frame. image recognition;
  • the 3D convolutional neural network model after parameter initialization is trained by using the stochastic gradient descent method, the parameters of the model are iteratively updated until the iterative stop condition is satisfied, and the 3D convolutional neural network model for outputting the recognition result is formed.
  • the sequence of the self-processing flow is arranged, and the 3D convolutional neural network model includes:
  • the number of the cooperative spatiotemporal feature structures is P, P ⁇ (4, 16);
  • the processing flow from input to output is arranged in sequence, and the collaborative spatiotemporal feature structure includes: a first collaborative spatiotemporal convolution layer, a first normalization layer, an activation layer; and the first collaborative spatiotemporal convolution layer, the first A normalization layer, the activation layer executes in parallel, and the fast connections from the input to the output of the cooperative spatiotemporal feature structure.
  • the processing flow from input to output is arranged in sequence, and the collaborative spatiotemporal feature structure further includes: a second collaborative spatiotemporal convolution layer after the activation layer, a second normalization layer Floor.
  • the process of processing data by the first cooperative spatiotemporal convolutional layer includes:
  • x is the input data of (t ⁇ h ⁇ w) ⁇ c 1
  • t ⁇ h ⁇ w is the size of the input feature map
  • c 1 is the number of channels of the input feature map
  • w represents the convolution filter kernel
  • the weighted summation of the three sets of input data obtains the output y of the first collaborative spatiotemporal convolutional layer:
  • [a hw , a tw , a th ] is the coefficient of size c 2 ⁇ 3
  • [a hw , a tw , a th ] is normalized by softmax
  • c 2 is the number of output channels
  • the number 3 represents three views.
  • an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor executes the program When implementing the steps in the above-mentioned deep learning-based image recognition method for capsule endoscopy.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the above-mentioned deep learning-based capsule endoscope The steps in the image recognition method.
  • the beneficial effects of the present invention are: the deep learning-based capsule endoscope image recognition method, device and medium of the present invention form a sequence of images of a specific format from multiple frames of images shot continuously, and then use 3D images to form an image sequence of a specific format.
  • the convolutional neural network model performs multi-channel recognition on multiple frames of images, and then combines the recognition probabilities of each channel to output the recognition results to improve the image recognition accuracy.
  • FIG. 1 is a schematic flowchart of a method for recognizing images of capsule endoscopes based on deep learning according to the first embodiment of the present invention
  • FIG. 2 is a schematic diagram of sliding window segmentation provided by a specific example of the present invention.
  • Fig. 3 is the schematic diagram of utilizing the trained 2D recognition model convolution kernel parameters to generate the 3D convolutional neural network model convolution kernel initialization parameters provided by a specific example of the present invention
  • FIG. 4 is a schematic structural diagram of a 3D convolutional neural network model provided by the present invention.
  • FIG. 5 is a schematic structural diagram of a collaborative spatiotemporal feature structure provided by the present invention.
  • FIG. 6 is a schematic flow chart of data processing by coordinating spatiotemporal convolutional layers in a specific example of the present invention.
  • a first embodiment of the present invention provides a deep learning-based capsule endoscope image recognition method, the method comprising:
  • Each of the RGB image sequences is composed of image data in RGB format, and each of the optical flow image sequences is composed of image data formed by calculating optical flow fields of adjacent RGB images;
  • step S1 during the operation of the capsule endoscope, images are continuously captured by the camera set on the endoscope, and are collected and stored synchronously or asynchronously to form the original image;
  • the sliding window segmentation method is used to divide the N original images into M groups of original image sequences of the same size, including: numbering the N original images according to the time generation sequence, which are 1, 2, ... N in sequence;
  • the preset window size K and the preset sliding step S the N images are sequentially divided into M groups of original image sequences, where,
  • the first group of original image sequences after segmentation consists of original images numbered 1, 2,...,K
  • the second group of original image sequences consists of original images numbered S+1, S+2,... , S+K original images
  • the last group of original image sequences consists of original images numbered NK, N-K+1, ..., N, which are divided into Group original image sequence, notation in formula Indicates rounded up.
  • the value range of K is 2 ⁇ K ⁇ 1000
  • the value range of S is 1 ⁇ S ⁇ K.
  • N is not divisible by K
  • the number of original image sequences that are not K is set as the first group or the last group.
  • the number N of the original images selected for calculation can be divisible by K, which will not be further described here.
  • the original image sequence consists of original images 1, 2, ..., 10,
  • the second set of original image sequences consists of original images 6, 7, ..., 15, until the last set of original image sequences consists of original images 9991, 9992, ..., 10000, which are divided into 1999 original image sequences.
  • analyzing N original images or analyzing M groups of original image sequences forms M groups of RGB image sequences, and each of the RGB image sequences is composed of image data in RGB format. Specifically, each original image in the original image sequence is converted into an image in RGB format, so that each original image sequence forms a corresponding RGB image sequence. It should be noted here that N original images can also be converted to RGB format first, and then M groups of RGB image sequences can be formed by the same sliding window segmentation method as the original image sequence. The RGB image sequences formed by the above two methods are the same.
  • the original image is an image in RGB format, it does not need to be transformed again, and the original image sequence is an RGB image sequence, which will not be described further here.
  • N original images or M groups of RGB image sequences are analyzed to form M groups of optical flow images.
  • the original images can be directly analyzed to obtain optical flow images, and then the optical flow images are formed according to the original image.
  • the same sliding window segmentation method of the image sequence forms M groups of optical flow image sequences; the original image sequence can also be analyzed to directly form the optical flow image sequence.
  • the original image sequence is first converted into an RGB image sequence, and then the optical flow field image data is obtained by calculating the optical flow field of adjacent RGB images; when the original image is known, the original image phase is obtained.
  • Corresponding RGB images and optical flow images are both in the prior art, therefore, they are not described in detail in this patent.
  • the 3D convolutional neural network model includes: RGB branch and optical flow branch;
  • T1 and T2 respectively represent the recognition accuracy of the verification set in the RGB branch and the optical flow branch in the process of constructing the 3D convolutional neural network model.
  • the recognition accuracy is the probability of successful recognition.
  • the displayed recognition result is the probability that the current image sequence contains lesions, such as bleeding, ulcers, polyps, erosions, etc.
  • the RGB branch models the local spatiotemporal information, which can well describe the outline of the shooting content
  • the optical flow branch models the changes of adjacent frame images, which can well capture the movement of the capsule endoscope.
  • the resulting dynamic change process of the shooting content is conducive to recovering the global spatial information. Therefore, the same image sequence is transformed to form two kinds of data, which are respectively recognized and output through the two branches constructed, and the results of the two branches are further fused, which can improve the recognition effect.
  • the construction methods of the RGB branch and the optical flow branch are the same.
  • a 3D convolutional neural network model is used to summarize the two branches.
  • the 3D convolutional neural network model can encode spatial and temporal information at the same time by extending the convolution kernel from two-dimensional to three-dimensional; in order to identify lesions in multiple frames of images, the shooting information from different angles obtained by continuously shooting adjacent images is comprehensively used. , compared with the 2D convolutional neural network model for single-frame image recognition, more information can be used, thereby improving the recognition accuracy.
  • the training methods of the 3D convolutional neural network model include:
  • the 3*3 convolution kernel of the 2D recognition model is copied 3 times to expand the dimension; further, the data of each dimension is divided by 3 to form a 3*3*3 3D volume
  • the initialization parameters of the product kernel is copied 3 times to expand the dimension; further, the data of each dimension is divided by 3 to form a 3*3*3 3D volume
  • the training method of the 3D convolutional neural network model also includes: M4, using the stochastic gradient descent method to train the 3D convolutional neural network model after parameter initialization, iteratively update the parameters of the model until the iterative stop condition is met, and form a model for output.
  • M4 using the stochastic gradient descent method to train the 3D convolutional neural network model after parameter initialization, iteratively update the parameters of the model until the iterative stop condition is met, and form a model for output.
  • the sequence of the self-processing flow is arranged, and the 3D convolutional neural network model includes: 7*7*7 3D convolutional layers, 3*3*3 3D pooling layers , at least 1 collaborative spatiotemporal feature structure, 3D pooling layer, and fully connected layer.
  • the processing flow from input to output is arranged in order, and the collaborative spatiotemporal feature structure includes: a first collaborative spatiotemporal convolution layer, a first normalization layer, and an activation layer; and A fast connection is performed in parallel with the first collaborative spatiotemporal convolutional layer, the first normalization layer, and the activation layer, and from the input to the output of the collaborative spatiotemporal feature structure.
  • the collaborative spatiotemporal feature structure further includes: a second collaborative spatiotemporal convolution layer and a second normalization layer after the activation layer.
  • the processing flow of the first collaborative spatio-temporal convolution layer and the second collaborative spatio-temporal convolution layer are the same, and here, they are both expressed as collaborative spatio-temporal convolution layers;
  • the process by which the layer processes data includes:
  • x is the input data of (t ⁇ h ⁇ w) ⁇ c 1
  • t ⁇ h ⁇ w is the size of the input feature map
  • c 1 is the number of channels of the input feature map
  • w represents the convolution filter kernel
  • [a hw , a tw , a th ] is the coefficient of size c 2 ⁇ 3
  • [a hw , a tw , a th ] is normalized by softmax
  • c 2 is the number of output channels
  • the number 3 represents three views.
  • the collaborative spatiotemporal convolution layer convolves three orthogonal views of the input data, learns spatial appearance and temporal motion information respectively, and collaboratively learns spatial and temporal features by sharing convolution kernels of different views.
  • an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor implements the above when executing the program Steps in a deep learning-based image recognition method for capsule endoscopy.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the above-mentioned deep learning-based image recognition method for capsule endoscope. step.
  • the deep learning-based capsule endoscope image recognition method, device and medium of the present invention form an image sequence of a specific format from multiple frames of images continuously shot, and then use a 3D convolutional neural network model to analyze the multiple frames of images. Perform multi-channel recognition, and then combine the recognition probability of each channel to output the recognition result to improve the image recognition accuracy.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this implementation manner. Those of ordinary skill in the art can understand and implement it without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Endoscopes (AREA)
  • Image Processing (AREA)

Abstract

一种基于深度学习的胶囊内窥镜影像识别方法、设备及介质,将连续拍摄的多帧图像形成特定格式的图像序列后,通过3D卷积神经网络模型对多帧图像进行多通道识别,进而联合各通道的识别概率输出识别结果,提高图像识别精度。

Description

基于深度学习的胶囊内窥镜影像识别方法、设备及介质
本申请要求了申请日为2021年01月06日,申请号为202110010379.4,发明名称为“基于深度学习的胶囊内窥镜影像识别方法、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及医疗设备成像领域,尤其涉及一种基于深度学习的胶囊内窥镜影像识别方法、电子设备及可读存储介质。
背景技术
胶囊内窥镜是一种医疗设备,其将摄像头、无线传输天线等核心器件集成;并在体内的消化道内采集图像并同步传送到体外,以根据获得的图像数据进行医疗检查。胶囊内窥镜在检测过程中会采集几万张图像,大量的图像数据使得阅片工作变得艰巨且耗时。随着技术的发展,利用图像处理和计算机视觉技术进行病灶识别获得了广泛的关注。
现有技术中,公开号为CN103984957A的中国专利申请,公开了一种胶囊内窥镜图像可疑病变区域自动预警系统,该系统采用图像增强模块对图像进行自适应增强,再通过纹理特征提取模块对平坦性病变的纹理特征进行检测,最后用分类预警模块进行分类,实现了对小肠平坦性病变的检测和预警功能。
公开号为CN111462082A的中国专利申请,公开了一种病灶图片识别装置、方法、设备及可读存储介质,其利用训练好的2D目标深 度学习模型对单张图像进行病灶识别。
现有技术所提及的方案都是对单张图像进行识别,识别过程中只能利用单张图像拍摄的信息,不能综合利用前后拍摄的图像信息。如此,单一角度拍摄的图像并不能直观的反映出病灶的整体情况,尤其是在某些特定角度下拍摄的消化道褶皱、胃壁等图像容易和息肉、隆起等病变相混淆;另外,现有技术不能同时获得拍摄内容的空间和时间信息,病灶识别的准确率较低。
发明内容
为解决上述技术问题,本发明的目的在于提供一种基于深度学习的胶囊内窥镜影像识别方法、设备及介质。
为了实现上述发明目的之一,本发明一实施方式提供一种基于深度学习的胶囊内窥镜影像识别方法,所述方法包括:通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;
采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;
解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;
每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;
将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值; 所述3D卷积神经网络模型包括:RGB支路和光流支路;
其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:
将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;
将光流图像序列输入光流支路进行计算以输出第二分类概率p2;
对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;
p=w 1*p1+w 2*p2;
w 1=T1/(T1+T2),
w 2=T2/(T1+T2);
其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
作为本发明一实施方式的进一步改进,采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列,包括:
依据时间生成顺序为N幅原始图像进行编号,其依次为1,2,……N;
以预设窗口大小K,预设滑动步长S依次分割N幅图像,将其划分为M组原始图像序列,其中,
Figure PCTCN2021137938-appb-000001
作为本发明一实施方式的进一步改进,所述预设窗口大小K的取值范围为2≤K≤1000,所述预设滑动步长S的取值范围为1≤S<K。
作为本发明一实施方式的进一步改进,3D卷积神经网络模型的训练方式包括:
将预训练的2D识别模型中尺寸为N*N的2D卷积核参数复制N遍;所述2D识别模型通过有病灶标签的图像训练获得,其输入为单帧图像,且只能对单帧图像进行识别;
将复制后的各核参数分别除以N,使得每一位置的核参数为原来的1/3;
将新的核参数重新组合形成尺寸为N*N*N的卷积核参数,以构成3D卷积神经网络模型中3D卷积核的初始化参数;
利用随机梯度下降法训练参数初始化后的3D卷积神经网络模型,迭代更新模型的参数,直到满足迭代停止条件,形成用于输出识别结果的所述3D卷积神经网络模型。
作为本发明一实施方式的进一步改进,自处理流程的先后顺序排布,所述3D卷积神经网络模型包括:
7*7*7的3D卷积层,3*3*3的3D池化层,至少1个协同时空特征结构,3D池化层,全连接层。
作为本发明一实施方式的进一步改进,所述协同时空特征结构的数量为P个,P∈(4,16);
自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构包括:第一协同时空卷积层,第一归一化层,激活层;以及与第一协同时空卷积层,第一归一化层,激活层并行执行、且从所述协同时空特征结构输入到输出的快连接。
作为本发明一实施方式的进一步改进,自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构还包括:处于激活层之后的 第二协同时空卷积层,第二归一化层。
作为本发明一实施方式的进一步改进,所述第一协同时空卷积层处理数据的流程包括:
将其入口输入特征图分解为三个视图,分别以H-W、T-H和T-W表示,
配置三个视图的输出特征分别以x hw、x tw和x th表示,则:
Figure PCTCN2021137938-appb-000002
Figure PCTCN2021137938-appb-000003
Figure PCTCN2021137938-appb-000004
其中,x为(t×h×w)×c 1的输入数据,t×h×w为输入特征图的尺寸,c 1为输入特征图的通道数,
Figure PCTCN2021137938-appb-000005
表示三维卷积,w表示卷积滤波核;
对三组输入数据进行加权求和得到第一协同时空卷积层的输出y:
Figure PCTCN2021137938-appb-000006
其中,[a hw,a tw,a th]为尺寸c 2×3的系数,且[a hw,a tw,a th]使用softmax进行归一化,c 2为输出的通道数,数字3表示三个视图。
为了解决上述发明目的之一,本发明一实施方式提供一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述基于深度学习的胶囊内窥镜影像识别方法中的步骤。
为了解决上述发明目的之一,本发明一实施方式提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执 行时实现如上所述基于深度学习的胶囊内窥镜影像识别方法中的步骤。
与现有技术相比,本发明的有益效果是:本发明的基于深度学习的胶囊内窥镜影像识别方法、设备及介质,将连续拍摄的多帧图像形成特定格式的图像序列后,通过3D卷积神经网络模型对多帧图像进行多通道识别,进而联合各通道的识别概率输出识别结果,提高图像识别精度。
附图说明
图1是本发明第一实施方式基于深度学习的胶囊内窥镜影像识别方法的流程示意图;
图2是本发明一具体示例提供的滑动窗口分割示意图;
图3是本发明一具体示例提供的利用已训练好的2D识别模型卷积核参数生成3D卷积神经网络模型卷积核初始化参数的示意图;
图4是本发明提供的3D卷积神经网络模型的结构示意图;
图5是本发明提供的协同时空特征结构的结构示意图;
图6是本发明具体示例中协同时空卷积层处理数据的流程示意图。
具体实施方式
以下将结合附图所示的具体实施方式对本发明进行详细描述。但这些实施方式并不限制本发明,本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。
如图1所示,本发明第一实施方式中提供一种基于深度学习的胶 囊内窥镜影像识别方法,所述方法包括:
S1、通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;
S2、采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;
解析N幅原始图像或解析M组原始图像序列形成M组RGB图像序列,以及解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;
每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;
S3、将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值。
对于步骤S1,胶囊内窥镜运行过程中,通过其上设置的摄像头连续拍摄图像,并同步或异步地进行收集存储以形成原始图像;
对于步骤S2,采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列,包括:依据时间生成顺序为N幅原始图像进行编号,其依次为1,2,……N;以预设窗口大小K,预设滑动步长S依次分割N幅图像,将其划分为M组原始图像序列,其中,
Figure PCTCN2021137938-appb-000007
具体的,经过分割后的第一组原始图像序列由编号为1、2、...、K的原始图像组成,第二组原始图像序列由编号为S+1、S+2、...、 S+K的原始图像组成,经过依次分割后,最后一组原始图像序列由编号为N-K、N-K+1、...、N的原始图像组成,共分割成
Figure PCTCN2021137938-appb-000008
组原始图像序列,公式中符号
Figure PCTCN2021137938-appb-000009
表示向上取整。较佳的,K的取值范围为2≤K≤1000,S的取值范围为1≤S<K。
需要说明的是,若N不能被K整除,则存在一组原始图像序列的数量不为K,较佳的,将该数量不为K的原始图像序列设定为第一组或者最后一组。通常情况下,为了计算方便,选取用于计算的原始图像的数量N可以被K整除,在此不做进一步的赘述。
结合图2所示,本发明一具体示例中,原始图像总张数为N=10000张,滑动窗口的大小设置为K=10,滑动步长设置为S=5,则分割后的第一组原始图像序列由原始图像1、2、...、10组成,第二组原始图像序列由原始图像6、7、...、15组成,一直到最后一组原始图像序列由原始图像9991、9992、...、10000组成,共分割成1999个原始图像序列。
相应的,解析N幅原始图像或解析M组原始图像序列形成M组RGB图像序列,每一所述RGB图像序列由RGB格式的图像数据构成。具体的,将原始图像序列中的每一原始图像分别转换为RGB格式的图像,以将每一原始图像序列分别形成一对应的RGB图像序列。这里需要说明的是,也可以对N幅原始图像先做RGB格式转换,再采用与形成原始图像序列相同的滑动窗口分割方法形成M组RGB图像序列,上述两种方式形成的RGB图像序列相同。
另外,若原始图像为RGB格式的图像,则无需再次变换,原始图 像序列即为RGB图像序列,在此不做进一步的赘述。
相应的,解析N幅原始图像或解析M组RGB图像序列形成M组光流图像,与RGB图像序列形成过程相类似的,可直接解析原始图像获取光流图像,再将光流图像按照形成原始图像序列相同的滑动窗口分割方法形成M组光流图像序列;也可以解析原始图像序列直接形成光流图像序列。具体的,以原始图像序列为例,先将原始图像序列转换为RGB图像序列,之后,通过计算相邻RGB图像的光流场得到光流场图像数据;在原始图像已知,获得原始图像相对应的RGB图像,光流图像均为现有技术,因此,在本专利中不做过多赘述。
对于步骤S3,3D卷积神经网络模型包括:RGB支路和光流支路;
将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;
将光流图像序列输入光流支路进行计算以输出第二分类概率p2;
对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;
p=w 1*p1+w 2*p2;
w 1=T1/(T1+T2),
w 2=T2/(T1+T2);
其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
具体的,所述识别精度为成功识别的概率。
本发明一具体示例中,T1=0.9,T1=0.8,则w 1=0.9/(0.9+0.8)=0.53,则w 2=0.8/(0.9+0.8)=0.47;
在具体应用中,所示识别结果为当前图像序列中包含病灶的概率,所述病灶例如:出血,溃疡,息肉,糜烂等,所述识别结果p的值越大,表示出现病灶的概率越大。
相应的,RGB支路对局部时空信息进行建模,能够很好的描述拍摄内容的外形轮廓;光流支路对相邻帧图像的变化进行建模,能够很好的捕捉胶囊内窥镜运动造成的拍摄内容的动态变化过程,有利于恢复全局的空间信息。因此,同一图像序列经过变换形成两种数据,并分别通过构建的两个支路进行识别输出,并进一步的将两个支路的结果进行融合,能够提高识别效果。
本发明具体实施方式中,RGB支路和光流支路的构建方式相同,本发明以下描述中以3D卷积神经网络模型概括两种支路。3D卷积神经网络模型通过将卷积核从二维扩展到三维,能够同时编码空间和时间信息;以对多帧图像进行病灶识别,综合利用连续拍摄的相邻图像得到的不同角度的拍摄信息,相对于2D卷积神经网络模型对单帧图像识别,能够利用的信息更多,从而提高识别精度。
具体的,3D卷积神经网络模型的训练方式包括:
M1、将预训练的2D识别模型中尺寸为N*N的2D卷积核参数复制N遍;所述2D识别模型通过有病灶标签的图像训练获得,其输入为单帧图像,且只能对单帧图像进行识别。2D识别模型的构建及应用均为现有技术,例如:背景技术CN111462082A的中国专利申请所公开内容,在此不做赘述。
M2、将复制后的各核参数分别除以N,使得每一位置的核参数为 原来的1/3;
M3、将新的核参数重新组合形成尺寸为N*N*N的卷积核参数,以构成3D卷积神经网络模型中3D卷积核的初始化参数;
具体参考图3所示,将2D识别模型的3*3的卷积核复制3遍,进行维度扩充;进一步的,将每一维的数据单独除以3,形成3*3*3的3D卷积核的初始化参数。
进一步的,3D卷积神经网络模型的训练方式还包括:M4、利用随机梯度下降法训练参数初始化后的3D卷积神经网络模型,迭代更新模型的参数,直到满足迭代停止条件,形成用于输出识别结果的所述3D卷积神经网络模型。
较佳的,结合图4所示,自处理流程的先后顺序排布,所述3D卷积神经网络模型包括:7*7*7的3D卷积层,3*3*3的3D池化层,至少1个协同时空特征结构,3D池化层,全连接层。
所述协同时空特征结构的数量为P个,P∈(4,16),本发明具体实施方式中,配置P=8。
较佳的,结合图5所示,自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构包括:第一协同时空卷积层,第一归一化层,激活层;以及与第一协同时空卷积层,第一归一化层,激活层并行执行、且从所述协同时空特征结构输入到输出的快连接。
进一步的,自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构还包括:处于激活层之后的第二协同时空卷积层,第二归一化层。
较佳的,结合图6所示,第一协同时空卷积层和第二协同时空卷积层的处理流程相同,这里,将其均以协同时空卷积层表述;具体的,协同时空卷积层处理数据的流程包括:
将其入口输入特征图分解为三个视图,分别以H-W、T-H和T-W表示,
配置三个视图的输出特征分别以x hw、x tw和x th表示,则:
Figure PCTCN2021137938-appb-000010
Figure PCTCN2021137938-appb-000011
Figure PCTCN2021137938-appb-000012
其中,x为(t×h×w)×c 1的输入数据,t×h×w为输入特征图的尺寸,c 1为输入特征图的通道数,
Figure PCTCN2021137938-appb-000013
表示三维卷积,w表示卷积滤波核;
对三组输入数据进行加权求和得到协同时空卷积层的输出y:
Figure PCTCN2021137938-appb-000014
其中,[a hw,a tw,a th]为尺寸c 2×3的系数,且[a hw,a tw,a th]使用softmax进行归一化,c 2为输出的通道数,数字3表示三个视图。
所述协同时空卷积层对输入数据的三个正交视图进行卷积,分别学习空间外观和时间运动信息,通过共享不同视图的卷积核,协作学习空间和时间特征。
对[a hw,a tw,a th]使用softmax进行归一化,可以防止响应的数量级爆炸。
进一步的,本发明一实施方式提供一种电子设备,包括存储器和 处理器,所述存储器存储有可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述基于深度学习的胶囊内窥镜影像识别方法中的步骤。
进一步的,本发明一实施方式提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述基于深度学习的胶囊内窥镜影像识别方法中的步骤。
综上所述,本发明的基于深度学习的胶囊内窥镜影像识别方法、设备及介质,将连续拍摄的多帧图像形成特定格式的图像序列后,通过3D卷积神经网络模型对多帧图像进行多通道识别,进而联合各通道的识别概率输出识别结果,提高图像识别精度。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本发明时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
以上所描述的装置实施方式仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施方式方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施方式 中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。
上文所列出的一系列的详细说明仅仅是针对本发明的可行性实施方式的具体说明,它们并非用以限制本发明的保护范围,凡未脱离本发明技艺精神所作的等效实施方式或变更均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述方法包括:
    通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;
    采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;
    解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;
    每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;
    将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;
    其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:
    将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;
    将光流图像序列输入光流支路进行计算以输出第二分类概率p2;
    对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;
    p=w 1*p1+w 2*p2;
    w 1=T1/(T1+T2),
    w 2=T2/(T1+T2);
    其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
  2. 根据权利要求1所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列,包括:
    依据时间生成顺序为N幅原始图像进行编号,其依次为1,2,……N;
    以预设窗口大小K,预设滑动步长S依次分割N幅图像,将其划分为M组原始图像序列,其中,
    Figure PCTCN2021137938-appb-100001
  3. 根据权利要求2所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述预设窗口大小K的取值范围为2≤K≤1000,所述预设滑动步长S的取值范围为1≤S<K。
  4. 根据权利要求1所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,3D卷积神经网络模型的训练方式包括:
    将预训练的2D识别模型中尺寸为N*N的2D卷积核参数复制N遍;所述2D识别模型通过有病灶标签的图像训练获得,其输入为单帧图像,且只能对单帧图像进行识别;
    将复制后的各核参数分别除以N,使得每一位置的核参数为原来的1/3;
    将新的核参数重新组合形成尺寸为N*N*N的卷积核参数,以构成3D卷积神经网络模型中3D卷积核的初始化参数;
    利用随机梯度下降法训练参数初始化后的3D卷积神经网络模型, 迭代更新模型的参数,直到满足迭代停止条件,形成用于输出识别结果的所述3D卷积神经网络模型。
  5. 根据权利要求1所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,自处理流程的先后顺序排布,所述3D卷积神经网络模型包括:
    7*7*7的3D卷积层,3*3*3的3D池化层,至少1个协同时空特征结构,3D池化层,全连接层。
  6. 根据权利要求5所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述协同时空特征结构的数量为P个,P∈(4,16);
    自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构包括:第一协同时空卷积层,第一归一化层,激活层;以及与第一协同时空卷积层,第一归一化层,激活层并行执行、且从所述协同时空特征结构输入到输出的快连接。
  7. 根据权利要求6所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,自输入至输出的处理流程的先后顺序排布,所述协同时空特征结构还包括:处于激活层之后的第二协同时空卷积层,第二归一化层。
  8. 根据权利要求6所述的基于深度学习的胶囊内窥镜影像识别方法,其特征在于,所述第一协同时空卷积层处理数据的流程包括:
    将其入口输入特征图分解为三个视图,分别以H-W、T-H和T-W表示,
    配置三个视图的输出特征分别以x hw、x tw和x th表示,则:
    Figure PCTCN2021137938-appb-100002
    Figure PCTCN2021137938-appb-100003
    Figure PCTCN2021137938-appb-100004
    其中,x为(t×h×w)×c 1的输入数据,t×h×w为输入特征图的尺寸,c 1为输入特征图的通道数,
    Figure PCTCN2021137938-appb-100005
    表示三维卷积,w表示卷积滤波核;
    对三组输入数据进行加权求和得到第一协同时空卷积层的输出y:
    Figure PCTCN2021137938-appb-100006
    其中,[a hw,a tw,a th]为尺寸c 2×3的系数,且[a hw,a tw,a th]使用softmax进行归一化,c 2为输出的通道数,数字3表示三个视图。
  9. 一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现一种基于深度学习的胶囊内窥镜影像识别方法中的步骤,所述方法包括:
    通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;
    采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;
    解析N幅原始图像或解析M组RGB图像序列形成M组光流图像序列;
    每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构 成;
    将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;
    其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:
    将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;
    将光流图像序列输入光流支路进行计算以输出第二分类概率p2;
    对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;
    p=w 1*p1+w 2*p2;
    w 1=T1/(T1+T2),
    w 2=T2/(T1+T2);
    其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现一种基于深度学习的胶囊内窥镜影像识别方法中的步骤,所述方法包括:
    通过胶囊内窥镜按照时间生成顺序收集N幅原始图像;
    采用滑动窗口分割方法将N幅原始图像分割为大小相同的M组原始图像序列;
    解析N幅原始图像或解析M组RGB图像序列形成M组光流图像 序列;
    每一所述RGB图像序列由RGB格式的图像数据构成,每一所述光流图像序列由通过计算相邻RGB图像的光流场所形成的图像数据构成;
    将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果;所述识别结果为预设参数出现的概率值;所述3D卷积神经网络模型包括:RGB支路和光流支路;
    其中,将所述RGB图像序列和所述光流图像序列分别输入到3D卷积神经网络模型以输出识别结果,包括:
    将RGB图像序列输入RGB支路进行计算以输出第一分类概率p1;
    将光流图像序列输入光流支路进行计算以输出第二分类概率p2;
    对所述第一分类概率和所述第二分类概率进行融合形成所述识别结果p;
    p=w 1*p1+w 2*p2;
    w 1=T1/(T1+T2),
    w 2=T2/(T1+T2);
    其中T1,T2分别表示构建3D卷积神经网络模型过程中,验证集分别在RGB支路和光流支路的识别精度。
PCT/CN2021/137938 2021-01-06 2021-12-14 基于深度学习的胶囊内窥镜影像识别方法、设备及介质 WO2022148216A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21917257.4A EP4276684A4 (en) 2021-01-06 2021-12-14 CAPSULE ENDOSCOPE IMAGE RECOGNITION METHOD BASED ON DEEP LEARNING, DEVICE AND MEDIUM
JP2023540947A JP2024502105A (ja) 2021-01-06 2021-12-14 深層学習に基づくカプセル内視鏡画像認識方法、機器及び媒体
KR1020237022485A KR20230113386A (ko) 2021-01-06 2021-12-14 딥러닝 기반의 캡슐 내시경 영상 식별 방법, 기기 및매체
US18/260,528 US20240070858A1 (en) 2021-01-06 2021-12-14 Capsule endoscope image recognition method based on deep learning, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110010379.4A CN112348125B (zh) 2021-01-06 2021-01-06 基于深度学习的胶囊内窥镜影像识别方法、设备及介质
CN202110010379.4 2021-01-06

Publications (1)

Publication Number Publication Date
WO2022148216A1 true WO2022148216A1 (zh) 2022-07-14

Family

ID=74427399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137938 WO2022148216A1 (zh) 2021-01-06 2021-12-14 基于深度学习的胶囊内窥镜影像识别方法、设备及介质

Country Status (6)

Country Link
US (1) US20240070858A1 (zh)
EP (1) EP4276684A4 (zh)
JP (1) JP2024502105A (zh)
KR (1) KR20230113386A (zh)
CN (1) CN112348125B (zh)
WO (1) WO2022148216A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309604A (zh) * 2023-05-24 2023-06-23 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) 动态分析时序mr图像的方法、系统、设备和存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348125B (zh) * 2021-01-06 2021-04-02 安翰科技(武汉)股份有限公司 基于深度学习的胶囊内窥镜影像识别方法、设备及介质
CN113159238B (zh) * 2021-06-23 2021-10-26 安翰科技(武汉)股份有限公司 内窥镜影像识别方法、电子设备及存储介质
CN113591961A (zh) * 2021-07-22 2021-11-02 深圳市永吉星光电有限公司 一种基于神经网络的微创医用摄像头图像识别方法
CN113591761B (zh) * 2021-08-09 2023-06-06 成都华栖云科技有限公司 一种视频镜头语言识别方法
CN113487605B (zh) * 2021-09-03 2021-11-19 北京字节跳动网络技术有限公司 用于内窥镜的组织腔体定位方法、装置、介质及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984957A (zh) 2014-05-04 2014-08-13 中国科学院深圳先进技术研究院 胶囊内窥镜图像可疑病变区域自动预警系统
CN109886358A (zh) * 2019-03-21 2019-06-14 上海理工大学 基于多时空信息融合卷积神经网络的人体行为识别方法
CN110222574A (zh) * 2019-05-07 2019-09-10 杭州智尚云科信息技术有限公司 基于结构化双流卷积神经网络的生产操作行为识别方法、装置、设备、系统及存储介质
CN110705463A (zh) * 2019-09-29 2020-01-17 山东大学 基于多模态双流3d网络的视频人体行为识别方法及系统
US20200210708A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Method and device for video classification
CN111383214A (zh) * 2020-03-10 2020-07-07 苏州慧维智能医疗科技有限公司 实时内窥镜肠镜息肉检测系统
CN111462082A (zh) 2020-03-31 2020-07-28 重庆金山医疗技术研究院有限公司 一种病灶图片识别装置、方法、设备及可读存储介质
CN112348125A (zh) * 2021-01-06 2021-02-09 安翰科技(武汉)股份有限公司 基于深度学习的胶囊内窥镜影像识别方法、设备及介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5191240B2 (ja) * 2008-01-09 2013-05-08 オリンパス株式会社 シーン変化検出装置およびシーン変化検出プログラム
JP5281826B2 (ja) * 2008-06-05 2013-09-04 オリンパス株式会社 画像処理装置、画像処理プログラムおよび画像処理方法
CN108292366B (zh) * 2015-09-10 2022-03-18 美基蒂克艾尔有限公司 在内窥镜手术中检测可疑组织区域的系统和方法
US10572996B2 (en) * 2016-06-28 2020-02-25 Contextvision Ab Method and system for detecting pathological anomalies in a digital pathology image and method for annotating a tissue slide
CN109934276B (zh) * 2019-03-05 2020-11-17 安翰科技(武汉)股份有限公司 基于迁移学习的胶囊内窥镜图像分类系统及方法
CN111950444A (zh) * 2020-08-10 2020-11-17 北京师范大学珠海分校 一种基于时空特征融合深度学习网络的视频行为识别方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984957A (zh) 2014-05-04 2014-08-13 中国科学院深圳先进技术研究院 胶囊内窥镜图像可疑病变区域自动预警系统
US20200210708A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Method and device for video classification
CN109886358A (zh) * 2019-03-21 2019-06-14 上海理工大学 基于多时空信息融合卷积神经网络的人体行为识别方法
CN110222574A (zh) * 2019-05-07 2019-09-10 杭州智尚云科信息技术有限公司 基于结构化双流卷积神经网络的生产操作行为识别方法、装置、设备、系统及存储介质
CN110705463A (zh) * 2019-09-29 2020-01-17 山东大学 基于多模态双流3d网络的视频人体行为识别方法及系统
CN111383214A (zh) * 2020-03-10 2020-07-07 苏州慧维智能医疗科技有限公司 实时内窥镜肠镜息肉检测系统
CN111462082A (zh) 2020-03-31 2020-07-28 重庆金山医疗技术研究院有限公司 一种病灶图片识别装置、方法、设备及可读存储介质
CN112348125A (zh) * 2021-01-06 2021-02-09 安翰科技(武汉)股份有限公司 基于深度学习的胶囊内窥镜影像识别方法、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4276684A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309604A (zh) * 2023-05-24 2023-06-23 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) 动态分析时序mr图像的方法、系统、设备和存储介质
CN116309604B (zh) * 2023-05-24 2023-08-22 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) 动态分析时序mr图像的方法、系统、设备和存储介质

Also Published As

Publication number Publication date
CN112348125A (zh) 2021-02-09
EP4276684A4 (en) 2024-05-29
CN112348125B (zh) 2021-04-02
KR20230113386A (ko) 2023-07-28
US20240070858A1 (en) 2024-02-29
EP4276684A1 (en) 2023-11-15
JP2024502105A (ja) 2024-01-17

Similar Documents

Publication Publication Date Title
WO2022148216A1 (zh) 基于深度学习的胶囊内窥镜影像识别方法、设备及介质
CN110378381B (zh) 物体检测方法、装置和计算机存储介质
WO2021036616A1 (zh) 一种医疗图像处理方法、医疗图像识别方法及装置
WO2020238734A1 (zh) 图像分割模型的训练方法、装置、计算机设备和存储介质
CN110097130B (zh) 分类任务模型的训练方法、装置、设备及存储介质
JP5417494B2 (ja) 画像処理方法およびシステム
CN112041912A (zh) 用于诊断肠胃肿瘤的系统和方法
US20170277977A1 (en) Image classifying apparatus, image classifying method, and image classifying program
CN111276240B (zh) 一种基于图卷积网络的多标签多模态全息脉象识别方法
Zhang et al. Dual encoder fusion u-net (defu-net) for cross-manufacturer chest x-ray segmentation
WO2023044605A1 (zh) 极端环境下脑结构的三维重建方法、装置及可读存储介质
CN111091536A (zh) 医学图像处理方法、装置、设备、介质以及内窥镜
WO2022052782A1 (zh) 图像的处理方法及相关设备
Wang et al. Paul: Procrustean autoencoder for unsupervised lifting
CN115330876A (zh) 基于孪生网络和中心位置估计的目标模板图匹配定位方法
CN116935044B (zh) 一种多尺度引导和多层次监督的内镜息肉分割方法
Amirthalingam et al. Improved Water Strider Optimization with Deep Learning based Image Classification for Wireless Capsule Endoscopy
CN115861490A (zh) 一种基于注意力机制的图像动画构建方法和系统
Vishwakarma et al. Learning salient features in radar micro-Doppler signatures using Attention Enhanced Alexnet
CN115937963A (zh) 一种基于步态识别的行人识别方法
Dabhi et al. High fidelity 3d reconstructions with limited physical views
Zhang et al. Semantic feature attention network for liver tumor segmentation in large-scale CT database
Sun et al. Devil in the details: Delving into accurate quality scoring for DensePose
US20240127468A1 (en) Anatomical structure complexity determination and representation
WO2024098240A1 (zh) 一种消化内镜视觉重建导航系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21917257

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20237022485

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023540947

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18260528

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021917257

Country of ref document: EP

Effective date: 20230807