CN117152042A - Fundus color photograph processing device based on attention mechanism - Google Patents

Fundus color photograph processing device based on attention mechanism Download PDF

Info

Publication number
CN117152042A
CN117152042A CN202210542445.7A CN202210542445A CN117152042A CN 117152042 A CN117152042 A CN 117152042A CN 202210542445 A CN202210542445 A CN 202210542445A CN 117152042 A CN117152042 A CN 117152042A
Authority
CN
China
Prior art keywords
image
module
fundus
color
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210542445.7A
Other languages
Chinese (zh)
Inventor
张冀聪
开力木江·阿合买提江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210542445.7A priority Critical patent/CN117152042A/en
Publication of CN117152042A publication Critical patent/CN117152042A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a fundus color illumination processing device based on an attention mechanism, which comprises an image preprocessing module, a blood vessel feature extraction module and a convolutional neural network module; the image preprocessing module comprises a mask layer extraction module and an image enhancement module; the mask layer extraction module is used for extracting a mask layer, and the mask layer is used for dividing an interested region image and a non-interested region image in fundus color illumination; the image enhancement module is used for enhancing the contrast of the eye bottom color photograph; the image preprocessing module cuts the fundus color photograph by utilizing the mask layer to obtain a preprocessed fundus color photograph image; the vascular feature extraction module extracts a vascular distribution significance map according to the preprocessed fundus color photograph image; the vascular distribution significance map and the preprocessed fundus color photographic image are multiplied by point-to-point activation of each color channel and then used as input of the convolutional neural network module; so that the convolutional neural network model is focused on learning the image area related to the hypertension.

Description

Fundus color photograph processing device based on attention mechanism
Technical Field
The application relates to an image data processing technology, in particular to a fundus color photograph image data processing method and device based on an attention mechanism, and particularly relates to a fundus color photograph image data processing method and system based on an attention mechanism for early screening of hypertension.
Background
The microvasculature on the retina is the only microvasculature of the human body which is not covered by skin or tissue and can be directly observed, and fundus color illumination can be obtained in a non-invasive and very economical way, so that the microvasculature is more suitable for large-scale screening. With the development of artificial intelligence technology in the field of medical images, fundus diseases screening such as diabetic retinopathy, senile macular degeneration and glaucoma based on fundus illumination are widely applied, so that systemic disease screening based on fundus illumination becomes a challenging technical development direction.
In recent years, a data-driven deep learning method is widely applied to disease screening of fundus color illumination. Compared with the traditional method adopting manual design features, the method for automatically learning the features in the fundus color photograph by establishing the deep learning model can automatically learn the features with higher complexity from the input original image data, and can obtain better performance by automatic optimization so as to realize end-to-end screening. Because diseases such as Diabetic Retinopathy (DR), glaucoma, senile macular degeneration (AMD) and the like can directly cause specific lesions of different degrees on eyeground, more semantic features are brought, when the diseases are screened by using a deep learning method, a deep learning model can be converged and focused on a lesion area more quickly, so that a better effect can be obtained.
For systemic chronic diseases such as hypertension, the difficulty of screening the disease directly through a model is high due to the lack of strong labeling of a lesion area, so that the existing algorithm appears to be weak in such application. In particular, the change of hypertension on the retina of the fundus is a long-term and slow process, so that no obvious lesion features usually appear in the early stage of the disease, and more tiny changes are gradually generated at the parts of the optic nerve disc, the microvasculature and the like. Therefore, the deep learning method is used for applying the hypertension on the fundus color photograph, and is often limited by difficult feature extraction, fuzzy semantic information and the like, so that the application is more difficult. Accordingly, there are currently less studies to screen hypertension by fundus illumination using deep learning methods. In 2020, dai et al used fundus color and fundus blood vessel segmentation images to perform cross-sectional predictive screening of hypertension disease, respectively, and used a simple five-layer CNN network to perform classification experiments, and AUCs obtained in screening of fundus color and fundus blood vessel segmentation images were only 0.65 and 0.60, respectively. In the same year Zhang L et al used a neural network model to predict hypertension, hyperglycemia, dyslipidemia, and a range of risk factors, screened using the impoptin-V3 model, predicted hypertension AUC was only 0.76. Therefore, since the early stage of hypertension usually does not cause obvious lesion characteristics of microvasculature on retina, the deep learning model cannot well extract effective characteristics for learning, so that the current research on early stage screening of hypertension based on fundus illumination is not universal, and the method used in the research on screening of hypertension in the existing research is single and has an unsatisfactory effect.
Disclosure of Invention
In order to solve the technical problems, the application provides a fundus illumination processing device based on an attention mechanism, which comprises an image preprocessing module, a blood vessel feature extraction module and a convolutional neural network module; the image preprocessing module comprises a mask layer extraction module and an image enhancement module; the mask layer extraction module is used for extracting a mask layer, and the mask layer is used for dividing an interested region image and a non-interested region image in fundus illumination; the image enhancement module is used for enhancing the contrast of the eye bottom color illumination; the image preprocessing module is used for cutting fundus color illumination subjected to contrast enhancement by the image enhancement module by utilizing the mask layer extracted by the mask layer extraction module so as to obtain a preprocessed fundus color illumination image; the blood vessel feature extraction module is used for extracting a blood vessel distribution significance map according to the preprocessed fundus color photograph image; the convolutional neural network module is used for carrying out automatic feature extraction; and the vascularity significance map and the preprocessed fundus color photograph image are multiplied by point-to-point activation of each color channel and then used as input of the convolutional neural network module.
In the above technical solution, the feature automatically extracted by the convolutional neural network module is an image data feature related to hypertension in fundus illumination.
In the above technical solution, the point-to-point activation multiplication of each color channel is performed on the vascularity saliency map and the preprocessed fundus color photograph image, specifically by adopting the formula:
SM is data of a vascularity significance map; i k The data of the fundus color photograph image after pretreatment; k is the channel number of fundus color photographic images, and 0,1 and 2 are respectively represented by RGB channels; the product at the pixel level is indicated by "; c is a constant matrix; mask is data of a Mask layer; i SM The activated image data is weighted for significance.
In the above technical solution, the convolutional neural network module obtains the image data I after the significance weighting activation SM The processed convolution layers are input into a stacked ResNeSt module for feature extraction.
In the above technical solution, the ResNeSt module splits the input data with the size of H×W×C into K structure consistency base groups in the channel dimension; in each group of coefficients, each group of features is further input into R branches, thus yielding g=kr branches; convolution with the convolution kernel sizes of 1 multiplied by 1 and 3 multiplied by 3 is respectively applied to each branch, so that deep features under G different channel dimensions are obtained; in each basic group, the output of R branches is used as split attention module input, R branch outputs are cascaded along the channel dimension and then are subjected to global pooling layer to obtain global expression mode, then two layers of full-connection layers are used for compressing in the channel dimension to obtain respective attention factors of each branch feature group, the R attention factors respectively represent the importance degree of the corresponding branch in the channel dimension, after R-Softmax function calculation, the attention weight of each branch in the channel dimension in the basic group is calculated, and finally the weight is multiplied with the corresponding branch output and summed to obtain the output of the basic group; wherein H, W, C is the height, width, color value of each channel of the input data image, K, R is a positive integer greater than 2.
In the above technical solution, the mask layer extraction module extracts image data of a red channel in an RBG color image of the fundus color photograph; dividing the image data of the red channel at a lower threshold value to obtain a background division map, wherein the background division map is divided into an interested region and a non-interested region; and extracting mask layers for segmenting the region-of-interest image and the non-region-of-interest image according to the background segmentation map.
In the above technical solution, the region of interest image is a quasi-circular region, and the non-region of interest is connected.
In the above technical solution, the image enhancement module transforms fundus color illumination into LAB color space, and splits image data of the luminance channel; performing contrast enhancement on the image data of the brightness channel by using a CLAHE algorithm; the resulting contrast enhanced luminance channel image data is reassembled and reconverted from the LAB color space back to the RGB color space.
In the above technical solution, the system further comprises a multi-layer perceptron module, which is used for classifying by fusing image features and non-image features in clinical metadata through the multi-layer perceptron module; preferably, the multi-layer perceptron module comprises three hidden layers connected in sequence, wherein the first hidden layer comprises two input parts; wherein the first input takes as input pre-processed metadata including one or more of age, gender, height, weight, heart rate, BMI index; the second input part takes as input the 512-dimensional feature map output by the convolutional neural network module.
The application has the following technical effects:
according to the application, the vascular distribution characteristic diagram obtained by segmenting and post-processing the blood vessel is used as a priori knowledge to guide the convolutional neural network, so that characteristic extraction is more targeted, and influence of individual factors on a screening result is further eliminated by fusing clinical metadata to correct. Compared with the prior art, the vascular feature extraction module is introduced to extract vascular distribution information, so that a deep learning model is helped to focus on important parts such as fundus main blood vessels, more targeted feature extraction is carried out, non-image features based on clinical metadata are fused in screening to classify, so that influence of individual factors on disease screening is corrected while more multi-element feature information is provided for early screening of hypertension, the accuracy of early screening of hypertension is further improved, large-scale early screening of hypertension can be carried out more efficiently, patients are helped to find illness states timely, intervention and treatment are carried out early, and important social value is achieved.
Drawings
FIG. 1 is a system frame structure diagram of the present application;
FIG. 2 is a flowchart of fundus illumination preprocessing of the present application;
FIG. 3 is a flow chart of a method for extracting blood vessel features according to the present application;
FIG. 4 is a flow chart of a post-extraction processing of vascular features of the present application;
FIG. 5 is a schematic diagram of a convolutional neural network module of the present application;
FIG. 6 is a schematic diagram of the ResNeSt module architecture of the present application;
fig. 7 is a schematic diagram of a multi-layer sensor module according to the present application.
Detailed Description
The present application will be described in further detail below with reference to the drawings and detailed description for those skilled in the art to understand and practice the application.
In order to solve the technical problems that the research on screening of hypertension diseases based on fundus color illumination is not universal, the using method is single and the effect is not ideal enough in the prior art, the application provides a fundus color illumination data processing method based on an attention mechanism, which is particularly suitable for early screening of hypertension symptoms, and the main steps of the method are as shown in figure 1, and the method comprises the following steps: preprocessing the fundus color photograph image, extracting refined blood vessel characteristics by using a pre-trained segmentation network, and extracting a saliency map representing blood vessel distribution by using a post-processing mode; the saliency map and the fundus color photograph are used as the input of the convolutional neural network in a fusion way, so that the characteristic in the fundus color photograph is extracted in a targeted and automatic way; the image features and the non-image features in the clinical metadata are fused through the multi-layer perceptron model to classify, so that early screening of hypertension symptoms is realized.
According to still another preferred embodiment of the present application, there is provided a fundus illumination data processing system based on an attention mechanism, including:
the image preprocessing module is used for preprocessing the eye bottom color photographic image;
the blood vessel feature extraction module is used for extracting refined blood vessel features through a pre-trained segmentation network and extracting a saliency map representing blood vessel distribution by using a post-processing mode;
the convolution neural network module is used for realizing targeted automatic extraction of the characteristics in the fundus color illumination by using the saliency map and the fundus color illumination as the input of the convolution neural network in a fusion way;
and the multi-layer perceptron module is used for fusing the image characteristics and the non-image characteristics in the clinical metadata through the multi-layer perceptron model to classify.
Preferably, the eye fundus illumination data processing system based on the attention mechanism is particularly suitable for early screening for hypertension.
According to still another preferred embodiment of the present application, there is provided a system for early screening of hypertension based on fundus color images, characterized by comprising:
the image preprocessing module is used for preprocessing the eye bottom color photographic image;
the blood vessel feature extraction module is used for extracting refined blood vessel features through a pre-trained segmentation network and extracting a saliency map representing blood vessel distribution by using a post-processing mode;
the convolution neural network module is used for realizing targeted automatic extraction of the characteristics in the fundus color illumination by using the saliency map and the fundus color illumination as the input of the convolution neural network in a fusion way;
and the multi-layer perceptron module is used for classifying by fusing the image features and the non-image features in the clinical metadata through the multi-layer perceptron model, so that early screening of hypertension is realized.
According to another preferred embodiment of the present application, the fundus illumination data processing system based on the attention mechanism provided by the present application, as shown in fig. 1, includes an image preprocessing module, a blood vessel feature extraction module, a convolutional neural network module, and a multi-layer perceptron module.
(1) Image preprocessing
The pretreatment method for fundus color illumination in the application is mainly divided into extraction of the region of interest and contrast enhancement of fundus color illumination, as shown in fig. 2.
The image preprocessing of the fundus color photograph image comprises the extraction of the region of interest and the image clipping. A region of interest (ROI) refers to a portion of the desired information to be processed contained in the fundus illumination image, which is a subset of the fundus illumination image; whereas the non-region of interest is a subset of the image that does not contain the required information. The non-interested region should be removed as much as possible under the precondition of not affecting the interested region, so as to reduce the calculation power consumption of image processing, and be more beneficial to unifying the sizes of pictures with different specifications. Therefore, the purpose of extracting the region of interest in the application is to avoid wasting time and computing resources due to excessive non-region of interest, and to use the extracted non-region of interest as a mask layer in subsequent data processing.
For a fundus image obtained by photographing with a fundus camera, that is, a fundus color photograph image, a quasi-circular region at the center position thereof is a portion of interest, that is, a region of interest.
The region of interest is extracted, typically using segmentation methods and morphological processing methods in conventional image processing, such as a manual thresholding method, an iterative thresholding method, a bimodal thresholding method, an Otsu method, etc., where Otsu is a segmentation method that automatically selects a threshold value based on global information.
The method for extracting the region of interest adopted by the application mainly comprises the following steps:
step S1110, extracts red channel image data in the RBG color image thereof for the fundus color photograph image.
In step S1120, the image data of the red channel is segmented at a lower threshold to obtain a background segmentation map.
Specifically, preferably, the Otsu algorithm is adopted, let u be the overall average value of the image data obtained in step S1110, the variance between the two parts of the region of interest and the region of non-interest be B, the threshold t be taken, and the average value obtained after extracting the region of interest be u 0 The ratio of the number of pixels contained in the region of interest to the total number of pixels of the whole original image is w 0 The method comprises the steps of carrying out a first treatment on the surface of the The mean value obtained after the extraction of the non-interested region is u 1 The pixel number ratio of the corresponding non-interested region is w 1 At this time satisfy w 0 +w 1 =1。
At this time, the variance B between the two parts is shown as follows:
B=w 0 (t)×(u 0 (t)-u) 2 +w1(t)×(u 1 (t)-u) 2
traversing the threshold t by using the formula, and obtaining t which can maximize the variance B through searching s The value is obtained to obtain the final Otsu threshold value, and t is at that time s Dividing the gray scale map into two-value maps by taking the value as a threshold valueSo as to obtain the best segmentation effect.
The non-interested areas extracted in the above steps should be generally connected, and the extracted interested areas should be generally circular-like areas, so that the method of morphological processing can be further utilized to delete small-area interested areas possibly misclassified, and the opening and closing operation can be utilized to remove burrs, so that the obtained binary mask edges are smoothed.
Therefore, the step S1120 may further include deleting the small-area connected region by using a morphological method, so as to remove the region possibly divided into the background by mistake in the foreground portion, and finally, performing the operations of expansion and corrosion, so as to smooth the view boundary, and obtain a relatively complete non-interested region image. The expansion and corrosion operation preferably uses a core having a size of 5×5.
Step S1130, extracting a mask layer dividing the region-of-interest image and the non-region-of-interest image.
And extracting the largest circumscribed rectangle of the segmented round-like region of interest, and respectively expanding 10 pixel points at the left side and the right side of the region of interest to improve the fault tolerance of the extraction. And finally, performing binarization processing in a threshold value cutoff mode, wherein the background area value is set to 0, the interested area value is set to 1, and a final mask layer is obtained.
The image preprocessing of the fundus color photograph image also includes image enhancement processing. Picture enhancement is a series of image transformation algorithms that adds some random information or performs a series of transformation operations on elements in an original image by performing processing, thereby purposefully highlighting or repairing lost information in a region of interest or context information related to a task in the original image, and suppressing non-region of interest or interference-causing parts. The specific purpose of the image enhancement processing adopted by the application is that: expanding the data set by using image enhancement processing, so as to obtain image data with richer characteristics; the information of the video disc, the blood vessel vein and the like which are more important for the downstream task is highlighted through a characteristic enhancement algorithm.
The method for enhancing the eye bottom color photograph mainly comprises the following steps:
step S1210 transforms the fundus illumination image to LAB color space and splits the luminance (L) channel image data.
In step S1220, contrast enhancement is performed on the luminance channel image data using the CLAHE algorithm.
Step S1230, reorganizing the contrast-enhanced luminance channel image data obtained in step S1220, reconverting from the LAB color space image back to the RGB color space image, and clipping the image by using the mask layer extracted in step S1130 in the preprocessing to obtain a preprocessed image.
After pretreatment by the method, the texture features in fundus color illumination can be obviously enhanced while the color features of the image are reserved to the greatest extent
(2) Vascular feature extraction
The blood vessel feature extraction module is used for extracting a retina segmentation map in fundus color illumination, and further converting the retina segmentation map into a Saliency map (Saliency map) capable of representing the distribution state of blood vessels in a post-processing mode, as shown in fig. 3. The method mainly comprises a network training stage and an application stage. In the training stage, the model is trained and preheated by utilizing a general public data set and then debugged by a small amount of private data sets mainly by utilizing the idea of transfer learning, so that the problem of poor characteristic extraction effect caused by inconsistent data size specifications when the model is applied to a subsequent screening task is avoided. In the application stage, the segmentation network which is selected in the previous stage and is trained is subjected to weight freezing, the threshold value cut-off operation at the tail end of the network is removed, and a post-processing module is added to convert the refined vascular feature mapping information into a vascular distribution saliency map which can represent a larger range of vascular distribution. The blood vessel enriched part in the eye bottom color photograph can be subjected to targeted significance weight distribution by utilizing the blood vessel distribution significance graph, so that the downstream convolutional neural network is helped to pay more attention to the partial region.
The application adopts U-Net and a series of variants thereof as a segmentation network, and freezes the network after training by using refined fundus blood vessel labeling, thereby being used as a feature extractor. The application preferably adopts an improved multi-scale Res-UNet segmentation network as a backbone network of the module, enables the model to have the capability of obtaining a refined blood vessel characteristic map through a training network, and converts the refined characteristic mapping information into a significance map in a larger range through a pooling-up sampling post-processing mode. The multi-scale Res-UNet segmentation network adopted by the application is that a residual short-circuit structure is added into a convolution module of a traditional U-Net network, so that the jump linking mode in a residual unit can ensure better combination of shallow information and deep information while simplifying network training, and the deep information with required scale can be adaptively learned while preventing model gradient from disappearing; that is, in the conventional U-Net network, a convolution joint coding part and decoding part with a convolution kernel size of 3 are generally used, and in order to extract more multi-scale information, the bridging part is improved to a hole convolution space pyramid pooling module (ASPP). The ASPP module performs parallel convolution on input by using hole convolution with expansion scale and filling size of 6, 12 and 18 respectively, then splices the obtained features together according to channel dimension, and finally fuses the features of multiple channel dimensions by using 1X 1 convolution, so that feature information on different sensing scales is obtained by parallel application of the hole convolution with different expansion scales to mapping of the input features, and the capability of capturing multi-scale feature context of the model is improved.
The blood vessel feature extraction method of the application is shown in fig. 4, and mainly comprises the following steps:
step S2110, slicing the fundus image after preprocessing obtained by preprocessing;
step S2120, extracting blood vessel features in the slice images by adopting a segmentation network obtained by pre-training;
step S2130, recombining the extracted blood vessel characteristics to obtain a blood vessel characteristic diagram;
and step S2140, performing post-processing on the blood vessel feature map to obtain a blood vessel distribution significance map.
After the segmentation network training is completed, carrying out post-processing on a feature layer output by the end of the network, and carrying out post-processing operation after recombining to obtain a blood vessel feature image, wherein the post-processing operation comprises pooling and up-sampling operation, the pooling operation comprises three average pooling operations of respectively carrying out core size 4x4 and step length 2, core size 8x8 and step length 8 on the blood vessel feature image in sequence, and pooling the image to 16x16 size; the upsampling operation includes stretching the gray scale map to the size of the input image using a bilinear interpolation upsampling algorithm.
In the application, the average pixel density in a sliding window can be obtained through the average pooling operation, so that the blood vessel density degree in the window is represented; and (3) adopting bilinear interpolation as an up-sampling method to change pixels among different windows into gentle ones, so as to obtain a softer saliency map capable of representing the global distribution of the blood vessel.
The vessel feature map contains refined feature mapping information for representing vessel veins in original map scale. After the post-processing by the method, the refined information can be converted into a significance map with a large range. Wherein the main blood vessel and the crossing part of the blood vessel can obtain higher significance value. Therefore, in the subsequent hypertension screening and verification process, the feature map is utilized to activate the original fundus color illumination, so that the significance guiding of the depth feature extraction module is realized, additional semantic information priori knowledge is provided for the downstream depth information extraction, and the model is helped to focus on important parts such as main blood vessels and blood vessel intersections.
(3) Automatic feature extraction using convolutional neural networks
The application adopts the convolutional neural network module to realize the automatic extraction of the characteristics related to the hypertension disease in fundus color illumination. Convolutional neural networks are one type of feedforward neural network, and are generally composed of a convolutional layer, a pooling layer, and a fully-connected layer. Specifically, as shown in fig. 5, the backbone network of the convolutional neural module used in the application adopts the convolutional neural network to automatically extract features, and mainly comprises the following steps:
step S3110, performing point-to-point activation multiplication of each color channel on the preprocessed fundus color photograph obtained in step S2130 by using the vascularity saliency map obtained in step S2140.
The application uses the following formula to realize the activation operation of the saliency map on the fundus image, namely, the saliency map is used for carrying out weight adjustment on the preprocessed fundus color photograph so as to obtain the image data I after saliency weighted activation SM As input to downstream screening tasks.
SM is the vascularity saliency map data obtained in step S2140; i k The preprocessed fundus color photograph image data obtained in step S2130; k is the channel number of fundus color photographic images, and 0,1 and 2 are respectively represented by RGB channels; the product at the pixel level is indicated by "; c is a constant matrix for preventing information loss of fewer parts of blood vessels in the fundus image; mask is Mask data obtained in step S1130.
Step S3120, continuously passing the input data through three convolution layers with the size of 3x3 convolution kernels, and inputting the input data into a stacked ResNeSt module for feature extraction;
as shown in fig. 6, the resnett module splits the input feature map (input sizes of h×w×c, H, W, and C are the height, width, and color values of each channel of the input data image) into K structurally uniform coefficient groups (carondinals) in the channel dimension, and further inputs each group of features into R branches (branches) in each coefficient group, thereby obtaining g=kr branches. The number of channels of the feature map in each branch is C k and/K/R, and convolutions (conv) with convolution kernel sizes of 1 multiplied by 1 and 3 multiplied by 3 are respectively applied to each branch, so that deep features under G different channel dimensions are obtained. In each group, R branches are output as Split Attention module (Split Attention) input, R branches are output in cascade along channel dimension and then are subjected to global pooling layer to obtain global expression mode, then feature map is compressed in channel dimension by two layers of full-connection layer (Dense) to obtain respective Attention factors of each sub-feature group, and R Attention factors respectively represent important processes of corresponding branches in channel dimensionAnd (3) after the degree is calculated by the r-Softmax function, calculating the attention weight of the channel dimension of each branch in the base group, and finally, respectively multiplying and summing the weight and the input branch output to obtain the output of the base group. The split attention mechanism can achieve the weighted tie that each output channel is a corresponding channel input on different branches, and the contribution of each channel is automatically learned through model training to form the channel attention mechanism similar to SENet. Finally, after the outputs of the K coefficient groups are added, the outputs return to the characteristic diagram with the same size as the original input through a convolution kernel of 1 multiplied by 1, and then the characteristic diagram is connected with the characteristic diagram which is input initially in a short circuit mode, so that a residual error module is formed.
Namely, after the fundus color photograph image data subjected to upstream processing is input into the convolutional neural network, the fundus color photograph image data continuously passes through three convolutional layers (accompanied by BN layers and ReLU activation functions) with the kernel size of 3 multiplied by 3, and then is input into a continuous ResNeSt module for extracting high-dimensional characteristics; and then compressing the features through a global average pooling layer, and outputting high-dimensional depth features after the dimension reduction by using a full-connection layer to serve as the input of a subsequent classification module.
In step S3130, feature compression is performed by a global averaging pooling layer, and finally the depth feature dimension is compressed to 512x1 by a full connection layer.
According to clinical researches, due to the fact that long-term blood pressure of a hypertensive patient is high, large-amplitude blood pressure fluctuation is easy to occur, functional stenosis of retinal arteries is easy to occur, vascular sclerosis tends to occur, the vascular diameter of arteries is generally not uniform and arteriovenous cross compression symptoms are not easily perceived in fundus images, and long-term uncontrolled hypertension can also cause changes such as optic disc nipple edema, hemorrhage, allergic leukoplakia and the like. In the saliency map generated by the application, the blood vessel enrichment parts such as the optic nerve disk and the main blood vessel are assigned with higher activation values, after the saliency map is used for weighting and activating the preprocessed fundus color photograph image data, the important area parts of the blood vessel enrichment parts such as the main blood vessel and the optic nerve disk in the fundus color photograph can obtain higher activation weight values, and the artery and vein intersection parts after the secondary bifurcation of the blood vessel are also assigned with higher activation weight values, so that a downstream model can be guided to pay more attention to the important parts. From a clinical perspective, these areas with high activation weight values are also areas of great interest to the practitioner in viewing fundus images. Therefore, the method is equivalent to the important attention area of a professional doctor when observing the image by transforming the finer blood vessel characteristic image after pixel level segmentation into a significance image of blood vessel distribution, thereby achieving the aim of using priori knowledge to 'guide' the subsequent characteristic extraction module to 'see' where.
In the training process, the convolutional neural network module utilizes back propagation to automatically extract deep characteristic information with high correlation with hypertension diseases in fundus color photographs, and after the convolutional neural network module is activated through a saliency map, the convolutional neural network module can be guided to be focused on areas with larger gray values (higher brightness) in the saliency map, namely areas with denser blood vessel distribution, especially structures such as optic nerve discs, main blood vessels and bifurcation and intersection of blood vessels and the distribution areas thereof, so that the convolutional neural network module is guided by taking the deep characteristic information as prior information, and the convolutional neural network model is enabled to mainly learn areas with correlation with hypertension.
(4) Multi-layer perceptron module
The application further adopts a multi-layer perceptron module to fuse the image characteristics and the non-image characteristics for classification. A multi-layer perceptron (MLP) is a neural network model based on a perceptron learning algorithm, which may contain multiple hidden layers in between, in addition to the input and output layers.
The structure of the MLP used in the present application is shown in FIG. 7, and the MLP includes three hidden layers: the first hidden layer a contains 64 nodes in total, and takes preprocessed metadata comprising clinical data information as input, wherein the metadata comprises data types such as age, gender, height, weight, heart rate, BMI index and the like; the feature map which is obtained in the step S3130 and is compressed into 512-dimension by the global average pooling layer and the full connection layer after being processed by the network is used as a second part of the MLP to be input; after the two parts are input and spliced, the two parts continuously pass through a hidden layer b with 512 nodes and a hidden layer c with 128 nodes, and the hidden layers are activated by using a ReLU so as to realize nonlinear affine transformation. The MLP output layer contains two nodes, giving the final classification result.
The MLP is usually trained by using a back propagation algorithm and a supervised learning method, so that the MLP and an upstream convolutional neural network module can be combined to realize end-to-end training through a gradient descent optimization algorithm. In addition, the activation function used in the MLP has the characteristic of nonlinear affine transformation, can fuse the high-dimensional characteristics output by the convolutional neural network with the non-image characteristics obtained from clinical metadata, and can realize the combination of different characteristics by continuously updating gradients during supervised learning. After the features of the two nodes finally output by the MLP module are subjected to Softmax function activation, the prediction probabilities of different classifications can be obtained respectively, wherein the larger one is the final classification result of the model.
Test results:
in order to verify the method and model of the application, the THCS data set used for testing is obtained by statistically sorting fundus images of hypertensive patients from hospitals and relevant demographic information, and comprises 1507 fundus images of the hypertensive patients confirmed in 822 northern people, 2540 fundus images of the non-hypertensive patients from 1340, and clinical metadata of the ages, sexes, heights, ethnicities, weights, heart rates and the like of the testees. The fundus image in the data set is 2464 x 2248 high-definition fundus color photograph, the photographing device is Canon CR-2 fundus camera, and the photographing view field is 45 degrees. While community population physical examination data provided by the hospital is also used as an additional supplemental data set. The data set also comprises an expansion data set of community population in Beijing area, and the expansion data set comprises fundus images and a series of clinical metadata such as blood pressure, blood sugar, serum creatinine, self-help medical history and the like, wherein the fundus images comprise fundus images 1344 from 699 suspected hypertension positive samples.
Experimental results show that the average AUC value of the THCS data set is 0.870, the average accuracy is 0.805, the average accuracy is 0.830 and the average recall is 0.761. Compared with a method using only a convolutional neural network, the average AUC, the average accuracy and the average recall rate are respectively improved by 6.8%, 4.2%, 14.9% and 10.9%, and the early screening effect on hypertension diseases is obviously improved.
In order to achieve the above object, there is also provided according to the present application a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the steps of the above method when said computer program is executed.
The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.
The memory is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and units, such as corresponding program units in the above-described method embodiments of the application. The processor executes the various functional applications of the processor and the processing of the composition data by running non-transitory software programs, instructions and modules stored in the memory, i.e., implementing the methods of the method embodiments described above.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more units are stored in the memory, which when executed by the processor, performs the method in the above embodiments.
The details of the computer device may be correspondingly understood by referring to the corresponding relevant descriptions and effects in the above embodiments, and will not be repeated here.
In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the storage of the above-mentioned decentralised personal health information. It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
It will be apparent to those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the concept and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. An eye fundus color photograph processing device based on an attention mechanism is characterized in that: the device comprises an image preprocessing module, a blood vessel feature extraction module and a convolutional neural network module; wherein,
the image preprocessing module comprises a mask layer extraction module and an image enhancement module;
the mask layer extraction module is used for extracting a mask layer, and the mask layer is used for dividing an interested region image and a non-interested region image in fundus illumination;
the image enhancement module is used for enhancing the contrast of the eye bottom color illumination;
the image preprocessing module is used for cutting fundus color illumination subjected to contrast enhancement by the image enhancement module by utilizing the mask layer extracted by the mask layer extraction module so as to obtain a preprocessed fundus color illumination image;
the blood vessel feature extraction module is used for extracting a blood vessel distribution significance map according to the preprocessed fundus color photograph image;
the convolutional neural network module is used for carrying out automatic feature extraction; and the vascularity significance map and the preprocessed fundus color photograph image are subjected to point-to-point activation multiplication of each color channel and then used as input of the convolutional neural network module;
and the multi-layer perceptron module is used for fusing the image characteristics and the non-image characteristics in the clinical metadata through the multi-layer perceptron model to classify.
2. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 1, wherein: the characteristic automatically extracted by the convolutional neural network module is the image data characteristic related to the hypertension symptoms in fundus color illumination.
3. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 2, wherein: the point-to-point activation multiplication of each color channel is carried out on the vascularity significance map and the preprocessed fundus color photograph image, specifically, the formula is adopted:
SM is data of a vascularity significance map; i k The data of the fundus color photograph image after pretreatment; k is the channel number of fundus color photographic images, and 0,1 and 2 are respectively represented by RGB channels; the product at the pixel level is indicated by "; c is a constant matrix; mask is data of a Mask layer; i SM The activated image data is weighted for significance.
4. A fundus colour illumination processing apparatus based on an attention mechanism as claimed in claim 3, wherein: the convolutional neural network module obtains the image data I after the significance weighting activation SM The processed convolution layers are input into a stacked ResNeSt module for feature extraction.
5. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 4, wherein: the ResNeSt module splits input data with the size of H multiplied by W multiplied by C into K structure consistency coefficient groups in the channel dimension; in each group of coefficients, each group of features is further input into R branches, thus yielding g=kr branches; convolution with the convolution kernel sizes of 1 multiplied by 1 and 3 multiplied by 3 is respectively applied to each branch, so that deep features under G different channel dimensions are obtained; in each basic group, the output of R branches is used as split attention module input, R branch outputs are cascaded along the channel dimension and then are subjected to global pooling layer to obtain global expression mode, then two layers of full-connection layers are used for compressing in the channel dimension to obtain respective attention factors of each branch feature group, the R attention factors respectively represent the importance degree of the corresponding branch in the channel dimension, after R-Softmax function calculation, the attention weight of each branch in the channel dimension in the basic group is calculated, and finally the weight is multiplied with the corresponding branch output and summed to obtain the output of the basic group;
wherein H, W, C is the height, width, color value of each channel of the input data image, K, R is a positive integer greater than 2.
6. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 5, wherein: the mask layer extraction module extracts image data of a red channel in an RBG color image of the fundus color photograph; dividing the image data of the red channel at a lower threshold value to obtain a background division map, wherein the background division map is divided into an interested region and a non-interested region; and extracting mask layers for segmenting the region-of-interest image and the non-region-of-interest image according to the background segmentation map.
7. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 6, wherein: the region of interest image is a quasi-circular region and the non-region of interest is connected.
8. A fundus color illumination processing apparatus based on an attention mechanism as defined in claim 7, wherein: the image enhancement module is used for transforming fundus color illumination into an LAB color space and splitting image data of a brightness channel; performing contrast enhancement on the image data of the brightness channel by using a CLAHE algorithm; the resulting contrast enhanced luminance channel image data is reassembled and reconverted from the LAB color space back to the RGB color space.
9. A fundus colour illumination processing apparatus based on an attention mechanism as claimed in any of claims 1 to 8, wherein: the multi-layer perceptron module comprises three hidden layers which are sequentially connected, wherein the first hidden layer comprises two input parts; wherein the first input takes as input pre-processed metadata including one or more of age, gender, height, weight, heart rate, BMI index; the second input part takes as input the 512-dimensional feature map output by the convolutional neural network module.
10. A fundus color illumination processing system for early screening of hypertension, comprising at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the one processor to cause the at least one processor to act as the attention-based fundus illumination processing apparatus of any of claims 1-9.
CN202210542445.7A 2022-05-17 2022-05-17 Fundus color photograph processing device based on attention mechanism Pending CN117152042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210542445.7A CN117152042A (en) 2022-05-17 2022-05-17 Fundus color photograph processing device based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210542445.7A CN117152042A (en) 2022-05-17 2022-05-17 Fundus color photograph processing device based on attention mechanism

Publications (1)

Publication Number Publication Date
CN117152042A true CN117152042A (en) 2023-12-01

Family

ID=88901339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210542445.7A Pending CN117152042A (en) 2022-05-17 2022-05-17 Fundus color photograph processing device based on attention mechanism

Country Status (1)

Country Link
CN (1) CN117152042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788473A (en) * 2024-02-27 2024-03-29 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788473A (en) * 2024-02-27 2024-03-29 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network
CN117788473B (en) * 2024-02-27 2024-05-14 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network

Similar Documents

Publication Publication Date Title
Li et al. Applications of deep learning in fundus images: A review
CN111656357B (en) Modeling method, device and system for ophthalmic disease classification model
US20200250497A1 (en) Image classification method, server, and computer-readable storage medium
Canayaz Classification of diabetic retinopathy with feature selection over deep features using nature-inspired wrapper methods
WO2019085793A1 (en) Image classification method, computer device and computer readable storage medium
Xie et al. Cross-attention multi-branch network for fundus diseases classification using SLO images
CN113012163A (en) Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network
CN115908241A (en) Retinal vessel segmentation method based on fusion of UNet and Transformer
CN110610480B (en) MCASPP neural network eyeground image optic cup optic disc segmentation model based on Attention mechanism
Zhuang et al. Classification of diabetic retinopathy via fundus photography: Utilization of deep learning approaches to speed up disease detection
CN117152042A (en) Fundus color photograph processing device based on attention mechanism
Arjmand et al. Transfer learning versus custom CNN architectures in NAFLD biopsy images
Phridviraj et al. A bi-directional Long Short-Term Memory-based Diabetic Retinopathy detection model using retinal fundus images
Qin et al. A review of retinal vessel segmentation for fundus image analysis
CN117038088B (en) Method, device, equipment and medium for determining onset of diabetic retinopathy
Fu et al. Automatic grading of Diabetic macular edema based on end-to-end network
Tariq et al. Diabetic retinopathy detection using transfer and reinforcement learning with effective image preprocessing and data augmentation techniques
Li et al. Retinal vessel segmentation network based on patch-GAN
Jayachandran et al. Retinal vessels segmentation of colour fundus images using two stages cascades convolutional neural networks
De Silva et al. A thickness sensitive vessel extraction framework for retinal and conjunctival vascular tortuosity analysis
Yang et al. Adaptive enhancement of cataractous retinal images for contrast standardization
Escorcia-Gutierrez et al. Grading Diabetic Retinopathy Using Transfer Learning-Based Convolutional Neural Networks
Madhura Prakash et al. A Systematic Study of Deep Learning Architectures for Analysis of Glaucoma and Hypertensive Retinopathy
Yang Human-in-the-loop for efficient training of retinal image analysis methods
Ali et al. Classifying Three Stages of Cataract Disease using CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination