CN114627091A

CN114627091A - Retinal age identification method and device

Info

Publication number: CN114627091A
Application number: CN202210287825.0A
Authority: CN
Inventors: 杨逍; 杨强; 刘强; 郭前进
Original assignee: Beijing Institute of Petrochemical Technology
Current assignee: Beijing Institute of Petrochemical Technology
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-14

Abstract

The application relates to a retinal age identification method and a retinal age identification device, wherein the method comprises the following steps: building a convolutional neural network model, and adding an attention mechanism module into the model; preprocessing an image to be recognized to obtain a standard image; inputting the standard image into a built model for training; predicting the age of the retina through the trained model; and visualizing the attention mechanism to obtain the region with the highest attention of the model. According to the scheme, the neural network model is adopted to identify the retinal age of the measured person, so that big data can be processed, and the defects that the traditional method is complex in steps and cannot process big data are overcome; the network can automatically extract the features without manually selecting the features; an attention mechanism is added into the network, so that the accuracy of retinal age prediction is improved; finally, the result is visualized, and the convolutional neural network can observe which part of retina area is more concerned, thereby promoting the manual identification of medical workers.

Description

Retinal age identification method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a retina age identification method and device.

Background

Fundus imaging technologies that can currently assist with examination include color Fundus photography, Fluorescein Fundus Angiography (FFA), Indocyanine Green Angiography (ICGA), and Optical Coherence Tomography (OCT). Unlike the first three imaging techniques, OCT is a non-invasive tomographic technique and has a high signal acquisition speed for real-time image display.

Ophthalmic medicine is a highly instrument-dependent discipline, particularly relying on imaging technology. Due to the transparent eye structure and the characteristics of OCT non-contact type, non-invasion, weak coherent light, no electric radiation, no damage and the like. In the field of ophthalmic medical imaging, the high-resolution fundus retina cross-section image can be acquired by the system, so that the microstructure characteristics of each layer of retina can be accurately displayed, and the system has important significance for diagnosis and guide treatment of retinopathy. In the field of fundus retinal diagnosis, OCT has become the mainstream technology of clinical and scientific research, provides visualization of retinal morphology and layers, is commonly used for examining retinal diseases such as age-related macular degeneration (AMD) and Diabetic Macular Edema (DME), and can also perform intelligent identification on the aging degree of the retina.

Deep learning has been successful in many areas over the last decade. Particularly in the image field, with the rapid development of deep learning, the performance of artificial intelligence in visual tasks and large-scale image recognition has surpassed that of human beings, so that the consensus of artificial intelligence and medical treatment in the industry is rapidly spread. Compared with the traditional processing technology, the artificial intelligence algorithm represented by deep learning achieves breakthrough progress, and the direction of the medical field is changed.

Research shows that the traditional digital image processing method has more complex steps and cannot better process medical image big data; from the final effect of retinal OCT image recognition, the performance of conventional digital image processing algorithms still needs to be improved.

Disclosure of Invention

To overcome, at least to some extent, the problems in the related art, the present application provides a retinal age identification method and apparatus.

According to a first aspect of embodiments of the present application, there is provided a retinal age identification method including:

building a convolutional neural network model, and adding an attention mechanism module into the model;

preprocessing an image to be recognized to obtain a standard image;

inputting the standard image into a built model for training;

predicting the age of the retina through the trained model;

and visualizing the attention mechanism to obtain the region with the highest attention of the model.

Further, the preprocessing the image to be recognized includes:

carrying out brightness conversion and denoising processing on an image to be recognized, and then cutting the processed image into an image with a fixed size;

wherein the image to be identified is an OCT image.

Further, the luminance transforming step includes: improving the brightness of the image;

the denoising processing step comprises: and denoising by adopting a three-dimensional block matching algorithm.

Further, the building of the convolutional neural network model includes:

adopting a ResNet18 network as a model;

the convolution attention module is added to the last convolutional layer of the ResNet18 network.

Further, the inputting the standard image into the built model for training includes:

the hyper-parameters are set to: the size of each batch of samples is 32;

when the training times are 1 st to 135 th, the learning rate is 0.1;

when the training times are in the order of 136-;

when the training times are 186-.

mapping the features in the convolutional layer to F ∈ R^C×H×WAs an input;

deducing a one-dimensional channel attention map M according to F_C∈R^C×1×1And a two-dimensional spatial attention map M_S∈R^1×H×W。

Further, the deriving a one-dimensional channel attention map and a two-dimensional spatial attention map from F includes:

according to a second aspect of embodiments of the present application, there is provided a retinal age identification device including:

the model building module is used for building a convolutional neural network model and adding an attention mechanism module into the model;

the preprocessing module is used for preprocessing the image to be recognized to obtain a standard image;

the training module is used for inputting the standard image into a built model for training;

the prediction module is used for predicting the age of the retina through the trained model;

and the visualization module is used for visualizing the attention mechanism and obtaining the region with the highest attention degree of the model.

According to a third aspect of embodiments of the present application, there is provided a computer apparatus comprising: a memory for storing a computer program; a processor for executing the computer program in the memory to implement the operational steps of the method according to any of the above embodiments.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the operational steps of the method according to any one of the above embodiments.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the scheme of the application adopts the neural network model to identify the retinal age of the detected person, can process big data, and overcomes the defects that the traditional method has complex steps and can not process big data; moreover, the network can automatically extract the characteristics without manually selecting the characteristics; an attention mechanism is added into the network, so that the certainty rate of omentum age prediction is improved; finally, the result is visualized, and the convolutional neural network can observe which part of retina area is more concerned, thereby promoting the manual identification of medical workers.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flow chart illustrating a retinal age identification method according to an exemplary embodiment.

Fig. 2 is a diagram illustrating a ResNet18 architecture, according to an exemplary embodiment.

FIG. 3 is a schematic diagram illustrating an overview of a CBAM in accordance with an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a retinal age identification method according to an exemplary embodiment. The method may comprise the steps of:

s1, building a convolutional neural network model, and adding an attention mechanism module into the model;

step S2, preprocessing the image to be recognized to obtain a standard image;

s3, inputting the standard image into a built model for training;

step S4, predicting the age of the retina through the trained model;

and step S5, visualizing the attention mechanism to obtain the region with the highest attention of the model.

According to the scheme, the neural network model is adopted to identify the retinal age of the measured person, so that big data can be processed, and the defects that the traditional method is complex in steps and cannot process big data are overcome; moreover, the network can automatically extract the characteristics without manually selecting the characteristics; an attention mechanism is added into the network, so that the certainty rate of omentum age prediction is improved; finally, the result is visualized, and the convolutional neural network can observe which part of retina area is more concerned, thereby promoting the manual identification of medical workers.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

1. In step S2, the image to be recognized is an OCT image. Firstly, OCT images are preprocessed, and the whole preprocessing process comprises the following steps: and performing contrast optimization (brightness transformation) and denoising on the image to be recognized, and then cutting the processed image into an image with a fixed size. The step of luminance transformation comprises: the brightness of the image is improved. The denoising processing step comprises: and denoising by adopting a three-dimensional block matching algorithm.

The reason for improving the contrast of the image is that the inventor finds out through research in the practical process that: for OCT images of ocular structures, the contrast of the feature areas is higher and brighter than the extraneous areas in the image; therefore, under the condition of brightness improvement, the effect of improving the contrast of the characteristic region can be effectively achieved. And then, the BM3D image denoising method is adopted to properly reduce the influence of particle noise on the image. The three-dimensional Block Matching algorithm (Block Matching 3D, abbreviation: BM3D) is a good image denoising algorithm. And integrating a plurality of similar blocks into a three-dimensional matrix by matching with adjacent image blocks, carrying out filtering processing in a three-dimensional space, and then fusing result inverse transformation to two dimensions to form a denoised image. The algorithm has obvious denoising effect and can obtain the highest peak signal-to-noise ratio.

The characteristic region which appears in the image and is related to the age can be effectively presented through the image preprocessing operation. Because the difference between the gray values of the pixels in the characteristic region and the pixels in the non-characteristic region is large, all image data can be cut into images with the size of 224 multiplied by 224 according to the gray value of the pixel value, and the characteristic region is reserved to the maximum extent so as to facilitate the training of a neural network.

2. And (3) constructing a deep learning convolutional neural network model, wherein the model adopts ResNet 18.

In some embodiments, step S1 builds a convolutional neural network model, including: adopting a ResNet18 network as a model; the convolution attention module is added to the last convolutional layer of the ResNet18 network.

Fig. 2 is a diagram of a network structure of the ResNet 18. ResNet18 consists of 17 convolutional layers and one fully connected layer. The preprocessed image data is input into the first layer of convolution, the convolution kernel functions are as shown in the figure, size is 7 × 7, stride is 2, then down-sampling is performed through maxpool (maximum pooling layer), and then 8 BasicBlock (basic unit) are performed. The basic block is a basic unit forming ResNet, and is divided into two types, wherein the first type is a part for realizing labeled jump connection, and the second type is a jump connection part with a dotted line label. The first structure is a residual error structure operation under the condition that the number of channels is not changed; in the second type of jump connection structure, the number of channels is changed, and thus it is made into a basic block alone. The kernel function of each convolutional layer is shown in the figure, then passes through AvgPool (average pooling layer) and finally is sorted and output by the fully connected layer using the softmax function.

It should be noted that, the deep learning convolutional neural network model may also be VGG, DesNet, etc., and during network optimization, a Grad-CAM (Gradient-weighted Class Activation Mapping) may be added for visualization.

3. An attention mechanism module is added to the model and CBAM (convolution attention module) is added to the last layer of convolution so that the pre-training parameters can be used. As shown in FIG. 3, CBAM maps F ∈ R characteristics in convolutional layers^C×H×WAs an input, since each channel of F is considered to be a feature detector, the channel is focused on what is a feature part in the input image. Then, a one-dimensional channel attention map M is deduced according to the information_C∈R^C×1×1And generating a two-dimensional space attention diagram M by using the space relation of the features_S∈R^1×H×WUnlike channel attention, where spatial attention is focused is part of the information, which is complementary to the channel attention. The whole attention process can be summarized as follows:

f is three-dimensional feature mapping through a one-dimensional channel attention map M_CAfter the treatment, it becomes one-dimensional. M is a group of_SThe same is true.

Which means multiplication by element. During the multiplication, the attention value is propagated: channel attention values propagate along the spatial dimension and vice versa. F' is a feature that is reselected after the attention module has been engaged.

4. Inputting the preprocessed pictures into the built model for training, and finally predicting the retinal age. Step S3 is to input the standard image into a built model for training, including: the hyper-parameters are set to: the size of each batch of samples is 32; when the training times are 1 st to 135 th, the learning rate is 0.1; when the training times are in the order of 136-; when the training times are 186-.

In the context of machine learning, a hyper-parameter is a parameter that is set to a value before the learning process is started, and not parameter data obtained through training. In general, the hyper-parameters need to be optimized, and a group of optimal hyper-parameters is selected for the learning machine, so as to improve the learning performance and effect. In the embodiment of the present application, the super parameter is set to BATCH _ SIZE ═ 32, and when epoch: [1-135], the learning rate is 0.1; epich: [ 136-; ecoch: [ 186-.

Batch (Batch/Batch sample): the entire training sample is divided into several batchs. Batch _ Size: size of each batch of samples. Epoch: when a complete data set passes through the neural network once and returns once, this process is called once; briefly, an Epoch is the process of training all training samples once.

Learning Rate is one method of adjusting the input weights of a neural network. If the neural network predicts correctly, the corresponding input weight will not change; otherwise, the weights of the neural network are adjusted according to a Loss Function (Loss Function). The adjusted amplitude is called Learning Rate, i.e. a ratio is increased on the basis of the adjustment.

5. Through cyclic training, the retinal age of the tested person is predicted, the attention mechanism is visualized and displayed in the form of thermodynamic diagram (such as a heatmap function carried by numpy itself), and the region with the highest attention in the characteristic region can be visually observed.

The invention provides a deep learning convolution neural network model capable of training an OCT image, which is suitable for predicting the retinal age of a tested person through the OCT image. The OCT image is used as a data set and is divided into a training set and a testing set. By utilizing image preprocessing, irrelevant information in the image is removed, noise points are reduced, then the image is input into a deep learning convolution neural network for training, and an attention mechanism is added into the network, so that the accuracy of retinal age prediction is improved. Finally, by visualizing the result, the convolutional neural network can be observed to pay more attention to which part of the retina area, thereby promoting the manual identification of doctors.

The technical scheme of the invention has the following characteristics: 1. the Resnet50 model is adopted to replace traditional machine learning methods such as random forests and SVM, so that the model can process big data, manual feature selection is not needed, and the network can automatically extract features. 2. The attention mechanism is added into the model, the accuracy of the model for identifying the retinal age is improved, and the attention mechanism is visualized, so that the region concerned by the network can be observed, and the manual resolution of a doctor can be promoted. 3. The traditional statistical analysis method is not ideal in prediction effect, the steps of processing images by a machine learning method are complex, big data cannot be processed well, the defects are overcome by the model, and the prediction accuracy is improved.

Embodiments of the present application also provide a retinal age identification apparatus including: the model building device comprises a model building module, a preprocessing module, a training module, a prediction module and a visualization module.

The model building module is used for building a convolutional neural network model, and an attention mechanism module is added into the model. The preprocessing module is used for preprocessing the image to be recognized to obtain a standard image. And the training module is used for inputting the standard image into the built model for training. And the prediction module is used for predicting the age of the retina through the trained model. The visualization module is used for visualizing the attention mechanism and obtaining the region with the highest attention degree of the model.

With regard to the apparatus in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein. The modules in the retinal age identification device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments, a computer device is also provided, the computer device comprising a processor, a memory connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a nonvolatile storage medium in which an operating system and a computer program are stored, and an internal memory. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program when executed by a processor to implement a retinal age identification method: building a convolution neural network model, and adding an attention mechanism module into the model; preprocessing an image to be recognized to obtain a standard image; inputting the standard image into a built model for training; predicting the age of the retina through the trained model; and visualizing the attention mechanism to obtain the region with the highest attention of the model.

In some embodiments, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a retinal age identification method: building a convolutional neural network model, and adding an attention mechanism module into the model; preprocessing an image to be recognized to obtain a standard image; inputting the standard image into a built model for training; predicting the age of the retina through the trained model; and visualizing the attention mechanism to obtain the region with the highest attention of the model.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A retinal age identification method, comprising:

preprocessing an image to be recognized to obtain a standard image;

inputting the standard image into a built model for training;

predicting the age of the retina through the trained model;

2. The method according to claim 1, wherein the preprocessing the image to be recognized comprises:

wherein the image to be identified is an OCT image.

3. The method of claim 2, wherein:

the luminance transforming step includes: improving the brightness of the image;

4. The method according to any one of claims 1-3, wherein the building of the convolutional neural network model comprises:

adopting a ResNet18 network as a model;

the convolution attention module is added to the last convolution layer of the ResNet18 network.

5. The method of claim 4, wherein the inputting the canonical image into a built model for training comprises:

the hyper-parameters are set to: the size of each batch of samples is 32;

when the training times are 1 st to 135 th, the learning rate is 0.1;

when the training times are in the order of 136-;

when the training times are 186-.

6. The method of claim 4, wherein the inputting the canonical image into a built model for training comprises:

mapping the features in the convolutional layer to F ∈ R^C×H×WAs an input;

deriving a one-dimensional channel attention map M from F_C∈R^C×1×1And a two-dimensional spatial attention map M_S∈R^1×H×W。

7. The method of claim 6, wherein deriving a one-dimensional channel attention map and a two-dimensional spatial attention map from F comprises:

8. a retinal age identification device, comprising:

the preprocessing module is used for preprocessing the image to be identified to obtain a standard image;

9. A computer device, comprising:

a memory for storing a computer program;

a processor for executing the computer program in the memory to carry out the operational steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the operational steps of the method of one of claims 1 to 7.