CN112545452B

CN112545452B - High myopia fundus lesion image recognition device

Info

Publication number: CN112545452B
Application number: CN202011429632.1A
Authority: CN
Inventors: 杨卫华; 李晗; 万程; 蒋沁; 曹国凡; 张�杰
Original assignee: Eye Hospital Nanjing Medical University
Current assignee: Eye Hospital Nanjing Medical University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-11-30
Anticipated expiration: 2040-12-07
Also published as: CN112545452A

Abstract

The invention discloses a high myopia fundus lesion risk prediction method based on a convolutional neural network and knowledge distillation, which comprises the following steps of: acquiring fundus images, and randomly dividing training set data and testing set data; preprocessing such as random turning, cutting, color dithering, normalization and the like are adopted for the training set, and only normalization preprocessing is carried out on the testing set; training a classification network model by using a knowledge distillation method, and respectively sending training data into a pre-trained teacher network and a student network to be trained; taking a soft tag value and a real tag value output by a teacher network as supervision information, and respectively calculating KL (Loss) and Focal Loss with predicted values output by a student network; weighting and summing the two different Loss values to serve as a final Loss function for updating parameters of the student network; the trained student network can carry out three-classification prediction of normal-low risk high myopia fundus lesions-high risk high myopia fundus lesions on the fundus image test set.

Description

High myopia fundus lesion image recognition device

Technical Field

The invention relates to a high myopia fundus lesion image recognition method based on a convolutional neural network and knowledge distillation, and belongs to the field of medical image (fundus image) processing.

Background

China is a high-myopia country, the number of patients with high myopia (more than 600 degrees of myopia) is increasing year by year, and the patients show a trend of youthfulness. High myopia may cause a number of serious complications, even blindness, with the vision impairment being permanent and irreversible, and is currently recognized as a blinding eye disease. According to the expert consensus on high myopia prevention and control, which is formulated by the ophthalmology division ocular optical group of the Chinese medical society in 2017, high myopia can be classified into low-risk simple high myopia and high-risk pathological myopia. High myopia at low risk, although high in degree, generally does not have serious ocular fundus lesions and tends to be stable after adulthood. High-risk high myopia and pathological myopia can cause irreversible and even blindness-caused fundus lesions, and the myopia degree can be continuously deepened along with the progressive progression of the disease course, along with the lifetime of a patient. In the retinal fundus image, the simple high myopia is mainly represented by a large oval optic disc with a certain inclination and arc-shaped spots around the optic disc, and is accompanied by leopard-streak-shaped fundus; compared with simple high myopia with low risk, the high myopia with high risk and pathological myopia are expressed as a series of more serious eye degenerative changes, such as visual back scleral staphyloma, macula lacquer crack, CNV, macula Fuchs' macula, macula retina choroid atrophy (including diffuse choroid retina atrophy, macula macular atrophy) and other lesions on the eyeground.

At present, the detection of high myopia in China mainly depends on diopter detection, eye axis measurement, fundus color photography, optical coherence tomography examination and other auxiliary examination methods for comprehensive evaluation, and an experienced expert can combine various examinations to predict the limited risk of high myopia fundus lesions. However, the artificial high myopia risk prediction is time-consuming and labor-consuming, has low accuracy and is difficult to popularize and implement. For areas where medical resources are relatively scarce, periodic follow-up monitoring is more difficult for highly myopic patients, and shortages of ophthalmologists and medical equipment may cause highly myopic patients to miss the window of optimal prevention and treatment. The artificial intelligent auxiliary diagnosis and treatment technology can automatically predict the risk of high myopia fundus lesions by utilizing simple and easily obtained fundus images under the condition that no qualified ophthalmologist and a professional high myopia detection instrument exist, can be implemented in a glasses shop, a physical examination center or primary hospitals, can realize regular high myopia fundus monitoring and risk prediction through simple examination means, can realize early discovery, early intervention and early treatment, and has very important significance for myopia prevention and control.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to overcome the defects of the prior art, and provides a method for acquiring relevant characteristics in a retinal fundus image, which can automatically predict the risk of high-myopia fundus lesions and has high precision and high speed.

The technical scheme is as follows: the invention discloses a method for predicting the risk of highly myopic fundus lesions by distillation based on a convolutional neural network and knowledge, which comprises the following steps:

acquiring fundus images, and randomly dividing training set data and testing set data;

preprocessing a training set to achieve the purpose of data amplification, and performing normalization preprocessing on a test set;

training a classification network model by using a knowledge distillation method, and respectively sending training data into a teacher network and a student network to be trained which are pre-trained on ImageNet;

taking the soft tag value and the real tag value output by the teacher network as supervision information, and respectively calculating KL (Loss) and Focal Loss with the predicted values output by the student network;

weighting and summing the two different Loss values to serve as a final Loss function for updating parameters of the student network;

and the trained student network is used as a classification network model to perform three-classification prediction of normal-low risk high myopia fundus lesions-high risk high myopia fundus lesions on the fundus image test set, so as to complete the risk prediction of the high myopia fundus lesions.

Optionally, the pre-processing comprises: and transforming the training set eye fundus image, scaling the eye fundus image to be uniform in size, rotating in a random direction, modifying brightness, contrast and saturation to realize color disturbance, and standardizing the training set eye fundus image after data amplification.

Optionally, a normalization pre-process of scaling and normalizing the test set fundus images is also included.

Optionally, the training of the classification network model by using the knowledge distillation method comprises: a teacher's network, which is an integrated model of many deep convolutional neural networks, is trained on the database ImageNet.

Optionally, the student network is a convolutional neural network consisting essentially of convolutional modules.

Optionally, the knowledge contained by the teacher network is transferred to the student network by knowledge distillation.

Optionally, the specific operation of calculating KL and Focal locations includes: introducing the output result of the teacher network as additional supervision information, taking KLLoss as a Loss function, and training the student network to enable the output predicted value to be close to the soft label value (soft labels) output by the teacher network.

Optionally, the specific operation of updating the parameters of the student network includes: using a Loss function obtained by weighting and summing the KL local and the Focal local as a network updating index, wherein each iteration only needs to update parameters of the student network, and the Loss function obtained by weighting and summing is as follows:

wherein, alpha is a weighting coefficient, q is a soft label value output by a teacher network, p is a prediction label value output by a student network, y is a real label value input into a fundus image, and KL and FL respectively represent two different Loss calculation functions of KL Loss and Focal Loss.

Optionally, the classification of fundus images according to the present invention specifically includes: classifying fundus images without obvious pathological symptoms into a 'normal' category; fundus images with simple high myopia characteristics such as a large oval optic disc with a certain inclination, leopard-streak fundus images and the like are classified into a category of low-risk high myopia fundus lesions; if a series of more serious eye degenerative changes appear in the fundus image, such as pathologic myopic fundus visible posterior scleral staphyloma, macular area lacquer crack, CNV, macular area Fuchs' spot, macular area retina choroidal atrophy (including diffuse choroidal retina atrophy, macular choroidal retina atrophy and macular atrophy) and other lesions, the fundus image is classified as a high-risk high-myopia fundus lesion.

The invention also proposes an electronic device comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for predicting risk of highly myopic ocular fundus lesions based on convolutional neural networks and knowledge distillation as set forth in the present invention.

The invention also provides a computer readable storage medium storing a computer program, which when executed by a processor implements the method for predicting the risk of the high myopia fundus lesion based on the convolutional neural network and knowledge distillation.

Further, the air conditioner is provided with a fan,

has the advantages that: the invention discloses a high myopia fundus lesion risk prediction method based on a convolutional neural network and knowledge distillation, and designs and trains a three-classification model based on a deep learning method. At the input end of the image, different preprocessing methods are adopted for training set data and test set data which are divided randomly, so that the aims of increasing data during training and improving prediction speed during testing are fulfilled. When a classification model based on a convolutional neural network is trained, a knowledge distillation technology is adopted, and a weighted loss function is designed to update model parameters of a student network, so that the lightweight student network keeps the classification performance close to a complex network, and the purpose of model compression is achieved. The trained classification network model has strong feature extraction and detection classification capabilities, the parameter quantity and required computing resources are smaller, and the application requirements in actual life are met better. The invention combines medical theory knowledge with a deep learning method, automatically classifies the input fundus images into three grades of normal fundus, low-risk high-myopia fundus and high-risk high-myopia fundus according to the pathological change characteristics of the high-myopia fundus, has high precision and high speed, can provide strong auxiliary technical support in actual life, and quickly and effectively helps doctors to analyze diseases.

Drawings

FIG. 1 is a flow chart illustrating a prediction method according to an embodiment of the present invention.

FIG. 2 is a schematic flow chart of training a classification network model by using a knowledge distillation method according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of fundus images of three categories in the prediction of risk of highly myopic fundus lesions in accordance with one embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a classification network model according to an embodiment of the present invention.

FIG. 5 is a schematic structural diagram of a single convolution module in the classification network model according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

According to the high myopia fundus lesion risk prediction method based on the convolutional neural network and knowledge distillation, different preprocessing methods are adopted for training set data and test set data which are divided randomly, so that data amplification during training and speed improvement during testing are facilitated; then, a knowledge distillation technology is adopted, a weighted loss function is designed, and model compression is realized while strong feature extraction and detection classification capability of a classification network model are maintained; and finally, predicting by using the trained classification network model, and outputting a risk prediction result of the high myopia fundus lesion. The specific process and the training method are shown in fig. 1 and fig. 2, respectively.

The method and technical effects of the present invention are described below by way of specific examples.

The method comprises the following steps: a data set is selected. The present invention uses a local fundus image dataset as a training and testing fundus image dataset, which has 1200 left and right eye images of different subjects with image resolutions varying from 1620 × 1444 to 2124 × 2056. Marking three types according to the pathological changes by a professional ophthalmologist, wherein 500 normal fundus images are healthy; 150 fundus images with low risk and high myopia are mainly represented by a large and inclined oval optic disc and arc-shaped spots around the optic disc, and are possibly accompanied with leopard-shaped fundus; 550 fundus images of high-risk high myopia, also called pathological myopia, are manifested as lesions such as posterior scleral staphyloma, macular lacquer crack, CNV, macular Fuchs' spot, macular retinal choroidal atrophy (including diffuse choroidal retinal atrophy, macular atrophy) and the like. Training sets and test sets were randomly sorted out at a 3:1 ratio, i.e., 900 images were used as training data and the remaining 300 were used as test data. Three types of fundus images are shown in fig. 3.

Step two: and (5) image preprocessing. In one aspect, input training set images are scaled and randomly cropped to normalize to a uniform size, such as 512 x 512, and then subjected to a series of data augmentation methods, including: horizontal flipping with about 50% probability, vertical flipping with about 50% probability, rotation by about 20 ° in random direction, adjustment of brightness, contrast, saturation and hue (adjustment factor set to 0.05), and finally normalization, setting the mean and standard deviation on the three channels RGB to [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225], respectively. On the other hand, in order to improve the prediction speed in the test stage and ensure the real-time performance, the images of the test set are uniformly scaled to 512 × 512, and are standardized, and the parameter setting is the same as that of the training set method.

Step three: and building a convolutional neural network model. The structure of the classification network model of the present invention is shown in fig. 4. The classification network model is composed of a convolution layer at the head, a plurality of stacked convolution module groups and a convolution-pooling-full connection layer at the tail. The structure of a single convolution module is shown in fig. 5. In a single convolution module, a convolution layer in the middle of a general residual structure is replaced by deep convolution (Depth-wise Conv), and a ReLU activation function is replaced by Swish activation, and the calculation formula of the Swish function is as follows:

where x is the input to the activation function and β is the tuning parameter. The Swish function can be viewed as a smooth function between the linear function and the ReLU function. And contrary to the residual structure, the single convolution module firstly increases the dimension and then decreases the dimension for input, and in addition, an SE channel is added as an attention mechanism to improve the classification performance.

Step four: the model was trained using a knowledge distillation method. And respectively sending the training data into a teacher network and a student network to be trained, which are pre-trained on ImageNet, wherein the teacher network is an integrated model of a plurality of deep convolutional neural networks, and the student network is a lightweight model built in the third step. Taking KLLoss as a Loss function, training a student network to enable the output prediction value to be close to the soft label value (soft labels) output by the teacher network; while taking Focal local as a Loss function, the predicted values of the trained student network are close to the true labels (true labels) of the input fundus images. And (3) using a Loss function obtained by weighting and summing the KL local and the Focal local as a network updating index, and only updating the parameters of the student network in each iteration until a satisfactory classification effect is achieved. After the test images are input, the trained classification network model can automatically predict three categories of normal-low risk high myopia fundus lesions-high risk high myopia fundus lesions. The specific parameter settings for the training process are shown in table 1.

TABLE 1

Experimental hardware: the central processing unit is 3.60GHz Intel (R) core (TM) i7-7700, the graphics processing unit is NVIDA GeForce RTX 2080 Ti, and the video memory is 16 GB. Experimental software: the operating system is Ubuntu 14.04LTS, and the programming language version is Python 3.6.5.

In the invention, a Knowledge distillation (Knowledge distillation) method in machine learning is used for training a classification model, wherein the Knowledge distillation is a machine learning technology for model compression and transfer learning, and the technology can guide the training of a lightweight model by using a trained complex network model or a plurality of network models. During the knowledge distillation, a generalized softmax function was introduced:

where z is an input feature vector, z_iAnd z_jIs an element thereof, T is a hyper-parameter for regulation, q_iIs z_iAnd (4) corresponding output after the generalized softmax function conversion. When a new model is trained, the softmax output distribution of the new model is more uniform by using a higher T, and the normal temperature T =1 is recovered in the test stage, so that the knowledge in the original model is extracted. Compared with most methods for directly training the convolutional neural network, the knowledge distillation technology adopted by the method can effectively reduce the parameters of the model and required computing resources, simultaneously maintains the accuracy of the original complex model as much as possible, and better meets the application requirements in actual life. The experimental verification results are shown in table 2.

TABLE 2

	Parameters	Total memory	Prediction accuracy
				Distillation without knowledge	25.6 (Millions)	573 (MB)	96.7%
Distillation using knowledge	5.3 (Millions)	279 (MB)	95.3%

Wherein Parameters refers to the Total parameter number of the model, and Total memory refers to the amount of memory required by the model, and can be used for measuring the complexity of the model; the prediction accuracy refers to the overall classification accuracy of the model on three levels of normal eyeground, low-risk high-myopia eyeground and high-risk high-myopia on the verification set data, and can be used for measuring the accuracy of the model.

The invention also relates to an electronic device comprising: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.

The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the method for predicting risk of ocular fundus lesions with high myopia based on convolutional neural network and knowledge distillation.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform high myopia fundus lesion image recognition based on convolutional neural network and knowledge distillation; the identification of the high myopia fundus lesion images based on the convolutional neural network and knowledge distillation comprises the following steps:

and the trained student network is used as a classification network model to identify the eye fundus image test set.

2. The electronic device of claim 1, wherein the pre-processing comprises: and transforming the training set eye fundus image, scaling the eye fundus image to be uniform in size, rotating in a random direction, modifying brightness, contrast and saturation to realize color disturbance, and standardizing the training set eye fundus image after data amplification.

3. The electronic device of claim 1 or 2, further comprising a normalization pre-process to scale and normalize the test set fundus image.

4. The electronic device of claim 1, wherein: the training of the classification network model by using the knowledge distillation method comprises the following steps: a teacher network, which is an integrated model of a plurality of deep convolutional neural networks, is trained on a database ImageNet.

5. The electronic device of claim 1 or 4, wherein: the student network is a convolution neural network mainly composed of convolution modules.

6. The electronic device of claim 5, wherein: through knowledge distillation, the knowledge contained in the teacher network is transferred to the student network.

7. The electronic device of claim 1, wherein: the specific operation of calculating the KL local and the Focal local comprises the following steps: introducing the output result of the teacher network as additional supervision information, taking KLLoss as a Loss function, and training the student network to enable the output predicted value to be close to the soft label value (soft labels) output by the teacher network.

8. The electronic device of claim 1, wherein: the specific operation of the parameter updating of the student network comprises the following steps: using a Loss function obtained by weighting and summing the KL local and the Focal local as a network updating index, wherein each iteration only needs to update parameters of the student network, and the Loss function obtained by weighting and summing is as follows: