CN114926460A - Training method of fundus image classification model, and fundus image classification method and system - Google Patents

Training method of fundus image classification model, and fundus image classification method and system Download PDF

Info

Publication number
CN114926460A
CN114926460A CN202210845237.4A CN202210845237A CN114926460A CN 114926460 A CN114926460 A CN 114926460A CN 202210845237 A CN202210845237 A CN 202210845237A CN 114926460 A CN114926460 A CN 114926460A
Authority
CN
China
Prior art keywords
fundus image
classification
training
matrix
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210845237.4A
Other languages
Chinese (zh)
Other versions
CN114926460B (en
Inventor
谷宗运
赵士博
韩啸
李传富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority to CN202210845237.4A priority Critical patent/CN114926460B/en
Publication of CN114926460A publication Critical patent/CN114926460A/en
Application granted granted Critical
Publication of CN114926460B publication Critical patent/CN114926460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention belongs to the technical field of intelligent medical diagnosis, and particularly relates to a training method of an eyeground image classification model, a pathological change image classification method and a system.

Description

Training method of fundus image classification model, and fundus image classification method and system
Technical Field
The invention belongs to the technical field of intelligent medical diagnosis, and particularly relates to a training method of an eye fundus image classification model, an eye fundus image classification method and an eye fundus image classification system.
Background
Diabetic retinopathy, which is an ocular complication caused by diabetes mellitus, is a leading cause of impaired vision and even blindness, and has become a major medical problem worldwide. Diabetic retinopathy classification refers to classifying a set of retinal fundus images according to the severity of a patient with diabetic retinopathy.
The high brightness area of the fundus image at the blood vessel convergence part is the optic disc, and the deep dark part at the other side is the fovea centralis. Mild diabetic retinopathy appears as deep red spots, i.e., small bleeding spots or very small red-spot microaneurysms. Moderate diabetic retinopathy is characterized by a few yellow lesions added on top of small red spots, and a few yellowish-white punctate hard exudates. Severe diabetic retinopathy adds a white, soft exudate like a cotton ball, in various shapes, from small to large plaques, on top of the red and yellow foci. The greater the number of such lesions, the more severe the diabetic retinopathy condition. Proliferative diabetic retinopathy implies that new retinal blood vessels have formed in or around the optic disc. It can cause vitreous hemorrhage, retinal hemorrhage, and severe retinal detachment. These variations are reflected on the fundus image set.
DR is classified into five grades according to the international classification standard for diabetic retinopathy. Respectively, grade 0 (non-diabetic retinopathy), grade 1 (mild non-proliferative diabetic retinopathy), grade 2 (moderate non-proliferative diabetic retinopathy), grade 3 (severe non-proliferative diabetic retinopathy) and grade 4 (proliferative diabetic retinopathy).
To date, there is no effective treatment to completely cure the disease. Research shows that early diagnosis and timely treatment of diabetic retinopathy help to prevent blindness. This goal may be achieved by periodically screening the project. Therefore, many national health agencies are promoting diabetic retinopathy screening, which has been proven to be effective in reducing the blindness rate caused by diabetic retinopathy.
The digital color fundus image set is the most widely used imaging modality for ophthalmologists to screen and identify the severity of diabetic retinopathy, and it can show the severity of the pathology. However, for many underdeveloped countries, diabetic retinopathy screening is a heavy burden due to the very shortage of ophthalmologists. The intelligent medical image-assisted diagnosis can greatly improve the screening and identifying efficiency of fundus diseases, becomes an effective way for solving the problem, and is a trend for intelligently diagnosing the severity of diabetic retinopathy.
The defects and shortcomings of the prior art are as follows:
the proportion difference of the sample cases of the diabetic retinopathy of different degrees is large, so that the distribution of samples of different levels in the diabetic retinopathy image data set is uneven, and in addition, the characteristic is difficult to extract due to uneven illumination of the shooting environment;
the resolution of the retina image is very high, the problems of gradient disappearance, explosion and the like can occur when the retina image is directly trained by a neural network, in addition, the calculation cost is high, and the model training is difficult;
the classification precision of the fundus lesion images is finer than other image categories, and due to the excessively tiny lesion points, the differences among the categories are very fine and difficult to distinguish, so that the classification accuracy of the model is low.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a method for training a fundus image classification model, a method and a system for classifying fundus images, so as to solve the problem of low prediction accuracy of the existing classification model.
In order to achieve the above objects and other related objects, the present invention provides a method for training a fundus image classification model, comprising the steps of:
acquiring a fundus image sample set, wherein the fundus image sample set comprises a fundus image set and a classification label set corresponding to the fundus image set;
performing feature extraction on the fundus image set by using a Transformer encoder to obtain a feature matrix;
classifying the feature matrix by using a residual attention mechanism to obtain a classification result;
calculating a loss value according to the classification result and the classification label set;
and adjusting parameters of the fundus image classification model until the loss value is smaller than a preset threshold value, and finishing training of the fundus image classification model.
In an optional embodiment of the present invention, the step of performing feature extraction on the fundus image set by using a Transformer encoder to obtain a feature matrix includes:
assuming that the dimension of an input fundus image is H × W × C, where H denotes the height of the image, W denotes the width of the image, and C denotes the number of channels of the image; dividing the fundus image into N image blocks, wherein the dimension of each image block is M = P multiplied by C, and the number of N is
Figure 691284DEST_PATH_IMAGE001
Rearranging the image block sequences into N multiplied by M matrixes, and linearly mapping each matrix to a specified dimension D; adding a special character cls to the matrix, wherein the dimension of the matrix is (N + 1) xM;
adding the position information of each image block to the matrix, keeping the dimension of the matrix unchanged, and obtaining an input item;
and inputting the input items into a Transformer encoder to obtain the feature matrix.
In an optional embodiment of the present invention, the step of classifying the feature matrix by using a residual attention mechanism to obtain a classification result includes:
and performing linear classification processing on the extracted feature matrix by using a residual attention mechanism and a Softmax classifier to obtain a classification result.
In an optional embodiment of the present invention, the step of performing linear classification processing on the extracted feature matrix using a residual attention mechanism and a Softmax classifier includes:
calculating class specific residual attention scores corresponding to each classification result according to the feature matrix;
calculating class specific residual attention corresponding to each classification result according to the class specific residual attention scores;
calculating single-head attention logic output corresponding to each classification result according to the class specific residual attention;
and calculating multi-head attention logic output corresponding to each classification result according to the single-head attention logic output, and taking the multi-head attention logic output as the prediction probability of the fundus image set corresponding to each classification result.
In an optional embodiment of the invention, the step of calculating a loss value according to the classification result and the classification label set comprises:
and bringing the classification result and the classification mark into a cross entropy function, and calculating the loss value.
In an optional embodiment of the present invention, the fundus image sample set is a diabetic retinopathy fundus image sample set, and the classification label set is a diabetic retinopathy grade.
To achieve the above and other related objects, the present invention further provides a fundus image classifying method, comprising the steps of:
acquiring a fundus image to be diagnosed;
and inputting the fundus image to be diagnosed into the fundus image classification model, obtaining a predicted value of the probability of the fundus image to be diagnosed under each classification label, and taking the classification label corresponding to the predicted value as the lesion grade of the fundus image to be diagnosed.
To achieve the above and other related objects, the present invention further provides a system for training a classification model of a fundus image, comprising:
the system comprises a sample acquisition module, a classification module and a classification module, wherein the sample acquisition module is used for acquiring a fundus image sample set, and the fundus image sample set comprises a fundus image set and a classification label set corresponding to the fundus image set;
the characteristic extraction module is used for extracting characteristics of the fundus image set by using a Transformer encoder to obtain a characteristic matrix;
the classification module is used for classifying the characteristic matrix by utilizing a residual error attention mechanism to obtain a classification result;
the loss calculation module is used for calculating a loss value according to the classification result and the classification label set;
and the control training module is used for adjusting the parameters of the fundus image classification model until the loss value is smaller than a preset threshold value so as to finish the training of the fundus image classification model.
To achieve the above and other related objects, the present invention further provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of the method when executing the computer program.
To achieve the above and other related objects, the present invention also provides a computer-readable storage medium on which a computer program is stored, the computer program implementing the steps of the method when executed by a processor.
The invention has the technical effects that:
the method uses the transform coder to extract the image characteristics, the transform coder has a multi-head attention mechanism, the weight of key information in the operation process can be strengthened, meanwhile, the extracted characteristics are classified by using a residual attention mechanism, the key information is further strengthened, and therefore the grading effect of the diabetic retinopathy fundus image is improved.
Drawings
Fig. 1 is a block diagram of a structure of a fundus image classification model provided by an embodiment of the present invention;
FIG. 2 is a flow chart of a method for training a classification model of fundus images according to an embodiment of the present invention;
FIG. 3 is a flow chart of a feature extraction method provided by an embodiment of the invention;
FIG. 4 is a flow chart of a classification method provided by an embodiment of the present invention;
FIG. 5 is a block diagram of a fundus image classification model training system provided by an embodiment of the present invention;
FIG. 6 is a functional block diagram of a transform encoder provided by an embodiment of the present invention;
FIG. 7 is a schematic flow chart of a residual attention mechanism provided by an embodiment of the present invention;
FIG. 8 is a graph of loss value variation for a model training process provided by an embodiment of the present invention;
fig. 9 is a block diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Please refer to fig. 1-9. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Fig. 1 shows a flowchart of a preferred embodiment of the fundus image classification model training method of the present invention.
The fundus image classification model training method is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware of the electronic devices comprises but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a personal computer, a tablet computer, a smart phone, and the like.
The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The fundus image classification model training method of the present invention, which can be applied to training a classification model of a diabetic retinopathy fundus image set, for example, will be described in detail below with reference to fig. 1.
The fundus image classification model can be used for disease multi-label classification of fundus image sets such as diabetic retinopathy, glaucoma, cataract, age-related macular degeneration, hypertension, myopia and the like; screening a diabetic retinopathy fundus image set; grading the disease condition of the diabetic retinopathy fundus image set and the like.
Referring to fig. 1 and 2, a training method of a fundus image classification model includes first acquiring a data set of a diabetic retinopathy fundus image set, and then dividing the data set into a training set, a test set and a verification set. Training the training set, performing feature extraction on the fundus image set by using a feature extractor, and putting the extracted feature vectors into a classifier to predict the classification probability. And finally, judging the disease level according to the classification probability. As will be described in detail below with respect to the method implementation, the feature extractor uses a pre-trained Transformer encoder model, and the classifier uses residual attention to determine the optimal threshold obtained from the york index for the lesion threshold.
The method specifically comprises the following steps:
s1: acquiring a fundus image sample set, wherein the fundus image sample set comprises a fundus image set and a classification label set corresponding to the fundus image set; in this embodiment, the fundus image sample set is a diabetic retinopathy fundus image sample set, and the classification label set is a diabetic retinopathy grade.
In one embodiment, the hierarchical labels are represented by five numbers 0,1,2,3, 4. The international classification criteria for diabetic retinopathy are shown in table 1:
TABLE 1 diabetic retinopathy grading Standard
Grading Retinopathy of disease
No obvious diabetic retinopathy Variable (0 grade) No abnormality
Mild non-proliferative diabetic retinopathy Omentum pathological change (grade 1) Only micro-aneurysm
Moderate nonproliferative phase diabetic retinopathy Omentum pathological change (2 grade) Microaneurysms, between mild and severe
Severe nonproliferative stage diabetic retinopathy Omentum pathological change (grade 3) Either of the following changes occurs, but the nonproliferative diabetic retinal manifestations: there are more than 20 intraretinal hemorrhages in any of the 4 quadrants; over two elephants Venous beading changes are limited; ③ significant intraretinal microvascular abnormalities in more than one quadrant
Proliferative diabetic retinopathy Variable (4 grade) One or more of the following changes occur: neovascularization, vitreous hemorrhage, pre-retinal hemorrhage
In this embodiment, the fundus image sample set is randomly divided into a training set, a verification set, and a test set, which account for 60%, 20%, and 20%, respectively.
S2: and performing feature extraction on the fundus image set by using a Transformer encoder to obtain a feature matrix. The structure of the transform encoder is shown in fig. 6.
Referring to fig. 3, in an embodiment, the step S2 includes:
s21: assuming that the dimension of an input fundus image is H × W × C, where H denotes the height of the image, W denotes the width of the image, and C denotes the number of channels of the image; dividing the fundus image into N image blocks, wherein the dimension of each image block is M = P multiplied by C, and the number of N is
Figure 719282DEST_PATH_IMAGE001
(ii) a In a specific embodiment, N = 9.
S22: rearranging the image block sequences into N multiplied by M matrixes, and linearly mapping each matrix to a specified dimension D; adding a special character cls to the matrix, wherein the dimension of the matrix is (N + 1) xM;
s23: adding the position information of each image block to the matrix, keeping the dimension of the matrix unchanged, and obtaining an input item;
s24: and inputting the input items into a Transformer encoder to obtain the feature matrix.
S3: and classifying the characteristic matrix by using a residual attention mechanism of the fundus image classification model to obtain a classification result.
Referring to fig. 4 and 7, in an embodiment of the present invention, a residual attention mechanism and a Softmax classifier are used to perform a linear classification process on an extracted feature matrix to obtain a classification result, which includes the following steps:
s31: and calculating the class-specific residual attention score corresponding to each classification result according to the position feature matrix.
Specifically, characteristic matrix is extracted from fundus image set
Figure 838548DEST_PATH_IMAGE002
Wherein, in the process,
Figure 28221DEST_PATH_IMAGE003
Figure 928044DEST_PATH_IMAGE004
and
Figure 494154DEST_PATH_IMAGE005
respectively representing the dimension, height and width of the feature matrix, and in this embodiment, for example, it can be assumed
Figure 481309DEST_PATH_IMAGE006
In the range of 2048 (parts by weight),
Figure 841883DEST_PATH_IMAGE007
is 7, and
Figure 229002DEST_PATH_IMAGE005
is 7; feature matrix
Figure 598803DEST_PATH_IMAGE008
Decoupling as a set of location feature matrices
Figure 427082DEST_PATH_IMAGE009
Using a full-link layer (1 x 1 convolution) as a classifier, each class having its own full-link layer classifier, th
Figure 224137DEST_PATH_IMAGE010
Classifier corresponding to classes
Figure 832973DEST_PATH_IMAGE011
Has the parameters of
Figure 6465DEST_PATH_IMAGE012
Definition of
Figure 17146DEST_PATH_IMAGE010
Class and item II
Figure 172053DEST_PATH_IMAGE013
Class-specific residual attention score for location:
Figure 268185DEST_PATH_IMAGE014
wherein,
Figure 979789DEST_PATH_IMAGE015
is a temperature control factor, can control the sharpness of attention score, and has
Figure 844977DEST_PATH_IMAGE016
. Will be provided with
Figure 921517DEST_PATH_IMAGE017
Is regarded as the first
Figure 504945DEST_PATH_IMAGE018
Class is present in
Figure 754661DEST_PATH_IMAGE019
Probability at location.
S32: and calculating the class-specific residual attention corresponding to each classification result according to the class-specific residual attention score.
In particular, for example for
Figure 739935DEST_PATH_IMAGE010
Class, location feature vector
Figure 738109DEST_PATH_IMAGE020
The corresponding weight is
Figure 808833DEST_PATH_IMAGE017
. And carrying out weighted summation on all the position feature matrix groups and the corresponding weights thereof to obtain a feature vector with a specific class:
Figure 862240DEST_PATH_IMAGE021
since average pooling is widely used in practice and yields superior results, we are dealing with vectors
Figure 702020DEST_PATH_IMAGE022
And
Figure 182680DEST_PATH_IMAGE023
performing fusion, finally, for
Figure 412804DEST_PATH_IMAGE024
Class-specific residual attention of classes
Figure 269901DEST_PATH_IMAGE025
Comprises the following steps:
Figure 964188DEST_PATH_IMAGE026
wherein,
Figure 615749DEST_PATH_IMAGE027
is a hyper-parameter (setting)0.3), g is a global class independent feature, which constitutes the whole class specific residual attention module.
S33: and calculating the single-head attention logic output corresponding to each classification result according to the class specific residual attention.
Specifically, will be
Figure 848016DEST_PATH_IMAGE028
Class-specific residual attention of classes
Figure 243225DEST_PATH_IMAGE025
Classifiers corresponding to the classes
Figure 792018DEST_PATH_IMAGE029
Taking the dot product to obtain the final logic output:
Figure 880060DEST_PATH_IMAGE030
wherein,
Figure 147093DEST_PATH_IMAGE031
is the number of classification categories.
S34: and calculating multi-head attention logic output corresponding to each classification result according to the single-head attention logic output, and taking the multi-head attention logic output as the prediction probability of the fundus image set corresponding to each classification result.
In particular, each head of the multi-head attention mechanism uses different temperature hyper-parameters
Figure 283677DEST_PATH_IMAGE032
. Let us consider the number of attention heads as
Figure 686976DEST_PATH_IMAGE033
The following rules are defined:
Figure 945919DEST_PATH_IMAGE034
Figure 700249DEST_PATH_IMAGE035
after introducing the multi-head attention mechanism, the logic output of each head can be obtained, which is respectively:
Figure 122746DEST_PATH_IMAGE036
. Note that each header here is a class-specific residual attention module. Each head corresponds to a temperature hyper-parameter which is set by us
Figure 646131DEST_PATH_IMAGE037
. We sum the logic outputs of each head directly to get the final multi-head attention logic output:
Figure 75976DEST_PATH_IMAGE038
s4: calculating a loss value according to the classification result and the classification label set; in a specific embodiment, the classification result and the classification index are brought into a cross entropy function, and the loss value is calculated.
In particular, loss value
Figure 52022DEST_PATH_IMAGE039
Wherein N is the lumped sample number of fundus images in the training set,
Figure 592725DEST_PATH_IMAGE040
is a first
Figure 173879DEST_PATH_IMAGE041
A disease label for opening the fundus oculi image,
Figure 509045DEST_PATH_IMAGE042
predict for the model
Figure 237967DEST_PATH_IMAGE043
Probability of opening fundus image disease label.
S5: adjusting the fundus image classification modelUntil the loss value is smaller than a preset threshold value, finishing the training of the fundus image classification model. Specifically, will
Figure 316781DEST_PATH_IMAGE044
And (4) performing back propagation for updating the model parameters, repeating the steps S2-S4, continuously performing iterative training, taking the corresponding verification set as an evaluation data set according to the training result until the loss value reaches a specified threshold value, and storing the corresponding model parameters of the final loss value.
To evaluate the performance of the model, the present invention trains and tests the model on both the DDR and IDRiD data sets. FIG. 8 is a loss curve of the inventive model training process on a DDR data set. As can be seen from FIG. 8, the Loss value decreases with the increase of the training times, the Loss values on the test set and the verification set decrease significantly at the first 6 epochs, the Loss values on the test set do not decrease any more at the 7 th epoch and after the 9 th epoch, and the model training tends to be saturated. From the variation of the loss function, it can be seen that the training loss and the testing loss are not very different, which indicates that no overfitting phenomenon occurs in our model.
DDR data sets are provided by ophthalmic disease intelligent recognition (ODIR-2019) for lesion segmentation and lesion detection. The data set included 13673 fundus image sets from 147 hospitals in 23 provinces in china. For the classification task, there has been provided on DDR a partition of training, validation and test sets, of which 6835 are used for training, 2733 are used for validation and the rest 4105 are used for testing.
IDRiD is an Indian diabetic retinopathy image dataset. It is a data set consisting of typical DR and normal retinal structures, containing 413 training images and 103 test images. The test set is divided into five classes according to the rules, and the test set is used as a verification set to evaluate the experimental result. The data set provides information for each image of disease severity of DR and diabetic macular edema. This makes it very suitable for developing and evaluating image analysis algorithms for early detection of diabetic retinopathy.
In order to verify the performance of the model, the invention also uses sensitivity, specificity and accuracy as evaluation indexes to evaluate the prediction result of the model. The sensitivity is the probability that the DR image of the lesion is not missed to be judged as negative, and the higher the value of the sensitivity is, the stronger the abnormality discovering capability of the model is. The sensitivity calculation is shown in equation 1.
Figure 1709DEST_PATH_IMAGE045
Specificity is the probability that a normal DR image is not erroneously judged to be positive. Higher values represent a greater ability of the model to recognize normality. The calculation formula of the specificity is shown in formula 2.
Figure 507777DEST_PATH_IMAGE046
Accuracy represents the correct proportion of the model classification, with higher values representing more accurate model classifications. The accuracy calculation is shown in equation 3.
Figure 723995DEST_PATH_IMAGE047
Wherein tp (true positive) represents the positive samples predicted as positive class by the model, tn (true negative) represents the negative samples predicted as negative class by the model, fp (false positive) represents the negative samples predicted as positive class by the model, and fn (false negative) represents the positive samples predicted as negative class by the model.
Results if shown in table 2:
TABLE 2 Performance parameters of DDR and IDRiD datasets
Figure 606500DEST_PATH_IMAGE048
Based on the fundus image classification model, the invention also provides a diabetic retinopathy fundus image classification method, which comprises the following steps: acquiring a fundus image to be diagnosed; and inputting the fundus image to be diagnosed into the fundus image classification model, obtaining a predicted value of the probability of the fundus image to be diagnosed under each classification label, and taking the classification label corresponding to the predicted value as the lesion grade of the fundus image to be diagnosed.
According to the method, the transform encoder is used for extracting image features, the transform encoder is provided with a multi-head attention mechanism, the weight of key information in the operation process can be strengthened, meanwhile, the extracted features are classified by using a residual attention mechanism, the key information is further strengthened, and therefore the grading effect of the diabetic retinopathy fundus image set is improved.
It should be noted that, the steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, and as long as the steps contain the same logical relationship, the steps are within the scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
Fig. 5 is a functional block diagram of a fundus image classification model training system according to a preferred embodiment of the present invention. The fundus image classification model training system comprises: a sample acquisition module 10, a feature extraction module 20, a classification module 30, a loss calculation module 40, and a control training module 50. The module referred to in the present invention refers to a series of computer program segments that can be executed by the processor 100 and can perform a fixed function, and is stored in the memory 200.
The sample acquisition module 10 is configured to acquire a fundus image sample set, where the fundus image sample set includes a fundus image set and a classification label set corresponding to the fundus image set; the feature extraction module 20 is configured to perform feature extraction on the fundus image set by using a Transformer encoder to obtain a feature matrix; the classification module 30 is configured to classify the feature matrix by using a residual attention mechanism to obtain a classification result; the loss calculating module 40 is configured to calculate a loss value according to the classification result and the classification label set; the control training module 50 updates the model parameters according to the loss value until the loss value is smaller than a preset threshold value.
It should be noted that the fundus image classification model training system of this embodiment is a device corresponding to the above-mentioned fundus image classification model training method, and the functional modules in the fundus image classification model training system or the functional modules in the fundus image classification model training system respectively correspond to the corresponding steps in the fundus image classification model training method. The fundus image classification model training system of the embodiment can be implemented by being matched with the fundus image classification model training method. Accordingly, the related-art details mentioned in the fundus image classification model training system of the present embodiment can also be applied to the above-described fundus image classification model training method.
It should be noted that, when the above functional modules are actually implemented, all or part of the functional modules may be integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In the implementation process, part or all of the steps of the method or each functional module above may be implemented by an integrated logic circuit of hardware in a processor element or instructions in the form of software.
Fig. 9 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a method for training a fundus image classification model.
The electronic device may include a memory 200, a processor 100, and a bus, and may further include a computer program, such as a fundus image classification model training program, stored in the memory and executable on the processor.
Wherein the memory includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory may also be an external storage device of the electronic device in other embodiments, such as a plug-in removable hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory may also include both an internal storage unit and an external storage device of the electronic device. The memory may be used not only to store application software installed in the electronic device and various types of data such as a code of a fundus image classification model training program, but also to temporarily store data that has been output or is to be output.
A processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device by using various interfaces and lines, executes various functions of the electronic device and processes data by running or executing programs or modules stored in the memory (for example, executing a fundus image classification model training program and the like), and calls data stored in the memory.
The processor executes an operating system of the electronic device and various installed application programs. The processor executes the application program to implement the steps in the above-mentioned embodiments of the fundus image classification model training method, for example, the steps shown in the figure.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device. For example, the computer program may be segmented into a sample acquisition module 10, a feature extraction module 20, a classification module 30, a loss calculation module 40, and a control training module 50.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a Processor (Processor) to execute partial functions of the fundus image classification model training method according to various embodiments of the present invention.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connected communication between the memory and at least one processor or the like.
In summary, the invention uses the transform encoder to extract image features, the transform encoder has a multi-head attention mechanism, so that the weight of key information in the operation process can be strengthened, and meanwhile, the extracted features are classified by using a residual attention mechanism, so that the key information is further strengthened, and the classification effect of the diabetic retinopathy fundus image set is improved.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (10)

1. A training method of a fundus image classification model is characterized by comprising the following steps:
acquiring a fundus image sample set, wherein the fundus image sample set comprises a fundus image set and a classification label set corresponding to the fundus image set;
performing feature extraction on the fundus image set by using a Transformer encoder to obtain a feature matrix;
classifying the feature matrix by using a residual attention mechanism to obtain a classification result;
calculating a loss value according to the classification result and the classification label set;
and adjusting parameters of the fundus image classification model until the loss value is smaller than a preset threshold value, and finishing training of the fundus image classification model.
2. A method for training a fundus image classification model according to claim 1, wherein said step of performing feature extraction on said fundus image set by using a fransformer encoder, and obtaining a feature matrix comprises:
assuming that the dimension of an input fundus image is H × W × C, where H denotes the height of the image, W denotes the width of the image, and C denotes the number of channels of the image; dividing the fundus image into N image blocks, each image block having dimensions M = P × P × C, where N is the number
Figure 887827DEST_PATH_IMAGE001
Rearranging the image block sequences into N multiplied by M matrixes, and linearly mapping each matrix to a specified dimension D; adding a special character to the matrix, wherein the dimension of the matrix is (N + 1) multiplied by M;
adding the position information of each image block to the matrix, keeping the dimension of the matrix unchanged, and obtaining an input item;
and inputting the input items into a Transformer encoder to obtain the feature matrix.
3. The training method of the fundus image classification model according to claim 1, wherein said step of classifying said feature matrix using a residual attention mechanism to obtain a classification result comprises:
and performing linear classification processing on the extracted feature matrix by using a residual attention mechanism and a Softmax classifier to obtain a classification result.
4. A training method of a fundus image classification model according to claim 3, wherein said step of linear classification processing of the extracted feature matrix using a residual attention mechanism and a Softmax classifier comprises:
calculating the class specific residual attention score corresponding to each classification result according to the feature matrix;
calculating the class specific residual attention corresponding to each classification result according to the class specific residual attention score;
calculating single-head attention logic output corresponding to each classification result according to the class specific residual attention;
and calculating the multi-head attention logic output corresponding to each classification result according to the single-head attention logic output, and taking the multi-head attention logic output as the prediction probability of each classification result corresponding to the fundus image set.
5. A training method for a fundus image classification model according to claim 1, wherein said step of calculating a loss value based on said classification result and said classification label set comprises:
and bringing the classification result and the classification mark into a cross entropy function, and calculating the loss value.
6. A training method for a fundus image classification model according to claim 1, wherein said fundus image sample set is a diabetic retinopathy fundus image sample set, and said classification label is a diabetic retinopathy grade.
7. A fundus image classification method is characterized by comprising the following steps:
acquiring a fundus image to be diagnosed;
inputting the fundus image to be diagnosed into the fundus image classification model according to any one of claims 1 to 6, obtaining a predicted value of the probability of the fundus image to be diagnosed under each classification label, and taking the classification label corresponding to the predicted value as the lesion grade of the fundus image to be diagnosed.
8. A system for training a fundus image classification model, comprising:
the system comprises a sample acquisition module, a classification module and a classification module, wherein the sample acquisition module is used for acquiring a fundus image sample set, and the fundus image sample set comprises a fundus image set and a classification label set corresponding to the fundus image set;
the characteristic extraction module is used for extracting characteristics of the fundus image set by using a Transformer encoder to obtain a characteristic matrix;
the classification module is used for classifying the characteristic matrix by using a residual attention mechanism to obtain a classification result;
the loss calculation module is used for calculating a loss value according to the classification result and the classification label set;
and the control training module is used for adjusting the parameters of the fundus image classification model until the loss value is smaller than a preset threshold value so as to finish the training of the fundus image classification model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program or the processor implements the steps of the method of claim 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of one of claims 1 to 6, or which, when being executed by a processor, carries out the steps of the method of claim 7.
CN202210845237.4A 2022-07-19 2022-07-19 Training method of fundus image classification model, and fundus image classification method and system Active CN114926460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210845237.4A CN114926460B (en) 2022-07-19 2022-07-19 Training method of fundus image classification model, and fundus image classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210845237.4A CN114926460B (en) 2022-07-19 2022-07-19 Training method of fundus image classification model, and fundus image classification method and system

Publications (2)

Publication Number Publication Date
CN114926460A true CN114926460A (en) 2022-08-19
CN114926460B CN114926460B (en) 2022-10-25

Family

ID=82816120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210845237.4A Active CN114926460B (en) 2022-07-19 2022-07-19 Training method of fundus image classification model, and fundus image classification method and system

Country Status (1)

Country Link
CN (1) CN114926460B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118675219A (en) * 2024-08-23 2024-09-20 杭州聚秀科技有限公司 Method and system for detecting diabetic retinopathy lesion based on fundus image

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376767A (en) * 2018-09-20 2019-02-22 中国科学技术大学 Retina OCT image classification method based on deep learning
CN111523617A (en) * 2020-06-09 2020-08-11 天津大学 Epilepsy detection system based on white matter fusion characteristic diagram and residual error attention network
US20210312905A1 (en) * 2020-04-03 2021-10-07 Microsoft Technology Licensing, Llc Pre-Training With Alignments For Recurrent Neural Network Transducer Based End-To-End Speech Recognition
CN113887610A (en) * 2021-09-29 2022-01-04 内蒙古工业大学 Pollen image classification method based on cross attention distillation transducer
US20220004716A1 (en) * 2020-07-06 2022-01-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for training semantic representation model, device and computer storage medium
CN114048818A (en) * 2021-11-16 2022-02-15 浙江工商大学 Video classification method based on accelerated transform model
CN114266730A (en) * 2021-11-29 2022-04-01 河海大学 Disease and insect pest identification method based on attention high-order residual error network
CN114419054A (en) * 2022-01-19 2022-04-29 新疆大学 Retinal blood vessel image segmentation method and device and related equipment
CN114494222A (en) * 2022-02-09 2022-05-13 西安科技大学 Vision transducer-based rolling bearing fault intelligent identification method
CN114564991A (en) * 2022-02-28 2022-05-31 合肥工业大学 Electroencephalogram signal classification method based on Transformer guide convolution neural network
CN114882014A (en) * 2022-06-16 2022-08-09 深圳大学 Dual-model-based fundus image quality evaluation method and device and related medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376767A (en) * 2018-09-20 2019-02-22 中国科学技术大学 Retina OCT image classification method based on deep learning
US20210312905A1 (en) * 2020-04-03 2021-10-07 Microsoft Technology Licensing, Llc Pre-Training With Alignments For Recurrent Neural Network Transducer Based End-To-End Speech Recognition
CN111523617A (en) * 2020-06-09 2020-08-11 天津大学 Epilepsy detection system based on white matter fusion characteristic diagram and residual error attention network
US20220004716A1 (en) * 2020-07-06 2022-01-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for training semantic representation model, device and computer storage medium
CN113887610A (en) * 2021-09-29 2022-01-04 内蒙古工业大学 Pollen image classification method based on cross attention distillation transducer
CN114048818A (en) * 2021-11-16 2022-02-15 浙江工商大学 Video classification method based on accelerated transform model
CN114266730A (en) * 2021-11-29 2022-04-01 河海大学 Disease and insect pest identification method based on attention high-order residual error network
CN114419054A (en) * 2022-01-19 2022-04-29 新疆大学 Retinal blood vessel image segmentation method and device and related equipment
CN114494222A (en) * 2022-02-09 2022-05-13 西安科技大学 Vision transducer-based rolling bearing fault intelligent identification method
CN114564991A (en) * 2022-02-28 2022-05-31 合肥工业大学 Electroencephalogram signal classification method based on Transformer guide convolution neural network
CN114882014A (en) * 2022-06-16 2022-08-09 深圳大学 Dual-model-based fundus image quality evaluation method and device and related medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YIHAN CHEN 等: "A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method", 《REMOTE SENSING》 *
YUHAO QING 等: "Improved Transformer Net for Hyperspectral Image Classification", 《REMOTE SENSING》 *
傅励瑶 等: "基于Transformer 的U 型医学图像分割网络综述", 《计算机应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118675219A (en) * 2024-08-23 2024-09-20 杭州聚秀科技有限公司 Method and system for detecting diabetic retinopathy lesion based on fundus image

Also Published As

Publication number Publication date
CN114926460B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
Boudegga et al. Fast and efficient retinal blood vessel segmentation method based on deep learning network
Singh et al. Image processing based automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus image
Da Rocha et al. Diabetic retinopathy classification using VGG16 neural network
Hassan et al. Joint segmentation and quantification of chorioretinal biomarkers in optical coherence tomography scans: A deep learning approach
Akil et al. Detection of retinal abnormalities in fundus image using CNN deep learning networks
Singh et al. Retinal optic disc segmentation using conditional generative adversarial network
Qiu et al. Self-supervised iterative refinement learning for macular OCT volumetric data classification
Yang et al. Classification of diabetic retinopathy severity based on GCA attention mechanism
Imran et al. Enhanced intelligence using collective data augmentation for CNN based cataract detection
Su et al. Superficial punctate keratitis grading for dry eye screening using deep convolutional neural networks
Hao et al. Hybrid variation-aware network for angle-closure assessment in AS-OCT
Sengupta et al. Ophthalmic diagnosis and deep learning–a survey
Elmoufidi et al. Diabetic retinopathy prevention using efficientnetb3 architecture and fundus photography
Haider et al. Modified Anam-Net Based Lightweight Deep Learning Model for Retinal Vessel Segmentation.
Babu et al. Efficient detection of glaucoma using double tier deep convolutional neural network
Panda et al. A detailed systematic review on retinal image segmentation methods
Helen et al. EYENET: An Eye Disease Detection System using Convolutional Neural Network
Tian et al. Learning discriminative representations for fine-grained diabetic retinopathy grading
Almansour et al. Peripapillary atrophy classification using CNN deep learning for glaucoma screening
Khudaier et al. Binary Classification of Diabetic Retinopathy Using CNN Architecture
Sujithra et al. Adaptive cluster-based superpixel segmentation and BMWMMBO-based DCNN classification for glaucoma detection
EP4049287A1 (en) Machine-learning techniques for prediction of future visual acuity
CN114926460B (en) Training method of fundus image classification model, and fundus image classification method and system
ElMOUFIDI et al. Efficientnetb3 architecture for diabetic retinopathy assessment using fundus images
Harithalakshmi et al. EfficientNet-based Diabetic Retinopathy Classification Using Data Augmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant