CN115424084B - Fundus photo classification method and device based on class weighting network - Google Patents
Fundus photo classification method and device based on class weighting network Download PDFInfo
- Publication number
- CN115424084B CN115424084B CN202211381516.6A CN202211381516A CN115424084B CN 115424084 B CN115424084 B CN 115424084B CN 202211381516 A CN202211381516 A CN 202211381516A CN 115424084 B CN115424084 B CN 115424084B
- Authority
- CN
- China
- Prior art keywords
- channel
- feature map
- category
- feature
- fundus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000010586 diagram Methods 0.000 claims description 53
- 238000000605 extraction Methods 0.000 claims description 49
- 238000011176 pooling Methods 0.000 claims description 44
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000013517 stratification Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 230000014509 gene expression Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000012821 model calculation Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 abstract description 6
- 229940023490 ophthalmic product Drugs 0.000 abstract 1
- 230000008569 process Effects 0.000 description 20
- 206010012689 Diabetic retinopathy Diseases 0.000 description 16
- 230000006870 function Effects 0.000 description 11
- 238000013145 classification model Methods 0.000 description 9
- 201000010099 disease Diseases 0.000 description 8
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 8
- 206010012601 diabetes mellitus Diseases 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 210000001508 eye Anatomy 0.000 description 6
- 230000003902 lesion Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 125000004122 cyclic group Chemical group 0.000 description 3
- 210000004220 fundus oculi Anatomy 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 206010002329 Aneurysm Diseases 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 2
- 208000009857 Microaneurysm Diseases 0.000 description 2
- 206010029113 Neovascularisation Diseases 0.000 description 2
- 206010038862 Retinal exudates Diseases 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000004195 computer-aided diagnosis Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000002062 proliferating effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004256 retinal image Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a classification method and a classification device for fundus photos based on a class weighting network, and belongs to the technical field of image classification and ophthalmic medicine. The classification method comprises the following steps: reading a plurality of fundus picture data and tags thereof; inputting fundus picture data and labels thereof into a category weighting network, training and constructing a category weighting network model; reading fundus picture data to be recognized; and inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the maximum model output probability as the type result of the fundus picture. According to the class weighting network model, different class weights are given to different classes of data, balance among different difficult and easy data is achieved, reference is provided for the class weights by calculating the type gradient norm, and the consumption of a large amount of time and energy for manually adjusting the weights through repeated experiments of researchers in the model training stage is avoided.
Description
Technical Field
The invention belongs to the technical field of image classification and ophthalmologic medicine, and particularly relates to a classification method and a classification device for fundus pictures based on a class weighting network.
Background
The vision is the information receiver which people know the world and obtain the most knowledge, however, with the development of society, the eye pressure of people on work and life is increased, in addition, the eye tissue is damaged indirectly by common diseases, and the eye health becomes a non-negligible challenge for people. One of the most common diseases that can cause ocular complications is Diabetes Mellitus (DM), which is one of the most serious and common chronic diseases in our age. According to the recent statistical data of the International Diabetes Federation (IDF), the prevalence of Diabetes in the 20-79 year old population worldwide was estimated to be 10.5% (5.366 million people) in 2021, and will rise to 12.2% (7.832 million people) by 2045. Ocular complications resulting from diabetes are called Diabetic Retinopathy (DR), which is the leading cause of blindness in adults, and the primary lesions include Microaneurysms (MAs), "punctate" or "spotted" bleeding (HEs), hard exudation (Ex), cotton Wool Spots (CWS), and Neovascularization (NV). In Chinese diabetics, the prevalence of DR is 18.45%, i.e., almost every 5 diabetics, there is a risk of blindness. The blind prevention and blind treatment work in China and even in the world is very severe.
The computer aided diagnosis technology based on machine learning and artificial intelligence has been developed so far, a great number of researchers have been put into relevant work, the related medical fields are also various, and doctors can be helped to diagnose the illness state of patients more accurately and efficiently. The development of this technology in the field of DR screening is one of the important directions. For example, (1) grading studies of the extent of glycoreticular lesions; (2) segmentation research of the glycoreticular lesion and related structures; (3) interpretation study of sugar network lesion judgment. The machine learning technology is accelerated to be deeply applied to ophthalmology, the existing disease diagnosis system is possibly thoroughly changed, the computer-aided diagnosis technology can effectively relieve the working pressure of ophthalmologists, the efficiency of clinical work is improved, the screening of large-scale population diseases is facilitated, and a new solution is provided for relieving the shortage of medical resources.
However, in the process of classifying and judging the degree of disease from the fundus retinal image, there are still many problems: first, because DR datasets have the particularity of their medical attributes, we face a great challenge in both the quantity and quality of datasets. Traditional deep learning algorithms usually require a large amount of data support, but DR data sets such as DDR, APTOS and Messidor-2 disclosed on the network are far from sufficient scale. Meanwhile, different data sets have the problem that the grading standards are inconsistent and nonuniform, and the more detailed grading of the sugar net is generally divided into five grades, namely normal, mild, moderate, severe and proliferative sugar nets. Even under the same standard, doctors with different organizations and different professional levels can make different judgments on the classification result of the disease condition, so that different data sets cannot be combined for use. In a single data set, the sources of fundus pictures are also five-flower eight-door, and the color, the definition, the contrast, the brightness, the size, the eyeball integrity and the like are different, and the quality levels are different. Second, there is a serious data imbalance phenomenon in the DR data set, including an imbalance in quantity and an imbalance in difficulty and ease. Normally, the number of normal eyegrounds will account for about half of the total data volume, and even higher, while the least significant data may be only 1/20 to 1/30 of the total data volume. The imbalance can cause that the machine learning model ignores the feature learning of a few categories in the training process, and pays more attention to samples of a plurality of categories, and finally causes the problems of poor model effect and high accuracy. In the DR data set, some types of data are difficult to distinguish while being small in quantity, such as DR1 and DR3, while DR4 is relatively easy to distinguish while being small in quantity.
Based on the problems of the DR data set, machine learning is still not accurate enough in the DR classification task, and a large improvement space exists. Therefore, the invention provides a fundus photo classification method and device based on a category weighting network, which are used for assisting in screening the diabetes mellitus and improving the DR classification accuracy.
Disclosure of Invention
The invention aims to solve at least one technical problem in the prior art and provides a fundus picture classifying method based on a class weighting network and a fundus picture classifying device based on the class weighting network.
In one aspect of the present invention, there is provided a fundus picture classification method based on a class weighting network, the method including the steps of:
reading a plurality of fundus picture data and tags thereof;
inputting the fundus photo data and the labels thereof into a class weighting network, training and constructing a class weighting network model, wherein the class weighting network model comprises the following steps:
performing primary feature extraction on the fundus picture data to obtain a primary extracted feature map;
respectively extracting the characteristics of the preliminarily extracted characteristic graph by using a channel dimension, a pixel dimension and a category dimension to respectively obtain a channel characteristic graph, a pixel characteristic graph and a category characteristic graph;
fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram;
converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;
reading fundus picture data to be recognized;
and inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.
Optionally, the preliminary feature extraction is performed on the fundus image data to obtain a preliminary extracted feature map, including:
performing primary feature extraction on the fundus picture by using a modified pre-training network to obtain a primary extracted feature map;
wherein the modified pre-training network does not include the last fully-connected layer of the pre-training network.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map includes:
using global average pooling of pixel dimensions to obtain features that ignore pixel dimensions, obtaining a channel weight distribution over conv _ block, wherein,
the specific relation of the structure of conv _ block is as follows:
wherein,CBa conv _ block layer is represented,xindication inputEntering the characteristic diagram of the conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
and extracting the features of the preliminary feature map by using a channel feature extractor according to channel dimensions to obtain a channel feature map, wherein,
the specific relational expression of the structure of the channel feature extractor is as follows:
wherein,F c representing a channel feature map;F B representing a primary extracted feature map;GAP p representing global average pooling in pixel dimension;CBrepresenting a conv _ block layer;representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a pixel dimension to obtain a pixel feature map includes:
using global average pooling of channel dimensions to obtain the characteristics of neglecting the channel dimensions, and obtaining pixel weight distribution through conv _ block;
and performing feature extraction on the preliminary feature map by pixel dimension by using a pixel feature extractor to obtain a pixel feature map, wherein,
the specific relational expression of the structure of the pixel feature extractor is as follows:
wherein,F p representing a pixel feature map;GAP c global average pooling representing channel dimensions;F B representing a primary extracted feature graph;CBrepresenting a conv _ block layer;representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map includes:
adopting 1 × 1 convolution layer to extract the primary feature mapF B Is expanded intoKLayer of to obtainF K ,KThe specific relationship of (a) is as follows:
wherein,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which the class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
wherein,F B showing the preliminary extraction of the feature map,Conv K to representK1 by 1 of the convolution layer,GMP K is shown for eachPerforming one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map(ii) a The specific relation is as follows:
to pairAnd executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
wherein,F T in order to be a type feature map,GAP C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points. />
Optionally, the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
wherein,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation representing the sample,
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
according to different types of gradient normg i To obtain a size ratio of the type weight.
Optionally, the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type identification result corresponding to the fundus picture label, includes:
the target feature graph is subjected to a global average pooling layer and a full connection layer to obtain final output;
the specific relation is as follows:
wherein,outthe final output value of the model is a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;representing pairs between different matricesThe elements that should be located are averaged out,FC H andFC N a fully-connected layer is shown,FC H the number of output channels of (a) is half of the number of input channels,FC N the number of output channels is the classification numberN。
Optionally, before reading the plurality of fundus picture data and the tags thereof, the method further includes: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.
Optionally, after inputting the fundus image data and the label thereof into the class weighting network, training and constructing a class weighting network model, the method further includes:
comparing the type recognition result of the fundus picture with a real label thereof to calculate cross entropy loss, and reversely transmitting and updating the model parameters;
the specific relationship of the cross entropy loss is as follows:
wherein,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
In another aspect of the present invention, there is provided a fundus image classification apparatus based on a class weighting network, including: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein,
the first reading unit is used for reading a plurality of fundus picture data and labels thereof;
the model forming unit is used for inputting the fundus photo data and the labels thereof into a category weighting network, training and constructing a category weighting network model; wherein the model forming unit further comprises: a basic feature extractor, a channel feature extractor, a pixel feature extractor, a category feature extractor and a feature converter;
the basic feature extractor is used for performing preliminary feature extraction on the fundus picture data to obtain a preliminary extraction feature map;
the channel feature extractor is used for performing feature extraction on the preliminarily extracted feature map by channel dimensions to obtain a channel feature map;
the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map in pixel dimensions to obtain a pixel feature map;
the category feature extractor is used for extracting features of the preliminarily extracted feature map according to category dimensions to obtain a category feature map;
the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;
the second reading unit is also used for reading fundus photo data to be recognized;
and the class output unit is used for inputting the fundus picture data to be recognized into the class weighting network model and taking the class with the maximum model output probability as a class result of the fundus picture.
The invention provides a classification method and a classification device for fundus photos based on a class weighting network. Furthermore, in the determination of different classes of weights, the invention also provides that the reference is provided by calculating the type gradient norm, so that the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the model training phase is avoided.
Drawings
FIG. 1 is a block flow diagram of a method for classifying fundus images based on a class-weighted network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a class weighting network according to an embodiment of the present invention;
fig. 3 is a block diagram showing a configuration of a fundus image classification apparatus based on a class weighting network according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Unless otherwise specifically stated, technical or scientific terms used herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this invention belongs. The use of "including" or "comprising" and the like in this disclosure does not limit the presence or addition of any number, step, action, operation, component, element, and/or group thereof or does not preclude the presence or addition of one or more other different numbers, steps, actions, operations, components, elements, and/or groups thereof. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number and order of the indicated features.
As shown in fig. 1 and 2, the present invention provides a fundus image classification method S100 based on a class weighting network, which specifically includes steps S110 to S140:
it should be noted that the classification method of this embodiment includes two stages, which are a model training stage and a model practical application stage, respectively, where the model training stage includes steps S110 to S120, and the model practical application stage includes steps S130 to S140, that is, a classification model for identifying the category of the fundus image is established first, and then the category of the fundus image is identified based on the classification model, and a diabetic retinopathy grade can be obtained based on the identification result of the fundus image, that is, screening of diabetic retinopathy is achieved.
And S110, reading a plurality of fundus picture data and labels thereof.
Specifically, in this embodiment, the fundus photo data needs to be read, the label corresponding to the fundus photo also needs to be read, a list composed of a training data file path and training data label data can be transmitted, a corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.
Note that the fundus image label of this embodiment includes five disease severity levels of diabetic retinopathy, which are respectively denoted as DR0 to DR4. In the model training stage, after reading, data enhancement needs to be carried out on the data of the eye negative film.
As a further preferable scheme, the invention carries out random up-down turning, random left-right turning and random rotation processing on the fundus picture in the training stage. That is, pre-processing of data amplification and enhancement is performed on fundus pictures to form an enhanced data set, and fundus picture data in the enhanced data set is read.
And S120, training and constructing a class weighting network model according to the fundus photo data and the labels thereof, wherein the constructed model is used as a classification model of the fundus photos so as to intelligently identify the classes of the fundus photos.
Specifically, fundus picture data are input into a category weighting network to carry out type recognition training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining the constructed class weighting network model.
And S130, reading fundus picture data to be recognized.
Specifically, the corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.
In practical application, after fundus image data is read, the original image is directly input to the category weighting network model for type recognition without performing data enhancement processing.
And S140, inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the highest model output probability as a category result of the fundus picture.
It should be noted that, in the actual application process, no loss calculation is performed on the recognition result, and no model parameter is updated.
It should be further noted that the class weighting network used in the present invention implements data balance on the model structure level, and the network structure is as shown in fig. 2, by providing learning channels of different scales for different classes of data, it implements balance between data of different difficulty levels, improves accuracy of fundus picture type identification, and is different from the common data balance performed in the loss calculation process.
Furthermore, in the determination of different types of weights, the method avoids the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process by calculating the type gradient norm.
Specifically, the process of network modeling in step S120 is as follows:
firstly, carrying out primary feature extraction on an eye fundus picture by using a basic feature extractor to obtain a primary extracted feature mapF B 。
It should be noted that any current mature pre-training network, such as ResNet, inceptionNet, denseNet, and EfficeientNet, recently introduced by Google, inc., can be used and modified by the primitive feature extractor.
Specifically, the modification comprises: and deleting the last full connection layer of the original network, so that the network does not directly give a classification result, but performs primary feature extraction on the fundus picture to obtain a primary extracted feature map.
Secondly, extracting the characteristics of the channel dimensionality of the preliminarily extracted characteristic graph by using a channel characteristic extractor to obtain a channel characteristic graphF c 。
Specifically, the channel weight distribution is obtained by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:
wherein,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layers,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
the specific relation of the structure of the channel feature extractor is as follows:
wherein,F B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP p the global average pooling is performed on the pixel dimension, so that the characteristics of the pixel dimension are ignored by the model at the position, and the influence of the characteristics of different channel dimensions on the model is more concentrated;CBrepresenting the above-mentioned conv _ block layer,representing a matrix dot product, passingCBChannel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain the channel characteristic diagramF c 。
Thirdly, extracting the feature of the pixel dimension of the preliminarily extracted feature image by using a pixel feature extractor to obtain a pixel feature imageF P 。
Specifically, the pixel weight distribution is obtained by using global average pooling of channel dimensions and conv _ block, and the specific relational expression of the structure of the pixel feature extractor is as follows:
wherein,CB,F B andthe meaning is the same as the above-mentioned relation,GAP c the global average pooling represents the channel dimension, and averages all channels in the feature map, compresses the average value to 1 channel, and enables the model to neglect the feature of the channel dimension and concentrate more on the contribution of different pixels to the model; throughCBPixel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain a pixel characteristic diagramF p 。
Fourthly, extracting the characteristics of the type dimensionality of the preliminarily extracted characteristic graph by using a category characteristic extractor to obtain a type characteristic graphF T 。
Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF B Is expanded intoKLayer of to obtainF K ,KThe specific relationship of (a) is as follows:
wherein,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be paired withiA class passesk i The characteristics of each channel are extracted,k i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on a model level.
Then to haveKCharacteristic diagram of individual channelF K According to typePooling the channels to obtain a feature map with dimension features of the channels omittedF N (ii) a The feature map ignoring channel dimension featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
wherein,F B showing the preliminary extraction of the feature map,Conv K to representKA number of convolution layers of 1 x 1,GMP K is shown for eachPerforming one-time maximum pooling on layer channels, i.e. pooling according to type channels to finally obtain the layer channels with neglected channel dimension characteristicsF N 。
Then toF N And performing global average pooling in the pixel dimension, further neglecting the features of the pixel dimension, and only focusing on the features in the learning type dimension. Obtaining type weight distribution through conv _ block, and then performing the operationF N Dot multiplication to obtain a preliminary type feature map. The specific relation is as follows:
wherein,representing a preliminary type characteristic diagram, and other symbols have the same meanings as the relational expressions described above;
finally, toPerforming global average pooling of channel dimensions and conv _ blockTo the final type weight distribution, the specific relation is as follows:
wherein,F T the other symbols have the same meaning as the above relation for the type profile obtained.
And fifthly, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a corresponding type identification result of the fundus photo label.
In particular, for the above channel characteristic diagramF C Pixel feature mapF P And type feature mapF T Obtaining final output through a global average pooling layer, a full connection layer and the like;
the specific relation is as follows:
wherein,F C ,F P ,F T ReLU and GAP P Has the same meaning as the above-mentioned relation,meaning that the elements at corresponding positions between different matrices are averaged,FC H andFC N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index with the maximum value is selected as the final identification result of the model.
Furthermore, in order to avoid the great time and labor consumption of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process, the invention provides reference for the setting of the type weight by calculating the type gradient norm, and the specific relation formula is as follows:
wherein,g i is shown asiThe type of the class is a gradient norm,n i denotes the firstiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirect output after model calculation (without softmax layer);
order top=softmax(out),yOne-hot vector representation representing samples, again because
The calculation of the type gradient norm can be simplified, and the specific relation is as follows:
according to different types of gradient normg i The size ratio of the type weight can be obtained.
It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform the training of the category identification process; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.
It should be further noted that, in the present invention, after obtaining the type recognition result in the training stage, the model needs to be updated to select the best effect as the final classification model of fundus images, so after the modeling, the type recognition step S120 for fundus images further includes the following steps:
and sixthly, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:
wherein,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, training can be continued on the basis of the existing model, and the model with better effect is selected again.
In this embodiment, a fundus picture classification model is constructed based on the above process to screen the lesion degree of the diabetes mellitus corresponding to the fundus picture, and the processes S130 to S140 of performing type recognition on the fundus picture to be recognized based on the formed model include:
reading fundus picture data to be screened; by adopting the constructed class weighting network model, the read fundus picture data is input into the class weighting network model, the type of the fundus picture data to be recognized is recognized, and the class with the maximum model output probability is used as the class result of the fundus picture.
As shown in fig. 3, another aspect of the present invention provides a fundus image classification apparatus 200 based on a class weighting network, including: a first reading unit 210, a model forming unit 220, a second reading unit 230, and a category output unit 240; wherein,
the first reading unit 210 is configured to read data of a plurality of fundus pictures and tags thereof, that is, in the model training stage, the data of the fundus pictures and the corresponding tags need to be read by the first reading unit for subsequent model training;
a model forming unit 220, configured to input fundus picture data and labels thereof into a class weighting network, train and construct a class weighting network model, where the model is used as a classification model of fundus pictures to identify classes of fundus pictures; wherein, the model forming unit 220 further includes: a basic feature extractor 221, a channel feature extractor 222, a pixel feature extractor 223, a category feature extractor 224, and a feature converter 225;
a basic feature extractor 221, configured to perform preliminary feature extraction on fundus image data to obtain a preliminary extracted feature map;
a channel feature extractor 222, configured to perform feature extraction on the preliminary extracted feature map according to channel dimensions to obtain a channel feature map;
a pixel feature extractor 223, configured to perform feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;
a category feature extractor 224, configured to perform feature extraction on the preliminary extracted feature map according to category dimensions to obtain a category feature map;
a feature converter 225 for fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a category identification result corresponding to the fundus photo label;
a second reading unit 230 for reading fundus picture data to be recognized;
a category output unit 240, configured to input fundus picture data to be recognized to the category weighting network model formed by the above construction, and take the category with the highest model output probability as a category result of the fundus picture.
It should be noted that, in the model training stage, the first reading unit of this embodiment needs to read not only fundus image data, but also tags corresponding to fundus images, and may transmit a list composed of training data file paths and training data tag data, open corresponding images through an opencv-python (cv 2) library, and transmit the read data to subsequent tasks. In the practical application stage of the model, the second reading unit only reads the data of the fundus picture, and the label type of the fundus picture data is identified by using the model formed by training in the previous step.
It should be further noted that the fundus picture label of the present embodiment includes five disease severity levels of diabetic retinopathy, which are respectively designated as DR0 to DR4. In the training stage of the model, after the first reading unit is used for reading, data enhancement needs to be carried out on the data of the eye negative film. That is, the apparatus of this embodiment further includes an enhancing unit 250 (as shown in fig. 3), and in the training stage, the enhancing unit performs random up-down flip, random left-right flip, and random rotation processing on the fundus oculi, that is, performs data amplification and enhancement preprocessing on the fundus oculi to form an enhanced data set, and then reads fundus oculi photograph data in the enhanced data set.
Furthermore, the embodiment uses the basic feature extractor to perform the preliminary feature extraction on the fundus picture to obtain the preliminary extracted feature mapF B . The basic feature extractor can use and modify any current mature pre-training network, such as ResNet, inception Net, denseNet, and EfficeientNet, which is newly proposed by Google in recent years.
Specifically, the modification comprises: and deleting the last full connection layer of the original network, so that the network does not directly give a classification result, but performs primary feature extraction on the fundus picture to obtain a primary extracted feature map.
Further, in the embodiment, the channel feature extractor is used to extract the feature of the preliminarily extracted feature map channel dimension to obtain the channel feature mapF C The specific process is as follows:
obtaining channel weight distribution by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:
wherein,CBrepresenting a conv _ block layer of the video signal,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the Batch Normalization of the mixture,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
the specific relationship of the structure of the channel feature extractor is as follows:
wherein,F B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP p representing global average pooling in pixel dimension, wherein the model ignores the characteristics of the pixel dimension and focuses more on the influence of the characteristics of different channel dimensions on the model;CBrepresenting the above-mentioned conv _ block layer,representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points to obtain a channel characteristic diagramF c 。
Further, the present embodiment uses a pixel feature extractor for preliminarily extracting features of the pixel dimensions of the feature imageExtracting to obtain a pixel feature mapF P The specific process is as follows:
and obtaining pixel weight distribution by using global average pooling of channel dimensions and conv _ block, wherein the specific relational expression of the structure of the pixel feature extractor is as follows:
wherein,CB,F B andthe meaning is the same as the above-mentioned relation,GAP c the global average pooling represents the channel dimension, and averages all channels in the feature map, compresses the average value to 1 channel, and enables the model to neglect the feature of the channel dimension and concentrate more on the contribution of different pixels to the model; throughCBPixel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain a pixel characteristic diagramF p 。
Furthermore, in the embodiment, the class feature extractor is used to extract the feature of the preliminarily extracted feature map type dimension to obtain the type feature mapF T 。
Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF B Is expanded intoKLayer of to obtainF K ,KThe specific relation of (2) is as follows:
wherein,Nthe number of types of pictures that are represented,k i denotes the firstiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be compared withiA class passesk i The characteristics of each channel are extracted,k i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on the model level.
Then to haveKCharacteristic diagram of each channelF K Pooling by type channel to obtain a feature map ignoring channel dimension featuresF N (ii) a The feature map ignoring the channel dimension featureF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
wherein,F B showing the preliminary extraction of the feature map,Conv K to representKA number of convolution layers of 1 x 1,GMP K is shown for eachPerforming one-time maximum pooling on layer channels, i.e. pooling according to type channels to finally obtain the layer channels with neglected channel dimension characteristicsF N 。
Then toF N And performing global average pooling in the pixel dimension, further neglecting the features of the pixel dimension, and only focusing on the features in the learning type dimension. Obtaining type weight distribution through conv _ block, and then performing the operationF N Dot multiplication to obtain a preliminary type feature map. The specific relation is as follows:
wherein,representing a preliminary type characteristic diagram, and other symbols have the same meanings as the relational expressions described above;
finally, toAnd executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
wherein,F T the other symbols have the same meaning as the above relation for the type profile obtained.
Furthermore, in order to avoid the consumption of a lot of time and effort in training process, which requires manual adjustment of the weight parameters according to the past experience and repeated experiments of researchers, the present invention needs to provide reference for the setting of the above-mentioned type weights by calculating the type gradient norm, that is, the model forming unit of the present embodiment further includes a weight setting module, and the specific relation formula is as follows:
wherein,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class is,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirect output after model calculation (without softmax layer);
order top=softmax(out),yOne-hot vector representation of the representation sample, again because
The type gradient norm can be calculatedThe line is simplified, and the specific relation is as follows:
It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform type identification training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.
Furthermore, in this embodiment, a feature converter is used to fuse feature maps of different dimensions to obtain a final feature map, and then the final feature map is converted into a comprehensive type identification result of the fundus photo label, which includes the following specific processes:
for the above channel characteristic diagramF c Pixel feature mapF P And type feature mapF T Obtaining final output through a global average pooling layer, a full connection layer and the like;
the specific relation is as follows:
wherein,F c ,F P ,F T ReLU andGAP P has the same meaning as the above-mentioned relation,indicating averaging of elements at corresponding positions between different matrices,FC H AndFC N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index of the maximum value is selected as the final type identification result of the model.
It should be noted that, in the present invention, after obtaining the fundus image classification result in the training stage, the model needs to be updated to select the model with the best effect as the final fundus image classification model, therefore, the model forming unit of this embodiment further includes an updating module, and the process of updating the model by using the updating module is as follows:
and comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:
wherein,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, the training can be continued on the basis of the existing model, and the model with better effect is reselected.
The classification method of fundus images based on the class weighting network will be further described below with reference to specific embodiments:
example 1
The present example identifies the category of fundus photos with diabetic retinopathy, including the steps of:
s1, calculating a type gradient norm, and providing reference for type weight setting of a class weighting network, namely determining the type gradient norm before using the network;
s2, reading fundus photo data and a label, and performing data enhancement on the fundus photo data;
s3, performing primary feature extraction on the fundus picture by using a basic feature extractor to obtain a primary extracted feature map;
s4, extracting the feature of the channel dimension of the preliminarily extracted feature map by using a channel feature extractor to obtain a channel feature map;
s5, extracting the feature of the pixel dimension of the feature image which is preliminarily extracted by using a pixel feature extractor to obtain a pixel feature image;
s6, extracting the features of the type dimensions of the preliminarily extracted feature map by using a category feature extractor to obtain a type feature map;
s7, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a comprehensive identification result of the fundus photo label;
s8, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely propagating and updating the model parameters; the process of calculating the cyclic loss and updating the parameters is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained;
and S9, reading fundus picture data to be screened, inputting a category weighting network model, and taking the category with the highest model output probability as the category result of the fundus picture.
The invention provides a classification method and a classification device of fundus photos based on a category weighting network, which have the following beneficial effects:
first, the classification method and apparatus based on class weighting network provided by the present invention implement the balance between data with different difficulty levels by giving different scales of learning channels to data with different classes. Unlike the conventional data balancing in the loss calculation process, the data balancing method realizes the data balancing on the model structure level.
Secondly, the invention provides a simple and effective scheme on the basis of determining different types of weights, namely, the invention avoids the consumption of a great deal of time and energy in the experimental process of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers by calculating the type gradient norm. And these costs will appear exponentially larger as the number of categories increases and the size of the data scales.
Thirdly, the classification method and device based on the class weighting network provided by the invention can be used for extracting the characteristics of the fundus picture data from three different dimensions, thereby improving the performance and generalization capability of model characteristic extraction and reducing the influence caused by non-uniform data formats.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.
Claims (8)
1. A fundus photo classification method based on a class weighting network is characterized by comprising the following steps:
reading a plurality of fundus picture data and tags thereof;
inputting the fundus photo data and the labels thereof into a class weighting network, training and constructing a class weighting network model, wherein the class weighting network model comprises the following steps:
performing primary feature extraction on the fundus picture data to obtain a primary extracted feature map;
respectively extracting the characteristics of the preliminarily extracted characteristic graph by using a channel dimension, a pixel dimension and a category dimension to respectively obtain a channel characteristic graph, a pixel characteristic graph and a category characteristic graph;
fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram;
converting the target characteristic diagram into a type recognition result corresponding to the fundus photo label;
reading fundus picture data to be identified;
inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum output probability of the category weighting network model as a category result of the fundus picture; wherein,
performing feature extraction on the preliminarily extracted feature map by using a category dimension to obtain a category feature map, wherein the method comprises the following steps:
adopting 1 × 1 convolution layer to extract the primary characteristic diagramF B Is expanded intoKLayer of to obtainF K ,KThe specific relation of (2) is as follows:
wherein,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which a class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relation being as follows:
wherein,F B showing the preliminary extraction of the feature map,Conv K to representK1 by 1 of the convolution layer,GMP K is shown for eachPerforming one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map(ii) a The specific relation is as follows:
to pairAnd executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
wherein,F T in order to be a type feature map,GAMP C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points;
the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
wherein,g i is shown asiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entry loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation of the representation sample,
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
according to different types of gradient normg i To obtain a size ratio of the type weight.
2. The method according to claim 1, wherein the performing preliminary feature extraction on the fundus picture data to obtain a preliminary extracted feature map comprises:
performing primary feature extraction on the fundus picture by using a modified pre-training network to obtain a primary extracted feature map;
wherein the modified pre-training network does not include the last fully connected layer of the pre-training network.
3. The method according to claim 2, wherein the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map comprises:
using global average pooling of pixel dimensions to obtain features that ignore pixel dimensions, obtaining a channel weight distribution over conv _ block, wherein,
the specific relation of the structure of conv _ block is as follows:
wherein,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
and extracting the features of the preliminary feature map by using a channel feature extractor according to channel dimensions to obtain a channel feature map, wherein,
the specific relational expression of the structure of the channel feature extractor is as follows:
wherein,F c representing a channel feature map;F B representing a primary extracted feature map;GAP p representing global average pooling in pixel dimension;CBrepresenting a conv _ block layer;representing a matrix dot product, passingCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
4. The method according to claim 3, wherein the performing feature extraction on the preliminary extracted feature map in pixel dimension to obtain a pixel feature map comprises:
using global average pooling of channel dimensions to obtain the characteristics of neglecting the channel dimensions, and obtaining pixel weight distribution through conv _ block;
and performing feature extraction on the preliminary feature map by pixel dimension by using a pixel feature extractor to obtain a pixel feature map, wherein,
the specific relational expression of the structure of the pixel feature extractor is as follows:
wherein,F p representing a pixel feature map;GAP c global average pooling representing channel dimensions;F B representing a primary extracted feature graph;CBrepresenting a conv _ block layer;representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
5. The method according to claim 4, wherein the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type recognition result corresponding to the fundus picture label comprises:
the target feature graph is subjected to a global average pooling layer and a full connection layer to obtain final output;
the specific relation is as follows:
wherein,outas the final output value of the model, a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;meaning that the elements at corresponding positions between different matrices are averaged,FC H andFC N a fully-connected layer is shown,FC H the number of output channels of (a) is half of the number of input channels,FC N the number of output channels is the classification numberN。
6. The method according to claim 1, wherein before reading the plurality of fundus picture data and the tags thereof, further comprising: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.
7. The method of claim 1, wherein after inputting the fundus picture data and the labels thereof into the class weighting network and training and constructing the class weighting network model, the method further comprises:
comparing the type recognition result of the fundus picture with the real label thereof to calculate cross entropy loss, and reversely transmitting and updating the parameters of the category weighting network model;
the specific relationship of the cross entropy loss is as follows:
wherein,x[class]representing input dataxThe class to which the actual belongs is a class,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
8. An fundus image classification apparatus based on a class weighting network, comprising: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein,
the first reading unit is used for reading a plurality of fundus picture data and labels thereof;
the model forming unit is used for inputting the fundus photo data and the labels thereof into a category weighting network, training and constructing a category weighting network model; wherein the model forming unit further comprises: a basic feature extractor, a channel feature extractor, a pixel feature extractor, a category feature extractor and a feature converter;
the basic feature extractor is used for performing preliminary feature extraction on the fundus picture data to obtain a preliminary extraction feature map;
the channel feature extractor is used for performing feature extraction on the preliminarily extracted feature map by channel dimensions to obtain a channel feature map;
the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;
the category feature extractor is configured to perform feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map, and includes:
adopting 1 × 1 convolution layer to extract the primary characteristic diagramF B Is expanded intoKLayer of to obtainF K ,KThe specific relation of (2) is as follows:
wherein,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which a class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain ignore channel dimension characteristicsCharacteristic diagram of featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
wherein,F B showing the preliminary extraction of the feature map,Conv K representKA number of convolution layers of 1 x 1,GMP K is shown for eachPerforming one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map(ii) a The specific relation is as follows:
to pairAnd executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
wherein,F T in order to be a type feature map,GAMP C indicating that global average pooling is done in the channel dimension,CBa conv _ block layer is represented,representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points;
the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
wherein,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation of the representation sample,
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
according to different types of gradient normg i To obtain the size ratio of the type weight;
the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a type identification result corresponding to the fundus photo label;
the second reading unit is also used for reading fundus picture data to be identified;
and the category output unit is used for inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211381516.6A CN115424084B (en) | 2022-11-07 | 2022-11-07 | Fundus photo classification method and device based on class weighting network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211381516.6A CN115424084B (en) | 2022-11-07 | 2022-11-07 | Fundus photo classification method and device based on class weighting network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115424084A CN115424084A (en) | 2022-12-02 |
CN115424084B true CN115424084B (en) | 2023-03-24 |
Family
ID=84207786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211381516.6A Active CN115424084B (en) | 2022-11-07 | 2022-11-07 | Fundus photo classification method and device based on class weighting network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115424084B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112907604A (en) * | 2021-03-16 | 2021-06-04 | 南通大学 | Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation |
CN114648806A (en) * | 2022-05-19 | 2022-06-21 | 山东科技大学 | Multi-mechanism self-adaptive fundus image segmentation method |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335269A (en) * | 2018-05-16 | 2019-10-15 | 腾讯医疗健康(深圳)有限公司 | The classification recognition methods of eye fundus image and device |
CN109493954B (en) * | 2018-12-20 | 2021-10-19 | 广东工业大学 | SD-OCT image retinopathy detection system based on category distinguishing and positioning |
AU2020101450A4 (en) * | 2020-07-23 | 2020-08-27 | .B.M.S, Rani Ms | Retinal vascular disease detection from retinal fundus images using machine learning |
AU2020103938A4 (en) * | 2020-12-07 | 2021-02-11 | Capital Medical University | A classification method of diabetic retinopathy grade based on deep learning |
CN118537584A (en) * | 2020-12-11 | 2024-08-23 | 北京航空航天大学 | Fundus photo classification system and storage medium for chronic kidney disease detection |
CN112560948B (en) * | 2020-12-15 | 2024-04-26 | 中南大学 | Fundus image classification method and imaging method under data deviation |
CN112869704B (en) * | 2021-02-02 | 2022-06-17 | 苏州大学 | Diabetic retinopathy area automatic segmentation method based on circulation self-adaptive multi-target weighting network |
CN113011362A (en) * | 2021-03-29 | 2021-06-22 | 吉林大学 | Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism |
CN113537395B (en) * | 2021-08-09 | 2022-07-08 | 同济大学 | Diabetic retinopathy image identification method based on fundus images |
CN113768460B (en) * | 2021-09-10 | 2023-11-14 | 北京鹰瞳科技发展股份有限公司 | Fundus image analysis system, fundus image analysis method and electronic equipment |
CN114019467B (en) * | 2021-10-25 | 2024-07-09 | 哈尔滨工程大学 | Radar signal identification and positioning method based on MobileNet model transfer learning |
CN114494195B (en) * | 2022-01-26 | 2024-06-04 | 南通大学 | Small sample attention mechanism parallel twin method for fundus image classification |
-
2022
- 2022-11-07 CN CN202211381516.6A patent/CN115424084B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112907604A (en) * | 2021-03-16 | 2021-06-04 | 南通大学 | Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation |
CN114648806A (en) * | 2022-05-19 | 2022-06-21 | 山东科技大学 | Multi-mechanism self-adaptive fundus image segmentation method |
Also Published As
Publication number | Publication date |
---|---|
CN115424084A (en) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021916B (en) | Deep learning diabetic retinopathy sorting technique based on attention mechanism | |
CN111815574B (en) | Fundus retina blood vessel image segmentation method based on rough set neural network | |
CN113011485B (en) | Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device | |
CN107423571A (en) | Diabetic retinopathy identifying system based on eye fundus image | |
CN110751637A (en) | Diabetic retinopathy detection system, method, equipment and training system | |
Peng et al. | Automatic staging for retinopathy of prematurity with deep feature fusion and ordinal classification strategy | |
CN108537282A (en) | A kind of diabetic retinopathy stage division using extra lightweight SqueezeNet networks | |
CN111938569A (en) | Eye ground multi-disease classification detection method based on deep learning | |
WO2022166399A1 (en) | Fundus oculi disease auxiliary diagnosis method and apparatus based on bimodal deep learning | |
Bhati et al. | Discriminative kernel convolution network for multi-label ophthalmic disease detection on imbalanced fundus image dataset | |
CN111080643A (en) | Method and device for classifying diabetes and related diseases based on fundus images | |
CN109464120A (en) | A kind of screening for diabetic retinopathy method, apparatus and storage medium | |
CN112101424A (en) | Generation method, identification device and equipment of retinopathy identification model | |
Liu | Construction and verification of color fundus image retinal vessels segmentation algorithm under BP neural network | |
CN112869697A (en) | Judgment method for simultaneously identifying stage and pathological change characteristics of diabetic retinopathy | |
CN113887662A (en) | Image classification method, device, equipment and medium based on residual error network | |
CN110473176B (en) | Image processing method and device, fundus image processing method and electronic equipment | |
CN114372985B (en) | Diabetic retinopathy focus segmentation method and system adapting to multi-center images | |
Yang et al. | Multi-classification of fundus diseases based on DSRA-CNN | |
Dong et al. | Supervised learning-based retinal vascular segmentation by m-unet full convolutional neural network | |
Feng et al. | Grading of diabetic retinopathy images based on graph neural network | |
Ali et al. | Cataract disease detection used deep convolution neural network | |
CN115424084B (en) | Fundus photo classification method and device based on class weighting network | |
Kolte et al. | Advancing Diabetic Retinopathy Detection: Leveraging Deep Learning for Accurate Classification and Early Diagnosis | |
Xiao et al. | SE-MIDNet based on deep learning for diabetic retinopathy classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |