CN115424084B - Fundus photo classification method and device based on class weighting network - Google Patents

Fundus photo classification method and device based on class weighting network Download PDF

Info

Publication number
CN115424084B
CN115424084B CN202211381516.6A CN202211381516A CN115424084B CN 115424084 B CN115424084 B CN 115424084B CN 202211381516 A CN202211381516 A CN 202211381516A CN 115424084 B CN115424084 B CN 115424084B
Authority
CN
China
Prior art keywords
channel
feature map
category
feature
fundus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211381516.6A
Other languages
Chinese (zh)
Other versions
CN115424084A (en
Inventor
沈婷
韩志科
洪朝阳
郑青青
杨斌
肖涵瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Provincial Peoples Hospital
Zhejiang University City College ZUCC
Original Assignee
Zhejiang Provincial Peoples Hospital
Zhejiang University City College ZUCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Provincial Peoples Hospital, Zhejiang University City College ZUCC filed Critical Zhejiang Provincial Peoples Hospital
Priority to CN202211381516.6A priority Critical patent/CN115424084B/en
Publication of CN115424084A publication Critical patent/CN115424084A/en
Application granted granted Critical
Publication of CN115424084B publication Critical patent/CN115424084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a classification method and a classification device for fundus photos based on a class weighting network, and belongs to the technical field of image classification and ophthalmic medicine. The classification method comprises the following steps: reading a plurality of fundus picture data and tags thereof; inputting fundus picture data and labels thereof into a category weighting network, training and constructing a category weighting network model; reading fundus picture data to be recognized; and inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the maximum model output probability as the type result of the fundus picture. According to the class weighting network model, different class weights are given to different classes of data, balance among different difficult and easy data is achieved, reference is provided for the class weights by calculating the type gradient norm, and the consumption of a large amount of time and energy for manually adjusting the weights through repeated experiments of researchers in the model training stage is avoided.

Description

Fundus photo classification method and device based on class weighting network
Technical Field
The invention belongs to the technical field of image classification and ophthalmologic medicine, and particularly relates to a classification method and a classification device for fundus pictures based on a class weighting network.
Background
The vision is the information receiver which people know the world and obtain the most knowledge, however, with the development of society, the eye pressure of people on work and life is increased, in addition, the eye tissue is damaged indirectly by common diseases, and the eye health becomes a non-negligible challenge for people. One of the most common diseases that can cause ocular complications is Diabetes Mellitus (DM), which is one of the most serious and common chronic diseases in our age. According to the recent statistical data of the International Diabetes Federation (IDF), the prevalence of Diabetes in the 20-79 year old population worldwide was estimated to be 10.5% (5.366 million people) in 2021, and will rise to 12.2% (7.832 million people) by 2045. Ocular complications resulting from diabetes are called Diabetic Retinopathy (DR), which is the leading cause of blindness in adults, and the primary lesions include Microaneurysms (MAs), "punctate" or "spotted" bleeding (HEs), hard exudation (Ex), cotton Wool Spots (CWS), and Neovascularization (NV). In Chinese diabetics, the prevalence of DR is 18.45%, i.e., almost every 5 diabetics, there is a risk of blindness. The blind prevention and blind treatment work in China and even in the world is very severe.
The computer aided diagnosis technology based on machine learning and artificial intelligence has been developed so far, a great number of researchers have been put into relevant work, the related medical fields are also various, and doctors can be helped to diagnose the illness state of patients more accurately and efficiently. The development of this technology in the field of DR screening is one of the important directions. For example, (1) grading studies of the extent of glycoreticular lesions; (2) segmentation research of the glycoreticular lesion and related structures; (3) interpretation study of sugar network lesion judgment. The machine learning technology is accelerated to be deeply applied to ophthalmology, the existing disease diagnosis system is possibly thoroughly changed, the computer-aided diagnosis technology can effectively relieve the working pressure of ophthalmologists, the efficiency of clinical work is improved, the screening of large-scale population diseases is facilitated, and a new solution is provided for relieving the shortage of medical resources.
However, in the process of classifying and judging the degree of disease from the fundus retinal image, there are still many problems: first, because DR datasets have the particularity of their medical attributes, we face a great challenge in both the quantity and quality of datasets. Traditional deep learning algorithms usually require a large amount of data support, but DR data sets such as DDR, APTOS and Messidor-2 disclosed on the network are far from sufficient scale. Meanwhile, different data sets have the problem that the grading standards are inconsistent and nonuniform, and the more detailed grading of the sugar net is generally divided into five grades, namely normal, mild, moderate, severe and proliferative sugar nets. Even under the same standard, doctors with different organizations and different professional levels can make different judgments on the classification result of the disease condition, so that different data sets cannot be combined for use. In a single data set, the sources of fundus pictures are also five-flower eight-door, and the color, the definition, the contrast, the brightness, the size, the eyeball integrity and the like are different, and the quality levels are different. Second, there is a serious data imbalance phenomenon in the DR data set, including an imbalance in quantity and an imbalance in difficulty and ease. Normally, the number of normal eyegrounds will account for about half of the total data volume, and even higher, while the least significant data may be only 1/20 to 1/30 of the total data volume. The imbalance can cause that the machine learning model ignores the feature learning of a few categories in the training process, and pays more attention to samples of a plurality of categories, and finally causes the problems of poor model effect and high accuracy. In the DR data set, some types of data are difficult to distinguish while being small in quantity, such as DR1 and DR3, while DR4 is relatively easy to distinguish while being small in quantity.
Based on the problems of the DR data set, machine learning is still not accurate enough in the DR classification task, and a large improvement space exists. Therefore, the invention provides a fundus photo classification method and device based on a category weighting network, which are used for assisting in screening the diabetes mellitus and improving the DR classification accuracy.
Disclosure of Invention
The invention aims to solve at least one technical problem in the prior art and provides a fundus picture classifying method based on a class weighting network and a fundus picture classifying device based on the class weighting network.
In one aspect of the present invention, there is provided a fundus picture classification method based on a class weighting network, the method including the steps of:
reading a plurality of fundus picture data and tags thereof;
inputting the fundus photo data and the labels thereof into a class weighting network, training and constructing a class weighting network model, wherein the class weighting network model comprises the following steps:
performing primary feature extraction on the fundus picture data to obtain a primary extracted feature map;
respectively extracting the characteristics of the preliminarily extracted characteristic graph by using a channel dimension, a pixel dimension and a category dimension to respectively obtain a channel characteristic graph, a pixel characteristic graph and a category characteristic graph;
fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram;
converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;
reading fundus picture data to be recognized;
and inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.
Optionally, the preliminary feature extraction is performed on the fundus image data to obtain a preliminary extracted feature map, including:
performing primary feature extraction on the fundus picture by using a modified pre-training network to obtain a primary extracted feature map;
wherein the modified pre-training network does not include the last fully-connected layer of the pre-training network.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map includes:
using global average pooling of pixel dimensions to obtain features that ignore pixel dimensions, obtaining a channel weight distribution over conv _ block, wherein,
the specific relation of the structure of conv _ block is as follows:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,CBa conv _ block layer is represented,xindication inputEntering the characteristic diagram of the conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
and extracting the features of the preliminary feature map by using a channel feature extractor according to channel dimensions to obtain a channel feature map, wherein,
the specific relational expression of the structure of the channel feature extractor is as follows:
Figure DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,F c representing a channel feature map;F B representing a primary extracted feature map;GAP p representing global average pooling in pixel dimension;CBrepresenting a conv _ block layer;
Figure DEST_PATH_IMAGE003
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a pixel dimension to obtain a pixel feature map includes:
using global average pooling of channel dimensions to obtain the characteristics of neglecting the channel dimensions, and obtaining pixel weight distribution through conv _ block;
and performing feature extraction on the preliminary feature map by pixel dimension by using a pixel feature extractor to obtain a pixel feature map, wherein,
the specific relational expression of the structure of the pixel feature extractor is as follows:
Figure DEST_PATH_IMAGE004
wherein the content of the first and second substances,F p representing a pixel feature map;GAP c global average pooling representing channel dimensions;F B representing a primary extracted feature graph;CBrepresenting a conv _ block layer;
Figure DEST_PATH_IMAGE005
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
Optionally, the performing feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map includes:
adopting 1 × 1 convolution layer to extract the primary feature mapF B Is expanded intoKLayer of to obtainF K KThe specific relationship of (a) is as follows:
Figure DEST_PATH_IMAGE006
wherein the content of the first and second substances,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which the class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
Figure DEST_PATH_IMAGE007
wherein the content of the first and second substances,F B showing the preliminary extraction of the feature map,Conv K to representK1 by 1 of the convolution layer,GMP K is shown for each
Figure DEST_PATH_IMAGE008
Performing one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map
Figure DEST_PATH_IMAGE009
(ii) a The specific relation is as follows:
Figure DEST_PATH_IMAGE010
to pair
Figure DEST_PATH_IMAGE011
And executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,F T in order to be a type feature map,GAP C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,
Figure DEST_PATH_IMAGE013
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points. />
Optionally, the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
Figure DEST_PATH_IMAGE014
wherein the content of the first and second substances,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation representing the sample,
Figure DEST_PATH_IMAGE015
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
Figure DEST_PATH_IMAGE016
according to different types of gradient normg i To obtain a size ratio of the type weight.
Optionally, the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type identification result corresponding to the fundus picture label, includes:
the target feature graph is subjected to a global average pooling layer and a full connection layer to obtain final output;
the specific relation is as follows:
Figure DEST_PATH_IMAGE017
wherein the content of the first and second substances,outthe final output value of the model is a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;
Figure DEST_PATH_IMAGE018
representing pairs between different matricesThe elements that should be located are averaged out,FC H andFC N a fully-connected layer is shown,FC H the number of output channels of (a) is half of the number of input channels,FC N the number of output channels is the classification numberN
Optionally, before reading the plurality of fundus picture data and the tags thereof, the method further includes: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.
Optionally, after inputting the fundus image data and the label thereof into the class weighting network, training and constructing a class weighting network model, the method further includes:
comparing the type recognition result of the fundus picture with a real label thereof to calculate cross entropy loss, and reversely transmitting and updating the model parameters;
the specific relationship of the cross entropy loss is as follows:
Figure DEST_PATH_IMAGE019
wherein the content of the first and second substances,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
In another aspect of the present invention, there is provided a fundus image classification apparatus based on a class weighting network, including: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein the content of the first and second substances,
the first reading unit is used for reading a plurality of fundus picture data and labels thereof;
the model forming unit is used for inputting the fundus photo data and the labels thereof into a category weighting network, training and constructing a category weighting network model; wherein the model forming unit further comprises: a basic feature extractor, a channel feature extractor, a pixel feature extractor, a category feature extractor and a feature converter;
the basic feature extractor is used for performing preliminary feature extraction on the fundus picture data to obtain a preliminary extraction feature map;
the channel feature extractor is used for performing feature extraction on the preliminarily extracted feature map by channel dimensions to obtain a channel feature map;
the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map in pixel dimensions to obtain a pixel feature map;
the category feature extractor is used for extracting features of the preliminarily extracted feature map according to category dimensions to obtain a category feature map;
the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;
the second reading unit is also used for reading fundus photo data to be recognized;
and the class output unit is used for inputting the fundus picture data to be recognized into the class weighting network model and taking the class with the maximum model output probability as a class result of the fundus picture.
The invention provides a classification method and a classification device for fundus photos based on a class weighting network. Furthermore, in the determination of different classes of weights, the invention also provides that the reference is provided by calculating the type gradient norm, so that the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the model training phase is avoided.
Drawings
FIG. 1 is a block flow diagram of a method for classifying fundus images based on a class-weighted network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a class weighting network according to an embodiment of the present invention;
fig. 3 is a block diagram showing a configuration of a fundus image classification apparatus based on a class weighting network according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Unless otherwise specifically stated, technical or scientific terms used herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this invention belongs. The use of "including" or "comprising" and the like in this disclosure does not limit the presence or addition of any number, step, action, operation, component, element, and/or group thereof or does not preclude the presence or addition of one or more other different numbers, steps, actions, operations, components, elements, and/or groups thereof. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number and order of the indicated features.
As shown in fig. 1 and 2, the present invention provides a fundus image classification method S100 based on a class weighting network, which specifically includes steps S110 to S140:
it should be noted that the classification method of this embodiment includes two stages, which are a model training stage and a model practical application stage, respectively, where the model training stage includes steps S110 to S120, and the model practical application stage includes steps S130 to S140, that is, a classification model for identifying the category of the fundus image is established first, and then the category of the fundus image is identified based on the classification model, and a diabetic retinopathy grade can be obtained based on the identification result of the fundus image, that is, screening of diabetic retinopathy is achieved.
And S110, reading a plurality of fundus picture data and labels thereof.
Specifically, in this embodiment, the fundus photo data needs to be read, the label corresponding to the fundus photo also needs to be read, a list composed of a training data file path and training data label data can be transmitted, a corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.
Note that the fundus image label of this embodiment includes five disease severity levels of diabetic retinopathy, which are respectively denoted as DR0 to DR4. In the model training stage, after reading, data enhancement needs to be carried out on the data of the eye negative film.
As a further preferable scheme, the invention carries out random up-down turning, random left-right turning and random rotation processing on the fundus picture in the training stage. That is, pre-processing of data amplification and enhancement is performed on fundus pictures to form an enhanced data set, and fundus picture data in the enhanced data set is read.
And S120, training and constructing a class weighting network model according to the fundus photo data and the labels thereof, wherein the constructed model is used as a classification model of the fundus photos so as to intelligently identify the classes of the fundus photos.
Specifically, fundus picture data are input into a category weighting network to carry out type recognition training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining the constructed class weighting network model.
And S130, reading fundus picture data to be recognized.
Specifically, the corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.
In practical application, after fundus image data is read, the original image is directly input to the category weighting network model for type recognition without performing data enhancement processing.
And S140, inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the highest model output probability as a category result of the fundus picture.
It should be noted that, in the actual application process, no loss calculation is performed on the recognition result, and no model parameter is updated.
It should be further noted that the class weighting network used in the present invention implements data balance on the model structure level, and the network structure is as shown in fig. 2, by providing learning channels of different scales for different classes of data, it implements balance between data of different difficulty levels, improves accuracy of fundus picture type identification, and is different from the common data balance performed in the loss calculation process.
Furthermore, in the determination of different types of weights, the method avoids the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process by calculating the type gradient norm.
Specifically, the process of network modeling in step S120 is as follows:
firstly, carrying out primary feature extraction on an eye fundus picture by using a basic feature extractor to obtain a primary extracted feature mapF B
It should be noted that any current mature pre-training network, such as ResNet, inceptionNet, denseNet, and EfficeientNet, recently introduced by Google, inc., can be used and modified by the primitive feature extractor.
Specifically, the modification comprises: and deleting the last full connection layer of the original network, so that the network does not directly give a classification result, but performs primary feature extraction on the fundus picture to obtain a primary extracted feature map.
Secondly, extracting the characteristics of the channel dimensionality of the preliminarily extracted characteristic graph by using a channel characteristic extractor to obtain a channel characteristic graphF c
Specifically, the channel weight distribution is obtained by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:
Figure DEST_PATH_IMAGE020
wherein the content of the first and second substances,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layers,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
the specific relation of the structure of the channel feature extractor is as follows:
Figure DEST_PATH_IMAGE021
wherein the content of the first and second substances,F B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP p the global average pooling is performed on the pixel dimension, so that the characteristics of the pixel dimension are ignored by the model at the position, and the influence of the characteristics of different channel dimensions on the model is more concentrated;CBrepresenting the above-mentioned conv _ block layer,
Figure 893850DEST_PATH_IMAGE013
representing a matrix dot product, passingCBChannel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain the channel characteristic diagramF c
Thirdly, extracting the feature of the pixel dimension of the preliminarily extracted feature image by using a pixel feature extractor to obtain a pixel feature imageF P
Specifically, the pixel weight distribution is obtained by using global average pooling of channel dimensions and conv _ block, and the specific relational expression of the structure of the pixel feature extractor is as follows:
Figure DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,CBF B and
Figure DEST_PATH_IMAGE023
the meaning is the same as the above-mentioned relation,GAP c the global average pooling represents the channel dimension, and averages all channels in the feature map, compresses the average value to 1 channel, and enables the model to neglect the feature of the channel dimension and concentrate more on the contribution of different pixels to the model; throughCBPixel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain a pixel characteristic diagramF p
Fourthly, extracting the characteristics of the type dimensionality of the preliminarily extracted characteristic graph by using a category characteristic extractor to obtain a type characteristic graphF T
Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF B Is expanded intoKLayer of to obtainF K KThe specific relationship of (a) is as follows:
Figure 679272DEST_PATH_IMAGE006
wherein the content of the first and second substances,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be paired withiA class passesk i The characteristics of each channel are extracted,k i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on a model level.
Then to haveKCharacteristic diagram of individual channelF K According to typePooling the channels to obtain a feature map with dimension features of the channels omittedF N (ii) a The feature map ignoring channel dimension featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
Figure DEST_PATH_IMAGE024
wherein the content of the first and second substances,F B showing the preliminary extraction of the feature map,Conv K to representKA number of convolution layers of 1 x 1,GMP K is shown for each
Figure DEST_PATH_IMAGE025
Performing one-time maximum pooling on layer channels, i.e. pooling according to type channels to finally obtain the layer channels with neglected channel dimension characteristicsF N
Then toF N And performing global average pooling in the pixel dimension, further neglecting the features of the pixel dimension, and only focusing on the features in the learning type dimension. Obtaining type weight distribution through conv _ block, and then performing the operationF N Dot multiplication to obtain a preliminary type feature map
Figure DEST_PATH_IMAGE026
. The specific relation is as follows:
Figure DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
representing a preliminary type characteristic diagram, and other symbols have the same meanings as the relational expressions described above;
finally, to
Figure DEST_PATH_IMAGE029
Performing global average pooling of channel dimensions and conv _ blockTo the final type weight distribution, the specific relation is as follows:
Figure DEST_PATH_IMAGE030
wherein the content of the first and second substances,F T the other symbols have the same meaning as the above relation for the type profile obtained.
And fifthly, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a corresponding type identification result of the fundus photo label.
In particular, for the above channel characteristic diagramF C Pixel feature mapF P And type feature mapF T Obtaining final output through a global average pooling layer, a full connection layer and the like;
the specific relation is as follows:
Figure DEST_PATH_IMAGE031
wherein the content of the first and second substances,F C ,F P ,F T ReLU and GAP P Has the same meaning as the above-mentioned relation,
Figure 680595DEST_PATH_IMAGE018
meaning that the elements at corresponding positions between different matrices are averaged,FC H andFC N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index with the maximum value is selected as the final identification result of the model.
Furthermore, in order to avoid the great time and labor consumption of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process, the invention provides reference for the setting of the type weight by calculating the type gradient norm, and the specific relation formula is as follows:
Figure DEST_PATH_IMAGE032
wherein the content of the first and second substances,g i is shown asiThe type of the class is a gradient norm,n i denotes the firstiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirect output after model calculation (without softmax layer);
order top=softmax(out),yOne-hot vector representation representing samples, again because
Figure 690008DEST_PATH_IMAGE015
The calculation of the type gradient norm can be simplified, and the specific relation is as follows:
Figure 160304DEST_PATH_IMAGE016
according to different types of gradient normg i The size ratio of the type weight can be obtained.
It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform the training of the category identification process; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.
It should be further noted that, in the present invention, after obtaining the type recognition result in the training stage, the model needs to be updated to select the best effect as the final classification model of fundus images, so after the modeling, the type recognition step S120 for fundus images further includes the following steps:
and sixthly, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:
Figure DEST_PATH_IMAGE033
wherein the content of the first and second substances,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, training can be continued on the basis of the existing model, and the model with better effect is selected again.
In this embodiment, a fundus picture classification model is constructed based on the above process to screen the lesion degree of the diabetes mellitus corresponding to the fundus picture, and the processes S130 to S140 of performing type recognition on the fundus picture to be recognized based on the formed model include:
reading fundus picture data to be screened; by adopting the constructed class weighting network model, the read fundus picture data is input into the class weighting network model, the type of the fundus picture data to be recognized is recognized, and the class with the maximum model output probability is used as the class result of the fundus picture.
As shown in fig. 3, another aspect of the present invention provides a fundus image classification apparatus 200 based on a class weighting network, including: a first reading unit 210, a model forming unit 220, a second reading unit 230, and a category output unit 240; wherein, the first and the second end of the pipe are connected with each other,
the first reading unit 210 is configured to read data of a plurality of fundus pictures and tags thereof, that is, in the model training stage, the data of the fundus pictures and the corresponding tags need to be read by the first reading unit for subsequent model training;
a model forming unit 220, configured to input fundus picture data and labels thereof into a class weighting network, train and construct a class weighting network model, where the model is used as a classification model of fundus pictures to identify classes of fundus pictures; wherein, the model forming unit 220 further includes: a basic feature extractor 221, a channel feature extractor 222, a pixel feature extractor 223, a category feature extractor 224, and a feature converter 225;
a basic feature extractor 221, configured to perform preliminary feature extraction on fundus image data to obtain a preliminary extracted feature map;
a channel feature extractor 222, configured to perform feature extraction on the preliminary extracted feature map according to channel dimensions to obtain a channel feature map;
a pixel feature extractor 223, configured to perform feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;
a category feature extractor 224, configured to perform feature extraction on the preliminary extracted feature map according to category dimensions to obtain a category feature map;
a feature converter 225 for fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a category identification result corresponding to the fundus photo label;
a second reading unit 230 for reading fundus picture data to be recognized;
a category output unit 240, configured to input fundus picture data to be recognized to the category weighting network model formed by the above construction, and take the category with the highest model output probability as a category result of the fundus picture.
It should be noted that, in the model training stage, the first reading unit of this embodiment needs to read not only fundus image data, but also tags corresponding to fundus images, and may transmit a list composed of training data file paths and training data tag data, open corresponding images through an opencv-python (cv 2) library, and transmit the read data to subsequent tasks. In the practical application stage of the model, the second reading unit only reads the data of the fundus picture, and the label type of the fundus picture data is identified by using the model formed by training in the previous step.
It should be further noted that the fundus picture label of the present embodiment includes five disease severity levels of diabetic retinopathy, which are respectively designated as DR0 to DR4. In the training stage of the model, after the first reading unit is used for reading, data enhancement needs to be carried out on the data of the eye negative film. That is, the apparatus of this embodiment further includes an enhancing unit 250 (as shown in fig. 3), and in the training stage, the enhancing unit performs random up-down flip, random left-right flip, and random rotation processing on the fundus oculi, that is, performs data amplification and enhancement preprocessing on the fundus oculi to form an enhanced data set, and then reads fundus oculi photograph data in the enhanced data set.
Furthermore, the embodiment uses the basic feature extractor to perform the preliminary feature extraction on the fundus picture to obtain the preliminary extracted feature mapF B . The basic feature extractor can use and modify any current mature pre-training network, such as ResNet, inception Net, denseNet, and EfficeientNet, which is newly proposed by Google in recent years.
Specifically, the modification comprises: and deleting the last full connection layer of the original network, so that the network does not directly give a classification result, but performs primary feature extraction on the fundus picture to obtain a primary extracted feature map.
Further, in the embodiment, the channel feature extractor is used to extract the feature of the preliminarily extracted feature map channel dimension to obtain the channel feature mapF C The specific process is as follows:
obtaining channel weight distribution by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:
Figure DEST_PATH_IMAGE034
wherein the content of the first and second substances,CBrepresenting a conv _ block layer of the video signal,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the Batch Normalization of the mixture,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
the specific relationship of the structure of the channel feature extractor is as follows:
Figure DEST_PATH_IMAGE035
wherein, the first and the second end of the pipe are connected with each other,F B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP p representing global average pooling in pixel dimension, wherein the model ignores the characteristics of the pixel dimension and focuses more on the influence of the characteristics of different channel dimensions on the model;CBrepresenting the above-mentioned conv _ block layer,
Figure DEST_PATH_IMAGE036
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points to obtain a channel characteristic diagramF c
Further, the present embodiment uses a pixel feature extractor for preliminarily extracting features of the pixel dimensions of the feature imageExtracting to obtain a pixel feature mapF P The specific process is as follows:
and obtaining pixel weight distribution by using global average pooling of channel dimensions and conv _ block, wherein the specific relational expression of the structure of the pixel feature extractor is as follows:
Figure 687100DEST_PATH_IMAGE004
wherein the content of the first and second substances,CBF B and
Figure 74088DEST_PATH_IMAGE013
the meaning is the same as the above-mentioned relation,GAP c the global average pooling represents the channel dimension, and averages all channels in the feature map, compresses the average value to 1 channel, and enables the model to neglect the feature of the channel dimension and concentrate more on the contribution of different pixels to the model; throughCBPixel weight distribution and preliminary extraction feature map obtained after stratificationF B Multiplying the phase points to obtain a pixel characteristic diagramF p
Furthermore, in the embodiment, the class feature extractor is used to extract the feature of the preliminarily extracted feature map type dimension to obtain the type feature mapF T
Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF B Is expanded intoKLayer of to obtainF K KThe specific relation of (2) is as follows:
Figure 536293DEST_PATH_IMAGE006
wherein the content of the first and second substances,Nthe number of types of pictures that are represented,k i denotes the firstiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be compared withiA class passesk i The characteristics of each channel are extracted,k i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on the model level.
Then to haveKCharacteristic diagram of each channelF K Pooling by type channel to obtain a feature map ignoring channel dimension featuresF N (ii) a The feature map ignoring the channel dimension featureF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
Figure DEST_PATH_IMAGE037
wherein the content of the first and second substances,F B showing the preliminary extraction of the feature map,Conv K to representKA number of convolution layers of 1 x 1,GMP K is shown for each
Figure DEST_PATH_IMAGE038
Performing one-time maximum pooling on layer channels, i.e. pooling according to type channels to finally obtain the layer channels with neglected channel dimension characteristicsF N
Then toF N And performing global average pooling in the pixel dimension, further neglecting the features of the pixel dimension, and only focusing on the features in the learning type dimension. Obtaining type weight distribution through conv _ block, and then performing the operationF N Dot multiplication to obtain a preliminary type feature map
Figure 946415DEST_PATH_IMAGE009
. The specific relation is as follows:
Figure DEST_PATH_IMAGE039
/>
wherein the content of the first and second substances,
Figure 621110DEST_PATH_IMAGE009
representing a preliminary type characteristic diagram, and other symbols have the same meanings as the relational expressions described above;
finally, to
Figure 347757DEST_PATH_IMAGE009
And executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
Figure 495711DEST_PATH_IMAGE012
wherein the content of the first and second substances,F T the other symbols have the same meaning as the above relation for the type profile obtained.
Furthermore, in order to avoid the consumption of a lot of time and effort in training process, which requires manual adjustment of the weight parameters according to the past experience and repeated experiments of researchers, the present invention needs to provide reference for the setting of the above-mentioned type weights by calculating the type gradient norm, that is, the model forming unit of the present embodiment further includes a weight setting module, and the specific relation formula is as follows:
Figure 206178DEST_PATH_IMAGE014
wherein the content of the first and second substances,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class is,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirect output after model calculation (without softmax layer);
order top=softmax(out),yOne-hot vector representation of the representation sample, again because
Figure 950143DEST_PATH_IMAGE015
The type gradient norm can be calculatedThe line is simplified, and the specific relation is as follows:
Figure 62455DEST_PATH_IMAGE016
according to different types of gradient normg i The size ratio of the type weight can be obtained.
It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform type identification training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.
Furthermore, in this embodiment, a feature converter is used to fuse feature maps of different dimensions to obtain a final feature map, and then the final feature map is converted into a comprehensive type identification result of the fundus photo label, which includes the following specific processes:
for the above channel characteristic diagramF c Pixel feature mapF P And type feature mapF T Obtaining final output through a global average pooling layer, a full connection layer and the like;
the specific relation is as follows:
Figure DEST_PATH_IMAGE040
wherein the content of the first and second substances,F c F P F T ReLU andGAP P has the same meaning as the above-mentioned relation,
Figure 318993DEST_PATH_IMAGE018
indicating averaging of elements at corresponding positions between different matrices,FC H AndFC N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index of the maximum value is selected as the final type identification result of the model.
It should be noted that, in the present invention, after obtaining the fundus image classification result in the training stage, the model needs to be updated to select the model with the best effect as the final fundus image classification model, therefore, the model forming unit of this embodiment further includes an updating module, and the process of updating the model by using the updating module is as follows:
and comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:
Figure DEST_PATH_IMAGE041
wherein the content of the first and second substances,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, the training can be continued on the basis of the existing model, and the model with better effect is reselected.
The classification method of fundus images based on the class weighting network will be further described below with reference to specific embodiments:
example 1
The present example identifies the category of fundus photos with diabetic retinopathy, including the steps of:
s1, calculating a type gradient norm, and providing reference for type weight setting of a class weighting network, namely determining the type gradient norm before using the network;
s2, reading fundus photo data and a label, and performing data enhancement on the fundus photo data;
s3, performing primary feature extraction on the fundus picture by using a basic feature extractor to obtain a primary extracted feature map;
s4, extracting the feature of the channel dimension of the preliminarily extracted feature map by using a channel feature extractor to obtain a channel feature map;
s5, extracting the feature of the pixel dimension of the feature image which is preliminarily extracted by using a pixel feature extractor to obtain a pixel feature image;
s6, extracting the features of the type dimensions of the preliminarily extracted feature map by using a category feature extractor to obtain a type feature map;
s7, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a comprehensive identification result of the fundus photo label;
s8, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely propagating and updating the model parameters; the process of calculating the cyclic loss and updating the parameters is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained;
and S9, reading fundus picture data to be screened, inputting a category weighting network model, and taking the category with the highest model output probability as the category result of the fundus picture.
The invention provides a classification method and a classification device of fundus photos based on a category weighting network, which have the following beneficial effects:
first, the classification method and apparatus based on class weighting network provided by the present invention implement the balance between data with different difficulty levels by giving different scales of learning channels to data with different classes. Unlike the conventional data balancing in the loss calculation process, the data balancing method realizes the data balancing on the model structure level.
Secondly, the invention provides a simple and effective scheme on the basis of determining different types of weights, namely, the invention avoids the consumption of a great deal of time and energy in the experimental process of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers by calculating the type gradient norm. And these costs will appear exponentially larger as the number of categories increases and the size of the data scales.
Thirdly, the classification method and device based on the class weighting network provided by the invention can be used for extracting the characteristics of the fundus picture data from three different dimensions, thereby improving the performance and generalization capability of model characteristic extraction and reducing the influence caused by non-uniform data formats.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. A fundus photo classification method based on a class weighting network is characterized by comprising the following steps:
reading a plurality of fundus picture data and tags thereof;
inputting the fundus photo data and the labels thereof into a class weighting network, training and constructing a class weighting network model, wherein the class weighting network model comprises the following steps:
performing primary feature extraction on the fundus picture data to obtain a primary extracted feature map;
respectively extracting the characteristics of the preliminarily extracted characteristic graph by using a channel dimension, a pixel dimension and a category dimension to respectively obtain a channel characteristic graph, a pixel characteristic graph and a category characteristic graph;
fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram;
converting the target characteristic diagram into a type recognition result corresponding to the fundus photo label;
reading fundus picture data to be identified;
inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum output probability of the category weighting network model as a category result of the fundus picture; wherein the content of the first and second substances,
performing feature extraction on the preliminarily extracted feature map by using a category dimension to obtain a category feature map, wherein the method comprises the following steps:
adopting 1 × 1 convolution layer to extract the primary characteristic diagramF B Is expanded intoKLayer of to obtainF K KThe specific relation of (2) is as follows:
Figure 740734DEST_PATH_IMAGE001
wherein the content of the first and second substances,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which a class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relation being as follows:
Figure 189033DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,F B showing the preliminary extraction of the feature map,Conv K to representK1 by 1 of the convolution layer,GMP K is shown for each
Figure 624563DEST_PATH_IMAGE003
Performing one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map
Figure 550930DEST_PATH_IMAGE004
(ii) a The specific relation is as follows:
Figure 24024DEST_PATH_IMAGE005
to pair
Figure 174383DEST_PATH_IMAGE006
And executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
Figure 706996DEST_PATH_IMAGE007
wherein, the first and the second end of the pipe are connected with each other,F T in order to be a type feature map,GAMP C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,
Figure 561688DEST_PATH_IMAGE008
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points;
the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
Figure 761725DEST_PATH_IMAGE009
wherein the content of the first and second substances,g i is shown asiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entry loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation of the representation sample,
Figure 82985DEST_PATH_IMAGE010
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
Figure 711017DEST_PATH_IMAGE011
according to different types of gradient normg i To obtain a size ratio of the type weight.
2. The method according to claim 1, wherein the performing preliminary feature extraction on the fundus picture data to obtain a preliminary extracted feature map comprises:
performing primary feature extraction on the fundus picture by using a modified pre-training network to obtain a primary extracted feature map;
wherein the modified pre-training network does not include the last fully connected layer of the pre-training network.
3. The method according to claim 2, wherein the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map comprises:
using global average pooling of pixel dimensions to obtain features that ignore pixel dimensions, obtaining a channel weight distribution over conv _ block, wherein,
the specific relation of the structure of conv _ block is as follows:
Figure 775925DEST_PATH_IMAGE013
wherein, the first and the second end of the pipe are connected with each other,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;
and extracting the features of the preliminary feature map by using a channel feature extractor according to channel dimensions to obtain a channel feature map, wherein,
the specific relational expression of the structure of the channel feature extractor is as follows:
Figure 955103DEST_PATH_IMAGE014
wherein the content of the first and second substances,F c representing a channel feature map;F B representing a primary extracted feature map;GAP p representing global average pooling in pixel dimension;CBrepresenting a conv _ block layer;
Figure 916106DEST_PATH_IMAGE015
representing a matrix dot product, passingCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
4. The method according to claim 3, wherein the performing feature extraction on the preliminary extracted feature map in pixel dimension to obtain a pixel feature map comprises:
using global average pooling of channel dimensions to obtain the characteristics of neglecting the channel dimensions, and obtaining pixel weight distribution through conv _ block;
and performing feature extraction on the preliminary feature map by pixel dimension by using a pixel feature extractor to obtain a pixel feature map, wherein,
the specific relational expression of the structure of the pixel feature extractor is as follows:
Figure 813523DEST_PATH_IMAGE016
wherein, the first and the second end of the pipe are connected with each other,F p representing a pixel feature map;GAP c global average pooling representing channel dimensions;F B representing a primary extracted feature graph;CBrepresenting a conv _ block layer;
Figure 619805DEST_PATH_IMAGE017
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplication of the phase points.
5. The method according to claim 4, wherein the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type recognition result corresponding to the fundus picture label comprises:
the target feature graph is subjected to a global average pooling layer and a full connection layer to obtain final output;
the specific relation is as follows:
Figure 125261DEST_PATH_IMAGE018
wherein the content of the first and second substances,outas the final output value of the model, a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;
Figure 116219DEST_PATH_IMAGE019
meaning that the elements at corresponding positions between different matrices are averaged,FC H andFC N a fully-connected layer is shown,FC H the number of output channels of (a) is half of the number of input channels,FC N the number of output channels is the classification numberN
6. The method according to claim 1, wherein before reading the plurality of fundus picture data and the tags thereof, further comprising: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.
7. The method of claim 1, wherein after inputting the fundus picture data and the labels thereof into the class weighting network and training and constructing the class weighting network model, the method further comprises:
comparing the type recognition result of the fundus picture with the real label thereof to calculate cross entropy loss, and reversely transmitting and updating the parameters of the category weighting network model;
the specific relationship of the cross entropy loss is as follows:
Figure 376299DEST_PATH_IMAGE020
wherein the content of the first and second substances,x[class]representing input dataxThe class to which the actual belongs is a class,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).
8. An fundus image classification apparatus based on a class weighting network, comprising: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein, the first and the second end of the pipe are connected with each other,
the first reading unit is used for reading a plurality of fundus picture data and labels thereof;
the model forming unit is used for inputting the fundus photo data and the labels thereof into a category weighting network, training and constructing a category weighting network model; wherein the model forming unit further comprises: a basic feature extractor, a channel feature extractor, a pixel feature extractor, a category feature extractor and a feature converter;
the basic feature extractor is used for performing preliminary feature extraction on the fundus picture data to obtain a preliminary extraction feature map;
the channel feature extractor is used for performing feature extraction on the preliminarily extracted feature map by channel dimensions to obtain a channel feature map;
the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;
the category feature extractor is configured to perform feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map, and includes:
adopting 1 × 1 convolution layer to extract the primary characteristic diagramF B Is expanded intoKLayer of to obtainF K KThe specific relation of (2) is as follows:
Figure 110906DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,Nthe number of types of pictures that are represented,k i is shown asiThe number of channels to which a class is assigned,Ktotal number of channels for all types;
to haveKCharacteristic diagram of individual channelF K Pooling by type channel to obtain ignore channel dimension characteristicsCharacteristic diagram of featuresF K Said feature map ignoring channel dimensional featuresF N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:
Figure 936779DEST_PATH_IMAGE021
wherein the content of the first and second substances,F B showing the preliminary extraction of the feature map,Conv K representKA number of convolution layers of 1 x 1,GMP K is shown for each
Figure 242514DEST_PATH_IMAGE022
Performing one-time maximum pooling on the layer channel;
to pairF N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF N Dot multiplication to obtain a preliminary type feature map
Figure 583366DEST_PATH_IMAGE023
(ii) a The specific relation is as follows:
Figure 997030DEST_PATH_IMAGE024
to pair
Figure 677410DEST_PATH_IMAGE023
And executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:
Figure 275750DEST_PATH_IMAGE025
wherein, the first and the second end of the pipe are connected with each other,F T in order to be a type feature map,GAMP C indicating that global average pooling is done in the channel dimension,CBa conv _ block layer is represented,
Figure 979264DEST_PATH_IMAGE026
representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF B Multiplying the phase points;
the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:
Figure 58603DEST_PATH_IMAGE027
wherein the content of the first and second substances,g i denotes the firstiThe type of the class is a gradient norm,n i is shown asiThe number of samples of a class,L t representing a sampletCross entropy loss generated after model generation,out t to representiSample of classestDirectly outputting after model calculation;
order top=softmax(out),yA one-hot vector representation of the representation sample,
Figure 62331DEST_PATH_IMAGE028
the calculation of the type gradient norm is simplified, and the specific relation is as follows:
Figure 831573DEST_PATH_IMAGE029
according to different types of gradient normg i To obtain the size ratio of the type weight;
the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a type identification result corresponding to the fundus photo label;
the second reading unit is also used for reading fundus picture data to be identified;
and the category output unit is used for inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.
CN202211381516.6A 2022-11-07 2022-11-07 Fundus photo classification method and device based on class weighting network Active CN115424084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211381516.6A CN115424084B (en) 2022-11-07 2022-11-07 Fundus photo classification method and device based on class weighting network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211381516.6A CN115424084B (en) 2022-11-07 2022-11-07 Fundus photo classification method and device based on class weighting network

Publications (2)

Publication Number Publication Date
CN115424084A CN115424084A (en) 2022-12-02
CN115424084B true CN115424084B (en) 2023-03-24

Family

ID=84207786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211381516.6A Active CN115424084B (en) 2022-11-07 2022-11-07 Fundus photo classification method and device based on class weighting network

Country Status (1)

Country Link
CN (1) CN115424084B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907604A (en) * 2021-03-16 2021-06-04 南通大学 Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation
CN114648806A (en) * 2022-05-19 2022-06-21 山东科技大学 Multi-mechanism self-adaptive fundus image segmentation method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665457B (en) * 2018-05-16 2023-12-19 腾讯医疗健康(深圳)有限公司 Image recognition method, device, storage medium and computer equipment
CN109493954B (en) * 2018-12-20 2021-10-19 广东工业大学 SD-OCT image retinopathy detection system based on category distinguishing and positioning
AU2020101450A4 (en) * 2020-07-23 2020-08-27 .B.M.S, Rani Ms Retinal vascular disease detection from retinal fundus images using machine learning
AU2020103938A4 (en) * 2020-12-07 2021-02-11 Capital Medical University A classification method of diabetic retinopathy grade based on deep learning
CN114693961B (en) * 2020-12-11 2024-05-14 北京航空航天大学 Fundus photo classification method, fundus image processing method and fundus image processing system
CN112560948B (en) * 2020-12-15 2024-04-26 中南大学 Fundus image classification method and imaging method under data deviation
CN112869704B (en) * 2021-02-02 2022-06-17 苏州大学 Diabetic retinopathy area automatic segmentation method based on circulation self-adaptive multi-target weighting network
CN113011362A (en) * 2021-03-29 2021-06-22 吉林大学 Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism
CN113537395B (en) * 2021-08-09 2022-07-08 同济大学 Diabetic retinopathy image identification method based on fundus images
CN113768460B (en) * 2021-09-10 2023-11-14 北京鹰瞳科技发展股份有限公司 Fundus image analysis system, fundus image analysis method and electronic equipment
CN114019467A (en) * 2021-10-25 2022-02-08 哈尔滨工程大学 Radar signal identification and positioning method based on MobileNet model transfer learning
CN114494195A (en) * 2022-01-26 2022-05-13 南通大学 Small sample attention mechanism parallel twinning method for fundus image classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907604A (en) * 2021-03-16 2021-06-04 南通大学 Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation
CN114648806A (en) * 2022-05-19 2022-06-21 山东科技大学 Multi-mechanism self-adaptive fundus image segmentation method

Also Published As

Publication number Publication date
CN115424084A (en) 2022-12-02

Similar Documents

Publication Publication Date Title
CN108021916B (en) Deep learning diabetic retinopathy sorting technique based on attention mechanism
CN113011485B (en) Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device
CN111815574B (en) Fundus retina blood vessel image segmentation method based on rough set neural network
WO2018201632A1 (en) Artificial neural network and system for recognizing lesion in fundus image
CN110197493A (en) Eye fundus image blood vessel segmentation method
CN107423571A (en) Diabetic retinopathy identifying system based on eye fundus image
CN108537282A (en) A kind of diabetic retinopathy stage division using extra lightweight SqueezeNet networks
CN111938569A (en) Eye ground multi-disease classification detection method based on deep learning
CN110751637A (en) Diabetic retinopathy detection system, method, equipment and training system
Peng et al. Automatic staging for retinopathy of prematurity with deep feature fusion and ordinal classification strategy
CN111080643A (en) Method and device for classifying diabetes and related diseases based on fundus images
WO2022166399A1 (en) Fundus oculi disease auxiliary diagnosis method and apparatus based on bimodal deep learning
CN109464120A (en) A kind of screening for diabetic retinopathy method, apparatus and storage medium
CN112101424A (en) Generation method, identification device and equipment of retinopathy identification model
Bhati et al. Discriminative kernel convolution network for multi-label ophthalmic disease detection on imbalanced fundus image dataset
Liu Construction and verification of color fundus image retinal vessels segmentation algorithm under BP neural network
CN110443105A (en) The immunofluorescence image kenel recognition methods of autoimmunity antibody
CN110473176B (en) Image processing method and device, fundus image processing method and electronic equipment
Agarwal et al. A survey on recent developments in diabetic retinopathy detection through integration of deep learning
CN113887662A (en) Image classification method, device, equipment and medium based on residual error network
Dong et al. Supervised learning-based retinal vascular segmentation by m-unet full convolutional neural network
Miao et al. Classification of Diabetic Retinopathy Based on Multiscale Hybrid Attention Mechanism and Residual Algorithm
CN115424084B (en) Fundus photo classification method and device based on class weighting network
Guo et al. Early detection of retinopathy of prematurity (ROP) in retinal fundus images via convolutional neural networks
Triyadi et al. Deep learning in image classification using vgg-19 and residual networks for cataract detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant