CN115424084B

CN115424084B - Fundus photo classification method and device based on class weighting network

Info

Publication number: CN115424084B
Application number: CN202211381516.6A
Authority: CN
Inventors: 沈婷; 韩志科; 洪朝阳; 郑青青; 杨斌; 肖涵瑜
Original assignee: Zhejiang Provincial Peoples Hospital; Hangzhou City University
Current assignee: Zhejiang Provincial Peoples Hospital; Hangzhou City University
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-24
Anticipated expiration: 2042-11-07
Also published as: CN115424084A

Abstract

The invention provides a classification method and a classification device for fundus photos based on a class weighting network, and belongs to the technical field of image classification and ophthalmic medicine. The classification method comprises the following steps: reading a plurality of fundus picture data and tags thereof; inputting fundus picture data and labels thereof into a category weighting network, training and constructing a category weighting network model; reading fundus picture data to be recognized; and inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the maximum model output probability as the type result of the fundus picture. According to the class weighting network model, different class weights are given to different classes of data, balance among different difficult and easy data is achieved, reference is provided for the class weights by calculating the type gradient norm, and the consumption of a large amount of time and energy for manually adjusting the weights through repeated experiments of researchers in the model training stage is avoided.

Description

Fundus photo classification method and device based on class weighting network

Technical Field

The invention belongs to the technical field of image classification and ophthalmologic medicine, and particularly relates to a classification method and a classification device for fundus pictures based on a class weighting network.

Background

The vision is the information receiver which people know the world and obtain the most knowledge, however, with the development of society, the eye pressure of people on work and life is increased, in addition, the eye tissue is damaged indirectly by common diseases, and the eye health becomes a non-negligible challenge for people. One of the most common diseases that can cause ocular complications is Diabetes Mellitus (DM), which is one of the most serious and common chronic diseases in our age. According to the recent statistical data of the International Diabetes Federation (IDF), the prevalence of Diabetes in the 20-79 year old population worldwide was estimated to be 10.5% (5.366 million people) in 2021, and will rise to 12.2% (7.832 million people) by 2045. Ocular complications resulting from diabetes are called Diabetic Retinopathy (DR), which is the leading cause of blindness in adults, and the primary lesions include Microaneurysms (MAs), "punctate" or "spotted" bleeding (HEs), hard exudation (Ex), cotton Wool Spots (CWS), and Neovascularization (NV). In Chinese diabetics, the prevalence of DR is 18.45%, i.e., almost every 5 diabetics, there is a risk of blindness. The blind prevention and blind treatment work in China and even in the world is very severe.

The computer aided diagnosis technology based on machine learning and artificial intelligence has been developed so far, a great number of researchers have been put into relevant work, the related medical fields are also various, and doctors can be helped to diagnose the illness state of patients more accurately and efficiently. The development of this technology in the field of DR screening is one of the important directions. For example, (1) grading studies of the extent of glycoreticular lesions; (2) segmentation research of the glycoreticular lesion and related structures; (3) interpretation study of sugar network lesion judgment. The machine learning technology is accelerated to be deeply applied to ophthalmology, the existing disease diagnosis system is possibly thoroughly changed, the computer-aided diagnosis technology can effectively relieve the working pressure of ophthalmologists, the efficiency of clinical work is improved, the screening of large-scale population diseases is facilitated, and a new solution is provided for relieving the shortage of medical resources.

However, in the process of classifying and judging the degree of disease from the fundus retinal image, there are still many problems: first, because DR datasets have the particularity of their medical attributes, we face a great challenge in both the quantity and quality of datasets. Traditional deep learning algorithms usually require a large amount of data support, but DR data sets such as DDR, APTOS and Messidor-2 disclosed on the network are far from sufficient scale. Meanwhile, different data sets have the problem that the grading standards are inconsistent and nonuniform, and the more detailed grading of the sugar net is generally divided into five grades, namely normal, mild, moderate, severe and proliferative sugar nets. Even under the same standard, doctors with different organizations and different professional levels can make different judgments on the classification result of the disease condition, so that different data sets cannot be combined for use. In a single data set, the sources of fundus pictures are also five-flower eight-door, and the color, the definition, the contrast, the brightness, the size, the eyeball integrity and the like are different, and the quality levels are different. Second, there is a serious data imbalance phenomenon in the DR data set, including an imbalance in quantity and an imbalance in difficulty and ease. Normally, the number of normal eyegrounds will account for about half of the total data volume, and even higher, while the least significant data may be only 1/20 to 1/30 of the total data volume. The imbalance can cause that the machine learning model ignores the feature learning of a few categories in the training process, and pays more attention to samples of a plurality of categories, and finally causes the problems of poor model effect and high accuracy. In the DR data set, some types of data are difficult to distinguish while being small in quantity, such as DR1 and DR3, while DR4 is relatively easy to distinguish while being small in quantity.

Based on the problems of the DR data set, machine learning is still not accurate enough in the DR classification task, and a large improvement space exists. Therefore, the invention provides a fundus photo classification method and device based on a category weighting network, which are used for assisting in screening the diabetes mellitus and improving the DR classification accuracy.

Disclosure of Invention

The invention aims to solve at least one technical problem in the prior art and provides a fundus picture classifying method based on a class weighting network and a fundus picture classifying device based on the class weighting network.

In one aspect of the present invention, there is provided a fundus picture classification method based on a class weighting network, the method including the steps of:

reading a plurality of fundus picture data and tags thereof;

inputting the fundus photo data and the labels thereof into a class weighting network, training and constructing a class weighting network model, wherein the class weighting network model comprises the following steps:

performing primary feature extraction on the fundus picture data to obtain a primary extracted feature map;

respectively extracting the characteristics of the preliminarily extracted characteristic graph by using a channel dimension, a pixel dimension and a category dimension to respectively obtain a channel characteristic graph, a pixel characteristic graph and a category characteristic graph;

fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram;

converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;

reading fundus picture data to be recognized;

and inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.

Optionally, the preliminary feature extraction is performed on the fundus image data to obtain a preliminary extracted feature map, including:

performing primary feature extraction on the fundus picture by using a modified pre-training network to obtain a primary extracted feature map;

wherein the modified pre-training network does not include the last fully-connected layer of the pre-training network.

Optionally, the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map includes:

using global average pooling of pixel dimensions to obtain features that ignore pixel dimensions, obtaining a channel weight distribution over conv _ block, wherein,

the specific relation of the structure of conv _ block is as follows:

wherein,CBa conv _ block layer is represented,xindication inputEntering the characteristic diagram of the conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;

and extracting the features of the preliminary feature map by using a channel feature extractor according to channel dimensions to obtain a channel feature map, wherein,

the specific relational expression of the structure of the channel feature extractor is as follows:

wherein,F _c representing a channel feature map;F _B representing a primary extracted feature map;GAP _p representing global average pooling in pixel dimension;CBrepresenting a conv _ block layer;

representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF _B Multiplication of the phase points.

Optionally, the performing feature extraction on the preliminary extracted feature map by using a pixel dimension to obtain a pixel feature map includes:

using global average pooling of channel dimensions to obtain the characteristics of neglecting the channel dimensions, and obtaining pixel weight distribution through conv _ block;

and performing feature extraction on the preliminary feature map by pixel dimension by using a pixel feature extractor to obtain a pixel feature map, wherein,

the specific relational expression of the structure of the pixel feature extractor is as follows:

wherein,F _p representing a pixel feature map;GAP _c global average pooling representing channel dimensions;F _B representing a primary extracted feature graph;CBrepresenting a conv _ block layer;

Optionally, the performing feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map includes:

adopting 1 × 1 convolution layer to extract the primary feature mapF _B Is expanded intoKLayer of to obtainF _K ，KThe specific relationship of (a) is as follows:

wherein,Nthe number of types of pictures that are represented,k _i is shown asiThe number of channels to which the class is assigned,Ktotal number of channels for all types;

to haveKCharacteristic diagram of individual channelF _K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF _K Said feature map ignoring channel dimensional featuresF _N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:

wherein,F _B showing the preliminary extraction of the feature map,Conv _K to representK1 by 1 of the convolution layer,GMP _K is shown for each

Performing one-time maximum pooling on the layer channel;

to pairF _N Performing global average pooling on pixel dimensions to obtain a characteristic graph ignoring the pixel dimensions, obtaining type weight distribution through conv _ block, and performing the operation ofF _N Dot multiplication to obtain a preliminary type feature map

(ii) a The specific relation is as follows:

to pair

And executing global average pooling and conv _ block of the channel dimension to obtain final type weight distribution, wherein the specific relation is as follows:

wherein,F _T in order to be a type feature map,GAP _C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,

representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF _B Multiplication of the phase points. />

Optionally, the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:

wherein,g _i denotes the firstiThe type of the class is a gradient norm,n _i is shown asiThe number of samples of a class,L _t representing a sampletCross entropy loss generated after model generation,out _t to representiSample of classestDirectly outputting after model calculation;

order top=softmax(out)，yA one-hot vector representation representing the sample,

the calculation of the type gradient norm is simplified, and the specific relation is as follows:

according to different types of gradient normg _i To obtain a size ratio of the type weight.

Optionally, the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type identification result corresponding to the fundus picture label, includes:

the target feature graph is subjected to a global average pooling layer and a full connection layer to obtain final output;

the specific relation is as follows:

wherein,outthe final output value of the model is a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;

representing pairs between different matricesThe elements that should be located are averaged out,FC _H andFC _N a fully-connected layer is shown,FC _H the number of output channels of (a) is half of the number of input channels,FC _N the number of output channels is the classification numberN。

Optionally, before reading the plurality of fundus picture data and the tags thereof, the method further includes: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.

Optionally, after inputting the fundus image data and the label thereof into the class weighting network, training and constructing a class weighting network model, the method further includes:

comparing the type recognition result of the fundus picture with a real label thereof to calculate cross entropy loss, and reversely transmitting and updating the model parameters;

the specific relationship of the cross entropy loss is as follows:

wherein,x[class]representing input dataxThe category to which the real thing belongs to,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).

In another aspect of the present invention, there is provided a fundus image classification apparatus based on a class weighting network, including: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein,

the first reading unit is used for reading a plurality of fundus picture data and labels thereof;

the model forming unit is used for inputting the fundus photo data and the labels thereof into a category weighting network, training and constructing a category weighting network model; wherein the model forming unit further comprises: a basic feature extractor, a channel feature extractor, a pixel feature extractor, a category feature extractor and a feature converter;

the basic feature extractor is used for performing preliminary feature extraction on the fundus picture data to obtain a preliminary extraction feature map;

the channel feature extractor is used for performing feature extraction on the preliminarily extracted feature map by channel dimensions to obtain a channel feature map;

the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map in pixel dimensions to obtain a pixel feature map;

the category feature extractor is used for extracting features of the preliminarily extracted feature map according to category dimensions to obtain a category feature map;

the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a category identification result corresponding to the fundus photo label;

the second reading unit is also used for reading fundus photo data to be recognized;

and the class output unit is used for inputting the fundus picture data to be recognized into the class weighting network model and taking the class with the maximum model output probability as a class result of the fundus picture.

The invention provides a classification method and a classification device for fundus photos based on a class weighting network. Furthermore, in the determination of different classes of weights, the invention also provides that the reference is provided by calculating the type gradient norm, so that the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the model training phase is avoided.

Drawings

FIG. 1 is a block flow diagram of a method for classifying fundus images based on a class-weighted network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a class weighting network according to an embodiment of the present invention;

fig. 3 is a block diagram showing a configuration of a fundus image classification apparatus based on a class weighting network according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

Unless otherwise specifically stated, technical or scientific terms used herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this invention belongs. The use of "including" or "comprising" and the like in this disclosure does not limit the presence or addition of any number, step, action, operation, component, element, and/or group thereof or does not preclude the presence or addition of one or more other different numbers, steps, actions, operations, components, elements, and/or groups thereof. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number and order of the indicated features.

As shown in fig. 1 and 2, the present invention provides a fundus image classification method S100 based on a class weighting network, which specifically includes steps S110 to S140:

it should be noted that the classification method of this embodiment includes two stages, which are a model training stage and a model practical application stage, respectively, where the model training stage includes steps S110 to S120, and the model practical application stage includes steps S130 to S140, that is, a classification model for identifying the category of the fundus image is established first, and then the category of the fundus image is identified based on the classification model, and a diabetic retinopathy grade can be obtained based on the identification result of the fundus image, that is, screening of diabetic retinopathy is achieved.

And S110, reading a plurality of fundus picture data and labels thereof.

Specifically, in this embodiment, the fundus photo data needs to be read, the label corresponding to the fundus photo also needs to be read, a list composed of a training data file path and training data label data can be transmitted, a corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.

Note that the fundus image label of this embodiment includes five disease severity levels of diabetic retinopathy, which are respectively denoted as DR0 to DR4. In the model training stage, after reading, data enhancement needs to be carried out on the data of the eye negative film.

As a further preferable scheme, the invention carries out random up-down turning, random left-right turning and random rotation processing on the fundus picture in the training stage. That is, pre-processing of data amplification and enhancement is performed on fundus pictures to form an enhanced data set, and fundus picture data in the enhanced data set is read.

And S120, training and constructing a class weighting network model according to the fundus photo data and the labels thereof, wherein the constructed model is used as a classification model of the fundus photos so as to intelligently identify the classes of the fundus photos.

Specifically, fundus picture data are input into a category weighting network to carry out type recognition training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining the constructed class weighting network model.

And S130, reading fundus picture data to be recognized.

Specifically, the corresponding picture is opened through an opencv-python (cv 2) library, and the read data is transmitted to a subsequent task.

In practical application, after fundus image data is read, the original image is directly input to the category weighting network model for type recognition without performing data enhancement processing.

And S140, inputting the fundus picture data to be recognized into a category weighting network model, and taking the category with the highest model output probability as a category result of the fundus picture.

It should be noted that, in the actual application process, no loss calculation is performed on the recognition result, and no model parameter is updated.

It should be further noted that the class weighting network used in the present invention implements data balance on the model structure level, and the network structure is as shown in fig. 2, by providing learning channels of different scales for different classes of data, it implements balance between data of different difficulty levels, improves accuracy of fundus picture type identification, and is different from the common data balance performed in the loss calculation process.

Furthermore, in the determination of different types of weights, the method avoids the consumption of a great deal of time and energy for manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process by calculating the type gradient norm.

Specifically, the process of network modeling in step S120 is as follows:

firstly, carrying out primary feature extraction on an eye fundus picture by using a basic feature extractor to obtain a primary extracted feature mapF _B 。

It should be noted that any current mature pre-training network, such as ResNet, inceptionNet, denseNet, and EfficeientNet, recently introduced by Google, inc., can be used and modified by the primitive feature extractor.

Specifically, the modification comprises: and deleting the last full connection layer of the original network, so that the network does not directly give a classification result, but performs primary feature extraction on the fundus picture to obtain a primary extracted feature map.

Secondly, extracting the characteristics of the channel dimensionality of the preliminarily extracted characteristic graph by using a channel characteristic extractor to obtain a channel characteristic graphF _c 。

Specifically, the channel weight distribution is obtained by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:

wherein,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layers,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;

the specific relation of the structure of the channel feature extractor is as follows:

wherein,F _B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP _p the global average pooling is performed on the pixel dimension, so that the characteristics of the pixel dimension are ignored by the model at the position, and the influence of the characteristics of different channel dimensions on the model is more concentrated;CBrepresenting the above-mentioned conv _ block layer,

representing a matrix dot product, passingCBChannel weight distribution and preliminary extraction feature map obtained after stratificationF _B Multiplying the phase points to obtain the channel characteristic diagramF _c 。

Thirdly, extracting the feature of the pixel dimension of the preliminarily extracted feature image by using a pixel feature extractor to obtain a pixel feature imageF _P 。

Specifically, the pixel weight distribution is obtained by using global average pooling of channel dimensions and conv _ block, and the specific relational expression of the structure of the pixel feature extractor is as follows:

wherein,CB，F _B and

the meaning is the same as the above-mentioned relation,GAP _c the global average pooling represents the channel dimension, and averages all channels in the feature map, compresses the average value to 1 channel, and enables the model to neglect the feature of the channel dimension and concentrate more on the contribution of different pixels to the model; throughCBPixel weight distribution and preliminary extraction feature map obtained after stratificationF _B Multiplying the phase points to obtain a pixel characteristic diagramF _p 。

Fourthly, extracting the characteristics of the type dimensionality of the preliminarily extracted characteristic graph by using a category characteristic extractor to obtain a type characteristic graphF _T 。

Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF _B Is expanded intoKLayer of to obtainF _K ，KThe specific relationship of (a) is as follows:

wherein,Nthe number of types of pictures that are represented,k _i is shown asiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be paired withiA class passesk _i The characteristics of each channel are extracted,k _i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on a model level.

Then to haveKCharacteristic diagram of individual channelF _K According to typePooling the channels to obtain a feature map with dimension features of the channels omittedF _N (ii) a The feature map ignoring channel dimension featuresF _N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:

wherein,F _B showing the preliminary extraction of the feature map,Conv _K to representKA number of convolution layers of 1 x 1,GMP _K is shown for each

Performing one-time maximum pooling on layer channels, i.e. pooling according to type channels to finally obtain the layer channels with neglected channel dimension characteristicsF _N 。

Then toF _N And performing global average pooling in the pixel dimension, further neglecting the features of the pixel dimension, and only focusing on the features in the learning type dimension. Obtaining type weight distribution through conv _ block, and then performing the operationF _N Dot multiplication to obtain a preliminary type feature map

. The specific relation is as follows:

wherein,

representing a preliminary type characteristic diagram, and other symbols have the same meanings as the relational expressions described above;

finally, to

Performing global average pooling of channel dimensions and conv _ blockTo the final type weight distribution, the specific relation is as follows:

wherein,F _T the other symbols have the same meaning as the above relation for the type profile obtained.

And fifthly, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a corresponding type identification result of the fundus photo label.

In particular, for the above channel characteristic diagramF _C Pixel feature mapF _P And type feature mapF _T Obtaining final output through a global average pooling layer, a full connection layer and the like;

the specific relation is as follows:

wherein,F _C ，F _P ，F _T ReLU and GAP _P Has the same meaning as the above-mentioned relation,

meaning that the elements at corresponding positions between different matrices are averaged,FC _H andFC _N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index with the maximum value is selected as the final identification result of the model.

Furthermore, in order to avoid the great time and labor consumption of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers in the training process, the invention provides reference for the setting of the type weight by calculating the type gradient norm, and the specific relation formula is as follows:

wherein,g _i is shown asiThe type of the class is a gradient norm,n _i denotes the firstiThe number of samples of a class,L _t representing a sampletCross entropy loss generated after model generation,out _t to representiSample of classestDirect output after model calculation (without softmax layer);

order top=softmax(out)，yOne-hot vector representation representing samples, again because

The calculation of the type gradient norm can be simplified, and the specific relation is as follows:

according to different types of gradient normg _i The size ratio of the type weight can be obtained.

It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform the training of the category identification process; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.

It should be further noted that, in the present invention, after obtaining the type recognition result in the training stage, the model needs to be updated to select the best effect as the final classification model of fundus images, so after the modeling, the type recognition step S120 for fundus images further includes the following steps:

and sixthly, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:

It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, training can be continued on the basis of the existing model, and the model with better effect is selected again.

In this embodiment, a fundus picture classification model is constructed based on the above process to screen the lesion degree of the diabetes mellitus corresponding to the fundus picture, and the processes S130 to S140 of performing type recognition on the fundus picture to be recognized based on the formed model include:

reading fundus picture data to be screened; by adopting the constructed class weighting network model, the read fundus picture data is input into the class weighting network model, the type of the fundus picture data to be recognized is recognized, and the class with the maximum model output probability is used as the class result of the fundus picture.

As shown in fig. 3, another aspect of the present invention provides a fundus image classification apparatus 200 based on a class weighting network, including: a first reading unit 210, a model forming unit 220, a second reading unit 230, and a category output unit 240; wherein,

the first reading unit 210 is configured to read data of a plurality of fundus pictures and tags thereof, that is, in the model training stage, the data of the fundus pictures and the corresponding tags need to be read by the first reading unit for subsequent model training;

a model forming unit 220, configured to input fundus picture data and labels thereof into a class weighting network, train and construct a class weighting network model, where the model is used as a classification model of fundus pictures to identify classes of fundus pictures; wherein, the model forming unit 220 further includes: a basic feature extractor 221, a channel feature extractor 222, a pixel feature extractor 223, a category feature extractor 224, and a feature converter 225;

a basic feature extractor 221, configured to perform preliminary feature extraction on fundus image data to obtain a preliminary extracted feature map;

a channel feature extractor 222, configured to perform feature extraction on the preliminary extracted feature map according to channel dimensions to obtain a channel feature map;

a pixel feature extractor 223, configured to perform feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;

a category feature extractor 224, configured to perform feature extraction on the preliminary extracted feature map according to category dimensions to obtain a category feature map;

a feature converter 225 for fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a category identification result corresponding to the fundus photo label;

a second reading unit 230 for reading fundus picture data to be recognized;

a category output unit 240, configured to input fundus picture data to be recognized to the category weighting network model formed by the above construction, and take the category with the highest model output probability as a category result of the fundus picture.

It should be noted that, in the model training stage, the first reading unit of this embodiment needs to read not only fundus image data, but also tags corresponding to fundus images, and may transmit a list composed of training data file paths and training data tag data, open corresponding images through an opencv-python (cv 2) library, and transmit the read data to subsequent tasks. In the practical application stage of the model, the second reading unit only reads the data of the fundus picture, and the label type of the fundus picture data is identified by using the model formed by training in the previous step.

It should be further noted that the fundus picture label of the present embodiment includes five disease severity levels of diabetic retinopathy, which are respectively designated as DR0 to DR4. In the training stage of the model, after the first reading unit is used for reading, data enhancement needs to be carried out on the data of the eye negative film. That is, the apparatus of this embodiment further includes an enhancing unit 250 (as shown in fig. 3), and in the training stage, the enhancing unit performs random up-down flip, random left-right flip, and random rotation processing on the fundus oculi, that is, performs data amplification and enhancement preprocessing on the fundus oculi to form an enhanced data set, and then reads fundus oculi photograph data in the enhanced data set.

Furthermore, the embodiment uses the basic feature extractor to perform the preliminary feature extraction on the fundus picture to obtain the preliminary extracted feature mapF _B . The basic feature extractor can use and modify any current mature pre-training network, such as ResNet, inception Net, denseNet, and EfficeientNet, which is newly proposed by Google in recent years.

Further, in the embodiment, the channel feature extractor is used to extract the feature of the preliminarily extracted feature map channel dimension to obtain the channel feature mapF _C The specific process is as follows:

obtaining channel weight distribution by using global average pooling of pixel dimensions and conv _ block; the specific relation of the structure of conv _ block is as follows:

wherein,CBrepresenting a conv _ block layer of the video signal,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the Batch Normalization of the mixture,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;

the specific relationship of the structure of the channel feature extractor is as follows:

wherein,F _B photograph showing the fundus of the eyePAfter passing through a basic feature extractor, obtaining a primary extracted feature map;GAP _p representing global average pooling in pixel dimension, wherein the model ignores the characteristics of the pixel dimension and focuses more on the influence of the characteristics of different channel dimensions on the model;CBrepresenting the above-mentioned conv _ block layer,

representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF _B Multiplying the phase points to obtain a channel characteristic diagramF _c 。

Further, the present embodiment uses a pixel feature extractor for preliminarily extracting features of the pixel dimensions of the feature imageExtracting to obtain a pixel feature mapF _P The specific process is as follows:

and obtaining pixel weight distribution by using global average pooling of channel dimensions and conv _ block, wherein the specific relational expression of the structure of the pixel feature extractor is as follows:

wherein,CB，F _B and

Furthermore, in the embodiment, the class feature extractor is used to extract the feature of the preliminarily extracted feature map type dimension to obtain the type feature mapF _T 。

Specifically, 1 × 1 convolution layer is used for extracting a characteristic diagram preliminarilyF _B Is expanded intoKLayer of to obtainF _K ，KThe specific relation of (2) is as follows:

wherein,Nthe number of types of pictures that are represented,k _i denotes the firstiThe number of channels, i.e. the type weight,Kthe total number of channels of all types, i.e. the total weight; the model can be compared withiA class passesk _i The characteristics of each channel are extracted,k _i the larger the value of (A), the more the model can understand and extract features from a plurality of angles; different types of weights are given to data with different difficulty degrees, and the data can be balanced on the model level.

Then to haveKCharacteristic diagram of each channelF _K Pooling by type channel to obtain a feature map ignoring channel dimension featuresF _N (ii) a The feature map ignoring the channel dimension featureF _N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:

. The specific relation is as follows:

wherein,

finally, to

Furthermore, in order to avoid the consumption of a lot of time and effort in training process, which requires manual adjustment of the weight parameters according to the past experience and repeated experiments of researchers, the present invention needs to provide reference for the setting of the above-mentioned type weights by calculating the type gradient norm, that is, the model forming unit of the present embodiment further includes a weight setting module, and the specific relation formula is as follows:

wherein,g _i denotes the firstiThe type of the class is a gradient norm,n _i is shown asiThe number of samples of a class is,L _t representing a sampletCross entropy loss generated after model generation,out _t to representiSample of classestDirect output after model calculation (without softmax layer);

order top=softmax(out)，yOne-hot vector representation of the representation sample, again because

The type gradient norm can be calculatedThe line is simplified, and the specific relation is as follows:

It should be noted that, in the present embodiment, when calculating the type gradient norm, it is necessary to set the initial type weight of the category weighting network to 5 uniformly, and then input the fundus image data into the category weighting network to perform type identification training; then, the recognition result and the real label are used as parameters of a loss function together for loss calculation, and model parameters are updated through back propagation; and (4) circulating the above operations until the loss tends to be stable and does not decrease any more, and obtaining a preliminarily converged class weighting network model. From this model, the type gradient norm of the corresponding fundus picture data can be calculated.

Furthermore, in this embodiment, a feature converter is used to fuse feature maps of different dimensions to obtain a final feature map, and then the final feature map is converted into a comprehensive type identification result of the fundus photo label, which includes the following specific processes:

for the above channel characteristic diagramF _c Pixel feature mapF _P And type feature mapF _T Obtaining final output through a global average pooling layer, a full connection layer and the like;

the specific relation is as follows:

wherein,F _c ，F _P ，F _T ReLU andGAP _P has the same meaning as the above-mentioned relation,

indicating averaging of elements at corresponding positions between different matrices，FC _H AndFC _N all represent the full-connection layer, the number of output channels of the former is half of the number of input channels, and the number of output channels of the latter is the classification number N;outand the final output of the model is a batch N-dimensional vector, the value of the vector element represents the possibility of the corresponding position model identification type, and the position index of the maximum value is selected as the final type identification result of the model.

It should be noted that, in the present invention, after obtaining the fundus image classification result in the training stage, the model needs to be updated to select the model with the best effect as the final fundus image classification model, therefore, the model forming unit of this embodiment further includes an updating module, and the process of updating the model by using the updating module is as follows:

and comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely transmitting and updating the model parameters. And (3) a cyclic loss calculation and parameter updating process is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained, wherein the loss is calculated by adopting a cross entropy loss function, and the specific relation is as follows:

It should be noted that, in the training stage, each round of loss calculation is used to update the model, and in this embodiment, the model is saved once every certain number of rounds, and finally, the model with the best effect is selected as the final fundus photo classification model. In the actual application stage, the model only outputs the classification result of the fundus picture without updating the model, so that the loss does not need to be calculated. Of course, if the later input data is gradually increased and the model is desired to be further updated, the training can be continued on the basis of the existing model, and the model with better effect is reselected.

The classification method of fundus images based on the class weighting network will be further described below with reference to specific embodiments:

example 1

The present example identifies the category of fundus photos with diabetic retinopathy, including the steps of:

s1, calculating a type gradient norm, and providing reference for type weight setting of a class weighting network, namely determining the type gradient norm before using the network;

s2, reading fundus photo data and a label, and performing data enhancement on the fundus photo data;

s3, performing primary feature extraction on the fundus picture by using a basic feature extractor to obtain a primary extracted feature map;

s4, extracting the feature of the channel dimension of the preliminarily extracted feature map by using a channel feature extractor to obtain a channel feature map;

s5, extracting the feature of the pixel dimension of the feature image which is preliminarily extracted by using a pixel feature extractor to obtain a pixel feature image;

s6, extracting the features of the type dimensions of the preliminarily extracted feature map by using a category feature extractor to obtain a type feature map;

s7, fusing the feature maps with different dimensions by using a feature converter to obtain a final feature map, and converting the final feature map into a comprehensive identification result of the fundus photo label;

s8, comparing the recognition result of the fundus picture with the real label of the fundus picture to calculate the cross entropy loss, and reversely propagating and updating the model parameters; the process of calculating the cyclic loss and updating the parameters is carried out until the loss tends to be stable and does not decrease, and a category weighting network model is obtained;

and S9, reading fundus picture data to be screened, inputting a category weighting network model, and taking the category with the highest model output probability as the category result of the fundus picture.

The invention provides a classification method and a classification device of fundus photos based on a category weighting network, which have the following beneficial effects:

first, the classification method and apparatus based on class weighting network provided by the present invention implement the balance between data with different difficulty levels by giving different scales of learning channels to data with different classes. Unlike the conventional data balancing in the loss calculation process, the data balancing method realizes the data balancing on the model structure level.

Secondly, the invention provides a simple and effective scheme on the basis of determining different types of weights, namely, the invention avoids the consumption of a great deal of time and energy in the experimental process of manually adjusting the weight parameters according to the past experience and repeated experiments of researchers by calculating the type gradient norm. And these costs will appear exponentially larger as the number of categories increases and the size of the data scales.

Thirdly, the classification method and device based on the class weighting network provided by the invention can be used for extracting the characteristics of the fundus picture data from three different dimensions, thereby improving the performance and generalization capability of model characteristic extraction and reducing the influence caused by non-uniform data formats.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A fundus photo classification method based on a class weighting network is characterized by comprising the following steps:

reading a plurality of fundus picture data and tags thereof;

converting the target characteristic diagram into a type recognition result corresponding to the fundus photo label;

reading fundus picture data to be identified;

inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum output probability of the category weighting network model as a category result of the fundus picture; wherein,

performing feature extraction on the preliminarily extracted feature map by using a category dimension to obtain a category feature map, wherein the method comprises the following steps:

adopting 1 × 1 convolution layer to extract the primary characteristic diagramF _B Is expanded intoKLayer of to obtainF _K ，KThe specific relation of (2) is as follows:

wherein,Nthe number of types of pictures that are represented,k _i is shown asiThe number of channels to which a class is assigned,Ktotal number of channels for all types;

to haveKCharacteristic diagram of individual channelF _K Pooling by type channel to obtain a profile that ignores channel dimensional featuresF _K Said feature map ignoring channel dimensional featuresF _N In common withNLayer channels, each layer channel indicating a type of feature, the specific relation being as follows:

Performing one-time maximum pooling on the layer channel;

(ii) a The specific relation is as follows:

to pair

wherein,F _T in order to be a type feature map,GAMP _C indicating that global average pooling is done over the channel dimension,CBa conv _ block layer is represented,

representing the dot product, pass of the matrixCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF _B Multiplying the phase points;

the type weight is obtained by calculating a type gradient norm, and the specific relation is as follows:

wherein,g _i is shown asiThe type of the class is a gradient norm,n _i is shown asiThe number of samples of a class,L _t representing a sampletCross entry loss generated after model generation,out _t to representiSample of classestDirectly outputting after model calculation;

order top=softmax(out)，yA one-hot vector representation of the representation sample,

2. The method according to claim 1, wherein the performing preliminary feature extraction on the fundus picture data to obtain a preliminary extracted feature map comprises:

wherein the modified pre-training network does not include the last fully connected layer of the pre-training network.

3. The method according to claim 2, wherein the performing feature extraction on the preliminary extracted feature map by using a channel dimension to obtain a channel feature map comprises:

the specific relation of the structure of conv _ block is as follows:

wherein,CBa conv _ block layer is represented,xa characteristic diagram representing the input conv _ block layer,Convrepresents a 1-by-1 convolutional layer as a transition layer, the number of output channels and input dataxThe number of the channels of (a) is the same,BNrepresents the value of Batch Normalization,ReLUandSigmoidrespectively representing a ReLU activation function and a Sigmoid activation function, which introduce nonlinear factors into the network;

representing a matrix dot product, passingCBChannel weight distribution and primary extraction characteristic diagram obtained after stratificationF _B Multiplication of the phase points.

4. The method according to claim 3, wherein the performing feature extraction on the preliminary extracted feature map in pixel dimension to obtain a pixel feature map comprises:

5. The method according to claim 4, wherein the fusing the channel feature map, the pixel feature map and the category feature map to obtain a target feature map, and converting the target feature map into a type recognition result corresponding to the fundus picture label comprises:

the specific relation is as follows:

wherein,outas the final output value of the model, a batchNVector of dimension, the value of vector element represents the possibility of corresponding position model identification type, choose the position subscript of the maximum value, as the final type identification result of the model;

meaning that the elements at corresponding positions between different matrices are averaged,FC _H andFC _N a fully-connected layer is shown,FC _H the number of output channels of (a) is half of the number of input channels,FC _N the number of output channels is the classification numberN。

6. The method according to claim 1, wherein before reading the plurality of fundus picture data and the tags thereof, further comprising: and performing enhancement processing of at least one of random up-down overturning, random left-right overturning and random rotating processing on the fundus picture to obtain an enhanced fundus picture.

7. The method of claim 1, wherein after inputting the fundus picture data and the labels thereof into the class weighting network and training and constructing the class weighting network model, the method further comprises:

comparing the type recognition result of the fundus picture with the real label thereof to calculate cross entropy loss, and reversely transmitting and updating the parameters of the category weighting network model;

the specific relationship of the cross entropy loss is as follows:

wherein,x[class]representing input dataxThe class to which the actual belongs is a class,x[j]representation of model to input dataxClass of belongingjThe result of the recognition of (1).

8. An fundus image classification apparatus based on a class weighting network, comprising: the device comprises a first reading unit, a second reading unit, a model forming unit and a category output unit; wherein,

the pixel feature extractor is used for performing feature extraction on the preliminarily extracted feature map according to pixel dimensions to obtain a pixel feature map;

the category feature extractor is configured to perform feature extraction on the preliminary extracted feature map by using a category dimension to obtain a category feature map, and includes:

to haveKCharacteristic diagram of individual channelF _K Pooling by type channel to obtain ignore channel dimension characteristicsCharacteristic diagram of featuresF _K Said feature map ignoring channel dimensional featuresF _N In common withNLayer channels, each layer channel indicating a type of feature, the specific relationship being as follows:

wherein,F _B showing the preliminary extraction of the feature map,Conv _K representKA number of convolution layers of 1 x 1,GMP _K is shown for each

Performing one-time maximum pooling on the layer channel;

(ii) a The specific relation is as follows:

to pair

wherein,F _T in order to be a type feature map,GAMP _C indicating that global average pooling is done in the channel dimension,CBa conv _ block layer is represented,

according to different types of gradient normg _i To obtain the size ratio of the type weight;

the characteristic converter is used for fusing the channel characteristic diagram, the pixel characteristic diagram and the category characteristic diagram to obtain a target characteristic diagram, and converting the target characteristic diagram into a type identification result corresponding to the fundus photo label;

the second reading unit is also used for reading fundus picture data to be identified;

and the category output unit is used for inputting the fundus picture data to be recognized into the category weighting network model, and taking the category with the maximum model output probability as the category result of the fundus picture.