CN109670489B

CN109670489B - Weak supervision type early senile macular degeneration classification method based on multi-instance learning

Info

Publication number: CN109670489B
Application number: CN201910120667.8A
Authority: CN
Inventors: 曹桂平; 谢新林
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2019-02-18
Filing date: 2019-02-18
Publication date: 2023-06-27
Anticipated expiration: 2039-02-18
Also published as: CN109670489A

Abstract

The invention discloses a weak supervision type early senile macular degeneration classification method based on multi-instance learning, which comprises the following steps: step one, collecting fundus images; cutting the image, extracting a green channel, sharpening and whitening; step three, detecting drusen and dividing focus by using a convolution neural network and multi-instance learning combination method; taking a convolutional neural network as a main frame, wherein the convolutional neural network comprises three convolutional layers, and the predicted result of the position of the verrucosis of the glass film is obtained after the image passes through each convolutional layer; at the output end of the network, the three prediction results are weighted and fused to be used as the final output result; the algorithm designs a special weak supervision training method based on multi-instance learning to train and detect the classifier, and only needs to know whether the fundus image has drusen or not, but can train the classifier without knowing the specific focus position; the algorithm can effectively save the cost of marking training data and improve the efficiency while ensuring the precision.

Description

Weak supervision type early senile macular degeneration classification method based on multi-instance learning

Technical Field

The invention relates to a disease detection and classification method, in particular to an early senile macular degeneration classification method.

Background

Senile maculopathy is one of the common ocular diseases, mainly occurring in the elderly population older than 55 years. Senile macular degeneration can be clinically divided into three stages, namely early stage, medium stage and late stage.

Once the patient enters the advanced stage, vision loss symptoms can occur over weeks to months. In order to effectively suppress the exacerbation of senile macular degeneration, timely early-stage senile macular degeneration screening is particularly important. And the early senile macular degeneration does not affect the vision of the patient and is not easy to be found by the patient. Fundus examination is widely used clinically as a means of early screening. But relying on manual screening by a physician and marking of lesion locations can take a significant amount of time. Therefore, in order to improve screening efficiency, it is of great importance to develop an automatic early maculopathy screening system.

Clinically, the primary condition of early age-related maculopathy is the occurrence of drusen in the macular area. Yellow-white spots appear in the macular region on the eye fundus, and according to the characteristics of the disease, the current mainstream early maculopathy classification method comprises the steps of preprocessing an image, extracting the characteristics, and finally designing a classifier for classification. In the classifier training stage, an intensive supervision method is often adopted, namely, eye bottom images which contain enough drusen focus are required to be collected and the positions of the eye bottom images are marked. However, precisely labeled fundus images are very costly and time consuming to acquire in reality.

In the prior art, the study on a fundus image macular degeneration segmentation method based on a supervision descriptor is available, the supervision descriptor learning is combined with the image bottom layer characteristics through the segmentation method based on the supervision descriptor learning, the generalized low-rank matrix approximation method is used for reducing the dimension of data, and then a manifold regularization term is constructed by combining data labels, so that the image characteristics are extracted, and a certain segmentation effect can be obtained by combining an SVM classifier. However, in the process of extracting the features, the method involves iterative optimization solution of the matrix, is huge in calculated amount, is very time-consuming in practical application, depends on label information of image pixels, and has more defects in practical application.

Disclosure of Invention

The invention aims to solve the technical problem of providing a weak supervision type early senile macular degeneration classification method based on multi-instance learning, which can effectively save the cost of marking training data and improve the efficiency while ensuring the accuracy.

In order to solve the technical problems, the technical scheme of the invention is as follows: a weak supervision early senile macular degeneration classification method based on multi-instance learning comprises the following steps:

step one, collecting fundus images;

preprocessing an image, cutting the image, extracting a green channel, sharpening and whitening;

step three, detecting drusen and dividing focus by using a convolution neural network and multi-instance learning combination method;

taking a convolutional neural network as a main frame, wherein the convolutional neural network comprises three convolutional layers, and a prediction result of a glass membrane wart disease position is obtained after an image passes through each convolutional layer; and at the output end of the network, the three prediction results are weighted and fused to be used as the final output result.

In the second step, cutting the fixed size in the center of the image, cutting the image into 512 x 512 pixels, ensuring that the sample has uniform size, and extracting the green channel for detection; then sharpening the image by using a Laplace template; and finally, performing whitening treatment on the image to ensure that the mean value of all pixels is 0 and the variance is 1.

As a preferred technical solution, in the third step, the convolution kernel size of each convolution layer is 3*3, the step length is 1, and the channel numbers are 32, 64, and 128, respectively.

As a preferred technical solution, in the third step, a Generalized Mean (GM) function is used to convert the prediction result at the pixel level into the prediction result at the image level, and the prediction loss of each convolution layer is calculated, where the GM function is defined as:

wherein X is _i For the i-th input picture,

for the pixel prediction result output by the classifier, < >>

R is a super parameter, which is the prediction result of the ith input picture, and 4 is taken here;

the loss of each convolution layer is as follows:

wherein Y is _i And I is an indication function, which is the real result of the ith input picture.

As a preferable technical scheme, area constraint is increased for the focus position prediction of each convolution layer, wherein the specific constraint mode is that if the sum of pixel areas of positive examples predicted is larger than a constraint value and the picture is predicted to contain drusen, corresponding weight is punished; the area constraint is defined as:

wherein v is _i A for predicted lesion area _i For a given area constraint.

As a preferred technical solution, after predicting and increasing the area constraint for the lesion position of each layer of convolution layer, the total loss of each layer of convolution layer is:

L _side (θ,ω)＝l _mil (θ,ω)+ηgl _ac (θ,ω)

where η is a given weight value.

As a preferred technical solution, the fusion loss of weighted fusion of three prediction results is:

the total loss of the convolutional neural network is:

due to the adoption of the technical scheme, the weak supervision type early senile macular degeneration classification method based on multi-instance learning comprises the following steps of: step one, collecting fundus images; preprocessing an image, cutting the image, extracting a green channel, sharpening and whitening; step three, detecting drusen and dividing focus by using a convolution neural network and multi-instance learning combination method; taking a convolutional neural network as a main frame, wherein the convolutional neural network comprises three convolutional layers, and a prediction result of a glass membrane wart disease position is obtained after an image passes through each convolutional layer; at the output end of the network, the three prediction results are weighted and fused to be used as the final output result; the algorithm designs a special weak supervision training method based on multi-instance learning to train and detect the classifier, and only needs to know whether the fundus image has drusen or not, but can train the classifier without knowing the specific focus position; the algorithm can effectively save the cost of marking training data and improve the efficiency while ensuring the precision.

Drawings

Fig. 1 is a schematic diagram of an embodiment of the present invention.

Detailed Description

The invention is further elucidated below in connection with fig. 1 and an embodiment. In the following detailed description, certain exemplary embodiments of the present invention are described by way of illustration only. It is needless to say that the person skilled in the art realizes that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive in scope.

A weak supervision early senile macular degeneration classification method based on multi-instance learning comprises the following steps:

step one, collecting fundus images;

In the second step, firstly, cutting the fixed size in the center of the image, cutting the picture into a picture with the size of 512 multiplied by 512 pixels, ensuring that the sample has the uniform size, and secondly, extracting a green channel for detection; then sharpening the image by using a Laplace template; and finally, performing whitening treatment on the image to ensure that the mean value of all pixels is 0 and the variance is 1.

In the third step, the convolution kernel size of each convolution layer is 3*3, the step length is 1, and the channel numbers are 32, 64 and 128 respectively. Converting the predicted result at the pixel level into the predicted result at the image level using a Generalized Mean (GM) function, calculating the predicted loss for each layer of convolutional layers, the GM function being defined as:

wherein X is _i For the i-th input picture,

for the pixel prediction result output by the classifier, < >>

the loss of each convolution layer is as follows:

In order to overcome the characteristic that a positive sample is excessively predicted by a multi-instance learning method, area constraint is increased for focus position prediction of each layer of convolution layer, wherein the specific constraint mode is that if the sum of pixel areas of positive instances is predicted to be larger than a constraint value and the picture is predicted to contain drusen, corresponding weight is punished; the area constraint is defined as:

wherein v is _i A for predicted lesion area _i For a given area constraint.

After the area constraint is increased for the focus position prediction of each layer of convolution layer, the total loss of each layer of convolution layer is as follows:

L _side (θ,ω)＝l _mil (θ,ω)+ηgl _ac (θ,ω)

where η is a given weight value.

the total loss of the convolutional neural network is:

training the neural network using samples containing only image tags, the weight update uses a random gradient descent method.

In the focus detection stage, the test image is input into a trained neural network to realize image classification, so as to obtain a predicted result of whether the image contains drusen or not. And simultaneously, taking the fusion result of pixel prediction of the three convolution layers as the prediction result of the focus position.

Compared with the existing method, the method has the following difference:

(1) traditional drusen detection and lesion marking algorithms are strong supervisory methods that require the acquisition of a large number of sample data with pixel-level labeling. The training process of the algorithm adopts a weak supervision method, only the labeling information of the image level is required to be collected, and the labeling of the pixel level is not required, so that the cost of labeling sample data can be greatly saved.

(2) Similar multi-instance learning methods have weaker feature extraction capability, and the method innovatively uses a multi-scale fusion mode and a method adopting area constraint to enhance the feature extraction capability of a network, so that the position of drusen can be better detected.

(3) The segmentation method based on supervised descriptor learning has strong feature extraction capability, but the algorithm involves a large number of matrix optimization iterative operations, has huge calculation amount, depends on image pixel label information, and is difficult to solve the problems of high cost and long time consumption; compared with the prior art, the method and the device can greatly save marking time and cost by marking information at an image level, realize classification of senile macular lesions and detect focus positions.

In the technical scheme, the detection network of drusen can also adopt a residual network as a main frame, so that the depth of the network is increased. The detection and segmentation network of drusen can also adopt a mode of combining multi-instance learning and attention network.

The foregoing has shown and described the basic principles, main features and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A weak supervision early age-related maculopathy classification method based on multi-instance learning, which is characterized by comprising the following steps:

step one, collecting fundus images;

taking a convolutional neural network as a main frame, wherein the convolutional neural network comprises three convolutional layers, and a prediction result of a glass membrane wart disease position is obtained after an image passes through each convolutional layer; at the output end of the network, the three prediction results are weighted and fused to be used as the final output result;

in the third step, the predicted result at the pixel level is converted into the predicted result at the image level by using a Generalized Mean (GM) function, and the predicted loss of each convolution layer is calculated, where the GM function is defined as:

wherein X is _i For the i-th input picture,

for the pixel prediction result output by the classifier, < >>

the loss of each convolution layer is as follows:

wherein Y is _i For the real result of the ith input picture, I is an indication function, t is the sequence number of the convolution layer, and t=1 or 2 or 3;

and outputting a result of the ith picture in the corresponding convolution layer t.

2. The weak supervision early age-related macular degeneration classification method based on multi-instance learning as set forth in claim 1, wherein in the second step, a fixed size is cut out in the center of the image, the picture is cut out into a 512×512 pixel size picture, the sample is ensured to have a uniform size, and then the green channel is extracted for detection; then sharpening the image by using a Laplace template; and finally, performing whitening treatment on the image to ensure that the mean value of all pixels is 0 and the variance is 1.

3. The method for classifying early senile macular degeneration based on multi-instance learning of claim 1, wherein in the third step, the convolution kernel size of each convolution layer is 3*3, the step size is 1, and the channel number is 32, 64, 128, respectively.

4. The weak supervision early age-related macular degeneration classification method based on multi-instance learning of claim 1, wherein the area constraint is increased for the focus position prediction of each convolution layer in such a way that if the sum of the pixel areas predicted as positive instances is larger than the constraint value and the picture is predicted as containing drusen, the corresponding weight is penalized; the area constraint is defined as:

wherein v is _i A for predicted lesion area _i For a given area constraint.