CN112101424A

CN112101424A - Generation method, identification device and equipment of retinopathy identification model

Info

Publication number: CN112101424A
Application number: CN202010857801.5A
Authority: CN
Inventors: 雷柏英; 张汝钢; 汪天富; 张国明
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2020-12-18
Anticipated expiration: 2040-08-24
Also published as: CN112101424B

Abstract

The invention provides a method for generating a retinopathy recognition model, a recognition device and equipment, wherein the trained retinopathy recognition model is obtained by training a preset network model by utilizing fundus pictures contained in a training set, wherein the fundus pictures in the training set carry classification label information corresponding to the fundus pictures, the classification label information is retina feature classification information corresponding to the fundus pictures, and the preset network model learns the classification label information contained in the fundus pictures in the training process, so that the trained retinopathy recognition model can be used for recognizing the retina images contained in a detection picture to give an accurate classification result of the retina features. According to the method, the device and the equipment provided by the embodiment, the network model for identifying the retinopathy is trained in a deep learning mode, so that the accuracy and the detection efficiency of the identification of the fundus image are improved.

Description

Generation method, identification device and equipment of retinopathy identification model

Technology neighborhood

The invention relates to the technical field of medical image processing, in particular to a method, a device and equipment for generating a retinopathy identification model.

Background

Acute progressive posterior pole retinopathy of prematurity (AP-ROP) is a particular type of retinopathy. Unlike conventional Retinopathy (ROP), AP-ROP may be accompanied by plus disease, retinal hemorrhage, and neovascularization flat, and there are several clinical problems for the diagnosis of AP-ROP: first, the progression of disease course of AP-ROP is not conventional and does not follow the progression of disease course of conventional ROP from stage 1 to stage 5. Meanwhile, the incidence rate of AP-ROP is low and the symptoms are not typical, so that many ophthalmologists do not have enough symptom diagnosis experience, and the possibility that AP-ROP children cannot be diagnosed in time is increased.

Deep learning has been applied to the field of image recognition, and an evaluation model for diagnosing early diseases has been constructed based on deep learning in the prior art, but a neural network evaluation model for identifying retinopathy has not appeared yet, and if a doctor alone identifies a fundus image in practical application, which causes a fundus image information identification error, the identification information deviation may be large, the information identification efficiency is low, and the objectivity is lacked.

Therefore, the prior art is subject to further improvement.

Disclosure of Invention

In view of the defects in the prior art, the invention aims to provide a method, a device and equipment for generating a retinopathy recognition model, which overcome the defects that in the prior art, no neural network model for performing feature recognition on an eye fundus image exists, and only information contained in the eye fundus image can be recognized by human eyes, so that the image feature recognition efficiency is low and the subjectivity is high.

The embodiment of the invention discloses the following scheme:

in a first aspect, the present embodiment discloses a method for generating a retinopathy identification model, where the method includes:

the method comprises the steps that a preset network model generates prediction classification information corresponding to fundus pictures according to the fundus pictures in a training set, wherein the training set comprises a plurality of groups of training samples, and each group of training samples comprises the fundus pictures and classification label information corresponding to the fundus pictures;

and the preset network model corrects model parameters according to the prediction classification information corresponding to the fundus pictures and the classification mark information corresponding to the fundus pictures, and continues to execute the step of generating the prediction classification information corresponding to the fundus pictures according to the fundus pictures in the training set by the preset network model until the training condition of the preset network model meets the preset condition so as to obtain the retinopathy recognition model.

Optionally, the preset network model includes: the system comprises a residual excitation module, a layered bilinear pooling module and a classification module;

the step of generating the prediction classification information corresponding to the fundus picture by the preset network model according to the fundus picture in the training set comprises the following steps:

inputting the fundus pictures in the training set into the residual excitation module to obtain a layer characteristic diagram which is output by the residual excitation module and corresponds to the fundus pictures;

inputting the layer characteristic diagram into the layered bilinear pooling module to obtain an interlayer interaction characteristic diagram output by the layered bilinear pooling module;

and inputting the inter-layer interaction feature map into the classification module to obtain the prediction classification information output by the classification module.

Optionally, the residual excitation module includes k residual blocks, and each residual block includes a convolution unit and a squeeze excitation unit;

the step of inputting the fundus pictures in the training set into the residual excitation module to obtain the layer characteristic diagram which is output by the residual excitation module and corresponds to the fundus pictures comprises the following steps:

inputting the fundus pictures in the training set into a first convolution unit in a first residual block to obtain a first convolution characteristic diagram output by the first convolution unit;

inputting the first convolution characteristic diagram into a first extrusion excitation unit in a first residual block to obtain a first reconstruction characteristic diagram output by the first extrusion excitation unit;

adding the first reconstruction feature map and the first convolution feature map, and inputting the result to a second convolution unit of a second residual block to obtain a second convolution feature map output by the second convolution unit;

inputting the second convolution feature map into a second extrusion excitation unit in a second residual block to obtain a second reconstruction feature map output by the second extrusion excitation unit;

and continuing the step of adding the reconstructed feature map output in the previous residual block and the convolution feature map output by the convolution unit in the previous residual block and inputting the added result into the next residual block until the extrusion excitation unit of the kth residual block outputs the layer feature map, wherein k is a positive integer.

Optionally, the pressing and exciting unit includes: a pooling layer and a feature excitation layer;

the step of inputting the first convolution characteristic diagram into a first extrusion excitation unit in a first residual block to obtain a first reconstruction characteristic diagram output by the first extrusion excitation unit includes:

inputting the first convolution feature map into the pooling layer, and performing global maximum pooling on the first convolution feature map through the pooling layer to obtain a compressed feature map;

inputting the compressed feature map into the feature excitation layer to obtain a first reconstructed feature map output by the feature excitation layer; the first reconstructed feature map is obtained by weighting the first convolution feature map input by each channel by multiplication according to the weight corresponding to each channel, and the weight corresponding to each channel is determined by the feature excitation layer according to the compressed feature map.

Optionally, the hierarchical bilinear pooling module includes P target convolution layers and P interaction layers;

the step of inputting the layer characteristic diagram into the layered bilinear pooling module to obtain the interlayer interaction characteristic diagram output by the layered bilinear pooling module comprises:

respectively inputting the reconstruction feature maps output by the residual blocks arranged from the k-P to the k-th into the P target convolution layers in a one-to-one correspondence manner to obtain high-dimensional convolution feature maps output by the P target convolution layers;

and after element-by-element multiplication is carried out on the different high-dimensional convolution characteristic graphs, the different high-dimensional convolution characteristic graphs are input into each interaction layer, and P interlayer interaction characteristic graphs output by each interaction layer are obtained.

Optionally, the classification module includes a pooling layer and a full connection layer;

the step of inputting the inter-layer interaction feature map into the classification module to obtain the prediction classification information input by the classification module comprises:

respectively inputting the P interlayer interactive feature maps into the pooling layer to obtain a pooled feature map after pooling fusion;

and inputting the pooling feature map into the full-connection layer to obtain a prediction classification result output by the full-connection layer.

Optionally, the step of modifying, by the preset network model, the model parameter according to the prediction classification information corresponding to the fundus picture and the classification label information corresponding to the fundus picture includes:

and calculating a Loss value between the predicted classification information and the classification mark information by utilizing a Focal local Loss function, and correcting the model parameter by utilizing the calculated Loss value.

In a second aspect, the present embodiment discloses a retinopathy identifying device, wherein the retinopathy identifying model obtained by applying the method for generating the retinopathy identifying model includes;

the image acquisition module is used for acquiring a fundus image to be identified;

and the classification identification module is used for carrying out image feature identification on the fundus image to be identified through the retinopathy identification model to obtain a prediction classification result.

In a third aspect, the present embodiment provides a terminal device, including a processor, and a storage medium communicatively connected to the processor, the storage medium being adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the method for generating a retinal lesion recognition model.

In a fourth aspect, the present embodiment discloses a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs, which are executable by one or more processors to implement the steps of the method for generating a retinopathy identification model.

The trained retinopathy recognition model is obtained by training a preset network model by utilizing fundus pictures contained in a training set, wherein the fundus pictures in the training set carry classification label information corresponding to the fundus pictures, the classification label information is retina feature classification information corresponding to the fundus pictures, and the preset network model learns the classification label information contained in the fundus pictures in the training process, so that the trained retinopathy recognition model can be used for recognizing the retina images contained in a detection picture to provide an accurate classification result of the retina features. The method provided by the embodiment adopts a deep learning mode to train the network model for identifying the retinopathy, so that the early screening of the retinopathy is realized by using the computer-aided identification, and the computer-aided identification can not only accelerate the identification speed, but also improve the identification accuracy and has important significance for reducing the blindness rate of the retinopathy.

Drawings

FIG. 1 is a schematic illustration of varying degrees of retinopathy in a fundus image;

FIG. 2 is a flowchart illustrating steps of a method for generating a retinopathy identification model according to the present embodiment;

fig. 3 is a schematic structural diagram of the retinopathy identification model provided in this embodiment;

FIG. 4 is a schematic block diagram of the identification device according to the present embodiment;

fig. 5 is a schematic diagram of a terminal device provided in the present embodiment;

FIGS. 6 a-6 d are graphs comparing ResNet-50 used in the retinopathy recognition model provided in this example with the T-SNE visualization effect using ResNet-101 and SE-HBP networks, respectively.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Retinopathy of prematurity (ROP) is a retinal vascular proliferative blindness disease that occurs in preterm or low birth weight infants. The disease course of ROP progresses rapidly, which if not timely treated, can lead to blindness in the patient. With the advancement of neonatal care, the survival rate of premature infants is increased, and the incidence of ROP is also increasing in low-to-medium income countries.

Acute progressive posterior pole retinopathy of prematurity (AP-ROP) is a special type of retinopathy of prematurity (ROP), one of the causes of blindness in premature infants. It is mainly manifested by vasodilation of the posterior pole of the retina and severe tortuosity. This particular retinopathy of prematurity progresses rapidly, but because it is uncommon and atypical, and is prone to misdiagnosis, patients are prone to missing optimal treatment times. The diagnosis of this particular lesion is highly appreciated and, in conjunction with the description of fig. 1, if the patient is diagnosed without immediate intervention, it will rapidly progress to the fifth stage of ROP, which is likely to cause retinal detachment, which will lead to blindness. Early screening for AP-ROP plays an important role in reducing the rate of blindness in the disease.

For the automatic classification of AP-ROP, the following challenges are faced: the boundary line or ridge of the AP-ROP is blurred, has low contrast and is difficult to see or identify. In some cases of AP-ROP, the tortuosity of the blood vessel is low, similar to a normal fundus image, which easily confuses the classification network. In clinical diagnosis, when an ophthalmologist for premature infants acquires fundus images, the fundus images of the retinas at a plurality of angles are taken, so that each fundus image has a different visual field, and the positions of focuses appearing on the fundus images are different. The quality of AP-ROP fundus images varies, and many of them have poor image quality due to the quality of the imaging apparatus and the operator's photographing technique. The incidence rate of AP-ROP is low, the number of cases is far lower than that of conventional ROP, and serious quantity imbalance exists among training data, so that the training result of the classification network is influenced.

In order to overcome the above challenges, this embodiment provides a method for generating a retinopathy recognition model, which obtains a network model capable of accurately recognizing retinal features contained in a fundus picture by training a preset network model, and if the retinal recognition model provided in this embodiment is used to recognize a fundus picture, the rate of retinal recognition can be increased, the recognition accuracy is improved, and early screening of an AP-ROP is facilitated. In addition, a layered bilinear pooling unit and an extrusion exciting unit are added in the preset network model disclosed in the embodiment, the complementation of interlayer information is increased by using the layered bilinear pooling unit, the loss of useful information between layers is reduced, the feature extraction capability of the network is improved by using the extrusion exciting unit, and the problems involved in the challenges are solved.

The method disclosed by the invention is explained in more detail below with reference to the drawings.

Exemplary method

In a first aspect, the present embodiment discloses a method for generating a retinopathy identification model, as shown in fig. 2, including:

step S1, the preset network model generates prediction classification information corresponding to the fundus picture according to the fundus picture in a training set, wherein the training set comprises a plurality of groups of training samples, and each group of training samples comprises the fundus picture and classification label information corresponding to the fundus picture.

In the step, fundus pictures used in the training set are collected firstly, a plurality of groups of training samples are established for the collected fundus pictures, and then the fundus pictures in each group of training samples are input into a preset network model.

In order to train a network model with better feature recognition, fundus pictures shot at different angles and different visual fields are selected in a training set, so that the network model can accurately recognize retina features in the fundus pictures shot at different visual fields.

The method for acquiring the fundus picture in the step can be obtained by shooting through a camera of the terminal device, or can be obtained from other terminal devices, the fundus picture is shot by other terminal devices, and in addition, each fundus picture in the training set can be obtained from a cloud server. The fundus picture may be acquired in a plurality of ways of the above three ways.

Corresponding classifications are marked in all fundus pictures in the training set, and specifically, the classifications include a normal fundus picture, an AP-ROP picture and a conventional ROP picture. The preset network model can realize the identification of the classification corresponding to the unmarked picture by learning the information marked in the fundus picture, and obtain the retina characteristic classification corresponding to the picture to be identified. In one embodiment, the labels are 0, 1, 2, respectively, AP-ROP if the net result is 0, 1 is the normal ROP, and 2 is the normal fundus image.

And step S2, the preset network model corrects model parameters according to the prediction classification information corresponding to the fundus pictures and the classification mark information corresponding to the fundus pictures, and the preset network model continues to execute the step of generating the prediction classification information corresponding to the fundus pictures according to the fundus pictures in the training set until the training condition of the preset network model meets the preset condition, so as to obtain the retinopathy recognition model.

When the fundus picture is input to the preset network model, the preset network model outputs prediction classification information corresponding to the fundus picture. The prediction classification information is a type corresponding to the retinal feature corresponding to the fundus picture, and the type includes: normal fundus pictures, conventional ROP pictures, and AP-ROP pictures.

Specifically, as shown in fig. 3, the preset network model includes: a residual excitation module 10, a hierarchical bilinear pooling module 20 and a classification module 30.

And after the eye fundus picture input into the preset network model is subjected to convolution, extrusion, excitation and other processing by a residual excitation module, the obtained characteristic graph is input into a layered bilinear pooling module, the characteristic graph is input into a classification module after the operations such as convolution, pooling and the like are performed by the layered bilinear pooling module, a final characteristic classification result is output through a full connection layer if the classification module contains the full connection layer, and the retina characteristic contained in the eye fundus picture input into the preset network model is identified to be a normal eye fundus picture which is also an AP-ROP picture.

Specifically, the step of generating, by the preset network model, prediction classification information corresponding to the fundus picture according to the fundus picture in the training set includes:

and step S11, inputting the fundus pictures in the training set into the residual excitation module to obtain a layer characteristic diagram which is output by the residual excitation module and corresponds to the fundus pictures.

Specifically, the residual excitation module includes k residual blocks, and each residual block includes a convolution unit and a squeeze excitation unit;

The squeeze activation unit is a channel-based attention mechanism. It models the important information of each channel and then enhances or suppresses the relevant channel according to the task.

Further, the pressing actuating unit includes: a pooling layer and a feature excitation layer;

In a specific application embodiment, the squeeze incentive units are integrated into the Res module of the ResNET50 network. Firstly, in the Res module, the feature after the convolution unit performs convolution operation on the input feature image is the feature F of H × W × C, the squeezing operation compresses the feature, and performs spatial global maximum pooling once to change a two-dimensional feature channel into a real number with a global receptive field, i.e. 1 × 1 × C, which represents the global distribution of the response on the feature channel. Second, the excitation operation generates weights for each eigen channel by inputting the compressed eigen into a two-layer convolutional neural network to explicitly model the correlation between eigen channels. Finally, the re-weighting operation weights the output weight of the excitation to the previous feature channel by channel through multiplication, and the re-calibration of the feature is completed.

And step S12, inputting the layer characteristic diagram into the layered bilinear pooling module to obtain the interlayer interaction characteristic diagram output by the layered bilinear pooling module.

The layered bilinear pooling module comprises P target convolution layers and P interaction layers;

The hierarchical bilinear pooling module is used for fine-grained image recognition and shows excellent effect. In fine-grained image recognition, image classes are generally highly similar and can be recognized only by using some local differences, and meanwhile, the background of each image is greatly different.

And challenge 1 and challenge 2 of AP-ROP fundus image recognition classification are similar to those of fine-grained image recognition.

The hierarchical bilinear pooling module firstly expands the features of different layers to a high-dimensional space by establishing independent linear mapping in a network, and then integrates the features of different convolutional layers by element-by-element multiplication. And then, the layered bilinear pooling module compresses the high-dimensional characteristics through summation to obtain the interlayer interaction characteristics, so that the interlayer interaction of the network is realized. The single interaction layer expression is as follows:

z_int＝U^Tx·V^Ty (1)

z＝P^Tz_int (2)

connecting a plurality of interaction layers in series to obtain the interaction characteristics of the layered bilinear pooling module, wherein the expression is as follows:

Z_HBP＝HBP(x,y,z,…)＝P^Tz_int

＝P^Tconcat(U^Tx·V^Ty,U^Tx·S^Tz,V^Ty·S^Tz,…) (3)

wherein P is a classification matrix, U, V, S, … are projection matrices of convolutional layer features x, y, z, …, respectively

And step S13, inputting the inter-layer interaction feature map into the classification module to obtain the prediction classification information input by the classification module.

Specifically, the step of modifying the model parameters by the preset network model according to the prediction classification information corresponding to the fundus picture and the classification label information corresponding to the fundus picture includes:

To solve the problem of data imbalance, a Focal-Loss function is used in the network. When calculating the classification loss, a commonly used loss function is a cross-entropy function:

by adding the parameters α and γ to the formula (4), the formula becomes:

equation (5) is Focal-local, and in general, the values of α and γ are fixed, α is 0.25, γ is 2, and the experimental results show that the effect is optimal.

When the data is misclassified by the network, the p value will be small, so the adjusting weight of Focal-Loss has little effect on the Loss function. However, for data that the network is more easily judged to be correct, the p value will be large, and the loss value will be greatly reduced, and the data that is easily judged by the network will have less influence on the network training. The network solves the problem of unbalanced data quantity of the AP-ROP by adjusting weights of different judgment difficulty data through the Focal-local.

Referring to fig. 3, the squeeze excitation unit is an SE module in china in fig. 3, the residual excitation module proposed in this embodiment is implemented by combining a ResNet50 and an SE module, and specifically, the SE module is inserted into a convolution portion of a ResNet50 network, and is used to perform compression excitation processing on features output by the convolution portion, and then perform feature addition. That is, in the residual excitation module, the output of the convolution part is input to the SE module, and then the output of the SE module is added to the input of the residual network, but not inserted into the residual module of the SE module, the output of the convolution part is directly added to the input of the residual module, and the insertion of the SE module can strengthen the useful information of the characteristic channel and inhibit the useless information of the characteristic channel, thereby improving the characteristic extraction capability of the network. This helps to improve the ability of the network to extract useful information from images of poor quality.

The layered bilinear pooling module is an HBP module in the figure, each interaction layer in the HBP module is subjected to pooling operation, the three layers are fused and then input into a full-connection layer, and the layered bilinear pooling module can perform interlayer information complementation by utilizing information interaction between the convolution layers, so that the loss of useful information between the layers is reduced. This helps reduce the effect of differences in image field of view, and also helps to solve the problem of similarity in features between fundus images of different classes.

Exemplary device

The embodiment further discloses a retinopathy identifying device on the basis of disclosing the generating method of the retinopathy identifying model, as shown in fig. 4, the retinopathy identifying device applies the retinopathy identifying model obtained by the generating method of the retinopathy identifying model, and the identifying device comprises;

an image acquisition module 110, configured to acquire a fundus image to be identified; the retina contained in the fundus image to be identified can be a normal fundus retina or a retina with pathological changes.

And the classification identification module 120 is configured to perform image feature identification on the fundus image to be identified through the retinopathy identification model to obtain a prediction classification result.

On the basis of the method, the embodiment further provides a terminal device, which includes a processor, and a storage medium communicatively connected to the processor, where the storage medium is adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the method for generating a retinal lesion recognition model.

On the basis of the method, the embodiment also discloses a terminal device, which comprises a processor and a storage medium in communication connection with the processor, wherein the storage medium is suitable for storing a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform steps implementing the method of generating the retinopathy recognition model. In one embodiment, the terminal device may be a mobile phone, a tablet computer or a smart television.

Specifically, as shown in fig. 5, the terminal device includes at least one processor (processor)20 and a memory (memory)22, and may further include a display 21, a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the steps of the method in the above-described embodiment.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 30 executes the functional application and data processing by executing the software program, instructions or modules stored in the memory 22, namely, implements the above-mentioned method in the above-mentioned embodiment.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In another aspect, the present embodiment provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs, which are executable by one or more processors, to implement the steps of the above-mentioned method for generating a retinopathy recognition model.

The following describes the training steps of the retinopathy recognition model generated by the method of the present invention and the performances thereof by using a specific application verification example of the method of the present invention.

The collection of fundus pictures in training set was performed by a professional ophthalmologist, the collection device being RetCam 3. In order to obtain image information of the entire eyeball, fundus images of a plurality of visual fields must be captured at the time of collecting data. Thus, each data collection will capture a plurality of angularly distinct views of the fundus image. If only a single visual field captures the focus information, the infant patient can be judged to have AP-ROP. Images from different angles containing lesion information are integrated into a fundus image dataset as a single image unit.

Table 1 data set details

The data set annotation is performed by a professional ophthalmologist. The same data sets were distributed to three specialized ophthalmologists, respectively, to complete annotation of the data sets. After that, we sort out fundus images with different opinions and discuss them. Voting is used to determine the result of the image. If the image quality is poor, the image will be discarded.

Table 1 lists the details of the data set. The conventional ROP data includes stage I, II, III fundus images of the ROP. In the data set, fundus images of respective categories are randomly distributed. To save computational resources, the original fundus image is resized to 224x 224.

This study divides the data into two parts: training set and test set. The training set is only used to train the network, while the test set is completely independent of the training set for evaluating the performance of the network. Fundus image data of different visual fields are randomly distributed, and the condition of fundus disease examination in a hospital can be simulated to a certain extent. We used several of the most commonly used evaluation criteria to evaluate the model generated by the method described in this example, including accuracy, sensitivity, specificity, accuracy and F1 coefficients.

Multiple sets of comparison experiments were performed in this study, including comparing the modules presented in this study with the original network and comparing on the ResNet50 and 101 networks, respectively. The results of the experiment are shown in table 2. Compared with the original network, the method has the advantage that the addition of each module effectively improves the network. The network proposed by this embodiment obtains the best performance, which indicates that the method provided by this embodiment is effective.

Table 2 network results presentation

The T-SNE visualization effect of ResNet-50, ResNet-101 and SE-HBP networks is shown in FIGS. 6a to 6d, and it can be seen that the retinopathy recognition model provided by the present embodiment has a better recognition effect for the classification of AP-ROP, ROP and conventional fundus images.

To better evaluate the SE-HBP network in this study, we invited three specialized ophthalmologists to participate in the network performance evaluation and used a new comparison data set. The comparison data set had 100 fundus images including 50 normal fundus images, 30 normal ROP fundus images and 20 AP-ROP fundus images. The comparison data sets were distributed to three specialized ophthalmologists for diagnosis and statistical results, respectively, and used to evaluate the SE-HBP 50 network. The final results are shown in table 3. From table 3, it can be seen that the network disclosed in the present embodiment has certain advantages over an ophthalmologist in terms of diagnosis of AP-ROP, whereas classification results of a general fundus image and a conventional ROP fundus image have approached an ophthalmologist. Meanwhile, because AP-ROP symptoms are atypical, an ophthalmologist can easily make misdiagnosis. The identification model disclosed by the embodiment can effectively assist an ophthalmologist in diagnosing the AP-ROP.

Table 3 network evaluation results

The identification device of the research can show that the accuracy rate reaches 96.59% in the test concentration of 2408 pictures, so that the identification device can assist doctors in early screening ROP and AP-ROP to a certain extent. The system can also assist doctors lacking AP-ROP diagnosis experience to judge the disease. Meanwhile, the artificial intelligence detection speed is far faster than that of a human doctor, so that the detection efficiency is greatly improved.

In the present study, the present embodiment provides a novel convolutional neural network structure, so as to perform automatic identification and classification on fundus images. And the information between the convolution layers is complemented by utilizing the HBP module, so that the information loss between the layers is reduced, and the performance capability of the network is enhanced. Meanwhile, the feature extraction capability of the convolutional neural network is enhanced by utilizing an SE module. Finally, aiming at the problem of unbalanced AP-ROP data quantity, Focal-local is used in the network, and different weights are given to data with different classification difficulty degrees and different orders of magnitude, so that the training influence of the data on the network is balanced. SE-HBP 50 was constructed based on ResNet 50. Experimental results show that the personality performance of the network provided by the embodiment is improved. In the comparative experiment, the result of SE-HBP 50 is optimal, and the accuracy rate is 96.59%. As can be seen from the T-SNE result, the network identification classification effect provided by the embodiment is more distinctive. By utilizing the network, automatic identification of the AP-ROP is realized, which is helpful for assisting ophthalmologists in early screening of the AP-ROP and auxiliary diagnosis of the AP-ROP.

It should be understood that equivalents and modifications of the present invention and its inventive concept may occur to those skilled in the art, and all such modifications and alterations are intended to fall within the scope of the appended claims.

Claims

1. A method for generating a retinopathy recognition model, comprising:

2. The method for generating a retinopathy recognition model according to claim 1, wherein the preset network model includes: the system comprises a residual excitation module, a layered bilinear pooling module and a classification module;

3. The method of generating a retinopathy identification model according to claim 2, characterized in that the residual excitation module includes k residual blocks, each of which includes a convolution unit and a squeeze excitation unit;

4. The method for generating a retinopathy recognition model according to claim 3, characterized in that the compression excitation unit includes: a pooling layer and a feature excitation layer;

5. The method of generating a retinopathy recognition model according to claim 4 wherein the layered bilinear pooling module includes P target convolution layers and P interaction layers;

6. The method of generating a retinopathy recognition model according to claim 5, characterized in that the classification module includes a pooling layer and a full connection layer;

7. The method for generating a retinopathy recognition model according to claim 1, wherein the step of modifying the model parameters by the preset network model according to the prediction classification information corresponding to the fundus picture and the classification label information corresponding to the fundus picture comprises:

8. A retinopathy recognition device characterized in that a retinopathy recognition model obtained by applying the method for generating a retinopathy recognition model according to any one of claims 1 to 7 includes;

9. A terminal device comprising a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the method of generating a retinopathy identification model of any of the above claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the method for generating a retinopathy identification model according to any one of claims 1-7.