CN112101424B

CN112101424B - Method, device and equipment for generating retinopathy identification model

Info

Publication number: CN112101424B
Application number: CN202010857801.5A
Authority: CN
Inventors: 雷柏英; 张汝钢; 汪天富; 张国明
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2023-08-04
Anticipated expiration: 2040-08-24
Also published as: CN112101424A

Abstract

The invention provides a generation method, a recognition device and equipment of a retina pathological change recognition model, which are used for training a preset network model by utilizing fundus pictures contained in a training set to obtain the retina pathological change recognition model after training, wherein the fundus pictures in the training set carry classification mark information corresponding to the fundus pictures, the classification mark information is retina characteristic classification information corresponding to the fundus pictures, and the preset network model learns the classification mark information contained in the fundus pictures in the training process, so that the retina pathological change recognition model after training can be used for recognizing retina images contained in a detection picture to give accurate classification results of retina characteristics. According to the method, the device and the equipment provided by the embodiment, the network model for identifying the retinopathy is trained in a deep learning mode, so that the accuracy and the detection efficiency of fundus image identification are improved.

Description

Method, device and equipment for generating retinopathy identification model

Technical neighborhood

The present invention relates to the field of medical image processing technologies, and in particular, to a method, an apparatus, and a device for generating a retinopathy recognition model.

Background

Acute progressive posterior-most premature retinopathy (AP-ROP) is a particular type of retinopathy. Unlike conventional Retinopathy (ROP), AP-ROP may be accompanied by conditions such as plus lesions, retinal hemorrhages, neovascularization, etc., and for diagnosis of AP-ROP, there are clinically several problems: first, the course of AP-ROP is not routine and does not follow the course of routine ROP from stage 1 to stage 5. Meanwhile, the incidence rate of the AP-ROP is low, the symptoms of the AP-ROP are atypical, many ophthalmologists have insufficient symptom diagnosis experience, and the possibility that the infant suffering from the AP-ROP cannot be diagnosed in time is increased.

Deep learning has been applied to the field of image recognition, and in the prior art, an evaluation model for diagnosing early diseases has been built based on deep learning, but a neural network evaluation model for recognizing retinopathy has not yet appeared, and if a doctor only relies on to recognize an eye fundus image in actual application, so that fundus image information is wrongly recognized, the deviation of recognition information is likely to be larger, the efficiency of information recognition is low, and objectivity is lacking.

Accordingly, there is a need for further improvements in the art.

Disclosure of Invention

In view of the above-mentioned shortcomings in the prior art, the present invention aims to provide a method, an apparatus and a device for generating a retinopathy recognition model, which overcome the defects that in the prior art, no neural network model for performing feature recognition on an fundus image exists, and only information contained in the fundus image can be recognized by means of human eyes, so that the image feature recognition efficiency is low and subjectivity is strong.

The scheme disclosed by the embodiment of the invention is as follows:

in a first aspect, the present embodiment discloses a method for generating a retinopathy identification model, which includes:

a preset network model generates prediction classification information corresponding to fundus pictures in a training set according to fundus pictures, wherein the training set comprises a plurality of groups of training samples, and each group of training samples comprises fundus pictures and classification mark information corresponding to the fundus pictures;

and correcting model parameters by the preset network model according to the prediction classification information corresponding to the fundus picture and the classification mark information corresponding to the fundus picture, and continuously executing the step of generating the prediction classification information corresponding to the fundus picture by the preset network model according to the fundus picture in the training set until the training condition of the preset network model meets the preset condition to obtain the retinopathy recognition model.

Optionally, the preset network model includes: the device comprises a residual excitation module, a layered bilinear pooling module and a classification module;

the step of generating prediction classification information corresponding to fundus pictures by the preset network model according to fundus pictures in a training set comprises the following steps:

inputting fundus pictures in the training set to the residual excitation module to obtain a layer characteristic diagram corresponding to the fundus pictures, which is output by the residual excitation module;

inputting the layer characteristic diagram into the layered double-linear pooling module to obtain an interlayer interaction characteristic diagram output by the layered double-linear pooling module;

and inputting the interlayer interaction characteristic diagram to the classification module to obtain the prediction classification information input by the classification module.

Optionally, the residual excitation module comprises k residual blocks, and each residual block comprises a convolution unit and an extrusion excitation unit;

the step of inputting the fundus picture in the training set to the residual excitation module to obtain a layer characteristic diagram corresponding to the fundus picture output by the residual excitation module comprises the following steps:

inputting fundus pictures in the training set to a first convolution unit in a first residual block to obtain a first convolution feature map output by the first convolution unit;

inputting the first convolution characteristic diagram to a first extrusion excitation unit in a first residual block to obtain a first reconstruction characteristic diagram output by the first extrusion excitation unit;

adding the first reconstruction feature map and the first convolution feature map, and then inputting the added first reconstruction feature map and the first convolution feature map into a second convolution unit of a second residual block to obtain a second convolution feature map output by the second convolution unit;

inputting the second convolution characteristic diagram to a second extrusion excitation unit in a second residual block to obtain a second reconstruction characteristic diagram output by the second extrusion excitation unit;

and continuously adding the reconstructed feature map output in the previous residual block and the convolution feature map output by the convolution unit in the previous residual block, and then inputting the added reconstructed feature map into the next residual block until the extrusion excitation unit of the kth residual block outputs a layer feature map, wherein k is a positive integer.

Optionally, the squeeze exciting unit includes: a pooling layer and a feature excitation layer;

the step of inputting the first convolution feature map to a first extrusion excitation unit in a first residual block to obtain a first reconstruction feature map output by the first extrusion excitation unit includes:

inputting the first convolution feature map to the pooling layer, and performing global maximum pooling treatment on the first convolution feature map through the pooling layer to obtain a compression feature map;

inputting the compressed feature map to the feature excitation layer to obtain a first reconstructed feature map output by the feature excitation layer; the first reconstruction feature map is a feature map obtained by weighting a first convolution feature map input to each channel through multiplication according to weights corresponding to each channel, and the weights corresponding to each channel are determined by the feature excitation layer according to the compression feature map.

Optionally, the hierarchical bilinear pooling module includes P target convolutional layers and P interaction layers;

the step of inputting the layer characteristic diagram to the layered double-linear pooling module to obtain the interlayer interaction characteristic diagram output by the layered double-linear pooling module comprises the following steps:

respectively inputting the reconstructed feature graphs output by the residual blocks arranged from k-P to k into the P target convolution layers in a one-to-one correspondence manner to obtain high-dimensional convolution feature graphs output by the P target convolution layers;

after element-by-element multiplication is carried out on each different high-dimensional convolution feature diagram, the element-by-element multiplication is input to each interaction layer, and P interlayer interaction feature diagrams output by each interaction layer are obtained.

Optionally, the classification module comprises a pooling layer and a full connection layer;

the step of inputting the interlayer interaction characteristic diagram to the classification module to obtain the prediction classification information input by the classification module comprises the following steps:

respectively inputting the P interlayer interaction feature images into a pooling layer to obtain pooled feature images after pooling fusion;

and inputting the pooling feature map to the full-connection layer to obtain a prediction classification result output by the full-connection layer.

Optionally, the step of correcting the model parameters by the preset network model according to the prediction classification information corresponding to the fundus picture and the classification mark information corresponding to the fundus picture includes:

and calculating a Loss value between the prediction classification information and the classification mark information by using a Focal Loss function, and correcting the model parameters by using the calculated Loss value.

In a second aspect, the present embodiment discloses a retinopathy recognition device, where a retinopathy recognition model obtained by applying the method for generating a retinopathy recognition model includes;

the image acquisition module is used for acquiring fundus images to be identified;

and the classification recognition module is used for carrying out image feature recognition on the fundus image to be recognized through the retinopathy recognition model to obtain a prediction classification result.

In a third aspect, the present embodiment provides a terminal device, including a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform steps implementing the method of generating a model of retinopathy identification.

In a fourth aspect, the present embodiment discloses a computer readable storage medium, where the computer readable storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of the method for generating a retinopathy identification model.

The invention has the beneficial effects that the retina pathological change identification model is obtained by training the preset network model by utilizing the fundus pictures contained in the training set, wherein the fundus pictures in the training set carry the classification mark information corresponding to the fundus pictures, the classification mark information is the retina characteristic classification information corresponding to the fundus pictures, and the preset network model learns the classification mark information contained in the fundus pictures in the training process, so that the retina pathological change identification model after training can be used for identifying retina images contained in the detection pictures to give accurate classification results of retina characteristics. The method provided by the embodiment adopts a deep learning mode to train a network model for retina pathological change identification, so that early screening of the retina pathological change is realized by using computer-aided identification, the computer-aided identification can not only accelerate the identification speed, but also improve the identification accuracy, and the method has important significance for reducing the blindness rate of the retinal pathological change.

Drawings

FIG. 1 is a schematic illustration of different degrees of retinopathy in fundus images;

FIG. 2 is a flowchart showing steps of a method for generating a model for identifying retinopathy according to the present embodiment;

FIG. 3 is a schematic diagram of the structure of the retinopathy recognition model provided in the present embodiment;

FIG. 4 is a schematic block diagram of the identification device according to the present embodiment;

fig. 5 is a schematic diagram of a terminal device according to the present embodiment;

fig. 6a to 6d are graphs comparing the effect of ResNet-50 used in the retinopathy recognition model provided in this example with the effect of T-SNE visualization using ResNet-101 and SE-HBP networks, respectively.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Retinopathy of prematurity (ROP) is a retinal vascular proliferative blindness disease that occurs in premature or low birth weight infants. The course of ROP progresses rapidly, which can lead to blindness if not timely treated. With the advancement of neonatal care, survival rates of premature infants have increased, and the incidence of ROP has increased in low and medium income countries.

Acute progressive post-natal retinopathy of prematurity (AP-ROP) is a special type of retinopathy of prematurity (ROP) and is one of the causes of blindness in premature infants. It is mainly manifested by a vasodilation of the posterior pole of the retina and a severe tortuosity. This particular retinopathy of prematurity progresses rapidly, but is prone to misdiagnosis due to its unusual and atypical nature, and thus patient is prone to missing optimal treatment times. High importance should be attached after diagnosing the specific lesion, and if the patient is diagnosed without timely intervention treatment, it will rapidly progress to the fifth stage of ROP, which is easy to cause retinal detachment and will lead to blindness of the patient, as shown in fig. 1. Early screening of AP-ROP plays an important role in reducing the rate of blindness of the disease.

For automatic classification of AP-ROPs, the following challenges are faced: the parting line or ridge of the AP-ROP is blurry, low in contrast, and difficult to see or recognize. In some AP-ROP cases, the tortuosity of the blood vessels is low, similar to normal fundus images, which easily confuses the classification network. In clinical diagnosis, when a premature ophthalmologist acquires fundus images, the premature ophthalmologist captures retinal fundus images at a plurality of angles, so that each fundus image will have a different field of view, and the position at which a lesion appears on the fundus image will also be different. The quality of AP-ROP fundus images varies, many of which are inferior in image quality due to imaging device quality, shooting technique by an operator, and the like. The morbidity of the AP-ROP is low, the case number is far lower than that of the conventional ROP, and serious quantity unbalance exists between training data, so that the training result of the classification network can be influenced.

In order to overcome the above-mentioned challenges, the present embodiment provides a method for generating a retinopathy identification model, by training a preset network model, a network model capable of accurately identifying retinal features contained in an fundus image is obtained, and if the retina identification model provided by the present embodiment is used for identifying fundus images, the rate of retina identification can be increased, the identification accuracy is improved, and the early screening of AP-ROP is facilitated. And in the preset network model disclosed in the embodiment, a layered double-linear pooling unit and an extrusion excitation unit are added, the layered double-linear pooling unit is used for increasing the complementation of interlayer information, reducing the loss of useful information between layers, and the extrusion excitation unit is used for improving the characteristic extraction capability of the network, so that the problems related to the challenges are solved.

The method disclosed in the present invention will be explained in more detail with reference to the accompanying drawings.

Exemplary method

In a first aspect, this embodiment discloses a method for generating a retinopathy identification model, as shown in fig. 2, including:

step S1, a preset network model generates prediction classification information corresponding to fundus pictures according to fundus pictures in a training set, wherein the training set comprises a plurality of groups of training samples, and each group of training samples comprises fundus pictures and classification mark information corresponding to the fundus pictures.

In the step, fundus pictures used in a training set are collected first, a plurality of groups of training samples are established from the collected fundus pictures, and then fundus pictures in each group of training samples are input into a preset network model.

In order to train a network model with better feature recognition, fundus images taken in different angles and different views in a training set are selected, so that the network model can accurately recognize retinal features in fundus images taken in different views.

The method for acquiring the fundus pictures in the step can be obtained by shooting through a camera of the terminal equipment, can also be obtained from other terminal equipment, wherein the fundus pictures are shot by other terminal equipment, and in addition, each fundus picture in the training set can also be obtained from a cloud server. Fundus pictures can also be obtained in a number of ways, including the three described above.

Each fundus picture in the training set is marked with a corresponding classification, and specifically, the classification comprises a normal fundus picture, an AP-ROP picture and a conventional ROP picture. The preset network model can learn the information marked in the eye bottom image to realize the identification of the classification corresponding to the unlabeled image, and the retinal feature classification corresponding to the image to be identified is obtained. In one embodiment, the labels are 0, 1, 2, respectively, AP-ROP if the network result is 0, 1 is conventional ROP, and 2 is a normal fundus image.

And S2, correcting model parameters according to the prediction classification information corresponding to the fundus picture and the classification mark information corresponding to the fundus picture by the preset network model, and continuously executing the step of generating the prediction classification information corresponding to the fundus picture by the preset network model according to the fundus picture in the training set until the training condition of the preset network model meets the preset condition so as to obtain the retinopathy recognition model.

After the fundus picture is input into the preset network model, the preset network model outputs prediction classification information corresponding to the fundus picture. The prediction classification information is a category corresponding to retinal features corresponding to the fundus picture, and the category comprises: normal fundus picture, normal ROP picture, and AP-ROP picture.

Specifically, referring to fig. 3, the preset network model includes: a residual excitation module 10, a layered bilinear pooling module 20, and a classification module 30.

The method comprises the steps that fundus pictures input into a preset network model are subjected to convolution, extrusion, excitation and other treatments through a residual excitation module, the obtained feature pictures are input into a layered bilinear pooling module, the layered bilinear pooling module is used for carrying out operations such as rolling and pooling, the obtained feature pictures are input into a classification module, a full-connection layer is contained in the classification model, a final feature classification result is output through the full-connection layer, and whether retina features contained in the fundus pictures input into the preset network model are normal fundus pictures or AP-ROP pictures is identified.

Specifically, the step of generating the prediction classification information corresponding to the fundus picture by the preset network model according to the fundus picture in the training set includes:

and S11, inputting fundus pictures in the training set to the residual excitation module to obtain a layer characteristic diagram corresponding to the fundus pictures, which is output by the residual excitation module.

Specifically, the residual excitation module comprises k residual blocks, and each residual block comprises a convolution unit and an extrusion excitation unit;

The squeeze-excitation unit is a channel-based attention mechanism. It models the important information of each channel and then enhances or suppresses the relevant channel according to the task.

Further, the squeeze exciting unit includes: a pooling layer and a feature excitation layer;

In a specific application embodiment, the squeeze stimulus unit is incorporated into the Res module of the ResNET50 network. First, assume that in the Res module, the convolution unit convolves an input feature image with a feature F of hxw×c, the compression operation compresses the feature, it performs global maximum pooling of the primary space, and changes the two-dimensional feature channel into a real number with a global receptive field, i.e., 1×1×c, which characterizes the global distribution of the response over the feature channel. Second, the excitation operation generates weights for each feature channel by inputting the compressed features into a two-layer convolutional neural network, which is used to explicitly model the correlation between the feature channels. Finally, the output weight of excitation is weighted to the previous characteristic channel by channel through multiplication by the operation of re-weighting, and the re-calibration of the characteristic is completed.

And step S12, inputting the layer characteristic diagram into the layered double-linear pooling module to obtain the interlayer interaction characteristic diagram output by the layered double-linear pooling module.

The hierarchical bilinear pooling module comprises P target convolution layers and P interaction layers;

The hierarchical bilinear pooling module itself is used for fine-grained image recognition and exhibits excellent results. In fine-grained image recognition, image categories are generally highly similar, and can be recognized only by using local differences, and at the same time, each image background is very different.

While challenge 1 of the AP-ROP fundus image identification classification is similar to challenge 2 with fine-grained image identification.

The hierarchical bilinear pooling module firstly expands the features of different layers into a high-dimensional space by establishing independent linear mapping in a network, and then integrates the features of different convolution layers through element-by-element multiplication. And then, the hierarchical bilinear pooling module compresses the high-dimensional characteristics through summation to obtain interlayer interaction characteristics, so that interlayer interaction of the network is realized. The single interaction layer expression is as follows:

z _int ＝U ^T x·V ^T y (1)

z＝P ^T z _int (2)

and connecting a plurality of interaction layers in series to obtain interaction characteristics of the layered double-linear pooling module, wherein the expression is as follows:

Z _HBP ＝HBP(x,y,z,…)＝P ^T z _int

＝P ^T concat(U ^T x·V ^T y,U ^T x·S ^T z,V ^T y·S ^T z,…) (3)

wherein P is a classification matrix, U, V, S, … are projection matrices of the convolution layer features x, y, z, …, respectively

And S13, inputting the interlayer interaction characteristic diagram to the classification module to obtain prediction classification information input by the classification module.

Specifically, the step of correcting the model parameters by the preset network model according to the prediction classification information corresponding to the fundus picture and the classification mark information corresponding to the fundus picture includes:

To solve the problem of data imbalance, a Focal-Loss function is used in the network. When calculating the classification loss, a common loss function is the cross entropy function:

by adding the parameters α and γ to equation (4), the equation becomes:

equation (5) is Focal-Loss, where the values of α and γ are usually fixed, α is 0.25, γ is 2, and experimental results indicate that this is the best.

When the data is misclassified by the network, the p-value will be small, so the impact of the Focal-Loss adjustment weight on the Loss function is small. However, for data that is more easily judged to be correct by the network, the p value will be large and the loss value will be greatly reduced, and the impact of the data that is easily judged by the network on the network training will be small. The network solves the problem of unbalanced data quantity of the AP-ROP by adjusting the weights of the different difficult-to-judge data through the Focal-Loss.

Referring to fig. 3, the extrusion excitation unit is an SE module in fig. 3, and the residual excitation module proposed in this embodiment is implemented by combining a res net50 and an SE module, specifically, the SE module is inserted into a convolution portion of a res net50 network, and after performing compression excitation processing on a feature output by the convolution portion, performs feature addition. That is, in the residual excitation module, the output of the convolution part is input to the SE module, the output of the SE module is added to the input of the residual network, and the output of the convolution part is directly added to the input of the residual module without being inserted into the residual module of the SE module, and the insertion of the SE module can strengthen the useful information of the characteristic channel and inhibit the useless information of the characteristic channel, thereby improving the characteristic extraction capability of the network. This helps to improve the ability of the network to extract useful information from images of poor quality.

The layered bilinear pooling module is an HBP module in the figure, each interaction layer in the HBP module is pooled, the three layers are fused and then input into the full-connection layer, and the module can utilize information interaction between convolution layers to carry out interlayer information complementation, so that loss of useful information between layers is reduced. This helps to reduce the effects of differences in the field of view of the image, and also helps to solve the problem of feature similarity between different classes of fundus images.

Exemplary apparatus

On the basis of disclosing the method for generating the retinopathy identification model, the embodiment also discloses a retinopathy identification device, as shown in fig. 4, wherein the retinopathy identification device applies the retinopathy identification model obtained by the method for generating the retinopathy identification model, and the identification device comprises;

an image acquisition module 110 for acquiring a fundus image to be identified; the retina contained in the fundus image to be identified may be a normal fundus retina, or may be a retina that has developed a lesion.

The classification recognition module 120 is configured to perform image feature recognition on the fundus image to be recognized through the retinopathy recognition model, so as to obtain a prediction classification result.

On the basis of the method, the embodiment also provides a terminal device, which comprises a processor and a storage medium in communication connection with the processor, wherein the storage medium is suitable for storing a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform steps implementing the method of generating a model of retinopathy identification.

On the basis of the method, the embodiment also discloses a terminal device, which comprises a processor and a storage medium in communication connection with the processor, wherein the storage medium is suitable for storing a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform steps implementing a method of generating the retinopathy identification model. In an implementation manner, the terminal device may be a mobile phone, a tablet computer or a smart television.

Specifically, as shown in fig. 5, the terminal device includes at least one processor (processor) 20 and a memory (memory) 22, and may further include a display 21, a communication interface (Communications Interface) 23, and a bus 24. Wherein the processor 20, the display 21, the memory 22 and the communication interface 23 may communicate with each other via a bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may invoke logic instructions in the memory 22 to perform the steps of the method of the above-described embodiments.

Further, the logic instructions in the memory 22 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 22, as a computer readable storage medium, may be configured to store a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 30 performs the functional applications and data processing by running software programs, instructions or modules stored in the memory 22, i.e. implements the methods described in the embodiments above.

The memory 22 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include high-speed random access memory, and may also include nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

In another aspect, the present embodiment provides a computer readable storage medium, where the computer readable storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of the method for generating a retinopathy identification model described above.

The following describes the training steps of the retinopathy identification model generated by the method of the present invention and the various properties verified by the training steps with the specific application verification example of the method of the present invention.

The collection of fundus pictures in the training set is performed by a professional ophthalmologist, and the collection device is RetCap 3. In order to obtain image information of the entire eyeball, fundus images of a plurality of fields of view must be captured when collecting data. Thus, each data collection will capture fundus images for multiple angularly different fields of view. As long as focus information is captured in a single field of view, the infant suffering from AP-ROP can be judged. Images containing lesion information from different angles are integrated into the fundus image dataset as a single image unit.

Table 1 data set details

Dataset annotation was performed by a professional ophthalmologist. The same data set was distributed to three specialized ophthalmologists, respectively, to complete annotation of the data set. Thereafter, we pick fundus images with different opinions and discuss them. The result of the image is determined using voting. If the image quality is poor, the image will be discarded.

Table 1 lists details of the data sets. Wherein, the conventional ROP data comprises phase I, II and III fundus images of ROP. In the dataset, fundus images of the respective categories are randomly distributed. To save computational resources, the original fundus image size is adjusted to 224x224.

The study divided the data into two parts: training sets and test sets. The training set is used only to train the network, while the test set is completely independent of the training set for evaluating the performance of the network. Fundus image data of different fields of view will be randomly distributed, which can simulate the condition of examination of fundus diseases in hospitals to some extent. We use several of the most common evaluation criteria to evaluate the model generated by the method described in this example, including accuracy, sensitivity, specificity, precision and F1 coefficient.

A number of sets of comparison experiments were performed in this study, including comparing the modules described in this study with the original network and comparing on the res net50 and 101 networks, respectively. The experimental results are shown in table 2. The addition of each module can be achieved to effectively enhance the network compared to the original network. The network proposed in this embodiment obtains the best performance, which indicates that the method provided in this embodiment is effective.

Table 2 network results display

The T-SNE visualization effects of ResNet-50, resNet-101 and SE-HBP networks are shown in FIGS. 6 a-6 d, and it can be seen that the retinopathy recognition model provided in this embodiment has better recognition effects for classification of AP-ROP, ROP and conventional fundus images.

To better evaluate the SE-HBP network in this study, we invited three specialized ophthalmologists to participate in the network performance evaluation and used a new comparison dataset. The comparative dataset had a total of 100 fundus images including 50 normal fundus images, 30 conventional ROP fundus images and 20 AP-ROP fundus images. The comparison dataset was distributed to three specialized ophthalmologists for diagnosis and statistics, respectively, while the dataset was used to evaluate the SE-HBP 50 network. The final results are shown in Table 3. From table 3, it can be obtained that the network disclosed in this embodiment has a certain advantage with respect to the ophthalmologist in terms of diagnosis of AP-ROP, whereas the classification results of the normal fundus image and the normal ROP fundus image are already close to the ophthalmologist. Also, ophthalmic doctors are prone to misdiagnosis due to atypical AP-ROP symptoms. The identification model disclosed in this embodiment can effectively assist the ophthalmologist in diagnosing AP-ROP.

Table 3 network evaluation results

The identification device of the research can be seen from the test result, and in the test set of 2408 pictures, the accuracy reaches 96.59 percent, so that the identification device can assist doctors to carry out early screening on ROP and AP-ROP to a certain extent. The existence of the system can also assist doctors lacking AP-ROP visit experience to judge the disease. Meanwhile, the artificial intelligence detection speed is far faster than that of a human doctor, so that the detection efficiency is greatly improved.

In this study, the present embodiment provides a novel convolutional neural network structure, so as to automatically identify and classify the fundus image. And the HBP module is utilized to complement information between convolution layers, so that the loss of information between the layers is reduced, and the expressive power of the network is enhanced. And simultaneously, the SE module is utilized to enhance the feature extraction capability of the convolutional neural network. Finally, aiming at the problem of unbalanced quantity of AP-ROP data, focal-Loss is used in the network, and the training influence of the data on the network is balanced by giving different weights to the data with different classification difficulty and different orders of magnitude. SE-HBP 50 was constructed based on ResNet 50. Experimental results show that the network performance is improved. In the comparative experiment, the results of SE-HBP 50 were optimal with an accuracy of 96.59%. From the result of the T-SNE, it can be seen that the network identification classification effect provided in this embodiment is more differentiated. With this network, automatic identification of the AP-ROP is achieved, which will help the ophthalmologist to perform early screening of the AP-ROP and auxiliary diagnosis of the AP-ROP.

It will be understood that equivalents and modifications will occur to persons skilled in the art in light of the present teachings and their spirit, and all such modifications and substitutions are intended to be included within the scope of the present invention as defined in the following claims.

Claims

1. A method of generating a retinopathy recognition model, comprising:

the preset network model corrects model parameters according to the prediction classification information corresponding to the fundus picture and the classification mark information corresponding to the fundus picture, and continues to execute the step of generating the prediction classification information corresponding to the fundus picture by the preset network model according to the fundus picture in the training set until the training condition of the preset network model meets the preset condition so as to obtain the retinopathy recognition model;

the preset network model comprises the following steps: the device comprises a residual excitation module, a layered bilinear pooling module and a classification module;

inputting the interlayer interaction characteristic diagram to the classification module to obtain prediction classification information input by the classification module;

the residual excitation module comprises k residual blocks, and each residual block comprises a convolution unit and an extrusion excitation unit;

continuously adding the reconstructed feature map output in the previous residual block and the convolution feature map output by the convolution unit in the previous residual block, and then inputting the added reconstructed feature map into the next residual block until an extrusion excitation unit of the kth residual block outputs a layer feature map, wherein k is a positive integer;

the squeeze exciting unit includes: a pooling layer and a feature excitation layer;

2. The method of claim 1, wherein the hierarchical bilinear pooling module comprises P target convolutional layers and P interaction layers;

3. The method of claim 2, wherein the classification module comprises a pooling layer and a fully connected layer;

4. The method according to claim 1, wherein the step of correcting the model parameters by the preset network model according to the predicted classification information corresponding to the fundus picture and the classification flag information corresponding to the fundus picture includes:

5. A retinopathy recognition device characterized by applying the retinopathy recognition model obtained by the method for generating a retinopathy recognition model according to any one of claims 1 to 4, comprising;

6. A terminal device comprising a processor, a storage medium in communication with the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the method of generating a retinopathy identification model as claimed in any of the preceding claims 1-4.

7. A computer-readable storage medium storing one or more programs executable by one or more processors to perform the steps of the method of generating a model of retinopathy identification as claimed in any one of claims 1-4.