CN114266769B

CN114266769B - System and method for identifying eye diseases based on neural network model

Info

Publication number: CN114266769B
Application number: CN202210192386.5A
Authority: CN
Inventors: 刘家明; 陈荡荡
Original assignee: Beijing Airdoc Technology Co Ltd
Current assignee: Beijing Airdoc Technology Co Ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-06-21
Anticipated expiration: 2042-03-01
Also published as: CN114266769A

Abstract

The disclosure relates to a system and a method for eye disease identification based on a neural network model. The system includes one or more processors; segmenting the network model; classifying and identifying the network model; and one or more computer-readable storage media storing program instructions implementing the segmented network model and the classification-identifying network model, which when executed by the one or more processors, cause: the segmentation network model receives an anterior segment image and segments the anterior segment image to obtain a physiological structure area image in the anterior segment; and the classification and identification network model receives the physiological structure region image in the anterior segment of the eye from the segmentation network model and performs classification and identification on the physiological structure region image so as to output an identification result for eye disease identification. By means of the scheme, the high-precision eye disease recognition result can be obtained based on the anterior segment image.

Description

System and method for identifying eye diseases based on neural network model

Technical Field

The present disclosure relates generally to the field of image processing technology. More particularly, the present disclosure relates to a system, method, and computer-readable storage medium for eye disease identification based on neural network models.

Background

The eyes are important tools for people to know the world and engage in various life and work. The biological data of the anterior segment is a key index reflecting the health status of eyes. With the development of artificial intelligence, the analysis of the anterior segment image by using a computer vision technology becomes possible, for example, a physiological structure is segmented by the acquired anterior segment image, and then the biological data of the anterior segment is analyzed based on the physiological structure, which has important practical significance for the methods of diagnosis, treatment plan drawing, treatment effect evaluation and the like of various eye diseases in clinical medicine.

Currently, the diagnosis of eye diseases is generally performed in the following two ways. Namely, the image of the eye is processed by the optical device to obtain a diagnosis image, then the clinician judges whether the user has eye diseases or not according to the experience diagnosis image, and obtains a corresponding amplitude spectrogram or a corresponding phase spectrum of the image of the eye by the AR glasses, and then the diagnosis device can automatically diagnose whether the user has eye diseases or not through the image of the eye. However, the former mainly depends on personal subjective judgment, and different doctors may judge results inconsistent and have long diagnosis time; in the latter, although it is not necessary to manually diagnose whether the user has the eye diseases, various eye diseases cannot be finely distinguished. Therefore, how to efficiently and accurately identify the eye diseases becomes a technical problem to be solved.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, the solution of the present disclosure provides a solution for eye disease identification based on neural network models. By using the scheme disclosed by the invention, the high-precision eye disease identification result can be quickly obtained. To this end, the present disclosure provides solutions in a number of aspects as follows.

In one aspect, the present disclosure provides a system for ocular disease identification based on a neural network model, comprising: one or more processors; segmenting the network model; classifying and identifying the network model; and one or more computer-readable storage media storing program instructions implementing the segmented network model and the classification-identifying network model, which when executed by the one or more processors, cause: the segmentation network model receives an anterior ocular segment image and segments the anterior ocular segment image to obtain a physiological structure region image in the anterior ocular segment; and the classification and identification network model receives the physiological structure region image in the anterior segment of the eye from the segmentation network model and performs classification and identification on the physiological structure region image so as to output an identification result for eye disease identification.

In one embodiment, the image of the physiological structure area in the anterior segment of the eye comprises at least a bulbar conjunctiva area image, a palpebral conjunctiva area image, an iris area image and/or a cornea area image.

In another embodiment, the one or more computer-readable storage media further store program instructions to pre-process the anterior ocular segment image, which when executed by the one or more processors, perform image transformation and/or image normalization operations on the anterior ocular segment image to pre-process the anterior ocular segment image.

In yet another embodiment, the one or more computer-readable storage media further store program instructions to implement a normalization operation that, when executed by the one or more processors, cause: performing a normalization operation on the segmentation result of the segmentation network model to obtain a physiological structure region image in the anterior segment of the eye; and performing normalization operation on the classification recognition result of the classification recognition network model to output a recognition result for eye disease recognition.

In yet another embodiment, the one or more computer-readable storage media further stores program instructions to calculate a loss function in the segmented network model that, when executed by the one or more processors, cause: respectively calculating a first loss function and a second loss function in the segmentation network model; and adding the first loss function and the second loss function to obtain a final loss function in the segmented network model.

In yet another embodiment, the split network model comprises an encoding module and a decoding module, wherein: the encoding module comprises a plurality of first residual blocks and is used for performing down-sampling encoding on an anterior ocular segment image to extract a first characteristic image related to a physiological structure area image in the anterior ocular segment; and the decoding module comprises a plurality of second residual blocks and is used for performing upsampling decoding on the basis of the first characteristic image so as to obtain a physiological structure area image in the anterior segment of the eye.

In yet another embodiment, each of the first residual blocks includes a plurality of first convolutional layers and a pooling layer, each of the second residual blocks includes an anti-convolutional layer and a plurality of second convolutional layers, and an output of one of the plurality of first residual blocks is connected to an input of a first one of the plurality of second residual blocks.

In yet another embodiment, in the upsampling decoding based on the first feature image to obtain the image of the physiological structure region in the anterior segment, the decoding module is further configured to: performing up-sampling decoding based on the first feature image to extract a second feature image related to a physiological structure region image in the anterior segment of the eye; and performing feature fusion on the part of the second feature image and the part of the first feature image to obtain a physiological structure area image in the anterior segment of the eye.

In yet another embodiment, the classification recognition network model includes a feature extraction module and a classification recognition module, wherein: the feature extraction module comprises a plurality of third convolution layers and a plurality of depth separation convolution layers and is used for performing feature extraction on a physiological structure area image in the anterior segment of the eye to obtain a third feature image; the classification identification module comprises a flattening layer and a full connection layer and is used for performing classification identification on the third feature image so as to output an identification result for identifying the eye diseases, wherein the identification result comprises the probabilities of various eye diseases.

In yet another embodiment, the plurality of depth-separated convolutional layers are arranged between a plurality of third convolutional layers and are connected in series, and an output of the last third convolutional layer connected in series is connected to an input of the flattened layer, and an output of the flattened layer is connected to an input of the fully-connected layer.

In yet another embodiment, the one or more computer-readable storage media further stores program instructions for determining a final recognition result for the respective disease based on the probabilities of the plurality of ocular diseases, which when executed by the one or more processors, causes: selecting a maximum probability of the probabilities for each eye disease; and determining the eye disease corresponding to the maximum probability as a final recognition result of the corresponding eye disease.

In another aspect, the present disclosure also provides a method for eye disease identification based on a neural network model, comprising: inputting the anterior segment image into a segmentation network model; segmenting the anterior segment image using the segmentation network model to obtain a physiological structure region image in the anterior segment; and using a classification and identification network model to receive the physiological structure region image in the anterior segment of the eye from the segmentation network model and perform classification and identification on the physiological structure region image so as to output an identification result for eye disease identification.

In yet another aspect, the present disclosure also provides a computer-readable storage medium having stored thereon computer-readable instructions for neural network model-based eye disease identification, which, when executed by one or more processors, implement an embodiment as described in another aspect above.

According to the scheme, the neural network model is used for segmenting and classifying the anterior segment image, so that the process from coarse to fine is realized, and the high-precision recognition result for the eye diseases can be quickly obtained. Furthermore, the embodiment of the disclosure introduces a residual block into the segmentation network model and performs feature fusion on the partial features extracted in the up-sampling decoding process and the partial features extracted in the down-sampling coding process, so that the loss of feature information can be avoided, and the integrity of the extracted feature image information is ensured. In addition, the embodiment of the disclosure also adopts two different loss functions to perform linear combination to obtain the loss function in the segmentation network model, thereby avoiding network degradation caused by disappearance of network gradient and improving the segmentation precision of the image.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

fig. 1 is a block diagram illustrating an exemplary structure of a system for eye disease recognition based on a neural network model according to an embodiment of the present disclosure;

FIG. 2 is an exemplary diagram illustrating segmentation of an anterior segment image using a segmentation network model in accordance with an embodiment of the present disclosure;

FIG. 3 is an exemplary diagram illustrating the classification of images of a region of a physiological structure using a classification recognition network model according to an embodiment of the present disclosure;

FIG. 4 is an exemplary diagram of an ensemble of neural network model based ocular disease identification, according to an embodiment of the present disclosure; and

fig. 5 is an exemplary flowchart illustrating a method of eye disease identification based on a neural network model according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the embodiments described in this specification are only some of the embodiments of the present disclosure provided to facilitate a clear understanding of the aspects and to comply with legal requirements, and not all embodiments of the present disclosure may be implemented. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed in the specification without making any creative effort, shall fall within the protection scope of the present disclosure.

Fig. 1 is a block diagram illustrating an exemplary architecture of a system 100 for eye disease identification based on neural network models, according to an embodiment of the present disclosure. As shown in fig. 1, the system 100 may include one or more processors 101, which may include, for example, a general purpose processor ("CPU") or a special purpose graphics processor ("GPU"). Further, the system 100 of the embodiment of the present disclosure may further include a segmentation network model 102 and a classification recognition network model 103.

By way of example, the above-described segmented network model 102 and class recognition network model 103 of embodiments of the present disclosure may be implemented as program instructions stored on a computer-readable storage medium, such as a computer, mobile computing device, or other form of computing device. Depending on the application scenario, the computer-readable storage medium may be one or more of, and may be any type of storage medium capable of storing program instructions. During the performance of the ocular disease recognition of the present disclosure, the processor 101 described above may execute program instructions stored on a computer-readable storage medium, such that execution of the program instructions enables the operations performed by the segmentation network model 102 and the classification recognition network model 103.

In particular, the processor, when executing the one or more program instructions described above, may cause the segmentation network model of embodiments of the present disclosure to receive and segment an anterior ocular segment image to obtain an image of a physiological structure region in the anterior ocular segment. In one embodiment, the aforementioned anterior ocular segment image may be acquired by an optical camera in close proximity to the human eye. Based on the foregoing description, the above-mentioned computer-readable storage medium further stores program instructions for pre-processing an anterior ocular segment image, which when executed by one or more processors, performs image transformation and/or image normalization operations on the above-mentioned anterior ocular segment image to pre-process the anterior ocular segment image. For example, the size of the anterior ocular segment image is changed by performing an image transformation operation on the anterior ocular segment image, and the pixel value of the anterior ocular segment image is normalized from (0, 255) to (0, 1) by performing an image normalization operation on the anterior ocular segment image.

It is to be understood that the physiological structure area in the anterior segment of the eye comprises at least, for example, a bulbar conjunctiva area, a palpebral conjunctiva area, an iris area and/or a cornea area, whereby the obtained physiological structure area image after segmentation via the aforementioned segmentation network model comprises at least a bulbar conjunctiva area image, a palpebral conjunctiva area image, an iris area image and/or a cornea area image. Accordingly, the classification recognition network model of the embodiment of the present disclosure is caused to receive the physiological structure region image in the aforementioned anterior segment from the segmentation network model and perform classification recognition on the physiological structure region image to output a recognition result for eye disease recognition. In one implementation scenario, the recognition result may include probabilities of a plurality of eye diseases.

In one application scenario, the above-mentioned segmentation network model may include an encoding module and a decoding module, and both the encoding module and the decoding module introduce a residual block, thereby avoiding loss of image feature information. In particular, the aforementioned encoding module may include a plurality of first residual blocks and is configured to down-sample encode the anterior ocular segment image to extract a first feature image related to the physiological structure region image in the anterior ocular segment. The decoding module may include a plurality of second residual blocks, and is configured to perform upsampling decoding based on the first feature image to obtain a physiological structure region image in the anterior segment of the eye. More specifically, the decoding module is further configured to perform upsampling decoding on the basis of the first feature image to extract a second feature image related to the physiological structure region image in the anterior segment of the eye and perform feature fusion on a portion of the second feature image and a portion of the first feature image to obtain the physiological structure region image in the anterior segment of the eye. In some embodiments, each of the first residual blocks includes a plurality of first convolutional layers and a pooling layer, each of the second residual blocks includes an anti-convolutional layer and a plurality of second convolutional layers, and an output of one of the plurality of first residual blocks is connected to an input of a first one of the plurality of second residual blocks. The split network model of the embodiment of the present disclosure will be described in detail later in conjunction with fig. 2.

In another application scenario, the classification recognition network model may include a feature extraction module and a classification recognition module. The feature extraction module may include a plurality of third convolution layers and a plurality of depth separation convolution layers, and is configured to perform feature extraction on the physiological structure region image in the anterior segment of the eye to obtain a third feature image. The classification recognition module may include a flattening layer (e.g., a Flatten layer) and a full-link layer, and is configured to perform classification recognition on the third feature image to output a recognition result (e.g., probabilities of a plurality of eye diseases) for eye disease recognition. In some embodiments, the plurality of depth-separated convolutional layers are arranged between a plurality of third convolutional layers and are connected in series, and an output of the last third convolutional layer connected in series is connected to an input of the flattening layer and an output of the flattening layer is connected to an input of the fully-connected layer. The classification recognition network model of the embodiment of the present disclosure will be described in detail later in conjunction with fig. 3.

In one embodiment, the computer-readable storage medium described above further stores program instructions that implement normalization operations that, when executed by the one or more processors, cause: and performing normalization operation on the segmentation result of the segmentation network model to obtain a physiological structure region image in the anterior segment of the eye and on the classification recognition result of the classification recognition network model to output a recognition result for eye disease recognition. In particular, this can be achieved, for example, by

The function implements a normalization operation. In one exemplary scenario, the foregoing may be followed

The function is expressed as follows:

（1）

wherein, the first and the second end of the pipe are connected with each other,

values representing output nodes corresponding to the segmented network model and the classification recognition network model,

represents the sum of the values of the corresponding output nodes obtained from the C different linear functions at the inputs of the two aforementioned models.

In an implementation scenario, a segmentation result of the segmentation network model is normalized to output a two-dimensional matrix, and each value in the two-dimensional matrix represents a probability of a segmented physiological structure region. In the implementation scenario, the corresponding probability is compared with a preset threshold, and a value smaller than the preset threshold is set to 0 (for example, as shown in a black area in fig. 2), and a region formed by a value larger than the preset threshold is a physiological structure region image in the anterior ocular segment (for example, as shown in a white area in fig. 2). For the above classification recognition network model, the classification recognition result is normalized to output a one-dimensional matrix, and each value in the one-dimensional matrix represents, for example, the probability of the corresponding eye disease. In some embodiments, the maximum probability in the one-dimensional matrix can be generally used as the recognition result for the eye disease recognition in the embodiments of the present disclosure. Taking the identification of the bulbar conjunctiva disease as an example, assuming that the one-dimensional matrix output after segmentation and classification includes three probability values, for example, a probability value of severe bulbar conjunctiva hyperemia (0.1), a probability value of mild bulbar conjunctiva hyperemia (0.3), and a probability value of normal bulbar conjunctiva hyperemia (0.6), and 0.6 is used as the identification result. That is, the system via embodiments of the present disclosure ultimately recognizes as normal redness of bulbar conjunctiva.

As can be seen from the above description, the embodiment of the present disclosure first inputs the image of the anterior segment of the eye into the segmentation network model for segmentation to extract the image of the physiological structure region in the anterior segment of the eye. And then inputting the segmented physiological structure region image into a classification and identification network model for classification and identification. That is, the coarse-to-fine recognition process is realized by the combination of segmentation and classification, so that a high-precision recognition result for the eye disease can be obtained quickly.

Fig. 2 is an exemplary diagram illustrating segmentation of an anterior segment image using a segmentation network model according to an embodiment of the present disclosure. It is to be understood that fig. 2 is a specific embodiment of the segmented network model 102 of fig. 1, and therefore the description of fig. 1 regarding the segmented network model 102 is equally applicable to fig. 2.

As shown in fig. 2, the segmentation network model 102 of the embodiment of the present disclosure may include an encoding module 201 (shown by a dashed box on the left in the figure) and a decoding module 202 (shown by a dashed box on the right in the figure). The encoding module 201 may be configured to down-sample encode (i.e., reduce) the anterior ocular segment image 203 to extract a first feature image related to the image of the physiological structure region in the anterior ocular segment. The aforementioned decoding module 202 can be used for performing upsampling decoding (i.e. enlarging the image) based on the first feature image to obtain an image of the physiological structure region in the anterior segment of the eye. As mentioned above, the encoding module 201 may comprise a plurality of first residual blocks, and the decoding module 202 may comprise a plurality of second residual blocks, for example, 5 first residual blocks (shown by a rectangle in a dashed box on the left side in the figure) and 6 second residual blocks (shown by a rectangle in a dashed box on the right side in the figure) are exemplarily shown. Further, each first residual block comprises a plurality of first convolutional layers and a pooling layer, each second residual block comprises an anti-convolutional layer and a plurality of second convolutional layers, and the output end of one first residual block in the plurality of first residual blocks is connected to the input end of the first second residual block in the plurality of second residual blocks.

For example, in one exemplary scenario, each first residual block includes 5 convolutional layers and 1 pooling layer (not shown in the figure), and each of the 5 convolutional layers and 1 pooling layer are connected in series. Each second residual block includes 1 deconvolution layer and 5 convolution layers, and similarly, each of the 1 deconvolution layer and the 5 convolution layers are connected in series. It is further shown that the output of the second last residual block is connected to the input of the first second residual block.

In an implementation scenario, prior to inputting the image of the anterior segment 203 to the segmentation network model, an image transformation and/or image normalization operation, for example, may be performed on the image of the anterior segment 203 to pre-process the image of the anterior segment 203. For example, the size of the anterior segment image is adjusted to 256 × 3 (where 3 denotes the number of channels) and the pixel values of the anterior segment image are normalized to between (0, 1). Next, the pre-processed anterior ocular segment image 203 is received via a segmentation network model and the anterior ocular segment image 203 is segmented to extract, for example, a bulbar conjunctiva area image, a palpebral conjunctiva area image, an iris area image and/or a cornea area image, such as the bulbar conjunctiva area image 204 shown in the figure.

Specifically, the anterior ocular segment image 203 is first sequentially downsampled and encoded via 5 first residual blocks in the encoding module 201 to obtain a first feature image. The encoding module 201 can obtain 5 first feature images, and the corresponding sizes of each first feature image are 256 × 64, 256 × 64, 128 × 128, 64 × 256, and 32 × 32 512, respectively. Further, the upsampling decoding is performed based on the first feature image via the decoding module 202 to extract a second feature image. According to the foregoing, during the upsampling decoding, the partial second feature image may also be feature-fused with the partial first feature image to obtain a physiological structure region image in the anterior segment of the eye. It is understood that the feature fusion in the embodiments of the present disclosure refers to the channel connection of feature images. For example, as shown in the figure, the last first feature image extracted by the encoding module (i.e., 32 × 512 feature image) is first feature-fused with the first second feature image extracted in the up-sampling decoding, so as to obtain a fused second feature image (i.e., 64 × 768 feature image). Next, the fused feature image is up-sampled and decoded to obtain a second feature image (i.e., 64 × 256 feature image).

Similarly, the third and fourth second feature images in the up-sampling decoding are feature-fused with the third and fourth first feature images extracted in the down-sampling encoding (i.e., 128 × 128, 256 × 64 feature images), respectively, to obtain corresponding fused second feature images 128 × 384 and second feature images 256 × 192. Further, the up-sampling decoding is continuously performed on the second feature image 256 × 192 to obtain the second feature image 256 × 64 and the third feature imageTwo feature images 256 × 64, and finally output the physiological structural regions in the anterior segment of the eye. As mentioned previously, it is possible to use, for example

The function performs a normalization operation on the output result of the segmented network model (i.e., the segmentation result) to normalize the output result of the segmented network model to between (0, 1), and finally outputs a two-dimensional matrix. The probability of being less than the preset threshold value in the two-dimensional matrix is set to 0, and the probability of being more than the preset threshold value is reserved. That is, the area formed by the values larger than the preset threshold in the two-dimensional matrix is the image of the physiological structure area in the anterior segment of the eye, such as the bulbar conjunctiva area image 204 (with a size of 256 × 256) shown as a white area in the figure.

It is to be understood that the above encoding process and decoding process both refer to feature extraction by convolution operation. In the embodiment of the present disclosure, the segmentation network model may adopt the intersection of the conventional convolution and the residual convolution to extract the feature image, and two arrows are exemplarily shown in the figure to represent the conventional convolution and the residual convolution, respectively. As an example, the convolution kernel of the residual convolution and the conventional convolution may be set to 3 x 3 with the step size set to 1. In one embodiment, the split network model of the disclosed embodiments may employ, for example, a UNet-ResNet network model.

In addition, since the outer eye image information is complex, each physiological structure thereof is easily affected by other background information, so that the encoder cannot extract effective features. In addition, when the segmentation network model is trained in advance, there is a disadvantage that the number of samples used for training is small, which causes an over-fitting phenomenon. Based on this, in the embodiment of the present disclosure, by calculating the first loss function (e.g., cross entropy loss BCELoss) and the second loss function (e.g., Dice loss) and linearly combining (e.g., linearly adding) the two loss functions, network degradation due to disappearance of network gradients can be avoided, and the segmentation accuracy of the image is improved.

As can be seen from the above description, the embodiments of the present disclosure ensure sufficient extraction of image feature information and solve the degradation problem in the neural network training process by using the segmentation network model and introducing the residual block in the segmentation network model. Further, the embodiment of the disclosure further adopts down-sampling coding and up-sampling decoding to perform feature extraction, and fuses the feature map obtained by the post-coding processing to the feature map of the decoding part, so that the feature information of the image can be better retained.

Fig. 3 is an exemplary diagram illustrating classification recognition of a physiological structure region image using a classification recognition network model according to an embodiment of the present disclosure. It is to be understood that fig. 3 is a specific embodiment of the classification recognition network model 103 in fig. 1, and therefore the description of fig. 1 regarding the classification recognition network model 103 is also applicable to fig. 3.

As shown in fig. 3, the classification recognition network model 103 of the embodiment of the present disclosure may include a feature extraction module 301 and a classification recognition module 302. The feature extraction module 301 may be configured to perform feature extraction on the physiological structure region image in the anterior segment of the eye to obtain a third feature image. The aforementioned classification recognition module 302 may be configured to perform classification recognition on the third feature image to output a recognition result for eye disease recognition. Based on the above description, the feature extraction module 301 may include a plurality of third convolution layers and a plurality of depth-separated convolution layers, the classification identification module 302 may include a flattening layer (e.g., a Flatten layer) and a full-link layer, and the plurality of depth-separated convolution layers are disposed between the plurality of third convolution layers and connected in series, an output terminal of the last third convolution layer of the series connection is connected to an input terminal of the flattening layer and an output terminal of the flattening layer is connected to an input terminal of the full-link layer.

For example, two third convolutional layers 303 and 5 depth-separated convolutional layers 304 are exemplarily shown in the figure, and the 5 depth-separated convolutional layers 304 are arranged between the two third convolutional layers 303 and their respective convolutional layers are connected in series. Further shown are a Flatten layer 305 and a fully-connected layer 306, with the output of the last third convolutional layer 303 connected to the input of the Flatten layer 305 and the output of the Flatten layer 305 connected to the input of the fully-connected layer 306. In some embodiments, the 5 depth-separated convolutional layers may be depth-separated convolutional layers having an inverse residual structure, for example.

In an implementation scenario, a segmented physiological structure region image 307 (e.g., the bulbar conjunctiva region image 204 shown in fig. 2) is first received from the segmented network model via the classification recognition network model 103, and then feature extraction is sequentially performed by the 1 third convolutional layer 303, the 5 depth-separation convolutional layer 304, and the 1 third convolutional layer 303 in the feature extraction module 301 to obtain a third feature image related to the eye disease. Further, classification recognition is performed through the scatter layer 305 and the full-link layer 306 in the classification recognition module 302, and finally, a recognition result for eye disease recognition is output.

Similar to the above-described split network model, for example, may also be employed

And calculating the output result (namely the classification recognition result) of the classification recognition network model by the function to obtain the final probability value. The above-mentioned Flatten layer 305 may perform data dimension reduction on the image data, for example, convert three-dimensional data into one-dimensional data. In some embodiments, a Dropout layer may also be added to the fully connected layer 306. The Dropout layer may be used to perform, for example, a random drop processing operation with a drop rate of 0.5 to improve the network fit. Thus, the classification recognition network model finally outputs a one-dimensional matrix including probabilities of a plurality of eye diseases. Taking the identification of bulbar conjunctival diseases as an example, the one-dimensional matrix output after classification includes a plurality of probability values, such as a probability of severe bulbar conjunctival hyperemia, a probability of mild bulbar conjunctival hyperemia, and a probability of normal bulbar conjunctival hyperemia. Assuming that the probability of the redness of bulbar conjunctiva is the greatest, the result of the identification of the eye disease is the redness of bulbar conjunctiva.

In one embodiment, the classification recognition network model of the disclosed embodiments may be a modified model based on MobilenetV 2. Specifically, in the embodiment of the present disclosure, the network layer after the 2 last layers in the MobilenetV2 is removed, and the number of all convolutional layer channels is reduced to the original 1/4, so as to satisfy the computational resources in the disease task of the previous image and ensure the accuracy of the recognition result. Further, the classification and identification network model of the embodiment of the present disclosure adopts a full connection layer structure, so that the probabilities of various eye diseases can be obtained.

Fig. 4 is an exemplary schematic diagram of an entirety of neural network model-based ocular disease identification according to an embodiment of the present disclosure. As shown in fig. 4, an anterior ocular segment image 401 may first be acquired by, for example, an optical camera. In an implementation scenario, the anterior ocular segment image 401 may be subjected to, for example, image transformation and/or image normalization operations to pre-process the anterior ocular segment image 401. The anterior ocular segment image 401 is then segmented via the segmentation network model 102 to obtain an image of a physiological structure region (e.g., a bulbar conjunctiva region image, a palpebral conjunctiva region image, an iris region image, and/or a cornea region image) in the anterior ocular segment. An encoding module and a decoding module may be included in the segmentation network model 102 and each introduces a respective residual block to extract a respective feature image and ultimately output a physiological structure region image 402.

As further shown in the figure, the physiological structure region image 402 is received via the classification recognition network model 103 and the physiological structure region image 402 is subjected to classification recognition to output a recognition result 403. In an implementation scenario, the aforementioned classification and recognition network model 103 may include a feature extraction module and a classification and recognition module, and a plurality of convolution layers and flattening layers and full-link layers are respectively disposed in the feature extraction module and the classification and recognition module to extract eye disease features and finally output probabilities of a plurality of eye diseases. Taking the example that the physiological structure area image 402 is a bulbar conjunctiva area image, wherein the bulbar conjunctiva area image outputs the classification recognition network model 103, the probability of severe bulbar conjunctiva hyperemia, the probability of mild bulbar conjunctiva hyperemia, and the probability of normal bulbar conjunctiva hyperemia can be obtained. Assuming that the probabilities respectively correspond to 0.1, 0.3 and 0.6, the final recognition result is that the redness of bulbar conjunctiva is normal. Regarding the aforementioned segmentation network model 102 and the classification recognition network model 103, reference may be made to the description of fig. 2 and fig. 3, respectively, and the description of the present disclosure is omitted here.

Fig. 5 is an exemplary flow chart illustrating a method 500 of eye disease identification based on a neural network model in accordance with an embodiment of the present disclosure. As shown in fig. 5, at step S502, the anterior ocular segment image is input to the segmentation network model. In one embodiment, the aforementioned anterior ocular segment images may be acquired by, for example, an optical camera. After the segmentation network model receives the anterior ocular segment image, at step S504, the anterior ocular segment image is segmented using the segmentation network model to obtain a physiological structure region image in the anterior ocular segment. In one embodiment, the segmentation network model may include an encoding module and a decoding module to extract feature images by downsampling encoding and upsampling decoding to obtain a physiological structure region image in the anterior segment of the eye. The physiological structure area image may be, for example, a bulbar conjunctiva area image, a palpebral conjunctiva area image, an iris area image, and/or a cornea area image. In an application scenario, the aforementioned encoding module and decoding module may both introduce a residual block, and their respective residual blocks may correspond to multiple convolutional and pooled layers and to a deconvolution layer and multiple convolutional layers. The loss of image characteristic information can be avoided by introducing a residual block. In addition, in the segmentation network model, the integrity of the feature information can be better ensured by performing feature fusion on the coded partial features and the decoded partial features so as to improve the segmentation accuracy.

Based on the acquired physiological structure region image, at step S506, the physiological structure region image in the anterior segment of the eye is received from the segmentation network model using the classification recognition network model and subjected to classification recognition to output a recognition result for eye disease recognition. In one embodiment, the class recognition network model may include a feature extraction module and a class recognition module. The feature extraction module may include a plurality of convolution layers and a plurality of depth separation convolution layers to extract features of the physiological structure region image. Then, the extracted features are subjected to, for example, a Flatten layer and a full link layer, and finally, a recognition result is output. I.e. the probability of various eye diseases. In the embodiment of the present disclosure, the eye disease corresponding to the maximum probability may be determined as the final recognition result. Taking the identification of bulbar conjunctiva disease as an example, assuming that the probability of severe bulbar conjunctiva hyperemia is 0.1, the probability of mild bulbar conjunctiva hyperemia is 0.3, and the probability of normal bulbar conjunctiva hyperemia is 0.6, 0.6 is taken as the identification result. That is, the final recognition result is that the conjunctiva congestion is normal.

Based on the above description, with the scheme of the present disclosure, by performing two-stage (i.e., segmentation-classification) identification tasks on the physiological structural region of the anterior segment of the eye, high-precision eye disease identification can be achieved compared to identification only for a single stage (e.g., classification). For example, the accuracy of redness of bulbar conjunctiva obtained by the two-stage recognition and the single-stage recognition shown in table 1, respectively.

TABLE 1

	Accuracy in identifying redness of bulbar conjunctiva
		Single stage diagnosis (Classification)	97.1%
Two-stage diagnosis (segmentation-classification)	98.3%

From the above description in conjunction with the accompanying drawings, those skilled in the art will also appreciate that embodiments of the present disclosure may also be implemented by software programs. The present disclosure thus also provides a computer program product. The computer program product may be used to implement the neural network model-based eye disease identification method described in this disclosure in conjunction with fig. 5.

It should be noted that while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

It should be understood that when the claims of the present disclosure, and when the terms first, second, third, fourth, etc. are used in the specification and drawings, they are used only to distinguish one object from another, and not to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

Although the embodiments of the present disclosure are described above, the descriptions are only examples for facilitating understanding of the present disclosure, and are not intended to limit the scope and application scenarios of the present disclosure. It will be understood by those skilled in the art of the present disclosure that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure, and that the scope of the disclosure is to be limited only by the appended claims.

Claims

1. A system for eye disease identification based on a neural network model, comprising:

one or more processors;

segmenting the network model;

classifying and identifying the network model; and

one or more computer-readable storage media storing program instructions implementing the segmented network model and the classification-identifying network model that, when executed by the one or more processors, cause:

the segmentation network model receives an anterior segment image and segments the anterior segment image to obtain a physiological structure area image in the anterior segment; and

the classification and recognition network model receives the physiological structure region image in the anterior segment of the eye from the segmentation network model and performs classification and recognition on the physiological structure region image so as to output a recognition result for eye disease recognition,

wherein the segmentation network model comprises an encoding module and a decoding module, the encoding module comprises a plurality of first residual blocks and is used for performing down-sampling encoding on an anterior ocular segment image to extract a first characteristic image related to a physiological structure area image in the anterior ocular segment;

the classification identification network model comprises a feature extraction module and a classification identification module, wherein the feature extraction module comprises a plurality of third convolution layers and a plurality of depth separation convolution layers, and the depth convolution layers are depth separation convolution layers with reverse residual error structures;

the classification recognition network model is a model improved based on MobilenetV2, the network layer after the last 2 layers is removed from the improved MobilenetV2, and the number of channels of all convolutional layers is reduced to 1/4.

2. The system of claim 1, wherein the image of the area of the physiological structure in the anterior segment comprises at least an image of a bulbar conjunctiva area, an image of a palpebral conjunctiva area, an image of an iris area, and/or an image of a corneal area.

3. The system according to claim 1, wherein the one or more computer-readable storage media further store program instructions to pre-process the anterior ocular segment image, which when executed by the one or more processors perform image transformation and/or image normalization operations on the anterior ocular segment image to pre-process the anterior ocular segment image.

4. The system of claim 2, wherein the one or more computer-readable storage media further store program instructions to implement a normalization operation that, when executed by the one or more processors, cause:

performing a normalization operation on the segmentation result of the segmentation network model to obtain a physiological structure region image in the anterior segment of the eye; and

and performing normalization operation on the classification recognition result of the classification recognition network model to output a recognition result for eye disease recognition.

5. The system according to claim 4, wherein the one or more computer-readable storage media further store program instructions to calculate a loss function in the segmented network model that, when executed by the one or more processors, cause:

respectively calculating a first loss function and a second loss function in the segmentation network model; and

adding the first loss function and the second loss function to obtain a final loss function in the segmented network model.

6. The system of claim 1, wherein the decoding module comprises a plurality of second residual blocks and is configured to perform upsampling decoding based on the first feature image to obtain a physiological structure region image in the anterior segment of the eye.

7. The system of claim 6, wherein each of the first residual blocks comprises a plurality of first convolutional layers and a pooling layer, wherein each of the second residual blocks comprises an anti-convolutional layer and a plurality of second convolutional layers, and wherein an output of one of the plurality of first residual blocks is connected to an input of a first one of the plurality of second residual blocks.

8. The system of claim 6, wherein in the upsampling decoding based on the first feature image to obtain the image of the physiological structure region in the anterior segment of the eye, the decoding module is further configured to:

performing up-sampling decoding based on the first feature image to extract a second feature image related to a physiological structure region image in the anterior segment of the eye; and

and performing feature fusion on the part of the second feature image and the part of the first feature image to obtain a physiological structure area image in the anterior segment of the eye.

9. The system of claim 4,

the feature extraction module is used for performing feature extraction on the physiological structure region image in the anterior segment of the eye to obtain a third feature image;

the classification identification module comprises a flattening layer and a full connection layer and is used for performing classification identification on the third feature image so as to output an identification result for identifying the eye diseases, wherein the identification result comprises the probabilities of a plurality of eye diseases.

10. The system of claim 9, wherein the plurality of depth separated convolutional layers are disposed between a plurality of third convolutional layers and are connected in series, and an output of the last third convolutional layer connected in series is connected to an input of the flattening layer, an output of the flattening layer being connected to an input of the fully-connected layer.

11. The system according to claim 9, wherein the one or more computer-readable storage media further store program instructions that determine a final recognition result for the respective disease based on the probabilities of the plurality of ocular diseases, which when executed by the one or more processors, cause:

selecting a maximum probability of the probabilities for each eye disease; and

and determining the eye disease corresponding to the maximum probability as a final recognition result of the corresponding eye disease.

12. A method for identifying eye diseases based on a neural network model is characterized by comprising the following steps:

inputting the anterior segment image into a segmentation network model;

segmenting the anterior segment image using the segmentation network model to obtain a physiological structure region image in the anterior segment; and

receiving a physiological structure region image in the anterior segment of the eye from the segmentation network model using a classification recognition network model and performing classification recognition on the physiological structure region image to output a recognition result for eye disease recognition,

13. A computer-readable storage medium having stored thereon computer-readable instructions for neural network model-based ocular disease identification, which, when executed by one or more processors, implement the method of claim 12.