CN117237749A

CN117237749A - Eye axis length prediction method, system and equipment

Info

Publication number: CN117237749A
Application number: CN202311201719.7A
Authority: CN
Inventors: 邢世来; 于晓光; 杜政霖
Original assignee: Shanghai Puxi And Optogene Technology Co ltd
Current assignee: Shanghai Puxi And Optogene Technology Co ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-15

Abstract

The application relates to the field of intelligent medical treatment, in particular to an eye axis length prediction method, an eye axis length prediction system and eye axis length prediction equipment. Comprises the steps of obtaining fundus photos; the fundus photo is input into a regression model for prediction to obtain the eye axis length of the fundus photo, wherein the regression model consists of a first convolution layer, a notice convolution module and a full connection layer, the fundus photo obtains a feature vector through the first convolution layer, the feature vector obtains an enhancement feature through the notice convolution module, and the enhancement feature obtains the predicted eye axis length through the full connection layer; the attention convolution module is composed of a convolution module and a DWM module, wherein the DWM module comprises an EMA attention module and a self-attention module. The application combines the deep learning technology to predict the ocular axis length of the fundus photo, and the automatic algorithm is helpful for the diagnosis and treatment of ophthalmic diseases, and has good practicality and universality.

Description

Eye axis length prediction method, system and equipment

Technical Field

The application relates to the field of intelligent medical treatment, in particular to an eye axis length prediction method, an eye axis length prediction system, eye axis length prediction equipment and a computer readable storage medium.

Background

The length of the axis of the eye is one of the most important structures in the eye and is a measure of the length of the anterior and posterior axes of the eye. Accurate measurement of the length of the ocular axis is important to the eye surgeon and is closely related to the incidence of ophthalmic diseases such as myopia, amblyopia, ametropia, etc. Therefore, accurate measurement of ocular axis length is one of the important links in ophthalmic diagnosis and treatment. Conventional eye axis length measurements are typically made by means of a-ultra and optical axis scanners, etc. The instrument can directly measure the length of the front shaft and the rear shaft of the eyeball, but the equipment is expensive, needs to be operated manually by a professional, has larger error, is inconvenient for common medical institutions or people who are inconvenient to go to a hospital, has little fundus photo data volume and uneven photo quality in clinical aspects, and increases heavy obstruction for measuring the length of the eye shaft. At present, the development of computer vision technology provides a new way for measuring the length of the eye axis, the eye axis length is automatically predicted by using the computer vision technology, the artificial interference factors and equipment limitations are avoided, but the problems of insufficient prediction precision and insufficient model generalization capability exist, so that the measurement of the length of the eye axis by using an intelligent technology is still a popular research direction.

Disclosure of Invention

Aiming at the problems of insufficient precision of the model prediction eye axis length and insufficient generalization capability of the model, the application provides an eye axis length prediction method which is combined with a deep learning technology, and the method can automatically predict the eye axis length of an input fundus photo after training is completed, and specifically comprises the following steps:

obtaining fundus photos;

the fundus photo is input into a regression model for prediction to obtain the eye axis length of the fundus photo, wherein the regression model consists of a first convolution layer, a notice convolution module and a full connection layer, the fundus photo obtains a feature vector through the first convolution layer, the feature vector obtains an enhancement feature through the notice convolution module, and the enhancement feature obtains the predicted eye axis length through the full connection layer; the attention convolution module consists of a convolution module and a DWM module, the feature vector is subjected to deep convolution in the convolution module to obtain convolution features, the convolution features are subjected to feature enhancement through the DWM module to obtain enhancement features, and the DWM module comprises an EMA attention module and a self-attention module; the convolution features are subjected to weight calculation through the EMA module to obtain weighted features, the relation features are obtained through the relation between the self-attention module learning features, and the weighted features and the relation features are fused to obtain enhanced features.

Further, the EMA module sequentially comprises a pooling layer, an attention layer, a grouping normalization layer, an activation function layer and a Matmul layer, the convolution characteristics are subjected to dimension reduction processing in the pooling layer and then are transmitted to the attention layer, m sub-channels are adopted in the attention layer for parallel weighting, local cross-channel interaction is established to obtain m characteristic tensors, m is a natural number which is more than or equal to 1, and the grouping normalization layer carries out grouping normalization processing on the characteristic tensors to obtain a plurality of groups of characteristic matrixes; the activation function layer carries out nonlinear mapping through an activation function, the Matmul layer fuses a plurality of groups of feature matrixes to obtain weighted features, the self-attention module sequentially comprises an embedded layer, a self-attention layer and an activation function layer, and the convolution features learn the relation of adjacent convolution features through the self-attention module to obtain relation features.

Further, the EMA module further includes a dimension conversion layer, where the dimension conversion layer is configured to perform dimension conversion on an input vector; the self-attention module further comprises a dropout layer, wherein the dropout layer carries out weighting processing on neurons of the self-attention layer, and the number of the neurons learned each time is changed.

The feature vector sequentially passes through n groups of attention convolution modules and then is subjected to full-connection layer prediction, wherein n is a natural number which is more than or equal to 1, the convolution modules in the attention convolution modules comprise a downsampling layer and a residual convolution layer, and the feature vector is subjected to dimension reduction through the downsampling layer and then subjected to residual convolution by the residual convolution layer to obtain convolution features; the regression model further comprises a maximum pooling layer, the maximum pooling layer performs feature dimension reduction on the enhanced features to obtain low-dimensional features, and the low-dimensional features are subjected to classification prediction through the full-connection layer.

The method further comprises a quality control model, wherein the quality control model performs feature recognition on fundus photos through a ConvNEXT network model to obtain high-quality photos, regression prediction can be performed only when the input fundus photos pass through the quality control model, and regression prediction is stopped when the quality of the input fundus photos does not reach the standard.

The regression model adopts one or more of the following: the method comprises the steps of linear regression, adaBoost, a support vector machine and a convolutional neural network, wherein the convolutional neural network adopts a RestNET50 network to perform feature learning to obtain a predicted result of the eye axis length, and the training weight of the RestNET50 network is obtained by adjusting after a high-quality photo is secondarily trained based on the pre-training weight of an ImageNET model.

The method further comprises data preprocessing, wherein the data preprocessing comprises one or more of the following: graying, binarizing, image enhancement, denoising and image enhancement, wherein the image enhancement comprises horizontal overturning, vertical overturning, contrast conversion and scale conversion, the fundus photo is converted into tensor after the image enhancement is carried out on the fundus photo, the tensor is normalized, and the average value and the variance value of three channels of the image RGB are limited in the normalization.

The application aims to provide an eye axis length prediction system, which comprises:

and a data acquisition module: obtaining fundus photos;

regression prediction module: the fundus photo is input into a regression model for prediction to obtain the eye axis length of the fundus photo, wherein the regression model consists of a first convolution layer, an attention convolution module and a full connection layer, the fundus photo obtains a feature vector through the first convolution layer, the feature vector obtains an enhancement feature through the attention convolution module, the attention convolution module consists of the convolution module and a DWM module, the feature vector carries out deep convolution in the convolution module to obtain the convolution feature, the convolution feature carries out feature enhancement through the DWM module to obtain the enhancement feature, and the DWM module comprises an EMA attention layer and a self-attention layer; the convolution features are subjected to weight calculation through an EMA module to obtain weighted features, relation features are obtained through the relation between the self-attention module learning features, and the weighted features and the relation features are fused to obtain enhanced features; the enhancement features result in a predicted ocular length through the fully connected layer.

An object of the present application is to provide an eye axis length prediction apparatus comprising:

a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke program instructions that when executed implement any of the above-described methods of eye axis length prediction.

It is an object of the present application to provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the above-mentioned methods of eye axis length prediction.

The application has the advantages that:

1. in order to enhance the performance of the model, the application adds a DWM module in the regression model, wherein the module is a special attention mechanism and consists of an EMA attention module and a self attention module, so as to realize the adjustment and weighting of input characteristics and enhance the attention of the model to useful information.

2. Aiming at the problem of uneven quality of fundus photos, the application uses a quality control model (ConvNeXt network) to screen fundus photos, remove photos with poor quality, ensure better data quality of the model before training, further ensure the study of the model on effective characteristics and improve the usability of the model.

3. Aiming at the problems of few data sets and insufficient model generalization capability, the application uses the weight parameters (comprising a quality control model and a regression model) of the ImageNet model to pretrain before training by using the fundus photo data set, and then trains and fine-tunes by using the fundus photo data set, so that the whole training process can obtain deep feature learning, and meanwhile, the problem of model fitting is avoided.

4. The method comprises a quality control model and a regression model, wherein regression prediction can be performed only after an input fundus photo passes through the quality control model, and further training can not be performed when the quality of the input fundus photo cannot pass through the quality control model, so that the integrally trained model can learn high-quality characteristics, and the reliability of the integral model is improved.

5. In order to enhance the generalization capability of the model, the training data is subjected to data preprocessing, including image overturning, image scaling, contrast conversion and normalization processing, so that the model calculation and processing are facilitated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an eye axis length prediction method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an eye axis length prediction system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an apparatus for predicting eye axis length according to an embodiment of the present application;

FIG. 4 is a flowchart of an ocular axis regression provided in an embodiment of the present application;

fig. 5 is a diagram of a fundus photo quality control module according to an embodiment of the present application;

FIG. 6 is a diagram of an eye axis regression module according to an embodiment of the present application;

FIG. 7 is a graph of eye axis regression training loss variation provided by an embodiment of the present application;

FIG. 8 is a graph of test set results provided by an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.

In some of the flows described in the specification and claims of the present application and in the above figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S101, S102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

Fig. 1 is a schematic diagram of an eye axis length prediction method according to an embodiment of the present application, which specifically includes:

s101: obtaining fundus photos;

in one embodiment, the fundus is generally referred to as the inside of the back of the eyeball, and is composed of a optic disc, a macula, a retina and a part of blood vessels, wherein the optic disc is in bright yellowish color, is approximately elliptical, the retina is converged at the center of the optic disc, changes of the optic disc are observed, if abnormal changes occur, the condition of the fundus is usually probed by an imaging technology, such as a ophthalmoscope, peeping the changes of the retina, knowing the thickness and trend of the blood vessels on the retina, and defining the light reflection state of arteries and the conditions of bleeding, white spots and nipple edema.

In one embodiment, the Axial Eye Length (Axial Eye Length) is an index reflecting the growth of the Eye, and is generally referred to as the anterior-posterior diameter of the Eye, and is referred to as the Length from the corneal vertex to the macula. Often the length of the eye axis increases with age, and if the eye is congenital dysplasia, resulting in an excessively long or short eye axis, the probability of developing eye disease increases. Abnormal length of the ocular axis can cause common ophthalmic diseases such as myopia, hyperopia, glaucoma and the like, and cause various manifestations such as blurred vision, fundus lesions and the like.

In a specific embodiment, the application counts the eye axis data of the university of Wenzhou, the data size reaches more than 10000, and students of all grades from one grade to four grades are covered, so that the data size and the data diversity are ensured. These data are divided into training sets, validation sets, test sets, wherein the training sets: 10847 university student's fundus color photograph and eye axis data, validation set: 2171 wenzhou university student's fundus color photograph and eye axis data, test set: 543 wenzhou college fundus color photograph and eye axis data.

In one embodiment, the application realizes the automatic detection function of the eye axis length through the quality control part and the prediction part, as shown in fig. 4, the quality control part screens high-quality fundus images by using a network model, and the prediction part automatically detects the eye axis length in the fundus images.

In one embodiment, the ConvNeXt network does not make great innovation on the whole network framework and the construction concept, and only makes some adjustment and improvement on the existing classical ResNet50/200 network according to some advanced ideas of the Transformer network, and introduces the latest partial ideas and technologies of the Transformer network into the existing modules of the CNN network so as to combine the advantages of the two networks and improve the performance of the CNN network. The optimization design is mainly carried out by the following points: macroscopic design, resNeXt network structure, inverse residual error module, large convolution kernel, and various layered micro-designs.

In one embodiment, the method further includes a quality control model, the quality control model performs feature recognition on the fundus photo through the ConvNEXT network model to obtain a high-quality photo, regression prediction can be performed only when the input fundus photo passes through the quality control model, and regression prediction is stopped when the quality of the input fundus photo does not reach the standard.

In a specific embodiment, before training a model, the application designs a quality control program for fundus illumination, can autonomously filter out fundus illumination with unqualified quality, and utilizes a deep learning ConvNEXT model to control the quality of an input picture, wherein the ConvNEXT model training weight is the weight of training in advance in an imageNET and screening and training in the quality of the actual fundus illumination, as shown in fig. 5, the fundus illumination is input into a network model, wherein the network model sequentially consists of two-dimensional convolution, layer standardization, sequential overlapping of four block and three downsampling layers, one GAP Layer, one Layer standardization and one full connection Layer, and data sequentially passes through the output of the obtained model; a block consists of three two-dimensional convolutions, a Layer standardization, an activation function, a Layer Scale Layer and a Drop path Layer, and data is added with original input data through the combination modules to form the output of the block; a downsampling Layer is sequentially composed of a Layer standardization and a two-dimensional convolution, and data sequentially pass through the downsampling Layer to obtain output of the downsampling Layer.

In one embodiment, the method further comprises data preprocessing, the data preprocessing comprising one or more of the following: graying, binarizing, image enhancement, denoising and image enhancement, wherein the image enhancement comprises horizontal overturning, vertical overturning, contrast conversion and scale conversion, the fundus photo is converted into tensor after the image enhancement is carried out on the fundus photo, the tensor is normalized, and the average value and the variance value of three channels of the image RGB are limited in the normalization.

In a specific embodiment, before the training process of the ocular axis regression model, in order to enhance generalization of the model, the application carries out data enhancement on training set pictures:

a. horizontally overturning the picture with 50% probability;

b. vertically overturning the picture with 50% probability;

c. adjusting the brightness, contrast and saturation of the picture;

in order to facilitate calculation in the model, the image of each level is scaled up/down to 224 x 224 and converted into a tensor format;

in order to avoid the bias of the picture data, carrying out normalization processing on the data, wherein the mean value and the variance of the three channels are respectively set as [0.485,0.456,0.406], [0.229,0.224,0.225];

to reduce the time cost, multi-threaded parallelism is employed, with the number of threads set in the range of 6-10.

In a specific embodiment, the application designs the complementary scaling of the pictures before entering the model, so that the pictures with various sizes are scaled in an equal ratio, the information integrity of the pictures is maintained to the greatest extent, and the generalization capability of the model is improved.

S102: the fundus photo is input into a regression model for prediction to obtain the eye axis length of the fundus photo, wherein the regression model consists of a first convolution layer, a notice convolution module and a full connection layer, the fundus photo obtains a feature vector through the first convolution layer, the feature vector obtains an enhancement feature through the notice convolution module, and the enhancement feature obtains the predicted eye axis length through the full connection layer; the attention convolution module consists of a convolution module and a DWM module, the feature vector is subjected to deep convolution in the convolution module to obtain convolution features, the convolution features are subjected to feature enhancement through the DWM module to obtain enhancement features, and the DWM module comprises an EMA attention module and a self-attention module; the convolution features are subjected to weight calculation through the EMA module to obtain weighted features, the relation features are obtained through the relation between the self-attention module learning features, and the weighted features and the relation features are fused to obtain enhanced features.

In one embodiment, the EMA (Excitationand Modulation Attention) attention mechanism is an attention mechanism for a neural network, the main principle of which is to focus the attention of the network on the most important part of the current task through a weighted processing of the input data, thereby improving the performance of the model. The core idea of the EMA attention mechanism is to introduce two concepts of excitation and modulation on the basis of the traditional attention mechanism. The excitation mechanism is used for calculating the importance degree of each part in the input data to the current task, and the modulation mechanism is used for adjusting the weights of different parts so as to achieve better model performance. In the EMA attention mechanism, the excitation mechanism calculates the similarity between the features and parameters of the input data to determine the importance of each part. Specifically, an inner product calculation is performed on input data and parameters to obtain a similarity matrix. Each element in the matrix represents the similarity between a certain part of the input data and the reference powder, and the higher the similarity is, the more important the part is for the current task.

In one embodiment, the self-attention mechanism addresses the situation: the input received by the neural network is a plurality of vectors with different sizes, and different vectors have a certain relation, but the relation between the inputs cannot be fully exerted during actual training, so that the model training result is extremely poor. Such as machine translation problems (sequence-to-sequence problems, how many tags the machine decides by itself), part-of-speech tagging problems (one vector corresponds to one tag), semantic analysis problems (multiple vectors correspond to one tag), and other word processing problems. The problem of a fully connected neural network that does not establish a correlation for multiple correlated inputs is addressed by a self-attention mechanism, which is actually intended for the machine to notice the correlation between different parts of the entire input.

In one embodiment, the EMA module sequentially comprises a pooling layer, an attention layer, a grouping normalization layer, an activation function layer and a Matmul layer, the convolution characteristics are subjected to dimension reduction processing by the pooling layer and then are transmitted to the attention layer, m sub-channels are adopted in the attention layer for parallel weighting, local cross-channel interaction is established to obtain m characteristic tensors, m is a natural number which is more than or equal to 1, and the grouping normalization layer carries out grouping normalization processing on the characteristic tensors to obtain a plurality of groups of characteristic matrixes; the activation function layer carries out nonlinear mapping through an activation function, the Matmul layer fuses a plurality of groups of feature matrixes to obtain weighted features, the self-attention module sequentially comprises an embedded layer, a self-attention layer and an activation function layer, and the convolution features learn the relation of adjacent convolution features through the self-attention module to obtain relation features.

In one embodiment, the EMA module further comprises a dimension conversion layer for dimension converting the input vector; the self-attention module further comprises a dropout layer, wherein the dropout layer carries out weighting processing on neurons of the self-attention layer, and the number of the neurons learned each time is changed.

In one embodiment, the feature vector sequentially passes through n groups of attention convolution modules and then is subjected to full-connection layer prediction, wherein n is a natural number which is more than or equal to 1, the convolution modules in the attention convolution modules comprise a downsampling layer and a residual convolution layer, and the feature vector is subjected to dimension reduction through the downsampling layer and then subjected to residual convolution by the residual convolution layer to obtain convolution features; the regression model further comprises a maximum pooling layer, the maximum pooling layer performs feature dimension reduction on the enhanced features to obtain low-dimensional features, and the low-dimensional features are subjected to classification prediction through the full-connection layer.

In one embodiment, regression prediction is based on the correlation principle of prediction, and the factors affecting the predicted target are found, then the approximate expression of the functional relationship between these factors and the predicted target is found, and found mathematically. Parameters are estimated for its model using the sample data and the model is error checked. If the model determines, the model may be used to predict the change in the factor.

In one embodiment, the regression model includes linear regression, polynomial regression, stepwise regression, ridge regression, lasso regression, elastic regression.

In one embodiment, the regression model employs one or more of the following: the method comprises the steps of linear regression, adaBoost, a support vector machine and a convolutional neural network, wherein the convolutional neural network adopts a RestNET50 network to perform feature learning to obtain a predicted result of the eye axis length, and the training weight of the RestNET50 network is obtained by adjusting after a high-quality photo is secondarily trained based on the pre-training weight of an ImageNET model.

In one embodiment, in the field of computer vision, the pre-trained model is a model trained on a large reference dataset to solve similar problems. Because of the high computational cost of training such models, it is common practice to import published results and use corresponding models. For example, in the target detection task, feature extraction is first performed by using a backbone neural network, and the backbones used herein are typically VGG, res net, and other neural networks, so when training a target detection model, the pre-training weights of these neural networks may be used to initialize the parameters of the backbones, so that relatively efficient features can be extracted from the beginning.

In one embodiment, in the field of image processing, the image net is used for pre-training of a network, and mainly has two points, on one hand, the image net is a data set with more training data marked in advance in the image field, the component is a great advantage, and the more the quantity is, the more the trained parameters are, the more the spectrum is leaner; on the other hand, because ImageNet has 1000 types and more types, the image data is general, and has little relation with the field, the image data has good universality and can be used after the pre-training is finished.

In a specific embodiment, the application uses a deep learning classical model ResNet50, and fine adjustment is performed on the basis of pre-training weights of the ResNet50 on an ImageNet1k, so that the model can extract more deep fundus information. In addition, the structure of ResNet50 is kept unchanged, and a DWM module is added behind each block, wherein the DWM module introduces a special attention mechanism on the basis of keeping the ResNet50 performance and is used for enhancing the attention degree of a model to specific characteristics. Specifically, the module adds self-attention by referring to ViT (Vision Transformer) ideas on the basis of reserving steps of self-adaptive average pooling, grouping normalization, convolution operation and the like of the EMA module, so that adjustment and weighting of input features are realized, and the performance of the model in the task of predicting the ocular length of fundus photos is improved.

In a specific embodiment, the training ocular axis regression model uses a deep learning model, as shown in fig. 6, wherein the fundus is input into a network model, the model consists of a first convolution Layer, four attention convolution layers, a maximum pooling Layer and two full connection layers, data are sequentially output, and the first convolution Layer sequentially comprises a two-dimensional convolution and a Layer standardization; note that the convolution layer is sequentially composed of a downsampling layer, a Block and a DWM layer; a Block is obtained by three two-dimensional convolutions, two Batch standardization and a ReLU activation function, wherein the two-dimensional convolutions and data obtained by sequentially overlapping other modules are added with input data of a primary module to obtain the output of the Block; the downsampling Layer consists of a Layer standardization and a two-dimensional convolution, and data sequentially pass through to obtain the output of the downsampling Layer; one DWM layer consists of two modules, EMA and self-attention; the EMA sequentially comprises a Reshape, an adaptive maximum pooling layer, an attention layer, a grouping standardization, an activation function, a Matmul function and an activation function; the self-attention module sequentially comprises an embedded layer, a self-attention layer, an activation function and a dropout layer, and data respectively flow through the two modules and are fused to obtain the output of the DWM.

Using the classification result weight of the ImageNet as a pre-training weight, and in order to improve the training efficiency of the model, freezing a front feature extraction layer of the model, and learning and back-propagating a feature classification layer; in the training process of the model, the batch size is set to be 64-256, adamW is used as an optimizer, the learning rate is initially set to be 5e-5, and then the learning rate is self-adjusted according to the loss. In order to enhance generalization of the model, weight-decay is set to be in the range of 0.1-0.05; the regression model is trained through a training set and learns key features of the eye axis, the accuracy on a verification set is used as a measurement index of model efficiency, and the model weight with the highest accuracy on the verification set is used as the optimal weight to be stored.

In one embodiment, a trained regression model is used to make regression predictions for a fundus using a convolutional neural network. It should be noted that, in the prediction stage, any fundus photo is input, if the fundus photo passes through the quality control model, the length of the ocular axis is obtained through the ocular axis regression model, and if the fundus photo does not pass through the quality control model, a qualified photo is required to be input.

In a specific embodiment, through experimental verification, the change process of the LOSS value of LOSS in regression training of the method is shown in fig. 7, the LOSS value of LOSS in the training set tends to be stable in the 217 th round, and the LOSS value of LOSS in the verification set tends to be stable in the 145 th round; the model obtains 81.53% regression rate on the test set, as shown in fig. 8, most of the predicted values are gathered and are positioned in the upper confidence space and the lower confidence space of the fitting straight line, and only a small part of the predicted value distribution is scattered; in addition, in order to verify the effectiveness of the method, an ablation experiment is carried out, and as shown in table 1, the ResNet50+DWM method provided by the application shows optimal performance on evaluation indexes such as Recall rate Recall, R square, mean square error MSE and the like.

Table 1 ablation experiment display

Fig. 2 is a schematic diagram of an apparatus for predicting an eye axis length according to an embodiment of the present application, which specifically includes:

and a data acquisition module: obtaining fundus photos;

regression prediction module: the fundus photo is input into a regression model for prediction to obtain the eye axis length of the fundus photo, wherein the regression model consists of a first convolution layer, a notice convolution module and a full connection layer, the fundus photo obtains a feature vector through the first convolution layer, the feature vector obtains an enhancement feature through the notice convolution module, and the enhancement feature obtains the predicted eye axis length through the full connection layer; the attention convolution module consists of a convolution module and a DWM module, the feature vector is subjected to deep convolution in the convolution module to obtain convolution features, the convolution features are subjected to feature enhancement through the DWM module to obtain enhancement features, and the DWM module comprises an EMA attention module and a self-attention module; the convolution features are subjected to weight calculation through the EMA module to obtain weighted features, the relation features are obtained through the relation between the self-attention module learning features, and the weighted features and the relation features are fused to obtain enhanced features.

Fig. 3 is a schematic diagram of an apparatus for predicting an eye axis length according to an embodiment of the present application, which specifically includes:

a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke program instructions that when executed perform any one of the methods of eye axis length prediction described above.

A computer readable storage medium storing a computer program which, when executed by a processor, is any one of the above methods of eye axis length prediction.

The results of the verification of the present verification embodiment show that assigning an inherent weight to an indication may improve the performance of the method relative to the default setting. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

It will be appreciated by those skilled in the art that all or part of the steps in the method of the above embodiment may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, where the medium may be a rom, a magnetic disk, or an optical disk, etc.

While the foregoing describes a computer device provided by the present application in detail, those skilled in the art will appreciate that the foregoing description is not meant to limit the application thereto, as long as the scope of the application is defined by the claims appended hereto.

Claims

1. An eye axis length prediction method is characterized by comprising the following steps:

obtaining fundus photos;

2. The eye axis length prediction method of claim 1, wherein the EMA module consists of a pooling layer, an attention layer, a grouping normalization layer, an activation function layer, and a Matmul layer in order;

optionally, the convolution characteristics are input to the attention layer after the dimension reduction processing is performed on the pooling layer, m sub-channels are adopted in the attention layer for parallel weighting, local cross-channel interaction is established to obtain m characteristic tensors, m is a natural number more than or equal to 1, and the grouping normalization layer performs grouping normalization processing on the characteristic tensors to obtain a plurality of groups of characteristic matrixes; the activation function layer carries out nonlinear mapping through an activation function, and the Matmul layer fuses a plurality of groups of feature matrixes to obtain weighted features;

preferably, the self-attention module sequentially comprises an embedded layer, a self-attention layer and an activation function layer, and the convolution features obtain relation features by learning the relation of adjacent convolution features through the self-attention module.

3. The eye axis length prediction method of claim 2, wherein the EMA module further comprises a dimension conversion layer for dimension converting the input vector;

preferably, the self-attention module further comprises a dropout layer, and the dropout layer performs weighting processing on neurons of the self-attention layer to change the number of the neurons learned each time.

4. The eye axis length prediction method according to claim 1, wherein the feature vector sequentially passes through n groups of attention convolution modules and then is subjected to full-connection layer prediction, n is a natural number greater than or equal to 1, the convolution modules in the attention convolution modules comprise a downsampling layer and a residual convolution layer, and the feature vector is subjected to dimension reduction through the downsampling layer and then subjected to residual convolution by the residual convolution layer to obtain convolution features;

preferably, the regression model further includes a maximum pooling layer, the maximum pooling layer performs feature dimension reduction on the enhanced features to obtain low-dimensional features, and the low-dimensional features are subjected to classification prediction through the fully connected layer.

5. The eye axis length prediction method according to claim 1, further comprising a quality control model, wherein the quality control model performs feature recognition on fundus photos through a ConvNEXT network model to obtain high-quality photos, regression prediction can be performed only when an input fundus photo passes through the quality control model, and regression prediction is stopped when the quality of the input fundus photo does not reach the standard.

6. The eye axis length prediction method of claim 1, wherein the regression model uses one or more of the following: linear regression, adaBoost, a support vector machine and a convolutional neural network, wherein the convolutional neural network adopts a RestNET50 network to perform feature learning to obtain a prediction result of the eye axis length;

optionally, the training weight of the RestNET50 network is obtained by adjusting the pre-training weight based on the ImageNET model after training the high-quality photo twice.

7. The eye axis length prediction method of claim 1, further comprising data preprocessing, the data preprocessing comprising one or more of the following: graying, binarizing, image enhancement, denoising and image enhancement;

preferably, the image augmentation comprises horizontal overturn, vertical overturn, contrast conversion and scale conversion, the fundus photo is converted into tensor after the image augmentation is carried out on the fundus photo, and the tensor is normalized;

optionally, the normalization process defines the mean and variance values of the three channels of the picture RGB.

8. An eye axis length prediction system, comprising:

and a data acquisition module: obtaining fundus photos;

9. An eye axis length prediction apparatus, comprising:

a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke program instructions which when executed implement an eye axis length prediction method as claimed in any one of claims 1-7.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a method of predicting an eye axis length as claimed in any one of claims 1 to 7.