CN116385806A - Method, system, equipment and storage medium for classifying strabismus type of eye image - Google Patents

Method, system, equipment and storage medium for classifying strabismus type of eye image Download PDF

Info

Publication number
CN116385806A
CN116385806A CN202310613349.1A CN202310613349A CN116385806A CN 116385806 A CN116385806 A CN 116385806A CN 202310613349 A CN202310613349 A CN 202310613349A CN 116385806 A CN116385806 A CN 116385806A
Authority
CN
China
Prior art keywords
network model
extraction network
feature
model
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310613349.1A
Other languages
Chinese (zh)
Other versions
CN116385806B (en
Inventor
刘陇黔
张海仙
吴达文
李彦霏
杨国渊
毛轶绩
封毅
魏文远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202310613349.1A priority Critical patent/CN116385806B/en
Publication of CN116385806A publication Critical patent/CN116385806A/en
Application granted granted Critical
Publication of CN116385806B publication Critical patent/CN116385806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, equipment and a storage medium for classifying strabismus type of eye images, relates to strabismus classification of eye images in the field of artificial intelligence, and aims to solve the technical problem that the accuracy rate of classifying strabismus type of eye images in the prior art is low. The method comprises the steps of inputting text data containing basic information of a patient and image data comprising an eye image of the patient, adopting a feature extraction module based on a ResNet50V2 model residual block connection mechanism to extract features of the eye image data, then adopting a feature fusion module based on a joint multi-head attention mechanism to fuse the image features extracted by the ResNet50V2 and the text features subjected to normalization processing, and finally adopting a multi-classification module based on a hierarchical classification method to output classification results (10 classifications of normal and strabismus). The whole model improves multi-classification precision through multi-mode and hierarchical classification architecture, reduces errors among classes, and has stronger practical significance and clinical value.

Description

Method, system, equipment and storage medium for classifying strabismus type of eye image
Technical Field
The invention relates to the technical field of artificial intelligence, relates to a classification method of an eye image strabismus type, in particular to a classification method, a system, equipment and a storage medium of an eye image strabismus type.
Background
Strabismus is a clinically common disease of the eye with a prevalence of about 3% and can cause monocular suppression and retinal abnormalities in the patient, resulting in permanent visual impairment. In addition, strabismus can have serious psychological social consequences for the patient. In summary, strabismus has a significant and long-term impact on patients in terms of visual function, appearance, learning ability, working opportunities, mental health, etc. The onset of strabismus is hidden, and many young strabismus patients will get better cure opportunities if diagnosed early. Therefore, screening and diagnosis of strabismus is important as early as possible. Currently, strabismus screening, diagnosis is mainly performed manually by an ophthalmologist through several tests, such as a covering and uncovering test, a triple prism covering test, etc., requiring a high degree of cooperation between the patient and the doctor, and a long examination time. The tests are very dependent on the skills and experience of doctors, and the examination results are subjective, but at present, the ophthalmologist in China has huge resource gaps and uneven quality, so that risks of missed diagnosis and misdiagnosis exist. Therefore, a reliable artificial intelligence system is developed by using a deep learning method, rapid and automatic strabismus screening and diagnosis are realized, a relatively more objective diagnosis result is provided, and further therapeutic intervention is started as soon as possible, so that the method has great significance for protecting visual functions of strabismus patients and improving life quality.
In the current artificial intelligence eye image classification technical field, by classifying eye images, classification results of eye strabismus in the images are obtained, and two main research methods are provided: an eye key region segmentation algorithm based on traditional step learning and a classification algorithm based on end-to-end learning. In the research of the eye key region segmentation algorithm, researchers often adopt a pre-trained face detection model to extract an eye region of a face image, obtain coordinates of key regions such as pupil centers, cornea light spots and the like for operation, and then use a preset threshold value and a coordinate operation result for numerical comparison to judge whether strabismus exists or not and the type of strabismus. Choi et al propose an image processing based strabismus screening model that uses a first eye image, samples all pixel points on the eye contour edge obtained by a segmentation algorithm, and applies a least squares method to obtain the coordinates of the pupil center. The similarity of the positions of the eyes in the strabismus photo is measured by calculating the distance from the pupil center to the inner and outer corners of eyes, and whether strabismus exists is judged; the Ma et al respectively obtain the coordinates of the cornea center and the cornea reflecting points of the eyes by adopting a similar method, respectively calculate the horizontal offset and the vertical offset of the cornea reflecting points relative to the corresponding cornea center through the relative positions of the coordinates, and judge whether strabismus exists or not; the first time of Kang et al uses the first, second and third eye images to obtain cornea reflecting points, pupil centers, inner and outer eyes and upper and lower eyelid edge point coordinates based on a U-Net segmentation algorithm, and coordinate operation is performed by translating coordinates on the second and third eye images onto the first eye image according to a reference system, so as to realize multi-classification tasks of inner strabismus, outer strabismus, upper strabismus and lower strabismus. In the research of an end-to-end classification algorithm, zheng et al train a deep learning model based on an R-CNN architecture by using a first eye image of horizontal strabismus and orthotopic, so that horizontal strabismus and normal classification tasks are realized; lin et al then trains a deep learning model based on the InceptionResNetV2 architecture using various types of strabismus and orthotopic first eye images, achieving strabismus and normal classification tasks.
Based on the research of the traditional step-by-step learning eye key region segmentation algorithm, comparing the eye feature point coordinate operation result with a preset threshold value to judge whether strabismus exists or not and the type of strabismus; the method utilizes statistical data in a smaller range to perform threshold selection, is subjective, and performs verification on smaller data sets, so that the selected threshold is easy to bias, difficult to popularize in a large range and low in classification accuracy of strabismus types. The research of the classification algorithm based on end-to-end learning utilizes a large amount of image data to train, has better popularization, but still stays in strabismus and normal classification tasks at present, has limited help to clinical practice application and has lower classification accuracy rate for strabismus types.
Disclosure of Invention
The invention aims at: in order to solve the technical problem of low accuracy in classifying strabismus types of eye images in the prior art, the invention provides a method, a system, equipment and a storage medium for classifying strabismus types of eye images.
The invention adopts the following technical scheme for realizing the purposes:
a method for classifying strabismus type of eye images, comprising the following steps:
Step S1, obtaining sample data
Acquiring sample data, wherein the sample data comprises eye image sample data and text sample data;
s2, constructing a feature extraction network model
Constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model;
step S3, training a feature extraction network model
Training the feature extraction network model constructed in the step S2 by adopting the sample data acquired in the step S1;
taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network;
step S4, strabismus real-time classification
And (3) acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained in the step (S3), and outputting a classification result by the feature extraction network model.
Further, in step S2, the res net50V2 image extraction model includes a zero-padding layer, a two-dimensional convolution layer, a zero-padding layer, a maximum pooling layer, a residual module, a batch normalization layer, a linear rectification unit, an average pooling layer, and a full connection layer, which are sequentially connected.
Still further, the residual block includes a plurality of residual blocks connected in sequence, the last residual block includes two basic blocks, and the remaining residual blocks include three basic blocks.
Further, each basic block comprises a first batch of normalization layers and a first linear rectifying unit which are sequentially connected; the output of the first linear rectifying unit is divided into two parallel paths, one path sequentially passes through the first two-dimensional convolution layer, the second batch normalization layer, the second linear rectifying unit, zero filling, the second two-dimensional convolution layer, the third batch normalization layer and the third linear rectifying unit to be input into the third two-dimensional convolution layer, the other path is input into the fourth two-dimensional convolution layer, and the inputs of the third two-dimensional convolution layer and the fourth two-dimensional convolution layer are fused to be used as the output of the whole basic block.
Further, in step S3, when the feature extraction network model is trained, the residual block of the feature pre-extraction network model is activated in a nonlinear manner by means of a relu function, and the specific formula is as follows:
Figure SMS_1
wherein ,
Figure SMS_2
representing a sequence of residual units, ">
Figure SMS_3
Representing the input of the residual unit,
Figure SMS_4
is some column weights and variations associated with the residual unit,/is>
Figure SMS_5
Is the number of network layers of the residual unit, +.>
Figure SMS_6
Representing the activation function, a ReLU is typically used.
Further, in step S2, the feature coarse extraction network model and the feature fine extraction network model are both feature extraction networks of a joint attention mechanism, and include a plurality of self-attention network blocks sequentially arranged;
taking the image information as K and V in an attention mechanism, taking the text information as Q in the attention mechanism, and scoring the attention degree of the image information by taking the text information as a model in the whole attention mechanism
Figure SMS_7
And obtaining a matrix, wherein the product of the matrix acts on a result V obtained by analyzing the image information as a weight to obtain a final attention calculation result.
Further, in step S3, when the feature extraction network model is trained, the calculation formulas of the forward learning attentiveness of the feature coarse extraction network model and the feature fine extraction network model are as follows:
Figure SMS_8
wherein ,
Figure SMS_9
representing the dimension of the matrix, the dimension of information contained in one sample contained in the matrix QK; />
Figure SMS_10
Representing the transpose.
A classification system for an eye image strabismus type, comprising:
the sample data acquisition module is used for acquiring sample data, wherein the sample data comprises eye image sample data and text sample data;
the feature extraction network model construction module is used for constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model;
the feature extraction network model training module is used for training the feature extraction network model constructed by the feature extraction network model construction module by adopting the sample data acquired by the sample data acquisition module;
taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network;
And the strabismus real-time classification module is used for acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained by the feature extraction network model training module, and outputting a classification result by the feature extraction network model.
A computer device, characterized by: comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method described above.
A computer-readable storage medium, characterized by: a computer program is stored which, when executed by a processor, causes the processor to perform the steps of the above method.
The beneficial effects of the invention are as follows:
1. compared with the prior art that the research of the eye critical area segmentation algorithm based on traditional step learning is subject to the problems that smaller data sets are easy to form bias and difficult to popularize in a large range, the method and the device can utilize a large amount of bimodal (image+text) data training networks, reduce the bias of the model and enable the model to have universality.
2. Compared with the prior research of classification algorithm based on end-to-end learning, the method is only limited in strabismus and normal classification tasks, has limited help to clinical practice application, and can realize strabismus classification which covers all strabismus types common in clinic, thereby having stronger practical significance and clinical value.
3. Compared with the conventional artificial intelligence strabismus screening and diagnosis research which only uses eye position pictures to screen and diagnose strabismus, the method and the device have the advantages that through multi-mode feature fusion and reference of information sources of two modes of text information and image information, the model is comprehensively learned to the electronic case features corresponding to the pictures, the accuracy of the classification model is further improved, and the accuracy of the classification result is improved.
4. In the invention, considering that part of the horizontal strabismus subtype and the vertical strabismus subtype are highly similar in characteristics, if classification tasks are directly carried out, the classification accuracy of similar subclasses is not high. According to the method, the problem of model class confusion caused by the high similarity of the characteristics of part of horizontal strabismus subtypes and part of vertical strabismus subtypes is solved by a hierarchical classification method, so that multi-classification precision is improved, and class errors are reduced
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a network architecture diagram of feature extractor ResNet50V2 of the present invention;
FIG. 3 is a schematic diagram of the structure of a residual block in the present invention;
fig. 4 is a diagram of a feature extraction network of a joint attention mechanism in the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.
Thus, all other embodiments, which can be made by one of ordinary skill in the art without undue burden from the invention, are intended to be within the scope of the invention.
Example 1
The embodiment provides a classification method of eye image strabismus type, which reduces the correlation between features by normalizing text data to the same scale and range in a data preprocessing stage, thereby improving the distinguishing and interpretability of the features, and strengthens the scale of a data set by using a data augmentation means by using fewer samples in an image training set, so that samples among the data set are balanced as much as possible, and the recognition capability of a model to a few categories is improved; in the feature extraction and fusion stage, feature extraction is carried out on the eye image data through a ResNet50V2 model residual block connection mechanism, and then a feature fusion module based on a joint multi-head attention mechanism is adopted to fuse the image features extracted by the ResNet50V2 and the text features subjected to normalization processing. The model comprehensively learns the characteristics of the electronic cases related to the pictures and the patients, and the accuracy of classifying the model is further improved; the multi-classification output stage adopts a multi-classification module based on a hierarchical classification method, firstly classifies the categories into three major categories of normal strabismus and vertical strabismus, then subdivides the major categories under the guidance of the major category classification, and further classifies the horizontal strabismus and the vertical strabismus into a plurality of subtypes. Through a hierarchical classification mode, the problem of model class confusion caused by the high similarity of the characteristics of part of horizontal strabismus subtypes and vertical strabismus subtypes is solved, so that multi-classification precision is improved, and class errors are reduced. As shown in fig. 1, the specific steps of classification are:
Step S1, obtaining sample data
Sample data is acquired, wherein the sample data comprises eye image sample data and text sample data.
The eye image sample data is an image of the eye, and the text sample data includes the sex, age of the illness, and age of the illness of the patient. Because the data value ranges of different features are greatly different, instability and oscillation phenomenon can occur in the gradient updating process, so that the convergence speed and the accuracy of the model are reduced, and therefore, the text sample data are required to be normalized. By normalizing the text sample data to the same scale and range, the correlation between features can be reduced, thereby improving the discriminativity and interpretability of the features. Therefore, the embodiment integrally converts the text sample data into [0,1], so that the comparison between different features is more reasonable and accurate. Specifically, the sexes of the male and female were changed to 1 and 0, respectively, and the disease years and the disease ages were subtracted by the minimum value in the sample and divided by the difference.
For the problem of fewer samples in the image training set, the embodiment uses simple and standardized data augmentation means such as scaling, rotation, overturning, contrast enhancement and the like to strengthen the scale of the data set, so that samples among the classes in the data set are balanced as much as possible, the recognition capability of the model to a few classes is improved, meanwhile, the overfitting of the model is prevented, and the variance of the model is reduced.
S2, constructing a feature extraction network model
Constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model.
The feature pre-extraction network model is used for respectively extracting features of eye image sample data of an eye image and text sample data of basic information of a patient.
Firstly, a ResNet50V2 image extraction model is used as a feature extractor of image data, a connection mode of a residual block is used, a full connection layer used for classification of the last layer is removed, and a result of the previous layer is extracted to be used as a feature vector of the mode, so that feature fusion is convenient to carry out subsequently. As shown in fig. 2, the res net50V2 image extraction model includes a zero-padding, a two-dimensional convolution layer, a zero-padding, a max-pooling layer, a residual module, a batch normalization layer, a linear rectification unit, an average pooling layer, and a full connection layer, which are sequentially connected. The residual error module comprises a plurality of residual error blocks which are connected in sequence, wherein the last residual error block comprises two basic blocks, and the rest residual error blocks comprise three basic blocks. Each basic block comprises a first batch of normalization layers and a first linear rectification unit which are sequentially connected; the output of the first linear rectifying unit is divided into two parallel paths, one path sequentially passes through the first two-dimensional convolution layer, the second batch normalization layer, the second linear rectifying unit, zero filling, the second two-dimensional convolution layer, the third batch normalization layer and the third linear rectifying unit to be input into the third two-dimensional convolution layer, the other path is input into the fourth two-dimensional convolution layer, and the inputs of the third two-dimensional convolution layer and the fourth two-dimensional convolution layer are fused to be used as the output of the whole basic block.
The ResNet50V2 image extraction model uses a connection mode of residual blocks, and can solve the problems of gradient disappearance and gradient explosion in a deep neural network, thereby realizing a deeper network structure and higher classification precision. Wherein, the residual block is the core of the ResNet50V2 image extraction model, and the structure of the residual block is shown in fig. 3.
The embodiment aims at the low-dimensional text data which is preprocessed, and the text extraction model adopts two full-connection layers to perform feature extraction on the text data so as to obtain feature vectors of the text data. The low-dimensional feature vectors can better describe semantic information and structural features of text data, and provide more powerful support for subsequent feature fusion processing.
For two modes with relatively large differences, namely image data and text data, the embodiment adopts late fusion, namely, feature extraction is respectively carried out on the two modes, and feature fusion based on a joint multi-head attention mechanism is carried out on the feature vectors of the two modes after the feature vectors are obtained. For this reason, as shown in fig. 4, the feature coarse extraction network model and the feature fine extraction network model are feature extraction networks of a joint attention mechanism, which are different from the self-attention structure of the conventional transducer, and take image information as K and V in the attention mechanism and text information as Q. Whereas the meaning of the QKV matrix is as follows:
Q represents all information of the entire text, which is the content to be learned. For the network, the Q matrix indicates as text information which region on the corresponding picture the network notices.
K is keyword information, which refers to prompt information of the model before the content is not seen, and is also called keyword information. A region of interest of text is given over the network corresponding to the image.
V is learning content, meaning that the model represents information according to keywords. That is, V is typically initialized to be the same as Q. Corresponding to the image information acted upon by the attention vector obtained for the text information acting upon the image in the network.
Figure SMS_11
The dimension of the matrix refers to the dimension of information contained in one sample contained in the matrix QK.
The characteristic coarse extraction network model and the characteristic fine extraction network model comprise a plurality of self-attention network blocks which are sequentially arranged;
taking the image information as K and V in an attention mechanism, taking the text information as Q in the attention mechanism, and scoring the attention degree of the image information by taking the text information as a model in the whole attention mechanism
Figure SMS_12
And obtaining a matrix, wherein the product of the matrix acts on a result V obtained by analyzing the image information as a weight to obtain a final attention calculation result.
The characteristic coarse extraction network model and the characteristic fine extraction network model have the characteristics that as K and V are obtained from images, the characteristic dimension is the dimension M at the pixel level, the characteristic dimension is relatively large, the text characteristic N is obtained from text information, the dimension is far smaller than M, and when the two QK matrixes are used as characteristics to be fused, each dimension of Q is mapped to the K and corresponds to a region part, namely the part focused by the attention mechanism. After the first attention mechanism layer, the vector scale sent to the subsequent network is changed into the scale of N x dv, so that the subsequent calculation complexity is greatly reduced, and the specific time complexity is expressed as:
Figure SMS_13
wherein ,
Figure SMS_14
representing that one dimension of the text matrix is mapped to a corresponding region in the image, < >>
Figure SMS_15
Dimension of text information matrix,/->
Figure SMS_16
The dimension of the image area to which one text message is mapped.
General strabismus classification tasks can be classified into normal strabismus and vertical strabismus. The major classes of horizontal strabismus and vertical strabismus can be divided into a plurality of sub-classes, and the final purpose of the embodiment is to subdivide the sub-classes. Because part of the horizontal strabismus subtype is highly similar to the vertical strabismus subtype in characteristics, if classification tasks are directly performed, the classification accuracy of similar subclasses is not high. Since the learning ability of the neural network is that the smaller the number of classifications, the higher the learning accuracy. If the major class is classified three times, and then the minor class is subdivided under the guidance of the major class classification, the overall classification accuracy is relatively high, so that the classification network of the embodiment adopts a multi-layer classification based on one full-connection layer, and the existing classification network structure is adopted, so that the whole classification task is divided into two parts, namely parent class classification and sub-class classification. The parent class directs the progress of the child class, but the training process of both are independent at model training, have different loss functions and iterate separately. Wherein for gradient updating of parameters, parameter updating after a coarse feature extraction network in a classification network is only determined by loss back propagation of sub-classifications, while a network before the coarse feature extraction network in the classification network is simultaneously updated jointly by losses of both classifications. The method comprises the following steps:
For parent class classification: if the strabismus task is classified into three categories of normal, horizontal strabismus and vertical strabismus, the task is a relatively simple and accurate task, the overall difference among the categories is large, and the complete characteristics of the data do not need to be learned. In the embodiment, the design of dividing the characteristic extraction network into two parts is adopted, wherein the former part of the network is a coarse characteristic extraction network, and the latter part of the network is a fine characteristic extraction network; when the coarse feature extraction network of the former part is finished and the coarse feature enters the fine feature extraction network of the latter part, newly creating a network branch, and externally connecting a linear layer on the feature to perform three classification tasks, namely parent classification tasks. Although the parent class classification results are also updated continuously along with the network, the results of the classification tasks considered as the parent class in the same training step are accurate, so as to guide the sub classification tasks.
For subclass classification: and after the whole feature extraction network is finished, adding a linear layer and a softmax function finally to carry out a final classification task. The result of the parent class classification will be obtained before the classification is performed, at which time the parent class classification result is fully believed to be correct. And therefore, all the results which do not accord with the parent classification in the classification are discarded, the result with the largest softmax value is selected from the rest results as the classification result, and finally one of the following categories is output: normal, common external strabismus, external strabismus A sign, external strabismus V sign, other non-common external strabismus, common internal strabismus, internal strabismus A sign, internal strabismus V sign, other non-common internal strabismus, left upper strabismus, right upper strabismus.
Step S3, training a feature extraction network model
Training the feature extraction network model constructed in the step S2 by adopting the sample data acquired in the step S1;
the method comprises the steps of taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network.
When the feature extraction network model is trained, the residual block of the feature pre-extraction network model is activated in a nonlinear mode by means of a relu function, and redundancy of information in data is reduced. And the upper layer features are directly used in the x part, and feature sharing and information transmission of the front and back convolution layers are realized through cross-layer connection, so that the learning speed of a model can be increased, and convergence can be accelerated through parameter optimization. This structure also makes the mapping F (x) more sensitive to changes in output, which is specifically formulated as:
Figure SMS_17
wherein ,
Figure SMS_18
representing a sequence of residual units, ">
Figure SMS_19
Representing the input of the residual unit,
Figure SMS_20
Is some column weights and variations associated with the residual unit,/is>
Figure SMS_21
Is the number of network layers of the residual unit, +.>
Figure SMS_22
Representing the activation function, a ReLU is typically used.
The method aims at the task that a machine can only see the eye photos of a patient, but the range of the interesting region of the picture is corrected and fused through the relevant electronic case data of the patient, so that the model comprehensively learns the picture and the relevant electronic case characteristics of the patient. Therefore, when the feature extraction network model is trained, the calculation formulas of the forward learning attentiveness of the feature coarse extraction network model and the feature fine extraction network model are as follows:
Figure SMS_23
wherein ,
Figure SMS_24
representing the dimension of the matrix, the dimension of information contained in one sample contained in the matrix QK; />
Figure SMS_25
Representing the transpose.
Because the network is obtained by data of two different modes through different coding networks, QKV matrixes are obtained by adding different full-connection layers to the data, the learning process is also from updating the full-connection layers, the image data is recorded as G, the text data is recorded as H, and the linear layer parameters of the three matrixes are
Figure SMS_26
、/>
Figure SMS_27
、/>
Figure SMS_28
The actual propagation formula is:
Figure SMS_29
where G represents image data, H represents text data,
Figure SMS_30
representing the dimensions of the matrix >
Figure SMS_31
、/>
Figure SMS_32
、/>
Figure SMS_33
Representing the linear layer parameters of the three matrices QKV, T representing the transpose.
The multi-head attention mechanism employed in this embodiment is to dimension data
Figure SMS_34
Disassembling into (/ -)>
Figure SMS_35
H) QKV, where h represents the number of heads of the multi-head attention mechanism, and not just one, each QKV can obtain a portion of the dimensional characteristics of the data over which to learn the attention mechanism, and finally stitch all heads together. The structural design can ensure that each attention mechanism is used for optimizing different characteristic parts of each data, so that the possible deviation generated by the same attention mechanism is balanced, the data has more expressions, and the model effect is obviously improved.
Step S4, strabismus real-time classification
And (3) acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained in the step (S3), and outputting a classification result by the feature extraction network model.
Example 2
The embodiment provides a classification system of an eye image strabismus type, which specifically includes:
and the sample data acquisition module is used for acquiring sample data, wherein the sample data comprises eye image sample data and text sample data.
The eye image sample data is an eye image, and the text sample data comprises gender, disease age and disease age of the patient. Because the data value ranges of different features are greatly different, instability and oscillation phenomenon can occur in the gradient updating process, so that the convergence speed and the accuracy of the model are reduced, and therefore, the text sample data are required to be normalized. By normalizing the text sample data to the same scale and range, the correlation between features can be reduced, thereby improving the discriminativity and interpretability of the features. Therefore, the embodiment integrally converts the text sample data into [0,1], so that the comparison between different features is more reasonable and accurate. Specifically, the sexes of the male and female were changed to 1 and 0, respectively, and the disease years and the disease ages were subtracted by the minimum value in the sample and divided by the difference.
For the problem of fewer samples in the image training set, the embodiment uses simple and standardized data augmentation means such as scaling, rotation, overturning, contrast enhancement and the like to strengthen the scale of the data set, so that samples among the classes in the data set are balanced as much as possible, the recognition capability of the model to a few classes is improved, meanwhile, the overfitting of the model is prevented, and the variance of the model is reduced.
The feature extraction network model construction module is used for constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model.
The feature pre-extraction network model is used for respectively extracting features of eye image sample data of an eye image and text sample data of basic information of a patient.
Firstly, a ResNet50V2 image extraction model is used as a feature extractor of image data, a connection mode of a residual block is used, a full connection layer used for classification of the last layer is removed, and a result of the previous layer is extracted to be used as a feature vector of the mode, so that feature fusion is convenient to carry out subsequently. As shown in fig. 2, the res net50V2 image extraction model includes a zero-padding, a two-dimensional convolution layer, a zero-padding, a max-pooling layer, a residual module, a batch normalization layer, a linear rectification unit, an average pooling layer, and a full connection layer, which are sequentially connected. The residual error module comprises a plurality of residual error blocks which are connected in sequence, wherein the last residual error block comprises two basic blocks, and the rest residual error blocks comprise three basic blocks. Each basic block comprises a first batch of normalization layers and a first linear rectification unit which are sequentially connected; the output of the first linear rectifying unit is divided into two parallel paths, one path sequentially passes through the first two-dimensional convolution layer, the second batch normalization layer, the second linear rectifying unit, zero filling, the second two-dimensional convolution layer, the third batch normalization layer and the third linear rectifying unit to be input into the third two-dimensional convolution layer, the other path is input into the fourth two-dimensional convolution layer, and the inputs of the third two-dimensional convolution layer and the fourth two-dimensional convolution layer are fused to be used as the output of the whole basic block.
The ResNet50V2 image extraction model uses a connection mode of residual blocks, and can solve the problems of gradient disappearance and gradient explosion in a deep neural network, thereby realizing a deeper network structure and higher classification precision. Wherein, the residual block is the core of the ResNet50V2 image extraction model, and the structure of the residual block is shown in fig. 3.
The embodiment aims at the low-dimensional text data which is preprocessed, and the text extraction model adopts two full-connection layers to perform feature extraction on the text data so as to obtain feature vectors of the text data. The low-dimensional feature vectors can better describe semantic information and structural features of text data, and provide more powerful support for subsequent feature fusion processing.
For two modes with relatively large differences, namely image data and text data, the embodiment adopts late fusion, namely, feature extraction is respectively carried out on the two modes, and feature fusion based on a joint multi-head attention mechanism is carried out on the feature vectors of the two modes after the feature vectors are obtained. For this purpose, the feature coarse extraction network model and the feature fine extraction network model are feature extraction networks of a joint attention mechanism, and the networks are different from the self-attention structure of the conventional transducer, and take image information as K and V in the attention mechanism and text information as Q. Whereas the meaning of the QKV matrix is as follows:
Q represents all information of the whole text, which is the content to be learned
K is keyword information, which refers to prompt information of the model before the content is not seen, and is also called keyword information
V is learning content, meaning that the model represents information according to keywords. That is, V is typically initialized to be the same as Q.
Figure SMS_36
The dimension of the matrix refers to the dimension of information contained in one sample contained in the matrix QK.
The characteristic coarse extraction network model and the characteristic fine extraction network model comprise a plurality of self-attention network blocks which are sequentially arranged;
taking the image information as K and V in an attention mechanism, taking the text information as Q in the attention mechanism, and scoring the attention degree of the image information by taking the text information as a model in the whole attention mechanism
Figure SMS_37
And obtaining a matrix, wherein the product of the matrix acts on a result V obtained by analyzing the image information as a weight to obtain a final attention calculation result.
The characteristic coarse extraction network model and the characteristic fine extraction network model have the characteristics that as K and V are obtained from images, the characteristic dimension is the dimension M at the pixel level, the characteristic dimension is relatively large, the text characteristic N is obtained from text information, the dimension is far smaller than M, and when the two QK matrixes are used as characteristics to be fused, each dimension of Q is mapped to the K and corresponds to a region part, namely the part focused by the attention mechanism. After the first attention mechanism layer, the vector scale sent to the subsequent network is N x d v The subsequent calculation complexity is greatly reduced, and the specific time complexity is expressed as:
Figure SMS_38
wherein ,
Figure SMS_39
representing that one dimension of the text matrix is mapped to a corresponding region in the image, < >>
Figure SMS_40
Dimension of text information matrix,/->
Figure SMS_41
The dimension of the image area to which one text message is mapped.
General strabismus classification tasks can be classified into normal strabismus and vertical strabismus. The major classes of horizontal strabismus and vertical strabismus can be divided into a plurality of sub-classes, and the final purpose of the embodiment is to subdivide the sub-classes. Because part of the horizontal strabismus subtype is highly similar to the vertical strabismus subtype in characteristics, if classification tasks are directly performed, the classification accuracy of similar subclasses is not high. Since the learning ability of the neural network is that the smaller the number of classifications, the higher the learning accuracy. If the major class is classified three times, and then the minor class is subdivided under the guidance of the major class classification, the overall classification accuracy is relatively high, so the classification network of the embodiment adopts multi-level classification based on one full-connection layer to divide the whole classification task into two parts, namely parent class classification and sub-class classification. And the two classification models are all classification tasks by adding a linear layer after the characteristic extraction network. The parent class directs the progress of the child class, but the training process of both are independent at model training, have different loss functions and iterate separately. Wherein for gradient updating of parameters, parameter updating after a coarse feature extraction network in a classification network is only determined by loss back propagation of sub-classifications, while a network before the coarse feature extraction network in the classification network is simultaneously updated jointly by losses of both classifications. The method comprises the following steps:
For parent class classification: if the strabismus task is classified into three categories of normal, horizontal strabismus and vertical strabismus, the task is a relatively simple and accurate task, the overall difference among the categories is large, and the complete characteristics of the data do not need to be learned. In the embodiment, the design of dividing the characteristic extraction network into two parts is adopted, wherein the former part of the network is a coarse characteristic extraction network, and the latter part of the network is a fine characteristic extraction network; when the coarse feature extraction network of the former part is finished and the coarse feature enters the fine feature extraction network of the latter part, newly creating a network branch, and externally connecting a linear layer on the feature to perform three classification tasks, namely parent classification tasks. Although the parent class classification results are also updated continuously along with the network, the results of the classification tasks considered as the parent class in the same training step are accurate, so as to guide the sub classification tasks.
For subclass classification: and after the whole feature extraction network is finished, adding a linear layer and a softmax function finally to carry out a final classification task. The result of the parent class classification will be obtained before the classification is performed, at which time the parent class classification result is fully believed to be correct. And therefore, all the results which do not accord with the parent classification in the classification are discarded, the result with the largest softmax value is selected from the rest results as the classification result, and finally one of the following categories is output: normal, common external strabismus, external strabismus A sign, external strabismus V sign, other non-common external strabismus, common internal strabismus, internal strabismus A sign, internal strabismus V sign, other non-common internal strabismus, left upper strabismus, right upper strabismus.
The feature extraction network model training module is used for training the feature extraction network model constructed by the feature extraction network model construction module by adopting the sample data acquired by the sample data acquisition module;
the method comprises the steps of taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network.
When the feature extraction network model is trained, the residual block of the feature pre-extraction network model is activated in a nonlinear mode by means of a relu function, and redundancy of information in data is reduced. And the upper layer features are directly used in the x part, and feature sharing and information transmission of the front and back convolution layers are realized through cross-layer connection, so that the learning speed of a model can be increased, and convergence can be accelerated through parameter optimization. This structure also makes the mapping F (x) more sensitive to changes in output, which is specifically formulated as:
Figure SMS_42
wherein ,
Figure SMS_43
representing a sequence of residual units, ">
Figure SMS_44
Representing the input of the residual unit,
Figure SMS_45
is some column weights and variations associated with the residual unit,/is>
Figure SMS_46
Is the number of network layers of the residual unit, +.>
Figure SMS_47
Representing the activation function, a ReLU is typically used.
The method aims at the task that a machine can only see the eye photos of a patient, but the range of the interesting region of the picture is corrected and fused through the relevant electronic case data of the patient, so that the model comprehensively learns the picture and the relevant electronic case characteristics of the patient. Therefore, when the feature extraction network model is trained, the calculation formulas of the forward learning attentiveness of the feature coarse extraction network model and the feature fine extraction network model are as follows:
Figure SMS_48
wherein ,
Figure SMS_49
representing the dimension of the matrix, the dimension of information contained in one sample contained in the matrix QK; />
Figure SMS_50
Representing the transpose.
Because the network is obtained by data of two different modes through different coding networks, the QKV matrix is obtained by adding different full-connection layers to the data, and the learning process is also from updating the full-connection layers, the image data is recorded as G, and the text numberAccording to H, the linear layer parameters of the three matrixes are
Figure SMS_51
、/>
Figure SMS_52
、/>
Figure SMS_53
The actual propagation formula is:
Figure SMS_54
where G represents image data, H represents text data,
Figure SMS_55
Representing the dimensions of the matrix>
Figure SMS_56
、/>
Figure SMS_57
、/>
Figure SMS_58
Representing the linear layer parameters of the three matrices QKV, T representing the transpose.
The multi-head attention mechanism employed in this embodiment is to dimension data
Figure SMS_59
Disassembling into (/ -)>
Figure SMS_60
H) QKV, where h represents the number of heads of the multi-head attention mechanism, and not just one, each QKV can obtain a portion of the dimensional characteristics of the data over which to learn the attention mechanism, and finally stitch all heads together. The structural design can lead each attention mechanism to optimize different characteristic parts of each data, thereby balancing the possible deviation generated by the same attention mechanism and leading the data to have more elementsThe expression of (3) obviously improves the model effect.
And the strabismus real-time classification module is used for acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained by the feature extraction network model training module, and outputting a classification result by the feature extraction network model.
Example 3
A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of a method of classifying an eye image strabismus type.
The computer equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or D interface display memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Of course, the memory may also include both internal storage units of the computer device and external storage devices. In this embodiment, the memory is often used to store an operating system and various application software installed on the computer device, for example, program codes of the classifying method of the strabismus type of the eye image. In addition, the memory may be used to temporarily store various types of data that have been output or are to be output.
The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute a program code stored in the memory or process data, for example, a program code for executing a classification method of the strabismus type of the eye image.
Example 4
A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of a method of classifying an eye image strabismus type.
Wherein the computer-readable storage medium stores an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the method for classifying an eye image strabismus type as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method for classifying an eye image strabismus type according to the embodiments of the present application.

Claims (10)

1. The classifying method of the strabismus type of the eye image is characterized by comprising the following steps of:
step S1, obtaining sample data
Acquiring sample data, wherein the sample data comprises eye image sample data and text sample data;
s2, constructing a feature extraction network model
Constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model;
step S3, training a feature extraction network model
Training the feature extraction network model constructed in the step S2 by adopting the sample data acquired in the step S1;
taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network;
Step S4, strabismus real-time classification
And (3) acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained in the step (S3), and outputting a classification result by the feature extraction network model.
2. A method of classifying an eye image strabismus type according to claim 1, wherein: in step S2, the res net50V2 image extraction model includes a zero-padding layer, a two-dimensional convolution layer, a zero-padding layer, a maximum pooling layer, a residual error module, a batch normalization layer, a linear rectification unit, an average pooling layer, and a full connection layer, which are sequentially connected.
3. A method of classifying an eye image strabismus type according to claim 2, wherein: the residual error module comprises a plurality of residual error blocks which are connected in sequence, wherein the last residual error block comprises two basic blocks, and the rest residual error blocks comprise three basic blocks.
4. A method of classifying an ocular image strabismus type as in claim 3, wherein: each basic block comprises a first batch of normalization layers and a first linear rectification unit which are sequentially connected; the output of the first linear rectifying unit is divided into two parallel paths, one path sequentially passes through the first two-dimensional convolution layer, the second batch normalization layer, the second linear rectifying unit, zero filling, the second two-dimensional convolution layer, the third batch normalization layer and the third linear rectifying unit to be input into the third two-dimensional convolution layer, the other path is input into the fourth two-dimensional convolution layer, and the inputs of the third two-dimensional convolution layer and the fourth two-dimensional convolution layer are fused to be used as the output of the whole basic block.
5. A method of classifying an eye image strabismus type according to claim 1, wherein: in step S3, when the feature extraction network model is trained, the residual block of the feature pre-extraction network model is activated in a nonlinear manner by means of a relu function, and the specific formula is as follows:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
representing a sequence of residual units, ">
Figure QLYQS_3
Representing the input of the residual unit,/->
Figure QLYQS_4
Is some column weights and variations associated with the residual unit,/is>
Figure QLYQS_5
Is the number of network layers of the residual unit, +.>
Figure QLYQS_6
Representing an activation function.
6. A method of classifying an eye image strabismus type according to claim 1, wherein: in step S2, the characteristic coarse extraction network model and the characteristic fine extraction network model are both characteristic extraction networks of a joint attention mechanism, and comprise a plurality of self-attention network blocks which are sequentially arranged;
taking the image information as K and V in an attention mechanism, taking the text information as Q in the attention mechanism, and scoring the attention degree of the image information by taking the text information as a model in the whole attention mechanism
Figure QLYQS_7
And obtaining a matrix, wherein the product of the matrix acts on a result V obtained by analyzing the image information as a weight to obtain a final attention calculation result.
7. The method of classifying an eye image strabismus type of claim 6, wherein: in step S3, when the feature extraction network model is trained, the calculation formulas of the forward learning attentiveness of the feature coarse extraction network model and the feature fine extraction network model are as follows:
Figure QLYQS_8
wherein ,
Figure QLYQS_9
representing the dimension of the matrix, the dimension of information contained in one sample contained in the matrix QK; />
Figure QLYQS_10
Representing the transpose.
8. A classification system of the strabismus type for an ocular image, comprising:
the sample data acquisition module is used for acquiring sample data, wherein the sample data comprises eye image sample data and text sample data;
the feature extraction network model construction module is used for constructing a feature extraction network model, wherein the feature extraction network model comprises a feature pre-extraction network model, a feature coarse extraction network model, a feature fine extraction network model and a classification network, and the feature pre-extraction network model comprises a ResNet50V2 image extraction model and a text extraction model;
the feature extraction network model training module is used for training the feature extraction network model constructed by the feature extraction network model construction module by adopting the sample data acquired by the sample data acquisition module;
Taking eye image sample data as input of a ResNet50V2 image extraction model, taking text sample data as input of a text extraction model, taking output of the ResNet50V2 image extraction model and the text extraction model as input of a characteristic coarse extraction network model, taking output of the characteristic coarse extraction network model as input of a characteristic fine extraction network model, taking output of the characteristic fine extraction network model as input of a classification network, and outputting a classification result by the classification network;
and the strabismus real-time classification module is used for acquiring real-time eye image data and text data, inputting the eye image data and the text data into the feature extraction network model trained by the feature extraction network model training module, and outputting a classification result by the feature extraction network model.
9. A computer device, characterized by: comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by: a computer program is stored which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
CN202310613349.1A 2023-05-29 2023-05-29 Method, system, equipment and storage medium for classifying strabismus type of eye image Active CN116385806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310613349.1A CN116385806B (en) 2023-05-29 2023-05-29 Method, system, equipment and storage medium for classifying strabismus type of eye image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310613349.1A CN116385806B (en) 2023-05-29 2023-05-29 Method, system, equipment and storage medium for classifying strabismus type of eye image

Publications (2)

Publication Number Publication Date
CN116385806A true CN116385806A (en) 2023-07-04
CN116385806B CN116385806B (en) 2023-09-08

Family

ID=86969727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310613349.1A Active CN116385806B (en) 2023-05-29 2023-05-29 Method, system, equipment and storage medium for classifying strabismus type of eye image

Country Status (1)

Country Link
CN (1) CN116385806B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9462945B1 (en) * 2013-04-22 2016-10-11 VisionQuest Biomedical LLC System and methods for automatic processing of digital retinal images in conjunction with an imaging device
CN108229580A (en) * 2018-01-26 2018-06-29 浙江大学 Sugared net ranking of features device in a kind of eyeground figure based on attention mechanism and Fusion Features
WO2020048183A1 (en) * 2018-09-04 2020-03-12 上海海事大学 Vessel type identification method based on coarse-to-fine cascaded convolutional neural network
WO2020098257A1 (en) * 2018-11-14 2020-05-22 平安科技(深圳)有限公司 Image classification method and device and computer readable storage medium
CN111309919A (en) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 System and training method of text classification model
CN112052845A (en) * 2020-10-14 2020-12-08 腾讯科技(深圳)有限公司 Image recognition method, device, equipment and storage medium
CN113065577A (en) * 2021-03-09 2021-07-02 北京工业大学 Multi-modal emotion classification method for targets
US20220058449A1 (en) * 2020-08-20 2022-02-24 Capital One Services, Llc Systems and methods for classifying data using hierarchical classification model
CN114462567A (en) * 2021-12-15 2022-05-10 西安邮电大学 Attention mechanism-based neural network model
CN114724231A (en) * 2022-04-13 2022-07-08 东北大学 Glaucoma multi-modal intelligent recognition system based on transfer learning
CN115019380A (en) * 2022-06-07 2022-09-06 广州医科大学 Strabismus intelligent identification method, device, terminal and medium based on eye image
CN115424319A (en) * 2022-08-16 2022-12-02 温州医科大学附属眼视光医院 Strabismus recognition system based on deep learning
CN115512153A (en) * 2022-09-21 2022-12-23 哈尔滨理工大学 Retina OCT image classification method, system, computer equipment and storage medium based on multi-scale residual error network
CN115937604A (en) * 2022-12-27 2023-04-07 西南大学 anti-NMDAR encephalitis prognosis classification method based on multi-modal feature fusion
CN116168403A (en) * 2023-01-17 2023-05-26 智慧眼科技股份有限公司 Medical data classification model training method, classification method, device and related medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9462945B1 (en) * 2013-04-22 2016-10-11 VisionQuest Biomedical LLC System and methods for automatic processing of digital retinal images in conjunction with an imaging device
CN108229580A (en) * 2018-01-26 2018-06-29 浙江大学 Sugared net ranking of features device in a kind of eyeground figure based on attention mechanism and Fusion Features
WO2020048183A1 (en) * 2018-09-04 2020-03-12 上海海事大学 Vessel type identification method based on coarse-to-fine cascaded convolutional neural network
WO2020098257A1 (en) * 2018-11-14 2020-05-22 平安科技(深圳)有限公司 Image classification method and device and computer readable storage medium
CN111309919A (en) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 System and training method of text classification model
US20220058449A1 (en) * 2020-08-20 2022-02-24 Capital One Services, Llc Systems and methods for classifying data using hierarchical classification model
CN112052845A (en) * 2020-10-14 2020-12-08 腾讯科技(深圳)有限公司 Image recognition method, device, equipment and storage medium
CN113065577A (en) * 2021-03-09 2021-07-02 北京工业大学 Multi-modal emotion classification method for targets
CN114462567A (en) * 2021-12-15 2022-05-10 西安邮电大学 Attention mechanism-based neural network model
CN114724231A (en) * 2022-04-13 2022-07-08 东北大学 Glaucoma multi-modal intelligent recognition system based on transfer learning
CN115019380A (en) * 2022-06-07 2022-09-06 广州医科大学 Strabismus intelligent identification method, device, terminal and medium based on eye image
CN115424319A (en) * 2022-08-16 2022-12-02 温州医科大学附属眼视光医院 Strabismus recognition system based on deep learning
CN115512153A (en) * 2022-09-21 2022-12-23 哈尔滨理工大学 Retina OCT image classification method, system, computer equipment and storage medium based on multi-scale residual error network
CN115937604A (en) * 2022-12-27 2023-04-07 西南大学 anti-NMDAR encephalitis prognosis classification method based on multi-modal feature fusion
CN116168403A (en) * 2023-01-17 2023-05-26 智慧眼科技股份有限公司 Medical data classification model training method, classification method, device and related medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
马超;刘亚淑;骆功宁;王宽全;: "基于级联随机森林与活动轮廓的3D MR图像分割", 自动化学报, no. 05 *
黎彪;丁雅?;邵毅;: "人工智能在小儿眼科领域的应用研究进展", 国际眼科杂志, no. 08 *

Also Published As

Publication number Publication date
CN116385806B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
US11580646B2 (en) Medical image segmentation method based on U-Net
CN111048170B (en) Digestive endoscopy structured diagnosis report generation method and system based on image recognition
Ran et al. Cataract detection and grading based on combination of deep convolutional neural network and random forests
US20220101654A1 (en) Method for recognizing actions, device and storage medium
CN110210286B (en) Fundus image-based abnormality identification method, fundus image-based abnormality identification device, fundus image-based abnormality identification equipment and fundus image-based abnormality identification storage medium
CN114398961B (en) Visual question-answering method based on multi-mode depth feature fusion and model thereof
CN106068514A (en) For identifying the system and method for face in free media
CN111160264B (en) Cartoon character identity recognition method based on generation countermeasure network
WO2020208494A1 (en) Method and system for estimating eye-related geometric parameters of a user
WO2023160264A1 (en) Medical data processing method and apparatus, and storage medium
CN114549850B (en) Multi-mode image aesthetic quality evaluation method for solving modal missing problem
CN110110600A (en) The recognition methods of eye OCT image lesion, device and storage medium
CN111967334B (en) Human body intention identification method, system and storage medium
JP2018116589A (en) State identification device, program and method using changed image group of object image
Zhang et al. Attention-based multi-model ensemble for automatic cataract detection in B-scan eye ultrasound images
CN110232331A (en) A kind of method and system of online face cluster
CN113240655A (en) Method, storage medium and device for automatically detecting type of fundus image
CN112768065A (en) Facial paralysis grading diagnosis method and device based on artificial intelligence
CN115631825A (en) Method for automatically generating structured report by using natural language model and related equipment
Zheng et al. Deep level set method for optic disc and cup segmentation on fundus images
CN111369506A (en) Lens turbidity grading method based on eye B-ultrasonic image
US11723630B2 (en) Method for positioning key features of a lens based on ocular B-mode ultrasound images
CN116385806B (en) Method, system, equipment and storage medium for classifying strabismus type of eye image
CN111695507B (en) Static gesture recognition method based on improved VGGNet network and PCA
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant