CN111428072A - Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium - Google Patents

Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium Download PDF

Info

Publication number
CN111428072A
CN111428072A CN202010242450.7A CN202010242450A CN111428072A CN 111428072 A CN111428072 A CN 111428072A CN 202010242450 A CN202010242450 A CN 202010242450A CN 111428072 A CN111428072 A CN 111428072A
Authority
CN
China
Prior art keywords
image
eye
images
modal
ophthalmic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010242450.7A
Other languages
Chinese (zh)
Inventor
方建生
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Southern University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202010242450.7A priority Critical patent/CN111428072A/en
Publication of CN111428072A publication Critical patent/CN111428072A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Abstract

The embodiment of the invention discloses a method, a device, a server and a storage medium for searching ophthalmic multi-modal images, wherein the method comprises the following steps: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. Through acquiring the single-mode eye image of the user, the system identifies in the deep learning model and outputs a multi-mode identification result, the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is achieved, and the user experience is improved.

Description

Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium
Technical Field
The present invention relates to retrieval technologies, and in particular, to a method, an apparatus, a server, and a storage medium for retrieving an ophthalmic multimodal image.
Background
With the development of imaging technology, digital ophthalmic images have become the main data of ophthalmology, and this trend drives the construction of ophthalmic image search function to assist the clinical decision of doctors. Conventionally, the ophthalmic image retrieval function adopts a text-based retrieval method, which first performs text description on an image (establishes a corresponding relationship between a text and the image), inputs a keyword query during retrieval, and returns a ranking result. The method of finding pictures by words has semantic difference of inconsistent text description and image content, and influences retrieval effect. With the development of computer vision technology, Content-based image Retrieval (CBIR) methods are beginning to be applied to ophthalmology. The method of finding the image by the image carries out retrieval according to the characteristics of the color, the shape, the texture and the like of the image, and avoids semantic difference between text description and image content. A Content-based Image Retrieval (CBIR) technique is a technique for retrieving the most similar Image from the Content of an Image, and combines information Retrieval, computer vision, and the like. In recent years, in the field of medical imaging, a deep learning algorithm represented by a deep convolutional network (CNN) obtains excellent performance in disease classification and lesion segmentation of ophthalmic images, and is superior to traditional classifiers (such as a Support Vector Machine (SVM), a Random Forest (RF) and the like) in extracting features such as textures, colors, forms and the like, so that a technical basis is provided for construction of an image retrieval function.
Diseases of various parts of the human body can be presented through pathological changes of the eyes, so that the academic and industrial fields are widely dedicated to automatically screening the diseases by analyzing the digital ophthalmology image through an artificial intelligence algorithm, and related results are published. For example, White Eye Detector, a free software for determining Eye cancer by pictures, developed by university of Beller, Texas, USA, BiliSten, a software for diagnosing liver cancer according to Eye color, developed by university of Washington, and Intelligent screening Eye fundus camera, introduced by national health company. However, the automatic disease screening based on the ophthalmic medical imaging technology and the image processing technology has the problems that the algorithm faces the challenges of interpretability and accuracy, the samples used for training the model also face the problems of difficult acquisition, subjective ambiguity labeling and the like, and the clinical application still has a long way to go. More importantly, although the computer screening result is only used as an auxiliary reference for the doctor, the computer screening result more or less affects the judgment of the doctor again, and thus the final diagnosis result is possibly affected.
Disclosure of Invention
The invention provides a method, a device, a server and a storage medium for searching an ophthalmologic case, which are used for realizing the effect of searching other potential ophthalmologic problems possibly existing through a single-mode ophthalmologic image.
In a first aspect, an embodiment of the present invention provides a method for retrieving an ophthalmic multimodal image, including: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
Optionally, the ophthalmic digital image includes: fundus, corneal nerve and OCT images.
Optionally, the deep learning model is a multi-modal convolutional neural network model.
Optionally, before inputting the single-mode eye image into a pre-trained deep learning model and outputting a multi-mode recognition result, the method further includes:
marking the sample image by using various labels and establishing a database;
and establishing a multi-modal convolutional neural network and training the multi-modal convolutional neural network model by using the sample image to obtain a trained multi-modal convolutional neural network model.
Optionally, the marking the sample image with a plurality of labels and establishing a database includes:
if the sample images belong to the same patient, marking the sample images as a first label;
if the sample image belongs to the same case, marking the sample image as a second label;
if the sample image is not relevant, marking as a third label;
and establishing a database of sample images marked with the first label, the second label and the third label.
Optionally, the multi-modal convolutional neural network model includes: the eye fundus image convolution neural network model, the cornea neural image convolution neural network model and the OCT image convolution neural network model.
Optionally, the comparative analysis result includes: the most similar multiple sample images and their detailed case cases.
In a second aspect, an embodiment of the present invention further provides a device for retrieving an ophthalmic multi-modal image, where the device includes:
the data acquisition module is used for acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
and the data identification module is used for correspondingly inputting the eye images in the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images in the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for retrieving an ophthalmic multimodal image as described in any of the above.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for retrieving an ophthalmic multimodal image as described in any one of the above.
The embodiment of the invention discloses a method, a device, a server and a storage medium for searching ophthalmic multi-modal images, wherein the method comprises the following steps:
acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the method for retrieving the multi-mode ophthalmologic image, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-mode eye image of the user and outputs the multi-mode identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is realized, and the user experience is improved.
Drawings
Fig. 1 is a flowchart of a method for retrieving multi-modal ophthalmic images according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for retrieving multi-modal ophthalmic images according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a retrieval apparatus for multi-modal ophthalmic images according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first label may be referred to as a second label, and similarly, a second label may be referred to as a first label, without departing from the scope of the present application. The first label and the second label are both labels, but they are not the same label. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Example one
Fig. 1 is a flowchart of a method for retrieving an ophthalmic multi-modal image according to an embodiment of the present invention, which is applicable to a case where a user performs an ophthalmic disease retrieval online, and specifically includes the following steps:
step 100, obtaining a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images of multiple different modes.
In this embodiment, an eye image of a user uploaded or collected by the user is acquired, where the eye image is an ophthalmic digital image, and the ophthalmic digital image imaging methods include eye surface color photography, Fundus color photography, Optical Coherence Tomography (OCT), Anterior segment OCT, In Vivo Confocal Microscope (IVCM), Fluorescein Fundus Angiography (FFA), and Indocyanine Green choroidal Angiography (ICGA). In this embodiment, the ophthalmic digital image includes: fundus, corneal nerve and OCT images. The imaging apparatus outputs a digital image by observing the morphology of tissue structures of blood vessels, nerves, cornea, crystals, crystalline lens, iris, etc. of the eye in the human body, among which a fundoscope, a slit lamp, and Optical Coherence Tomography (OCT). The ophthalmic digital images generated by different imaging devices are different, have different resolutions and different parts, and can be used for different disease type diagnoses. Such as optical coherence tomography of the posterior segment of the eye, has important value in clinical examination and diagnosis of retinal diseases, macular diseases, optic nerve diseases, glaucoma and the like. In terms of data morphology, each imaging mode generates a modality, and the multiple imaging modes generate multiple digital images which are multiple modalities.
Step 110, correspondingly inputting the eye images of the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
In this embodiment, the deep learning model is a multi-modal convolutional neural network model, and the eye images of multiple different modalities are input into multiple pre-trained deep learning models to fuse and train similarity and dissimilarity of the eye images of multiple different modalities and output multiple feature vectors, and a comparative analysis result is generated according to the multiple feature vectors, where the comparative analysis result includes: the most similar sample images and the detailed case conditions thereof can be provided for the user or the doctor as reference by returning the multi-modal sample images and the case reasons thereof, so that whether the user possibly comprises other eye diseases which cannot be displayed by the single-modal eye images or not can be conveniently judged.
Illustratively, certain ocular diseases sometimes do not occur alone, but occur simultaneously, and one elderly often incorporates several senile ocular diseases. The disease of the eyeground and the cataract are suffered at the same time, the eyeground is generally examined by using a fundoplication lens, and the cataract is examined by using an OCT (optical coherence tomography), namely, the imaging modes of different modes are suitable for different diseases. When a patient in a certain case has multiple eye lesions, digital images of multiple modalities may be needed for diagnosis. The method has application value in constructing similarity relation for digital images of different modes belonging to the same case. Through case-level retrieval, the likelihood of multiple lesions in multiple modalities can be retrieved. In this case, multi-modal search is actually helpful for the discovery of multiple lesions. If a patient is known to be a disease a, the multi-modal search function inputs a digital image search of one modality of the patient, returns a digital image of another modality, and the returned digital image of the other modality is related to a disease B, so that the patient may have A, B both diseases. The retrieval is valuable in that the requirement is partially definite and partially fuzzy, and the partial fuzzy is constantly definite through the retrieval result. For example, in a text search, if a user wants to find a book and knows part of the book name, but the full name is unknown, the search engine returns many similar results by searching the part with the known book name, and the similar results may include the book that the user wants, and the user knows the full name of the book. Retrieval helps users to keep their needs clear of vast amounts of information and often has unexpected consequences. The multi-modal image retrieval is based on unexpected results for a plurality of lesion scenes, and assists doctors in further mining information to judge lesions.
For example, when a doctor diagnoses, an ophthalmologic image of a patient is extracted, and when the image is difficult to make a decision, the doctor consciously searches for similar cases to be referred to. At this time, if the image is searched by the modality of the query image alone, the case deficiency (deficiency and difficulty) of the modality may be faced, and the similar cases searched may still fail to make a diagnosis conclusion, and at this time, the search of other modalities may be further extended, and the result of the search of other modalities is used to assist the diagnosis. If a patient in a case takes only one modality of digital image of the eye, multi-modality retrieval can be used to assist the physician. However, if the patient himself takes digital images of multiple modalities, although the images can be retrieved from modality to modality, the diagnosis process is time-consuming for the doctor. When the patient is confronted with difficult and complicated diseases, the patient only takes a modal ophthalmic digital image, and the multi-modal retrieval highlights the functional value of the patient. Even if the patient takes a plurality of ophthalmic digital images, the multi-mode retrieval can avoid the retrieval of a plurality of modes of a doctor for a plurality of times, and the diagnosis efficiency is improved.
The embodiment discloses a method for searching multi-modal ophthalmology images, which comprises the following steps: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the method for retrieving the multi-mode ophthalmologic image, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-mode eye image of the user and outputs the multi-mode identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is realized, and the user experience is improved.
Example two
Fig. 1 is a flowchart of a method for retrieving an ophthalmic multi-modal image according to an embodiment of the present invention, which is applicable to a case where a user performs an ophthalmic disease retrieval online, and specifically includes the following steps:
step 200, acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images of multiple different modes.
Step 210, labeling the sample image with a plurality of labels and building a database.
Specifically, step 210 includes: if the sample images belong to the same patient, marking the sample images as a first label;
if the sample image belongs to the same case, marking the sample image as a second label;
if the sample image is not relevant, marking as a third label;
and establishing a database of sample images marked with the first label, the second label and the third label.
In this embodiment, the multi-modal ophthalmic digital images are labeled with 3 levels, i.e., the third label is irrelevant 0, the second label is relevant to case 1, and the first label is relevant to disease 2, so that the relevance is higher and higher. Assuming A, B, C three modes, namely a corneal nerve map, an eye fundus map and an anterior segment OCT, the three modes are labeled on the training sample mainly. The three modalities are labeled 1 if they all belong to the same case (same patient), 2 if they belong to the same disease, and 0 if they are not related. If the two modes exist, the digital images of the two modes are labeled. The image pair samples are generated mainly by considering the correlation of images between modalities. Theoretically, the higher the correlation, the more similar the learned hash code.
And step 220, establishing a multi-modal convolutional neural network and training the multi-modal convolutional neural network model by using the sample image to obtain a trained multi-modal convolutional neural network model.
In the present embodiment, the multi-modal convolutional neural network model includes: the eye fundus image convolution neural network model, the cornea neural image convolution neural network model and the OCT image convolution neural network model. And training a plurality of neural network models to perform loss function calculation on the generated hash codes.
Illustratively, taking two modalities of an eye fundus map and an OCT map as examples, the image pairs (the eye fundus map and the OCT map) of the two modalities are input at the leftmost side, the intermediate network structure layer is the same, and the last layer is a Hash layer (generating a Hash code with the length of K). During training, multi-mode loss training is carried out on the labels of the images between the modes, and a discriminant mode is adopted for loss function design, namely, the smaller the distance of the Hash codes corresponding to the images of different modes belonging to the same label is. A model based on multi-mode discriminant loss function training, namely a model fusion method. The network structures of different modes are the same, but the weights are not shared (influence of different mode characteristics is avoided), but finally the loss function fuses and trains the correlation of the two modes, so that the images of different modes have certain similarity. After training, the different modes generate discrete hash representation for the image according to respective models, namely binarization.
During training, the output of the hash layer is between [ -1,1], and a function is activated by tanh; when the discrete hash code representation is generated after training, the hash layer is symbolized, namely, a sign symbolic function is added to the tanh function, so that each bit of the K hash codes is represented by 1 or-1, and the distance can be calculated in a Hamming space after binarization. In brief, during training, K hash codes are values between [ -1,1] and are used for training the multi-modal loss function. When the hash codes are generated for the images based on the trained model, the K hash codes are the values of { -1,1 }. The retrieval of the lesion position is very important for case retrieval. Generally, the discrimination of an image depends mainly on the identification and comparison of key regions (i.e. lesion regions). If the ratio of the lesion area is small in the two images, if the comparison of the whole image is performed, the feature representation of the lesion area is ignored, which results in an error of similarity. In short, the information of the lesion region is weighted more than the information of other regions in the entire image. In the feature extraction, the model design mainly considers the feature representation of the lesion region. Therefore, the scheme introduces a spatial attitude mechanism into the model to focus on capturing the characteristics of the lesion area. Based on the CNN model with the spatial attribute mechanism, the high-dimensional image is mapped into a low-dimensional hash code. The model fusion method based on the multi-mode discriminant loss mode constructs the similarity of images among the modes, and the distance between hash codes generated by the images with the same label is smaller.
The method based on multi-modal model fusion generates a hash code for the digital image, namely supports Hamming distance calculation. The specific scene is as follows: inputting a new digital image, generating a string of K hash codes for the image through the same network, and then performing Hamming distance calculation with the samples of the generated hash codes in the database, wherein the smaller the distance, the higher the similarity. Assuming that models of two modes of OCT and fundus oculi are trained and samples in the database generate hash codes through the models, now a doctor inputs a fundus oculi image for retrieval, the fundus oculi image generates K hash codes through the models of the fundus oculi modes, and carries out Hamming distance calculation with the hash codes of the samples in the database, and returns the most similar n results. The n results may include images of two modalities, because after model fusion, the digital images between modalities establish a certain similarity, which is reflected in the K hash codes.
And step 230, correspondingly inputting the eye images in the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images in the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
The embodiment discloses a method for searching multi-modal ophthalmology images, which comprises the following steps: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; marking the sample image by using various labels and establishing a database; establishing a multi-modal convolutional neural network, training the multi-modal convolutional neural network model by using the sample image to obtain a trained multi-modal convolutional neural network model, correspondingly inputting the eye images of the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the method for retrieving the multi-mode ophthalmologic image, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-mode eye image of the user and outputs the multi-mode identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is realized, and the user experience is improved.
EXAMPLE III
The ophthalmologic multi-mode image retrieval device provided by the embodiment of the invention can implement the ophthalmologic multi-mode image retrieval method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of an execution method. Fig. 3 is a schematic structural diagram of an ophthalmic multimodal image retrieval apparatus 300 according to an embodiment of the present invention. Referring to fig. 3, the apparatus 300 for retrieving an ophthalmic multi-modal image according to an embodiment of the present invention may include:
a data obtaining module 310, configured to obtain a current eye image of a user, where the eye image is an ophthalmic digital image and includes eye images of multiple different modalities;
the data identification module 320 is configured to correspondingly input the eye images in the multiple different modalities into multiple pre-trained deep learning models, fuse and train similarity and dissimilarity of the eye images in the multiple different modalities, output multiple feature vectors, and generate a comparison analysis result according to the multiple feature vectors.
Further, the ophthalmic digital image comprises: fundus, corneal nerve and OCT images.
Further, the deep learning model is a multi-modal convolutional neural network model.
Further, the acquiring a current eye image of a user, where the eye image is an ophthalmic digital image, and the eye image includes eye images of multiple different modalities, further includes:
marking the sample image by using various labels and establishing a database;
and establishing a multi-modal convolutional neural network and training the multi-modal convolutional neural network model by using the sample image to obtain a trained multi-modal convolutional neural network model.
Further, the marking the sample image with the plurality of labels and establishing the database includes:
if the sample images belong to the same patient, marking the sample images as a first label;
if the sample image belongs to the same case, marking the sample image as a second label;
if the sample image is not relevant, marking as a third label;
and establishing a database of sample images marked with the first label, the second label and the third label.
Further, the multi-modal convolutional neural network model comprises: the eye fundus image convolution neural network model, the cornea neural image convolution neural network model and the OCT image convolution neural network model.
Further, the comparative analysis result includes: the most similar multiple sample images and their detailed case cases.
The embodiment discloses a retrieval device of ophthalmology multimodal image, the device includes: the data acquisition module is used for acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; and the data identification module is used for correspondingly inputting the eye images in the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images in the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the retrieval device for the ophthalmic multi-modal images, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-modal eye images of the user and outputs the multi-modal identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-modal eye images in the prior art is solved, the effect of retrieving other possible eye diseases through the single-modal eye images is realized, and the user experience is improved.
Example four
Fig. 4 is a schematic structural diagram of a computer server according to an embodiment of the present invention, as shown in fig. 4, the computer server includes a memory 410 and a processor 420, the number of the processors 420 in the computer server may be one or more, and one processor 420 is taken as an example in fig. 4; the memory 410 and the processor 420 in the device may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example.
The memory 410 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules (e.g., the data acquisition module 310 and the data identification module 320 in the ophthalmologic case retrieval apparatus 300) corresponding to the method for retrieving the multimodal ophthalmologic image in the embodiment of the present invention, and the processor 420 executes various functional applications and data processing of the device/terminal/apparatus by running the software programs, instructions, and modules stored in the memory 410, so as to implement the method for retrieving the multimodal ophthalmologic image.
Wherein the processor 420 is configured to run the computer program stored in the memory 410, and implement the following steps:
acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
In one embodiment, the computer program of the computer device provided in the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the method for retrieving an ophthalmic multimodal image provided in any embodiments of the present invention.
The memory 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 410 may further include memory located remotely from the processor 420, which may be connected to devices/terminals/devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiment discloses a retrieval server of ophthalmologic multi-modal images, which is used for executing the following method, and the method comprises the following steps: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the method for retrieving the multi-mode ophthalmologic image, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-mode eye image of the user and outputs the multi-mode identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is realized, and the user experience is improved.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for retrieving an ophthalmic multimodal image, and the method includes:
acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in a method for retrieving an ophthalmic multimodal image provided by any embodiment of the present invention.
The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
The embodiment discloses a retrieval storage medium of an ophthalmic multi-modal image, which is used for executing the following method, wherein the method comprises the following steps: acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes; correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors. According to the method for retrieving the multi-mode ophthalmologic image, provided by the embodiment of the invention, the system identifies in the deep learning model by acquiring the single-mode eye image of the user and outputs the multi-mode identification result, so that the problem that other potential eye diseases of the user cannot be found through the single-mode eye image in the prior art is solved, the effect of retrieving other possible eye diseases through the single-mode eye image is realized, and the user experience is improved.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for searching multi-modal ophthalmology images is characterized by comprising the following steps:
acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
correspondingly inputting the eye images of the multiple different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images of the multiple different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
2. The method for retrieving ophthalmic multimodal images as claimed in claim 1, wherein the ophthalmic digital image comprises: fundus, corneal nerve and OCT images.
3. The method as claimed in claim 1, wherein the deep learning model is a multi-modal convolutional neural network model.
4. The method for retrieving multi-modal ophthalmology video of claim 3, wherein the acquiring a current eye image of the user, the eye image being an ophthalmic digital video, the eye image including a plurality of eye videos of different modalities further comprises:
marking the sample image by using various labels and establishing a database;
and establishing a multi-modal convolutional neural network and training the multi-modal convolutional neural network model by using the sample image to obtain a trained multi-modal convolutional neural network model.
5. The method for retrieving multi-modal ophthalmology image of claim 4, wherein the labeling the sample image with a plurality of labels and establishing a database comprises:
if the sample images belong to the same patient, marking the sample images as a first label;
if the sample image belongs to the same case, marking the sample image as a second label;
if the sample image is not relevant, marking as a third label;
and establishing a database of sample images marked with the first label, the second label and the third label.
6. The method for retrieving multi-modal ophthalmology image of claim 4, wherein the multi-modal convolutional neural network model comprises: the eye fundus image convolution neural network model, the cornea neural image convolution neural network model and the OCT image convolution neural network model.
7. The method for retrieving multi-modal ophthalmic image of claim 1, wherein the comparative analysis result comprises: the most similar multiple sample images and their detailed case cases.
8. An ophthalmologic multi-modality image retrieval device, comprising:
the data acquisition module is used for acquiring a current eye image of a user, wherein the eye image is an ophthalmic digital image and comprises eye images in various different modes;
and the data identification module is used for correspondingly inputting the eye images in the plurality of different modes into a plurality of pre-trained deep learning models, fusing and training the similarity and the dissimilarity of the eye images in the plurality of different modes, outputting a plurality of feature vectors, and generating a comparison analysis result according to the plurality of feature vectors.
9. A server, characterized in that the server comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of retrieving ophthalmic multimodal imagery according to any one of claims 1 to 7.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for retrieving an ophthalmic multimodal image as claimed in any one of claims 1 to 7.
CN202010242450.7A 2020-03-31 2020-03-31 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium Pending CN111428072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242450.7A CN111428072A (en) 2020-03-31 2020-03-31 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242450.7A CN111428072A (en) 2020-03-31 2020-03-31 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium

Publications (1)

Publication Number Publication Date
CN111428072A true CN111428072A (en) 2020-07-17

Family

ID=71549253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242450.7A Pending CN111428072A (en) 2020-03-31 2020-03-31 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium

Country Status (1)

Country Link
CN (1) CN111428072A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102940A (en) * 2020-09-08 2020-12-18 南方科技大学 Refractive detection method, device, computer equipment and storage medium
CN112101438A (en) * 2020-09-08 2020-12-18 南方科技大学 Left and right eye classification method, device, server and storage medium
CN112884729A (en) * 2021-02-04 2021-06-01 北京邮电大学 Auxiliary diagnosis method and device for fundus diseases based on bimodal deep learning
CN113011485A (en) * 2021-03-12 2021-06-22 北京邮电大学 Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device
CN113658683A (en) * 2021-08-05 2021-11-16 重庆金山医疗技术研究院有限公司 Disease diagnosis system and data recommendation method
CN114661936A (en) * 2022-05-19 2022-06-24 中山大学深圳研究院 Image retrieval method applied to industrial vision and electronic equipment
WO2022205779A1 (en) * 2021-03-29 2022-10-06 中国科学院深圳先进技术研究院 Processing method and apparatus based on multi-modal eye detection data, and terminal device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506797A (en) * 2017-08-25 2017-12-22 电子科技大学 One kind is based on deep neural network and multi-modal image alzheimer disease sorting technique
CN107562812A (en) * 2017-08-11 2018-01-09 北京大学 A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
CN109902714A (en) * 2019-01-18 2019-06-18 重庆邮电大学 A kind of multi-modality medical image search method based on more figure regularization depth Hash
CN110009623A (en) * 2019-04-10 2019-07-12 腾讯科技(深圳)有限公司 A kind of image recognition model training and image-recognizing method, apparatus and system
WO2019207800A1 (en) * 2018-04-27 2019-10-31 株式会社ニデック Ophthalmic image processing device and ophthalmic image processing program
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562812A (en) * 2017-08-11 2018-01-09 北京大学 A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
CN107506797A (en) * 2017-08-25 2017-12-22 电子科技大学 One kind is based on deep neural network and multi-modal image alzheimer disease sorting technique
WO2019207800A1 (en) * 2018-04-27 2019-10-31 株式会社ニデック Ophthalmic image processing device and ophthalmic image processing program
CN109902714A (en) * 2019-01-18 2019-06-18 重庆邮电大学 A kind of multi-modality medical image search method based on more figure regularization depth Hash
CN110009623A (en) * 2019-04-10 2019-07-12 腾讯科技(深圳)有限公司 A kind of image recognition model training and image-recognizing method, apparatus and system
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102940A (en) * 2020-09-08 2020-12-18 南方科技大学 Refractive detection method, device, computer equipment and storage medium
CN112101438A (en) * 2020-09-08 2020-12-18 南方科技大学 Left and right eye classification method, device, server and storage medium
CN112102940B (en) * 2020-09-08 2024-04-16 南方科技大学 Refraction detection method, refraction detection device, computer equipment and storage medium
CN112101438B (en) * 2020-09-08 2024-04-16 南方科技大学 Left-right eye classification method, device, server and storage medium
CN112884729A (en) * 2021-02-04 2021-06-01 北京邮电大学 Auxiliary diagnosis method and device for fundus diseases based on bimodal deep learning
CN113011485A (en) * 2021-03-12 2021-06-22 北京邮电大学 Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device
WO2022188489A1 (en) * 2021-03-12 2022-09-15 北京邮电大学 Training method and apparatus for multi-mode multi-disease long-tail distribution ophthalmic disease classification model
WO2022205779A1 (en) * 2021-03-29 2022-10-06 中国科学院深圳先进技术研究院 Processing method and apparatus based on multi-modal eye detection data, and terminal device
CN113658683A (en) * 2021-08-05 2021-11-16 重庆金山医疗技术研究院有限公司 Disease diagnosis system and data recommendation method
CN114661936A (en) * 2022-05-19 2022-06-24 中山大学深圳研究院 Image retrieval method applied to industrial vision and electronic equipment
CN114661936B (en) * 2022-05-19 2022-10-14 中山大学深圳研究院 Image retrieval method applied to industrial vision and electronic equipment

Similar Documents

Publication Publication Date Title
CN111428072A (en) Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium
Islam et al. Applying supervised contrastive learning for the detection of diabetic retinopathy and its severity levels from fundus images
Ishtiaq et al. Diabetic retinopathy detection through artificial intelligent techniques: a review and open issues
Gayathri et al. Diabetic retinopathy classification based on multipath CNN and machine learning classifiers
Akbar et al. Automated techniques for blood vessels segmentation through fundus retinal images: A review
KR20210158853A (en) Cloud server and diagnostic assistant systems based on cloud server
JP7152513B2 (en) Image recognition method, device, terminal equipment and medical system, and computer program thereof
Zulkifley et al. Pterygium-Net: a deep learning approach to pterygium detection and localization
CN111428737B (en) Instance retrieval method, device, server and storage medium for ophthalmic image
CN111428070A (en) Ophthalmologic case retrieval method, ophthalmologic case retrieval device, ophthalmologic case retrieval server and storage medium
Goel et al. Deep learning approach for stages of severity classification in diabetic retinopathy using color fundus retinal images
KR102596534B1 (en) Diagnosis assistance method and apparatus
Shoukat et al. Artificial intelligence techniques for glaucoma detection through retinal images: State of the art
Alghamdi et al. A comparative study of deep learning models for diagnosing glaucoma from fundus images
WO2022166399A1 (en) Fundus oculi disease auxiliary diagnosis method and apparatus based on bimodal deep learning
CN113962311A (en) Knowledge data and artificial intelligence driven ophthalmic multi-disease identification system
Strzelecki et al. Artificial Intelligence in the detection of skin cancer: state of the art
CN117237711A (en) Bimodal fundus image classification method based on countermeasure learning
Biswas et al. DFU_XAI: a deep learning-based approach to diabetic foot ulcer detection using feature explainability
Adinehvand et al. An efficient multistage segmentation method for accurate hard exudates and lesion detection in digital retinal images
Ashtari-Majlan et al. Deep Learning and Computer Vision for Glaucoma Detection: A Review
Danao et al. Machine learning-based glaucoma detection through frontal eye features analysis
Ali et al. Classifying Three Stages of Cataract Disease using CNN
Soofi Exploring Deep Learning Techniques for Glaucoma Detection: A Comprehensive Review
Dhinakaran et al. Keratoviz-A multistage keratoconus severity analysis and visualization using deep learning and class activated maps

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination