CN111080630A

CN111080630A - Fundus image detection apparatus, method, device, and storage medium

Info

Publication number: CN111080630A
Application number: CN201911327024.7A
Authority: CN
Inventors: 余双; 马锴; 郑冶枫; 边成; 龚丽君; 初春燕; 刘含若; 王宁利
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-04-28
Anticipated expiration: 2039-12-20
Also published as: CN111080630B

Abstract

The application discloses fundus image detection equipment, method and device and a storage medium, and belongs to the technical field of images. The method includes the steps that a group of fundus images to be detected is obtained through fundus image detection equipment responding to an image detection instruction, the group of fundus images comprise a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, feature extraction is conducted on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector, image classification is conducted on the basis of the first feature vector, the second feature vector and a third feature vector used for indicating difference between the first feature vector and the second feature vector, and a label corresponding to the group of fundus images is output. Image detection can be carried out based on image characteristics and image differences of the left eye image and the right eye image, labels corresponding to the images are obtained, and accuracy of eye fundus image detection results is improved.

Description

Fundus image detection apparatus, method, device, and storage medium

Technical Field

The present application relates to the field of image technologies, and in particular, to an eye fundus image detection apparatus, an eye fundus image detection method, an eye fundus image detection apparatus, and a storage medium.

Background

With the development of artificial intelligence, image detection technologies based on artificial intelligence are widely applied to various fields of people's lives. For example, in the field of clinical medicine, the detection of a fundus image can be realized based on an image detection technology, and a computer device can segment a cup and optic disc image in the fundus image, calculate a cup-to-disc ratio based on the segmented image, and further determine whether the fundus image is a fundus image of glaucoma.

In the above image detection process, the cup-to-tray ratio is generally calculated from the fundus image of a single eye, but in some cases of glaucoma, there may be a case where the cup-to-tray ratio of a single eye falls within a normal range, and in such a case, glaucoma detection is performed based on only the fundus image of a single eye, which may result in inaccurate detection results.

Disclosure of Invention

The embodiment of the application provides fundus image detection equipment, method and device and a storage medium, which can improve the accuracy of fundus image detection results. The technical scheme is as follows:

in one aspect, there is provided a fundus image detecting apparatus for:

responding to an image detection instruction, and acquiring a group of fundus images to be detected, wherein the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye;

inputting the first image and the second image into an image classification model;

extracting the features of the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector;

obtaining a third feature vector based on the first feature vector and the second feature vector, wherein the third feature vector is used for indicating the difference between the first feature vector and the second feature vector;

and performing image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputting labels corresponding to the group of fundus images.

In one aspect, a fundus image detection method is provided, and the method includes:

In one possible implementation, the obtaining a third feature vector based on the first feature vector and the second feature vector includes:

obtaining a difference vector of the first feature vector and the second feature vector;

and taking an absolute value of each numerical value in the difference vector to obtain the third feature vector.

In one possible implementation, the image classifying the set of fundus images based on the first feature vector, the second feature vector, and the third feature vector, and outputting labels corresponding to the set of fundus images includes:

splicing the first characteristic vector, the second characteristic vector and the third characteristic vector to obtain a fourth characteristic vector corresponding to the group of fundus images;

inputting the fourth feature vector into a first fully-connected layer, and mapping the fourth feature vector into a two-dimensional vector by the first fully-connected layer;

the label indicated by the two-dimensional vector is taken as the label corresponding to the group of fundus images.

In a possible implementation manner, after the feature extraction is performed on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector, the method further includes:

inputting the first feature vector into a second full-connection layer, and determining a left-eye label corresponding to the first image based on an output result of the second full-connection layer;

and inputting the second feature vector into a third fully-connected layer, and determining a right-eye label corresponding to the second image based on an output result of the third fully-connected layer.

In one aspect, there is provided a fundus image detecting apparatus, the apparatus including:

the image acquisition module is used for responding to an image detection instruction and acquiring a group of fundus images to be detected, wherein the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye;

an input module for inputting the first image and the second image into an image classification model;

the first vector acquisition module is used for extracting the features of the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector;

a second vector obtaining module, configured to obtain a third feature vector based on the first feature vector and the second feature vector, where the third feature vector is used to indicate a difference between the first feature vector and the second feature vector;

and the image classification module is used for carrying out image classification on the group of fundus images based on the first characteristic vector, the second characteristic vector and the third characteristic vector and outputting labels corresponding to the group of fundus images.

In one possible implementation, the image acquisition module is configured to:

acquiring a first fundus image of the left eye and a second fundus image of the right eye;

performing the following steps on any one of the first fundus image and the second fundus image:

inputting the fundus image into an image segmentation model, and calculating the probability that each pixel point in the fundus image is the optic disc by the image segmentation model to obtain a probability matrix, wherein the greater the numerical value of an element in the probability matrix, the greater the probability that the element is positioned on the optic disc;

based on the numerical values of the respective elements in the probability matrix, an image of a target region whose center coincides with the disc center is acquired from the fundus image.

In one possible implementation, the image acquisition module is configured to:

on the basis of a probability threshold value, carrying out binarization processing on the probability matrix to obtain a binarization matrix corresponding to the fundus image;

determining the center of the optic disc and the diameter of the optic disc based on the binarization matrix;

based on the disc center and the disc diameter, the target area is determined, the center of the target area coinciding with the disc center.

In one possible implementation, the first vector acquisition module is configured to:

respectively extracting the features of the first image and the second image through a first feature extractor and a second feature extractor in the image classification model to obtain a first feature matrix corresponding to the first image and a second feature matrix corresponding to the second image, and respectively performing global pooling on the first feature matrix and the second feature matrix to obtain the first feature vector and the second feature vector.

In one possible implementation, the parameters of the first feature extractor and the second feature extractor are the same.

In one possible implementation, the second vector acquisition module is configured to:

In one possible implementation, the image classification module is to:

In one possible implementation, the method further comprises:

the left eye label determining module is used for inputting the first feature vector into a second full-connection layer and determining a left eye label corresponding to the first image based on an output result of the second full-connection layer;

and the right eye label determining module is used for inputting the second feature vector into a third full-connection layer and determining a right eye label corresponding to the second image based on an output result of the third full-connection layer.

In one aspect, there is provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded into and executed by a processor to implement operations performed by the fundus image detection method.

The technical scheme provided by the embodiment of the application is that the fundus image detection equipment responds to the image detection instruction to acquire a group of fundus images to be detected, the group of fundus images includes a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, extracting the features of the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector, obtaining a third feature vector based on the first feature vector and the second feature vector, the third feature vector is used for indicating the difference between the first feature vector and the second feature vector, performing image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputting a label corresponding to the group of fundus images. Image detection can be carried out based on image characteristics and image differences of the left eye image and the right eye image, labels corresponding to the images are obtained, and accuracy of eye fundus image detection results is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an image inspection system provided in an embodiment of the present application;

fig. 2 is a flowchart of a fundus image detection method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a fundus image detection process provided by an embodiment of the present application;

FIG. 4 is a flowchart of an image classification model training method provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an eye fundus image detecting apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Computer Vision technology (CV) is a science for researching how to make a machine "look", and more specifically, it refers to that a camera and a Computer device are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphics processing, so that the Computer device is processed into an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D (3 Dimensions) technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and further include common biometric technologies such as face Recognition and fingerprint Recognition.

The scheme provided by the embodiment of the application mainly relates to image processing and image recognition technologies in computer vision, and the fundus images are detected through the image processing and image recognition technologies, so that whether the detected fundus images are fundus images of glaucoma or not is determined.

Fig. 1 is a schematic diagram of an image detection system provided in an embodiment of the present application, and referring to fig. 1, the image detection system 100 includes: terminal 110 and image inspection platform 140.

The terminal 110 is connected to the image inspection platform 110 through a wireless network or a wired network. The terminal 110 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP4(Moving Picture expert group Audio Layer IV) player, and a laptop computer. The terminal 110 is installed and operated with an application program supporting image detection. The application may be a detection-type application or the like. Illustratively, the terminal 110 is a terminal used by a user, and an application running in the terminal 110 is logged with a user account.

Terminal 110 is connected to image inspection platform 140 via a wireless or wired network.

The image inspection platform 140 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Image inspection platform 140 is used to provide background services for applications that support image inspection. Optionally, the image detection platform 140 undertakes primary detection work, and the terminal 110 undertakes secondary detection work; or, the image detection platform 140 undertakes the secondary detection work, and the terminal 110 undertakes the primary detection work; alternatively, the image inspection platform 140 or the terminal 110 may be respectively responsible for the inspection work.

Optionally, the image detection platform 140 comprises: the system comprises an access server, an image detection server and a database. The access server is used to provide access services for the terminal 110. The image detection server is used for providing background services related to image detection. The image detection server can be one or more. When there are multiple image detection servers, there are at least two image detection servers for providing different services, and/or there are at least two image detection servers for providing the same service, for example, providing the same service in a load balancing manner, which is not limited in the embodiments of the present application. The image detection server can be provided with an image classification model.

The terminal 110 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 110.

Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only one, or several tens or hundreds, or more, and in this case, the image detection system further includes other terminals. The number of terminals and the type of the device are not limited in the embodiments of the present application.

Fig. 2 is a flowchart of a fundus image detection method according to an embodiment of the present application. The method may be applied to the terminal or the server, and both the terminal and the server may be regarded as a computer device, and in this embodiment, the fundus image detection device may be any one of the terminals or the servers described above, so that this embodiment is described based on the computer device as an execution subject, and referring to fig. 2, this embodiment may specifically include the following steps:

201. the computer device acquires a first fundus image of the left eye and a second fundus image of the right eye.

In one possible implementation, the computer device may acquire the first fundus image and the second fundus image to be detected upon receiving the image detection instruction. The first fundus image and the second fundus image may be images stored in a computer device, images captured by the computer device in a video, or images acquired by the computer device with an image acquisition function in real time, and the embodiment of the present application does not limit which image is specifically adopted.

202. The computer device acquires probability matrices corresponding to the first fundus image and the second fundus image, respectively.

In one possible implementation, the computer device may obtain a probability matrix corresponding to each fundus image based on an image segmentation model, and the computer device may perform the following steps for any one of the first fundus image and the second fundus image: inputting the fundus image into an image segmentation model, and calculating the probability that each pixel point in the fundus image is the optic disc by the image segmentation model to obtain a probability matrix, wherein the greater the numerical value of an element in the probability matrix, the greater the probability that the position of the element is the optic disc. The image segmentation model can be a model trained based on a plurality of groups of sample images, one group of sample images can comprise a left eye fundus image and a right eye fundus image, each eye fundus image can be added with labeling information, the labeling information can be used for indicating the area of the optic disc and the outline of the area of the optic disc, the computer equipment can be used for training the image segmentation model based on the plurality of groups of sample images, and each parameter in the image segmentation model is adjusted, so that the image segmentation model can identify the area of the optic disc in the eye fundus image.

Taking the example of acquiring the probability matrix corresponding to the first fundus image as an example, in a possible implementation manner, the process may specifically include the following steps:

step one, the computer equipment inputs the first fundus image into the image segmentation model.

After the computer device inputs the first fundus image and the second fundus image into the image segmentation model, the image segmentation model can preprocess the first fundus image, and convert the first fundus image into a digital matrix consisting of a plurality of pixel points, so that the computer device can perform a subsequent operation process.

And secondly, the computer equipment performs characteristic extraction on the first fundus image through the image segmentation model to obtain a characteristic diagram corresponding to the first fundus image.

In one possible implementation, the image segmentation model may include a plurality of convolution layers, the computer device may sequentially convolve a digital matrix corresponding to the first fundus image with each convolution layer to extract image features, and the computer device may generate an intermediate feature map based on a result of operation output by each convolution layer, and use the intermediate feature map obtained based on the last convolution layer as the feature map of the first fundus image. The specific number of convolution layers in the image segmentation model may be set by a developer, which is not limited in this embodiment of the present application.

Specifically, the convolution layer is taken as an example to describe the above convolution operation process, where one convolution layer may include one or more convolution kernels, each convolution kernel corresponds to one scanning window, and the size of the scanning window is the same as that of the convolution kernel, and in the process of performing convolution operation on the convolution kernels, the scanning window may slide on the intermediate feature map according to a target step size, and scan each region of the intermediate feature map in sequence, where the target step size may be set by a developer. Taking a convolution kernel as an example, in the convolution operation process, when the scanning window of the convolution kernel slides to any region of the intermediate feature map, the computer device reads the numerical value corresponding to each feature point in the region, performs point multiplication operation on the numerical value corresponding to each feature point of the convolution kernel, then accumulates each product, and takes the accumulated result as a feature point. And then, sliding the scanning window of the convolution kernel to the next area of the intermediate feature map according to the target step length, performing convolution operation again, outputting a feature point until all the areas of the feature map are scanned completely, and combining all the output feature points into a new intermediate feature map which is used as the input of the next convolution layer.

And thirdly, acquiring a probability matrix corresponding to the first fundus image by the computer equipment based on the characteristic diagram.

In one possible implementation, the computer device may upsample or deconvolute the feature map, resulting in an object matrix of the same size as the first fundus image. The image segmentation model may include a plurality of transposed convolutional layers for upsampling, the computer device may perform a transposed convolution operation on the feature map and each of the transposed convolutional layers in turn to enlarge a size of the feature map, and the computer device may obtain the target matrix based on an output result of the last transposed convolutional layer. In the image segmentation model, the specific number of the transposed convolution layers may be set by a developer, which is not limited in the embodiment of the present application.

It should be noted that the above description of the upsampling process is only an exemplary illustration of an upsampling method, and the embodiment of the present application does not limit which upsampling method is specifically adopted.

The process of acquiring the probability matrix corresponding to the second fundus image is the same as the process of acquiring the probability matrix corresponding to the first fundus image, and details are not repeated herein.

In this embodiment, after the computer device obtains the target matrix, each pixel point in the first fundus image may be classified based on the target matrix. In a possible implementation manner, a sigmoid (S-shaped growth curve) function may be included in the image segmentation model, and the computer device may convert each element in the object matrix into a value belonging to (0, 1) based on the sigmoid function, that is, convert each element in the object matrix into the probability matrix, where each value in the probability matrix may be used to indicate a probability that the value is located on a video disk.

203. The computer device performs optic disc segmentation on the first fundus image and the second fundus image respectively to obtain a first image and a second image.

Wherein the first image and the second image include disc information for a left eye and disc information for a right eye, respectively.

In the embodiment of the present application, the computer device may acquire an image of a target region whose center coincides with the center of the optic disc from the fundus image based on the numerical sizes of the respective elements in the probability matrix. In a possible implementation manner, first, the computer device may perform binarization processing on the probability matrix based on a probability threshold to obtain a binarization matrix corresponding to the fundus image, for example, the computer device may compare a value of each element in the probability matrix with the probability threshold, assign an element having a value greater than the probability threshold to a first value, and assign an element having a value less than the probability threshold to a second value, where the probability threshold, the first value, and the second value may all be set by a developer. Then, the computer device may determine the center of the optical disc and the diameter of the optical disc based on the binarization matrix, for example, the computer device may perform connected domain analysis based on the binarization matrix, obtain a region where an element assigned as a first numerical value is located, obtain at least one candidate region, use a candidate region with a largest area as a region where the optical disc is located, determine the center of the region where the optical disc is located as the center of the optical disc, and determine the diameter of the optical disc based on the size of the region where the optical disc is located. Finally, the computer device may determine the target region based on the disc center and the disc diameter, the center of the target region coinciding with the disc center, for example, the computer device may determine a square region having a length of N disc diameters with the disc center as the center as a side as the target region, where N is greater than 0, and a specific numerical value may be set by a developer, the computer device may acquire an image of the target region from a fundus image, specifically, an image of the target region acquired from a first fundus image as a first image, and an image of the target region acquired from a second fundus image as a second image.

It should be noted that the above description of acquiring the image of the target area based on the probability matrix is only an exemplary illustration, and the embodiment of the present application does not limit which method is specifically used to acquire the image of the target area.

It should be noted that, in step 201, step 202, and step 203, a group of fundus images to be detected is acquired in response to the image detection instruction, and the group of fundus images includes a first image corresponding to the left eye and a second image corresponding to the right eye. The above description of acquiring an image of a region of interest from a fundus image based on an image segmentation model is merely an exemplary description, and the embodiment of the present application does not limit which image segmentation model is specifically applied and how an image of a region of interest is specifically acquired. In the embodiment of the application, by acquiring the partial image containing the optic disc information from the fundus image to perform the subsequent detection step, the computer device does not need to perform operation on other areas in the image, so that the interference of a large amount of irrelevant data is avoided, and the detection efficiency and the accuracy are improved. Of course, the first image and the second image may be images of the optic disk regions of the left eye and the right eye acquired by a device such as a fundus camera, which is not set in the embodiment of the present application.

204. A computer device inputs the first image and the second image into an image classification model.

In an embodiment of the present application, the image classification model may include a first feature extractor and a second feature extractor for extracting image features of the first image and the second image, respectively. Wherein the first feature extractor and the second feature extractor have the same parameters, i.e. the weight parameters of both feature extractors are shared, the computer device may extract features of the same dimension of the first image and the second image.

In this embodiment of the present application, each feature extractor may be constructed based on a deep Neural Network, for example, each feature extractor may be a VGG (Visual Geometry Group Network) model, a ResNet (Residual Neural Network) model, and the like.

After the computer device inputs the first image and the second image into the first feature extractor and the second feature extractor, respectively, each feature extractor may pre-process the input image, convert the input image into a digital matrix composed of a plurality of pixel values, so that the computer device performs a subsequent feature extraction step.

205. And the computer equipment performs feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector.

The computer device can respectively perform feature extraction on the first image and the second image through a first feature extractor and a second feature extractor in the image classification model to obtain a first feature matrix corresponding to the first image and a second feature matrix corresponding to the second image, and respectively perform global pooling processing on the first feature matrix and the second feature matrix to obtain the first feature vector and the second feature vector.

Taking the example of obtaining the first feature vector corresponding to the first image as an example, in a possible implementation manner, the first feature extractor may include a plurality of convolution layers, and the computer device may perform convolution operation on the first image and each convolution layer in sequence, and generate a first feature matrix corresponding to the first image based on an output result of a last convolution layer in the first feature extractor. The convolution process is similar to the convolution process in step 202, and is not described herein again. In a possible implementation manner, the image classification model may further include a global pooling layer, and a size of a scan window of the global pooling layer is the same as that of the first feature matrix, for example, the size of the first feature matrix is a × B × C, and the size of the scan window may be a × B, where A, B, C are positive integers, and the specific value of A, B, C is not limited in this embodiment. The computer device may perform global pooling on the first feature matrix through the global pooling layer, for example, performing global tie pooling on the first image, and the computer device may obtain an average value of each element in the scanning window, as one element in the first feature vector, to obtain a 1 × C first feature vector. Of course, the computer device may also obtain the first feature vector in other pooling manners, which is not limited in this embodiment of the present application. The process of obtaining the second feature vector by the computer device is the same as the process of obtaining the second feature vector, and is not described herein again.

It should be noted that the above description of obtaining the first feature vector is only an exemplary illustration, and the embodiment of the present application does not limit which image feature extraction method is specifically adopted to obtain the first feature vector.

206. The computer equipment obtains a third feature vector based on the first feature vector and the second feature vector.

In a possible implementation manner, the computer device may obtain a difference vector between the first feature vector and the second feature vector, and take an absolute value of each value in the difference vector to obtain the third feature vector, where the third feature vector may be used to indicate a difference between the first feature vector and the second feature vector, that is, may be used to indicate a difference between the first image and the second image.

It should be noted that the above manner of representing binocular disparity based on the third feature vector is merely an exemplary illustration of a manner of representing binocular disparity, and the computer device may also determine disparity of binocular images based on other manners, and the embodiment of the present application is not limited.

207. The computer device performs image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputs labels corresponding to the group of fundus images.

In one possible implementation, the computer device may splice the first feature vector, the second feature vector, and the third feature vector to obtain a fourth feature vector corresponding to the group of fundus images, input the fourth feature vector into a first full connection layer, and map the fourth feature vector into a two-dimensional vector by the first full connection layer, for example, the computer device may perform a convolution operation on the fourth feature vector based on at least one weight parameter in the full connection layer to convert the fourth feature vector into a two-dimensional vector, and the computer device may use a label indicated by the two-dimensional vector as a label corresponding to the group of fundus images. In the embodiment of the present application, one tag may correspond to one two-dimensional vector, the tag may be a fundus image of glaucoma or a fundus image of glaucoma, and the correspondence between the tag and the two-dimensional vector may be set by a developer.

In the embodiment of the application, the characteristic vectors corresponding to the left eye and the right eye and the characteristic vectors used for indicating the difference between the left eye and the right eye are spliced, prediction is carried out based on the spliced vectors, the characteristics of the two eyes and the characteristic difference of the two eyes are fused in the fundus image detection process, and the accuracy of fundus image detection is improved. Of course, the above description of fusion of binocular features based on feature vector splicing is only an exemplary illustration of a binocular feature fusion method, and the embodiment of the present application does not limit which binocular feature fusion method is specifically adopted.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

According to the technical scheme provided by the embodiment of the application, a group of fundus images to be detected is obtained by responding to an image detection instruction, the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, feature extraction is carried out on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector, a third feature vector is obtained based on the first feature vector and the second feature vector and is used for indicating the difference between the first feature vector and the second feature vector, the group of fundus images are subjected to image classification based on the first feature vector, the second feature vector and the third feature vector, and labels corresponding to the group of fundus images are output. In the fundus image detection process, image detection is carried out based on the image characteristics and the image difference of the left eye image and the right eye image, so that labels corresponding to the images are obtained, and the accuracy of fundus image detection results is improved.

In the glaucoma detection, by applying the fundus image detection method, medical workers can input the fundus images of both eyes of a patient into an image segmentation model and an image classification model, segment the fundus images by the image segmentation model, and the image classification model identifies and classifies the segmented images to judge whether the patient suffers from glaucoma. In the embodiment of the application, the important clinical feature of glaucoma, namely the difference of cup-to-disk ratios of two eyes, is integrated into the deep learning model, so that the image classification model can predict based on the images of two eyes, the condition that potential glaucoma patients with the cup-to-disk ratio of one eye in a normal range but with the cup-to-disk ratio difference between two eyes being large is avoided, and the sensitivity and the accuracy of the deep learning model on the aspect of case detection are improved. The fundus image detection method can be applied to various medical institutions, assists medical staff in diagnosing, and improves the diagnosis efficiency and accuracy.

In this embodiment of the application, after the computer device obtains the first feature vector and the second feature vector, the computer device may further perform label prediction on the first image and the second image respectively based on the two feature vectors, that is, besides the task of predicting the binocular image, the computer device further includes a left-eye image prediction task and a right-eye image prediction task. In one possible implementation, the computer device may input the first feature vector into a second fully-connected layer, determine a left-eye label corresponding to the first image based on an output result of the second fully-connected layer, input the second feature vector into a third fully-connected layer, and determine a right-eye label corresponding to the second image based on an output result of the third fully-connected layer.

Referring to fig. 3, fig. 3 is a schematic diagram of a fundus image detection process according to an embodiment of the present application, the computer device may input a first fundus image 301 of a left eye and a second fundus image 302 of a right eye into an image segmentation model, and identify a region of interest in each fundus image by the image segmentation model, so as to obtain a first image 303 containing left-eye optic disc information and a second image 304 containing right-eye optic disc information, the computer device may input the first image 303 and the second image 304 into an image classification model 305, obtain a first feature vector corresponding to the left eye and a second feature vector corresponding to the right eye based on a first feature extractor, a second feature extractor and other operation layers in the image classification model 305, perform an absolute value operation on a difference value between the feature vectors corresponding to the left eye and the right eye, obtain a third feature vector, and splicing the first characteristic vector, the second characteristic vector and the third characteristic vector to obtain a fourth characteristic vector, and predicting whether the eyes are glaucoma or not by the first full-connection layer based on the fourth characteristic vector to obtain a label. In addition, the computer device can also input the features extracted from the left eye and the right eye into the separate full-link layers respectively to predict whether the monocular glaucoma is the glaucoma, namely, the first feature vector and the second feature vector are input into the second full-link layer and the third full-link layer respectively to obtain a left eye label and a right eye label.

In the embodiment of the application, the labels corresponding to the left eye and the right eye are independently predicted to perform auxiliary judgment, so that the accuracy and comprehensiveness of the detection result can be improved, and the specific fundus image is the fundus image of glaucoma. Of course, when fundus eye detection is performed, a left-eye image prediction task, a right-eye image prediction task, and a binocular image prediction task may be combined, and the computer device may execute at least one task of these tasks to detect a fundus image, and the embodiment of the present application does not limit the combination manner of each prediction task.

The above embodiments mainly describe the process in which the computer device performs fundus image detection based on an image classification model that needs to be trained before performing fundus image detection. Fig. 4 is a flowchart of an image classification model training method provided in an embodiment of the present application, and referring to fig. 4, the method may specifically include the following steps:

401. the computer device initializes various parameters in the image classification model.

And the computer equipment carries out random assignment on each parameter in each convolution layer, the global pooling layer and the full-connection layer of the image classification model so as to realize parameter initialization. In a possible implementation manner, the computer device may perform parameter initialization on the target recognition model by using a gaussian distribution with a variance of 0.01 and a mean of 0, and the embodiment of the present application does not limit a specific method for initializing the model parameters.

402. The computer device inputs a training data set into the image classification model.

Wherein the training data set may comprise a plurality of sets of disc images, a set of disc images may comprise a left eye disc image and a right eye disc image from the same patient.

In one possible implementation, the left eye optic disc image and the right eye optic disc image may be obtained based on an image segmentation model, the computer device may input the left eye fundus image and the right eye fundus image from the same patient into the image segmentation model, identify the region of interest, i.e., the region where the optic disc is located, and the computer device may transform, e.g., crop, translate, rotate, etc., the region of interest to obtain a plurality of sets of optic disc images based on a set of eye fundus images, thereby improving the diversity of data.

Of course, each optic disc image may be added with annotation information, for example, whether the optic disc image is an image of glaucoma or not may be marked, and the embodiment of the present application does not limit the specific content of the annotation information.

403. And the computer equipment acquires the output result of the image classification model and calculates the error between the output result and the correct detection result.

In an embodiment of the application, the computer device may calculate the error between the output result and the correct detection result by means of one or more loss functions.

In the embodiment of the present application, the binocular image prediction task, the left eye image prediction task and the right eye image prediction task may correspond to different weights, wherein a weight value corresponding to the binocular image prediction task is the largest to highlight the importance of the binocular image prediction task, and each weight value may be set by a developer, for example, the weight of the binocular image prediction task may be set to 0.5, the weight of the left eye image prediction task may be set to 0.25, and the weight of the right eye image prediction task may be set to 0.25.

In one possible implementation, the computer device may calculate the error by the following loss function equation (1).

Wherein the content of the first and second substances,

a binary cross entropy loss can be expressed, as shown in equation (2) above,

the regularization loss of the network parameters can be represented, i can represent the number of each detection task, and the number can be respectively used for predicting the binocular image in the embodiment of the applicationThe measurement task, the left eye image prediction task and the right eye image prediction task are added with numbers y_iThe correct detection result of the i task can be indicated,

can represent the output result of the i task, u_iThe weights corresponding to the tasks can be represented, theta can represent parameters in the image classification model, lambda can represent the regularization coefficient of the image classification model, and the specific numerical value of lambda can be set by developers.

404. And the computer equipment adjusts each parameter in the image classification model based on the error between the output result and the correct detection result of the image classification model until the image classification model meets the target condition, so that the trained image classification model is obtained.

In one possible implementation, the computer device may compare the obtained multiple errors with multiple error thresholds, when any error value is greater than the error threshold, the computer device propagates the multiple errors back to the image classification model, and solves, based on the respective errors, respective parameters in the target recognition model, where the respective parameters include parameters corresponding to multiple convolution kernels, parameters corresponding to a global pooling layer, parameters corresponding to respective full-link layers, and the like. For example, the computer device may reflect the error of the binocular image prediction task back to the first fully-connected layer and respective feature extractors, the error of the left-eye image prediction task back to the second fully-connected layer and first feature extractor, and the error of the right-eye image prediction task back to the third fully-connected layer and second feature extractor. Wherein, these a plurality of error threshold values all can be set up by the developer, and the number of error threshold value is the same with the error number of acquireing.

In this embodiment of the present application, the target condition may be set by a developer, and in a possible implementation manner, the target condition may be set to enable the number of the obtained output results to reach a target number, where the target number may be set by the developer. And when the errors are all smaller than the error threshold, determining that the target identification result obtained by the computer device is correct, continuing reading the next group of video disc images by the computer device, executing step 403, and if the correct number of the output results obtained by the computer device reaches the target number, namely the target condition is met, determining that the training of the image classification model is finished.

Fig. 5 is a schematic structural diagram of an eye fundus image detecting apparatus provided in an embodiment of the present application, and referring to fig. 5, the apparatus includes:

an image acquisition module 501, configured to acquire, in response to an image detection instruction, a group of fundus images to be detected, where the group of fundus images includes a first image corresponding to a left eye and a second image corresponding to a right eye;

an input module 502 for inputting the first image and the second image into an image classification model;

a first vector obtaining module 503, configured to perform feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector;

a second vector obtaining module 504, configured to obtain a third feature vector based on the first feature vector and the second feature vector, where the third feature vector is used to indicate a difference between the first feature vector and the second feature vector;

an image classification module 505, configured to perform image classification on the set of fundus images based on the first feature vector, the second feature vector, and the third feature vector, and output a label corresponding to the set of fundus images.

In one possible implementation, the image acquisition module 501 is configured to:

In one possible implementation, the first vector obtaining module 503 is configured to:

In one possible implementation, the second vector acquisition module 504 is configured to:

In one possible implementation, the image classification module 505 is configured to:

In one possible implementation, the method further comprises:

The device provided by the embodiment of the application acquires a group of fundus images to be detected in response to an image detection instruction, the group of fundus images including a first image corresponding to a left eye and a second image corresponding to a right eye, inputs the first image and the second image into an image classification model, performs feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector, acquires a third feature vector based on the first feature vector and the second feature vector, the third feature vector is used for indicating a difference between the first feature vector and the second feature vector, performs image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputs a label corresponding to the group of fundus images. By applying the fundus image detection device, image detection is carried out based on the image characteristics and the image difference of the left eye image and the right eye image, so that the label corresponding to the image is obtained, and the accuracy of the fundus image detection result is improved.

It should be noted that: in the fundus image detection apparatus provided in the above embodiment, only the division of the above functional modules is exemplified in fundus image detection, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. In addition, the fundus image detection apparatus provided by the above embodiment and the fundus image detection method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment in detail, and are not described again here.

The computer device provided by the above technical solution can be implemented as a terminal or a server, for example, fig. 6 is a schematic structural diagram of a terminal provided in the embodiment of the present application. The terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: one or more processors 601 and one or more memories 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 602 is used to store at least one program code for execution by the processor 601 to implement the fundus image detection methods provided by the method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a display 605, a camera assembly 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in some embodiments, the display 605 may be a flexible display disposed on a curved surface or a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.

The positioning component 608 is used to locate the current geographic location of the terminal 600 to implement navigation or LBS (location based Service). The positioning component 608 can be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal 600. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 613 may be disposed on the side bezel of terminal 600 and/or underneath display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical button or vendor Logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of display screen 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the display screen 605 is increased; when the ambient light intensity is low, the display brightness of the display screen 605 is adjusted down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.

A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when proximity sensor 616 detects that the distance between the user and the front face of terminal 600 gradually decreases, processor 601 controls display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front face of the terminal 600 is gradually increased, the processor 601 controls the display 605 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Fig. 7 is a schematic structural diagram of a server 700 according to an embodiment of the present application, where the server 700 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 701 and one or more memories 702, where at least one program code is stored in the one or more memories 702, and is loaded and executed by the one or more processors 701 to implement the methods provided by the foregoing method embodiments. Of course, the server 700 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 700 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, including at least one program code executable by a processor to perform the fundus image detection method in the above-described embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or implemented by at least one program code associated with hardware, where the program code is stored in a computer readable storage medium, such as a read only memory, a magnetic or optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An fundus image detecting apparatus, characterized in that the fundus image detecting apparatus is configured to:

performing feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector;

2. The apparatus according to claim 1, characterized in that the fundus image detection apparatus is configured to:

inputting the fundus image into an image segmentation model, and calculating the probability that each pixel point in the fundus image is the optic disc by the image segmentation model to obtain a probability matrix, wherein the greater the numerical value of an element in the probability matrix, the greater the probability that the position of the element is the optic disc;

and acquiring an image of a target area from the fundus image based on the numerical value of each element in the probability matrix, wherein the center of the image of the target area is coincident with the center of the optic disc.

3. The apparatus according to claim 2, characterized in that the fundus image detection apparatus is configured to:

determining the optic disc center and the optic disc diameter based on the binarization matrix;

determining the target area based on the disc center and the disc diameter, the center of the target area coinciding with the disc center.

4. The apparatus according to claim 1, characterized in that the fundus image detection apparatus is configured to:

respectively extracting the features of the first image and the second image through a first feature extractor and a second feature extractor in the image classification model to obtain a first feature matrix corresponding to the first image and a second feature matrix corresponding to the second image, and respectively performing global pooling on the first feature matrix and the second feature matrix to obtain a first feature vector and a second feature vector.

5. The apparatus of claim 4, wherein the parameters of the first feature extractor and the second feature extractor are the same.

6. The apparatus according to claim 1, characterized in that the fundus image detection apparatus is configured to:

7. The apparatus according to claim 1, characterized in that the fundus image detection apparatus is configured to:

and taking the label indicated by the two-dimensional vector as a label corresponding to the group of fundus images.

8. The apparatus according to claim 1, wherein the fundus image detection apparatus is further configured to:

and inputting the second feature vector into a third full-connection layer, and determining a right-eye label corresponding to the second image based on an output result of the third full-connection layer.

9. A method for fundus image detection, the method comprising:

10. The method according to claim 9, wherein said acquiring a set of fundus images to be detected in response to an image detection instruction, the set of fundus images including a first image corresponding to a left eye and a second image corresponding to a right eye, comprises:

11. The method according to claim 10, wherein the acquiring an image of a target region from the fundus image based on the numerical size of each element in the probability matrix comprises:

12. The method of claim 9, wherein the performing feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector comprises:

13. The method of claim 12, wherein the parameters of the first feature extractor and the second feature extractor are the same.

14. An fundus image detection apparatus, comprising:

and the image classification module is used for carrying out image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector and outputting labels corresponding to the group of fundus images.

15. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded into and executed by a processor to perform operations performed by a fundus image detection method according to any one of claims 9 to 13.