CN116934730A

CN116934730A - Method and device for training detection model for detecting abnormal region in fundus image

Info

Publication number: CN116934730A
Application number: CN202310961124.5A
Authority: CN
Inventors: 琚烈; 彭婕; 肖淏东; 赵培泉; 陈吉利; 冯伟; 宋凯敏; 马彤; 王斌; 戈宗元; 张大磊
Original assignee: XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine; Beijing Airdoc Technology Co Ltd
Current assignee: XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine; Beijing Airdoc Technology Co Ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-24

Abstract

The application discloses a method and equipment for training a detection model for detecting an abnormal region in a fundus image. The method comprises the following steps: acquiring an ultra-wide-angle fundus image for training a detection model; cutting the ultra-wide-angle fundus image without the marked detection frame and the ultra-wide-angle fundus image marked with the detection frame to obtain a plurality of image blocks; performing feature operations related to outlier detection on the plurality of image blocks using the trained feature extractor to obtain respective plurality of intermediate features; and respectively inputting the plurality of intermediate features into a first classifier to execute global classification operation and a second classifier to execute local classification operation, and correspondingly calculating a full supervision loss function and a half supervision loss function based on whether a detection frame is marked or not so as to train a detection model for detecting an abnormal region in the fundus image. By utilizing the scheme of the application, the abnormal region in the ultra-wide-angle fundus image can be accurately identified and positioned under the condition of lacking the labeling of the abnormal region.

Description

Method and device for training detection model for detecting abnormal region in fundus image

Technical Field

The present application relates generally to the field of artificial intelligence. More particularly, the present application relates to a method, apparatus, and computer-readable storage medium for training a detection model that detects an abnormal region in a fundus image. Further, the present application also relates to a method, an apparatus, and a computer-readable storage medium for detecting an abnormal region in a fundus image.

Background

As disclosed by the world health organization ("WHO"), 22 hundred million people suffer from vision disorders or blindness, and early prevention or effective treatment by 10 hundred million people can effectively prevent the worsening of the disease, so that early discovery and correct treatment of eye diseases is of great importance in preventing vision impairment. With the continued development of artificial intelligence, especially machine learning and deep learning, is expected to provide better solutions for prevention and treatment.

In recent years, ultra-wide-angle fundus imaging (shown in the left diagram in fig. 1, for example) has been increasingly applied to diagnosis and treatment of ophthalmic diseases, which can take fundus photographs without mydriasis, and can obtain a wider retinal field of view (a field of view of up to 200 °) than a color fundus image taken by a conventional color fundus camera (shown in the right diagram in fig. 1, for example). Therefore, the ultra-wide angle fundus imaging technology obviously reduces the complexity of fundus disease identification imaging process, improves the visual range of fundus photos, and provides more valuable information and data for clinical diagnosis and treatment planning. At present, many research works on ultra-wide-angle fundus photographs effectively utilize deep learning techniques, and explore the feasibility of automated diagnosis of certain lesions. However, the existing method is often aimed at the whole image processing, and the pixels of the ultra-wide angle image are large, which ignores some tiny abnormal details, so that the detection result is inaccurate. In addition, most of research work of targets focuses on abnormal classification in images, specific positions of abnormal areas are less researched, compared with a traditional color fundus image, the ultra-wide-angle image is large in information quantity, cost for marking an abnormal area detection frame is high, and marked and high-quality ultra-wide-angle fundus image data are less, so that training of a model is not facilitated.

In view of the foregoing, it is desirable to provide a scheme for training a detection model for detecting an abnormal region in a fundus image so as to accurately identify and locate the abnormal region in a super-wide-angle fundus image in the absence of labeling of the abnormal region or with fewer labeling of the abnormal region.

Disclosure of Invention

In order to solve at least one or more of the technical problems mentioned above, the present application proposes, in various aspects, a solution for training a detection model for detecting an abnormal region in a fundus image.

In a first aspect, the present application provides a method for training a detection model for detecting an abnormal region in a fundus image, wherein the detection model comprises at least a feature extractor and a classifier, and the method comprises: acquiring a super-wide-angle fundus image for training the detection model, wherein the super-wide-angle fundus image comprises a super-wide-angle fundus image without a detection frame and a super-wide-angle fundus image with a detection frame; cutting the ultra-wide-angle fundus image without the marked detection frame and the ultra-wide-angle fundus image with the marked detection frame to obtain a plurality of image blocks; performing feature operations related to outlier detection on the plurality of image blocks using the trained feature extractor to obtain respective plurality of intermediate features; inputting the plurality of intermediate features into a first classifier to execute global classification operation and a second classifier to execute local classification operation respectively, and correspondingly calculating a full supervision loss function and a half supervision loss function based on whether a detection frame is marked or not; and training a detection model for detecting an abnormal region in the fundus image by using the full supervision loss function and the semi supervision loss function.

In one embodiment, the ultra-wide-angle fundus image labeled with the detection frame includes an ultra-wide-angle fundus image labeled with the detection frame and labeled with an anomaly classification, and the ultra-wide-angle fundus image unlabeled with the detection frame includes an ultra-wide-angle fundus image labeled with an anomaly classification but not labeled with the detection frame and an ultra-wide-angle fundus image unlabeled with the detection frame but not labeled with an anomaly classification.

In another embodiment, wherein the trained feature extractor is obtained by: performing data enhancement operation on the plurality of image blocks to obtain data-enhanced image blocks; inputting the image blocks with the enhanced data and the plurality of image blocks into the feature extractor for feature operation, and calculating a contrast loss function; and training the feature extractor with the contrast loss function.

In yet another embodiment, wherein the trained feature extractor is further coupled to a fully-connected module, wherein performing feature operations related to outlier detection on the plurality of image blocks using the trained feature extractor to obtain a respective plurality of intermediate features comprises: and performing feature operation related to abnormal region detection on the plurality of image blocks by using the trained feature extractor, and performing full-connection operation on feature operation results by using the full-connection module so as to obtain a plurality of intermediate features.

In yet another embodiment, wherein inputting the plurality of intermediate features into the first classifier to perform a global classification operation and into the second classifier to perform a local classification operation, respectively, and calculating the full-supervised and semi-supervised loss functions based on whether the detection boxes are labeled corresponds comprises: inputting the plurality of intermediate features into the first classifier to execute global classification operation, and correspondingly calculating a first full supervision loss function based on the marked detection frame and a first half supervision loss function based on the unmarked detection frame; and inputting the plurality of intermediate features into the second classifier to execute local classification operation, and calculating a second full supervision loss function based on the marked detection frame correspondence and calculating a second semi supervision loss function based on the unmarked detection frame correspondence.

In yet another embodiment, wherein the first classifier and the second classifier comprise fully connected layers, and the first classifier further comprises an attention module, inputting the plurality of intermediate features into the first classifier to perform a global classification operation and inputting the plurality of intermediate features into the second classifier to perform a local classification operation comprises: inputting the plurality of intermediate features into an attention module in the first classifier to perform an attention operation, and performing a fully connected operation on an attention operation result via a fully connected layer in the first classifier to perform a global classification operation; and inputting the plurality of intermediate features into a full connection layer in the second classifier to perform a full connection operation to perform a local classification operation.

In yet another embodiment, wherein training a detection model that detects an abnormal region in the fundus image using the full-supervised loss function and the semi-supervised loss function includes: calculating a first sum of losses of the first full supervision loss function and the first half supervision loss function; calculating a second sum of losses of the second full-supervised and second semi-supervised loss functions; and training a detection model for detecting an abnormal region in the fundus image based on the first loss sum and the second loss sum.

In yet another embodiment, the method further comprises: and visualizing the weights in the attention module to determine the weight corresponding to each image block.

In a second aspect, the present application provides an apparatus for training a detection model for detecting an abnormal region in a fundus image, comprising: a processor; a memory storing program instructions for training a detection model for detecting an abnormal region in a fundus image, which when executed by the processor, cause the apparatus to implement the plurality of embodiments in the foregoing first aspect.

In a third aspect, the present application also provides a method for detecting an abnormal region in a fundus image, comprising: acquiring an ultra-wide-angle fundus image to be detected; the ultra-wide angle fundus image is input into a detection model trained according to the embodiments in the foregoing first aspect for detection to output the detection result of the abnormal region in the fundus image.

In a fourth aspect, the present application also provides an apparatus for detecting an abnormal region in a fundus image, comprising: a processor; a memory storing program instructions for detecting an abnormal region in a fundus image, which when executed by the processor, cause the apparatus to implement the embodiment in the foregoing third aspect.

In a fifth aspect, the present application also provides a computer-readable storage medium having stored thereon computer-readable instructions for training a detection model for detecting an abnormal region in a fundus image and for detecting an abnormal region in a fundus image, which when executed by one or more processors, implement the various embodiments of the foregoing first aspect and the embodiments of the foregoing third aspect.

By the scheme for training the detection model for detecting the abnormal region in the fundus image, the embodiment of the application acquires the super-wide-angle fundus image without the detection frame and the detection frame, cuts the super-wide-angle fundus image into a plurality of image blocks, and extracts a plurality of intermediate features through the trained feature extractor. Then, a global classification operation and a local partial class operation are executed by inputting a plurality of intermediate features into the first classifier and the second classifier respectively, and corresponding full-supervision loss functions and half-supervision loss functions are calculated to train a detection model. Based on the information, the embodiment of the application reduces the loss of detail information by carrying out the blocking processing on the ultra-wide-angle fundus image, so that the information of the characteristic result is more abundant.

Furthermore, the embodiment of the application trains the detection model by combining global and local classification and calculating the full supervision and semi-supervision loss function based on the marked detection frame and the unmarked detection frame, so that the classification result is more accurate, and the reliability of the detection model can be ensured under the condition of lacking abnormal region marking. The abnormal region in the ultra-wide-angle fundus image can be accurately identified and positioned by utilizing the trained detection model. In addition, the embodiment of the application also visualizes the weight in the attention module so as to view the image block where the abnormal region is located, thereby bringing great convenience for detecting the abnormal region in clinic.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. In the drawings, embodiments of the application are illustrated by way of example and not by way of limitation, and like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is an exemplary schematic diagram showing an ultra-wide angle fundus image and a normal color fundus image;

FIG. 2 is an exemplary flow chart illustrating a method for training a detection model for detecting an abnormal region in a fundus image in accordance with an embodiment of the present application;

Fig. 3 is an exemplary diagram showing the entirety of a detection model for detecting an abnormal region in a fundus image according to an embodiment of the present application;

FIG. 4 is an exemplary schematic diagram illustrating an attention operation according to an embodiment of the present application;

FIG. 5 is an exemplary diagram illustrating visualizing weights in an attention module according to an embodiment of the application;

fig. 6 is an exemplary flowchart illustrating a method for detecting an abnormal region in a fundus image according to an embodiment of the present application; and

fig. 7 is an exemplary block diagram showing a configuration of an apparatus for training a detection model for detecting an abnormal region in a fundus image and for detecting an abnormal region in a fundus image according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the embodiments described in this specification are only some embodiments of the application provided for the purpose of facilitating a clear understanding of the solution and meeting legal requirements, and not all embodiments of the application may be implemented. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are intended to be within the scope of the present application based on the embodiments disclosed herein.

Fig. 1 is an exemplary schematic diagram showing an ultra-wide angle fundus image and a normal color fundus image. The left diagram in fig. 1 shows an ultra-wide angle fundus image, and the right diagram in fig. 1 shows a normal color fundus image. According to the background description, compared with the common color fundus image, the imaging technology of the ultra-wide-angle fundus image is simpler, a wider retina visual field range can be obtained, and more valuable information and data can be provided for clinical diagnosis and treatment planning. However, the existing approach is generally to process the whole image (i.e. a block of images), which results in more detail loss and inaccurate detection results.

In addition, the existing method has little research on the specific position of the abnormal region, and the abnormal region detection frame is often marked in the ultra-wide-angle fundus image to serve as a training label. However, since the ultra-wide-angle fundus image has a wide field of view and a large amount of information, it consumes a large amount of human resources and time costs to label the abnormal region detection frame in the image, and the disclosed data set of the ultra-wide-angle fundus image labeled at a high quality and large scale is sparse, whereas for a fine focus, it becomes imperceptible after scaling and input into the neural network model, thus presenting a certain difficulty to model training.

Based on the method, the scheme for training the detection model for detecting the abnormal region in the fundus image is provided, and the ultra-wide-angle fundus image without the detection frame and the detection frame is subjected to block processing, so that detail loss is reduced, and training difficulty of the detection model is reduced. Further, global classification operation and local class operation are performed by inputting the plurality of intermediate features extracted via the feature extractor into the first and second classifiers, respectively, and corresponding full-supervised and semi-supervised loss functions are calculated. Therefore, the classification result is more accurate through the combination of global and local classification, the reliability of the detection model can be ensured under the condition of lacking of abnormal region labeling, and the abnormal region in the ultra-wide-angle fundus image can be accurately identified and positioned through the trained detection model.

A scheme for training a detection model for detecting an abnormal region in a fundus image according to an embodiment of the present application will be described in detail with reference to fig. 2 to 5.

Fig. 2 is an exemplary flow chart illustrating a method 200 for training a detection model for detecting an abnormal region in a fundus image according to an embodiment of the present application. As shown in fig. 2, at step 201, an ultra-wide angle fundus image for training a detection model is acquired. In one implementation scenario, the aforementioned detection model may be, for example, a ResNet-50 model, and the detection model may include at least a feature extractor and classifier. In some embodiments, the feature extractor may be composed of multiple convolution layers, dropout layers, activation functions, and Batch Normalization layers, and the classifier may include, for example, a fully connected layer. In one embodiment, the aforementioned ultra-wide-angle fundus image may be acquired by, for example, an ultra-wide-angle fundus camera, and the ultra-wide-angle fundus image includes an ultra-wide-angle fundus image without a detection frame noted and an ultra-wide-angle fundus image with a detection frame noted. The ultra-wide-angle fundus image marked with the detection frame comprises an ultra-wide-angle fundus image marked with the detection frame and marked with abnormal classification, and the ultra-wide-angle fundus image unmarked with the detection frame comprises an ultra-wide-angle fundus image marked with abnormal classification and not marked with the detection frame and an ultra-wide-angle fundus image unmarked with the detection frame and not marked with abnormal classification.

In other words, the ultra-wide-angle fundus image labeled with the detection frame of the present application is an ultra-wide-angle fundus image labeled with both the detection frame and the abnormality classification, and the ultra-wide-angle fundus image not labeled with the detection frame includes an ultra-wide-angle fundus image labeled with only the abnormality classification but not the detection frame and an ultra-wide-angle fundus image not labeled (i.e., neither the detection frame nor the abnormality classification is labeled). That is, the ultra-wide-angle fundus image of the present application includes three categories, namely, labeling of both the detection frame and the abnormality classification, labeling of only the abnormality classification, and non-labeling. It will be understood that the foregoing labeling detection frame refers to a specific position where an abnormal region is marked on the ultra-wide-angle fundus image by, for example, a rectangular frame, and the foregoing labeling abnormality classification refers to whether or not an abnormal region is present on the ultra-wide-angle fundus image by, for example, "1" is marked when an abnormal region is present, and "0" is marked when an abnormal region is not present. In an actual application scenario, the abnormal region in the embodiment of the present application may be, for example, a focal region in an ultra-wide-angle fundus image.

Based on the acquired ultra-wide-angle fundus image, at step 202, the ultra-wide-angle fundus image without the detection frame and the ultra-wide-angle fundus image with the detection frame are cropped to obtain respective multiple image blocks. That is, the aforementioned three types of ultra-wide-angle fundus images are each cut out of the entire image into several image blocks (for example, each type of ultra-wide-angle fundus image is cut out as four image blocks shown in fig. 3). In some embodiments, the ultra-wide-angle fundus image may be data cleaned, i.e., poor quality ultra-wide-angle fundus images removed, prior to cropping the various ultra-wide-angle fundus images into image tiles.

Next, at step 203, feature operations associated with the detection of abnormal regions are performed on the plurality of image blocks using the trained feature extractor to obtain a respective plurality of intermediate features. In one embodiment, the aforementioned trained feature extractor is obtained by: and performing data enhancement operation on the plurality of image blocks to obtain the image blocks with enhanced data, inputting the image blocks with enhanced data and the plurality of image blocks into a feature extractor to perform feature operation, and calculating a contrast loss function to train the feature extractor by using the contrast loss function to obtain a trained feature extractor. In some implementations, the aforementioned data enhancement operations may include, but are not limited to, horizontally flipping the image, scaling the image, and/or changing the color of the image, etc.

Based on the data-enhanced image block and the original (non-data-enhancement-performed) plurality of image blocks, the data-enhanced image block and the original plurality of image blocks are used as training samples to train the feature extractor. Specifically, the feature extractors are used for respectively extracting corresponding feature results to calculate a contrast loss function, and then the feature extractors are trained in the forward direction and the reverse direction based on the contrast loss function to obtain the trained feature extractors. In one implementation scenario, the contrast loss function may be calculated by the following formula:

Wherein v is _i Representing the feature result of the original ultra-wide angle fundus image extracted via the feature extractor,representing the feature result of the corresponding image extracted via the feature extractor after one data enhancement,/for each image>Representing other samples in the same batch of training. For example, in v _i Representing both the labeling of a detection frame and the labeling of an anomalyThe feature result of the classified ultra-wide-angle fundus image extracted via the feature extractor is exemplified by +.>Representing characteristic results extracted by a characteristic extractor after the ultra-wide-angle fundus image marked with a detection frame and abnormal classification is subjected to data enhancement once, wherein the characteristic results are marked with +_>And the ultra-wide-angle fundus images only marked with abnormal classification and/or the non-marked ultra-wide-angle fundus images in the same batch of training are represented. Further, T represents a hyper-parameter (whose value is, for example, 2-5), exp represents a power exponent function. Under the scene, the contrast loss function can learn the characteristics of the ultra-wide-angle fundus image under the condition of no need of marking by pulling in the distance between the same sample and the characteristic space after different data are enhanced and pulling out the distance between the contrast loss function and other samples in the characteristic space.

In obtaining the above-described trained feature extractor, feature operations related to abnormal region detection may be performed on a plurality of image blocks to obtain a respective plurality of intermediate features. In one embodiment, the trained feature extractor is further connected to a full connection module, and performs feature operations related to abnormal region detection on the plurality of image blocks by using the trained feature extractor, and performs full connection operations on feature operation results via the full connection module, so as to obtain respective plurality of intermediate features. That is, in the embodiment of the application, the plurality of image blocks corresponding to the various ultra-wide-angle fundus images are firstly input to the feature extractor to extract the features, and then the extracted features are input to the full-connection module to perform full-connection operation, so that the plurality of intermediate features corresponding to the various ultra-wide-angle fundus images are obtained.

Based on the obtained plurality of intermediate features, at step 204, the plurality of intermediate features are respectively input into the first classifier to perform global classification operation and input into the second classifier to perform local classification operation, and the full-supervised loss function and the semi-supervised loss function are calculated based on whether the detection boxes are labeled or not. In one embodiment, the plurality of intermediate features are input to a first classifier to perform a global classification operation, and a first full-supervised loss function is calculated based on labeled detection box correspondence and a first half-supervised loss function is calculated based on unlabeled detection box correspondence, and the plurality of intermediate features are input to a second classifier to perform a local classification operation, and a second full-supervised loss function is calculated based on labeled detection box correspondence and a second half-supervised loss function is calculated based on unlabeled detection box correspondence.

In one embodiment, the first classifier and the second classifier each comprise a fully connected layer, and the first classifier further comprises an attention module. Wherein, for the first classifier, the attention module that inputs the plurality of intermediate features into the first classifier performs an attention operation, and performs a full connection operation on an attention operation result via a full connection layer in the first classifier to perform a global classification operation. For the second classifier, a full connection layer, which inputs a plurality of intermediate features into the second classifier, performs a full connection operation to perform a partial classification operation.

That is, embodiments of the present application employ two classifiers to form two classification branches, one of which performs a global (i.e., for an overall image) classification operation via the attention module and the fully connected layer, and the other of which performs a local (i.e., for a local image) classification operation via the fully connected layer. And respectively inputting a plurality of intermediate features corresponding to the super-wide-angle fundus images into two classifiers, and calculating corresponding loss functions in each classification branch based on whether the detection frames are marked or not. Specifically, for the labeling detection frame, the first and second full supervision loss functions can be calculated based on the labeling detection frame (i.e. the real label) and the classification results of the first and second classifiers; for unlabeled detection boxes, the first and second semi-supervised loss functions may be calculated based on the pseudo labels and classification results of the first and second classifiers.

In one implementation scenario, for the annotation detection box, the first and second full supervision loss functions may be calculated based on the following formula:

wherein L is _Xi Representing a first (or second) fully supervised loss function, p _i Representing a real label of the tag,the classification result of the plurality of intermediate features of the ultra-wide-angle fundus image representing the labeling detection frame passing through the first (or second) classifier is represented by i.

In another implementation scenario, for an unlabeled detection box, the first and second semi-supervised loss functions may be calculated based on the following formulas:

wherein L is _Ui Representing a first (or second) semi-supervised loss function,representing pseudo tag->The classification result of the plurality of intermediate features of the ultra-wide-angle fundus image, which represent unlabeled detection frames (including only labeled abnormal classification and unlabeled), passing through the first (or second) classifier, i represents the classification category. In some implementation scenarios, the aforementioned +.>When the value of class i is the maximum value of all classes and exceeds a preset threshold value (e.g. 0.5), it can be taken as +.>A loss calculation is performed. That is, the pseudo tag may be selected to have the maximum value among all the categories and exceed a predetermined threshold value>

Further, at step 205, a detection model for detecting an abnormal region in the fundus image is trained using the full-supervised and semi-supervised loss functions. In one embodiment, the detection model for detecting the abnormal region in the fundus image may be trained by calculating a first sum of losses of the first full-supervised loss function and the first half-supervised loss function and calculating a second sum of losses of the second full-supervised loss function and the second half-supervised loss function, further based on the first sum of losses and the second sum of losses. Specifically, a first total loss sum is obtained by adding a first total supervision loss function and a first half supervision loss function, a second total supervision loss function and a second half supervision loss function are added to obtain a second total loss sum, and then an addition operation or a weighted sum operation is performed on the first total loss sum and the second total loss sum to obtain a final total loss so as to train a detection model based on the final total loss forward direction and the final total loss reverse direction, so that training of the detection model for detecting abnormal areas in fundus images is achieved.

As can be seen from the above description, in the embodiment of the present application, the ultra-wide-angle fundus image without the detection frame and the detection frame is obtained, and is cut into a plurality of image blocks, and a trained feature extractor is used to extract a plurality of intermediate features of various ultra-wide-angle fundus images. And then, inputting a plurality of intermediate features of all types of ultra-wide-angle fundus images into two classification branches formed by a first classifier and a second classifier to respectively execute global classification operation and local classification operation, and calculating a first full-supervision loss function and a second full-supervision loss function and a first semi-supervision loss function and a second semi-supervision loss function based on whether a detection frame is marked and set and the classification results of the real label and the pseudo label and the plurality of intermediate features of all types of ultra-wide-angle fundus images through the corresponding classifiers. Further, according to the loss sum of the first and second total supervision loss functions and the loss sum of the first and second semi-supervision loss functions, an addition or a weighted sum operation is performed to obtain a final total loss, so that training of a detection model for detecting an abnormal region in the fundus image is achieved.

Based on the above, the embodiment of the application cuts the whole image into a plurality of image blocks, and performs the feature extraction operation on each image block, so that the loss of detail information can be reduced, and the information of the feature result is more complete and rich. Further, the embodiment of the application also forms two classification branches to execute global classification operation and local partial class operation by setting two classifiers, and calculates corresponding full supervision loss function and semi-supervision loss function. Therefore, the classification result is more accurate by combining global and local classification, and the reliability of the detection model can be ensured in the absence of abnormal region labeling. Based on the trained detection model, the abnormal region in the ultra-wide angle fundus image can be accurately identified and positioned.

Fig. 3 is an exemplary diagram showing the entirety of a detection model for detecting an abnormal region in a fundus image according to an embodiment of the present application. It should be appreciated that the training process shown in fig. 3 is one specific embodiment of the training method 100 of fig. 1 described above, and thus the description of fig. 1 described above applies equally to fig. 3.

The upper left side in fig. 3 shows an acquired ultra-wide-angle fundus image including an ultra-wide-angle fundus image 301 labeled with a detection frame and an ultra-wide-angle fundus image not labeled with a detection frame (including an ultra-wide-angle fundus image 302 labeled with an abnormal classification but not labeled with a detection frame and an ultra-wide-angle fundus image 303 not labeled with a detection frame but not labeled with an abnormal classification). That is, the acquired ultra-wide-angle fundus image includes three types of ultra-wide-angle fundus images 301 that label both the detection frame and the abnormality classification, ultra-wide-angle fundus images 302 that label only the abnormality classification, and ultra-wide-angle fundus images 303 that do not label. In one embodiment, the foregoing three types of ultra-wide-angle fundus images may be first subjected to data cleaning to remove poor quality ultra-wide-angle fundus images. Then, the three types of ultra-wide-angle fundus images are cut out to obtain a plurality of image blocks. In one exemplary scenario, assume that the aforementioned three types of ultra-wide angle fundus imaging are denoted as S, which are cut equally into a plurality of image blocks ("patches"), e.g., denoted as S ₁ ,s ₂ ,...,s _n Where n represents the number of image blocks.

According to the foregoingAs is known, data enhancement operations (e.g., horizontally flipped images, image scaling and/or changing image colors, etc.) may be performed on a plurality of image blocks of each of the aforementioned three types of ultra-wide-angle fundus images to obtain data-enhanced image blocks, which are input as training samples 304 to the feature extractor 305 for feature extraction to obtain corresponding feature extraction results. For example, a feature result v extracted from the original ultra-wide-angle fundus image via a feature extractor is obtained _i And the feature result extracted by the feature extractor after the corresponding image is subjected to data enhancementNext, a contrast loss function is calculated based on the above formula (1), and a trained feature extractor 305 can be obtained by training the feature extractor in the forward and reverse directions. In one implementation scenario, the aforementioned feature extractor 305 may be composed of, for example, multiple convolution layers, dropout layers, activation functions, and Batch Normalization layers, and a fully connected module 306 may be connected after the feature extractor 305.

In an implementation scenario, a plurality of image blocks s are extracted via the aforementioned feature extractor ₁ ,s ₂ ,...,s _n After performing the feature operation related to the abnormal region detection, a plurality of initial features F may be obtained, where the plurality of initial features F may be denoted as f= { F ₁ ,f ₂ ,...,f _n }. It will be appreciated that after feature extraction, each image block correspondingly extracts a feature, e.g. image block s ₁ Corresponding to the initial feature f ₁ . Similarly, image block s ₂ Corresponding to the initial feature f ₂ Image block s _n Corresponding to the initial feature f _n . Next, the multiple initial features are input into the full connection module 306 for full connection operation, so as to obtain multiple corresponding intermediate features L ("Logits") = { L ₁ ,l ₂ ,...,l _n }. Wherein, for the image marked with the detection frame, a plurality of intermediate features correspondingly obtained are marked as L _X And L is _X ＝{l _X1 ,l _X2 ,...,l _Xn -a }; for images without marked detection frames, it is toThe intermediate features that should be obtained are denoted as L _U And L is _U ＝{l _U1 ,l _U2 ,...,l _Un }. The following description will take various ultra-wide angle fundus images cut into four image blocks as an example.

As shown in the lower part of fig. 3, the above-described ultra-wide-angle fundus image 301 labeled with both the detection frame and the abnormality classification, the ultra-wide-angle fundus image 302 labeled with only the abnormality classification, and the ultra-wide-angle fundus image 303 labeled with no abnormality classification are each input to the trained feature extractor 305 for performing a feature operation related to the detection of an abnormal region, and a full-connection operation is performed on the feature operation result via the full-connection module 306 to obtain a plurality of intermediate features 301-2, a plurality of intermediate features 302-2, and a plurality of intermediate features 303-2, respectively.

Based on the obtained plurality of intermediate features 301-2 through 303-2, the plurality of intermediate features 301-2 through 303-2 are input into a first classifier and a second classifier, respectively, wherein the first classifier comprises an attention module 307 and a fully connected layer 308, and the second classifier comprises a fully connected layer 309. For the first classifier, the attention module 307 that inputs the plurality of intermediate features 301-2 to 303-2 into the first classifier performs an attention operation, and performs a full connection operation on the attention operation result via the full connection layer 308 in the first classifier to obtain respective corresponding classification results to perform a global classification operation. For the second classifier, the full connection layer 309, which inputs the plurality of intermediate features 301-2 to 303-2 into the second classifier, performs a full connection operation to obtain respective corresponding classification results to perform a partial classification operation.

In one implementation scenario, the attention module 307 may also include a fully connected layer. Specifically, when performing the attention operation, an attention mechanism is added to each intermediate feature of the plurality of intermediate features by using the attention module to obtain a new intermediate feature corresponding to each intermediate feature, and then a corresponding attention operation result is obtained by performing a product operation on the new intermediate feature and the corresponding intermediate feature. The aforementioned attention operation will be described in detail later with reference to fig. 4.

Further, for the ultra-wide-angle fundus image 301 labeled with both the detection frame and the anomaly classification, based on classification results of the real labels and the corresponding plurality of intermediate features 301-2 via the first and second classifiers, a first total supervision loss function 310 and a second total supervision loss function 311 are calculated according to the above formula (2). For the ultra-wide-angle fundus image 302 and the non-annotated ultra-wide-angle fundus image 303, which are only annotated with abnormal classification, the first semi-supervised loss function 312 and the second semi-supervised loss function 313 may be calculated based on the classification results of the pseudo labels and the respective corresponding plurality of intermediate features 302-2, the plurality of intermediate features 303-2 via the first and second classifiers, and according to the above formula (3). After the first total supervision loss function 310 and the second total supervision loss function 311 and the first half supervision loss function 312 and the second half supervision loss function 313 are obtained, the final total loss is obtained according to the loss sum of the first total supervision loss function and the second total supervision loss function and the loss sum of the first half supervision loss function and the second half supervision loss function, and the addition or the weighted sum operation is performed, so that the training of the detection model for detecting the abnormal region in the fundus image is realized. In some embodiments, the detection model may be, for example, a ResNet-50 model.

Fig. 4 is an exemplary schematic diagram illustrating an attention operation according to an embodiment of the present application. As shown in fig. 4, taking one of the above-described plurality of intermediate features 301-2 as an example, intermediate feature 301-2 is input to the fully connected layer in attention module 307 to output a new intermediate feature 401. Next, the new intermediate feature 401 is multiplied by intermediate feature 301-2 (e.g., as shown in the figureShown) to obtain a corresponding attention operation result 402. Similarly, by performing the foregoing operations on other ones of the plurality of intermediate features 301-2, respective corresponding attention operation results may be obtained, and by performing the foregoing operations on each of the plurality of intermediate features 302-2, 303-2, respective corresponding attention operation results may also be obtained.

In one embodiment, the embodiment of the present application may also visualize the weights in the attention module to determine the weights corresponding to each image block. Based on the above, the probability value of the abnormal region existing in each image block can be checked to determine the image block where the abnormal region exists and the position of the abnormal region in the image block, so that great convenience is brought to the detection of the abnormal region in clinic, for example, as shown in fig. 5.

Fig. 5 is an exemplary diagram illustrating visualizing weights in an attention module according to an embodiment of the application. As shown in fig. 5 by way of example, four image blocks are shown below each image block, with the corresponding weights, i.e., probability values for the presence of an anomaly in each image block. For example, the weights w of the image blocks shown in the upper left of fig. 5 ₁ Weight w of image block shown in upper right of fig. 5 =0.9 ₂ Weight w of image block shown in lower left in fig. 5 =0.04 ₃ Weight w of image block shown at lower right in fig. 5 =0.05 ₄ =0.01. As can be seen from the respective corresponding weights, the probability of the abnormal region existing in the image block shown in the upper left in fig. 5 is large.

Fig. 6 is an exemplary flow chart illustrating a method 600 for detecting an abnormal region in a fundus image according to an embodiment of the present application. As shown in fig. 6, at step 601, an ultra-wide angle fundus image to be detected is acquired. In one embodiment, the ultra-wide angle fundus image may be acquired by, for example, an ultra-wide angle fundus camera. Based on the obtained ultra-wide-angle fundus image, at step 620, the ultra-wide-angle fundus image is input into the trained detection model for detection to output the detection result of the abnormal region in the fundus image. By detecting in the detection model trained by the method, the abnormal region in the ultra-wide-angle fundus image can be accurately identified and positioned. That is, the output detection result may include the classification result of the abnormal region, and may also include the specific position of the abnormal region.

Fig. 7 is an exemplary block diagram showing an apparatus 700 for training a detection model for detecting an abnormal region in a fundus image and for detecting an abnormal region in a fundus image according to an embodiment of the present application. It is to be appreciated that the device implementing aspects of the present application may be a single device (e.g., a computing device) or a multi-function device including various peripheral devices.

As shown in fig. 7, the apparatus of the present application may include a central processing unit or central processing unit ("CPU") 711, which may be a general purpose CPU, a special purpose CPU, or other information processing and program running execution unit. Further, the device 700 may also include a mass memory 712 and a read only memory ("ROM") 713, wherein the mass memory 712 may be configured to store various types of data, including various and ultra-wide angle fundus images, algorithmic data, intermediate results, and various programs required to operate the device 700. ROM 713 may be configured to store data and instructions necessary to power-on self-test of device 700, initialization of functional modules in the system, drivers for basic input/output of the system, and boot the operating system.

Optionally, the device 700 may also include other hardware platforms or components, such as a tensor processing unit ("TPU") 714, a graphics processing unit ("GPU") 715, a field programmable gate array ("FPGA") 716, and a machine learning unit ("MLU") 717, as shown. It will be appreciated that while various hardware platforms or components are shown in device 700, this is by way of example only and not limitation, and that one of skill in the art may add or remove corresponding hardware as desired. For example, the device 700 may include only a CPU, a related storage device, and an interface device to implement the method for training the detection model for detecting an abnormal region in a fundus image and the method for detecting an abnormal region in a fundus image of the present application.

In some embodiments, to facilitate the transfer and interaction of data with external networks, the device 700 of the present application further comprises a communication interface 718, whereby the device can be connected to a local area network/wireless local area network ("LAN/WLAN") 705 via the communication interface 718, and further to a local server 706 or to the Internet ("Internet") 707 via the LAN/WLAN. Alternatively or additionally, the device 700 of the present application may also be directly connected to the internet or cellular network via the communication interface 718 based on wireless communication technology, such as wireless communication technology based on generation 3 ("3G"), generation 4 ("4G"), or generation 5 ("5G"). In some application scenarios, the device 700 of the present application may also access the server 708 and database 709 of the external network as needed to obtain various known algorithms, data, and modules, and may store various data remotely, such as various data or instructions for rendering, for example, ultra-wide angle fundus images, etc.

The peripheral devices of the device 700 may include a display 702, an input 703 and a data transmission interface 704. In one embodiment, the display device 702 may, for example, include one or more speakers and/or one or more visual displays configured to provide voice prompts and/or visual display of the detection model of the present application for training to detect abnormal regions in fundus images and detecting abnormal regions in fundus images. The input device 703 may include other input buttons or controls, such as a keyboard, mouse, microphone, gesture-capture camera, etc., configured to receive input of audio data and/or user instructions. The data transfer interface 704 may include, for example, a serial interface, a parallel interface, or a universal serial bus interface ("USB"), a small computer system interface ("SCSI"), serial ATA, fireWire ("FireWire"), PCI Express, and high definition multimedia interface ("HDMI"), etc., configured for data transfer and interaction with other devices or systems. In accordance with aspects of the present application, the data transmission interface 704 may receive an ultra-wide angle fundus image from an ultra-wide angle fundus camera acquisition and transmit data or results, including the ultra-wide angle fundus image or various other types, to the device 700.

The above-described CPU 711, mass memory 712, ROM 713, TPU 714, GPU 715, FPGA 716, MLU 717, and communication interface 718 of the device 700 of the present application may be interconnected by a bus 719 and data interaction with peripheral devices may be achieved by the bus. In one embodiment, the CPU 711 may control other hardware components in the device 700 and their peripherals via the bus 719.

An apparatus for training a detection model for detecting an abnormal region in a fundus image and for detecting an abnormal region in a fundus image, which can be used to perform the present application, is described above in connection with fig. 7. It is to be understood that the device structure or architecture herein is merely exemplary and that the implementation and implementation entities of the present application are not limited thereto, but that changes may be made without departing from the spirit of the present application.

Those skilled in the art will also appreciate from the foregoing description, taken in conjunction with the accompanying drawings, that embodiments of the present application may also be implemented in software programs. The present application thus also provides a computer readable storage medium. The computer readable storage medium may be used to implement the method for training the detection model for detecting an abnormal region in a fundus image and the method for detecting an abnormal region in a fundus image described in connection with fig. 2 and 6 of the present application.

It should be noted that although the operations of the method of the present application are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It should be understood that when the terms "first," "second," "third," and "fourth," etc. are used in the claims, the specification and the drawings of the present application, they are used merely to distinguish between different objects, and not to describe a particular order. The terms "comprises" and "comprising" when used in the specification and claims of the present application are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present specification and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Although the embodiments of the present application are described above, the descriptions are merely examples for facilitating understanding of the present application, and are not intended to limit the scope and application of the present application. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is defined by the appended claims.

Claims

1. A method for training a detection model for detecting an abnormal region in a fundus image, wherein the detection model includes at least a feature extractor and a classifier, and the method comprises:

acquiring a super-wide-angle fundus image for training the detection model, wherein the super-wide-angle fundus image comprises a super-wide-angle fundus image without a detection frame and a super-wide-angle fundus image with a detection frame;

cutting the ultra-wide-angle fundus image without the marked detection frame and the ultra-wide-angle fundus image with the marked detection frame to obtain a plurality of image blocks;

performing feature operations related to outlier detection on the plurality of image blocks using the trained feature extractor to obtain respective plurality of intermediate features;

Inputting the plurality of intermediate features into a first classifier to execute global classification operation and a second classifier to execute local classification operation respectively, and correspondingly calculating a full supervision loss function and a half supervision loss function based on whether a detection frame is marked or not; and

and training a detection model for detecting an abnormal region in the fundus image by using the full supervision loss function and the semi supervision loss function.

2. The method of claim 1, wherein the ultra-wide-angle fundus image labeled with a detection frame comprises an ultra-wide-angle fundus image labeled with a detection frame and labeled with an anomaly classification, and the ultra-wide-angle fundus image unlabeled with a detection frame comprises an ultra-wide-angle fundus image labeled with an anomaly classification and unlabeled with a detection frame and an ultra-wide-angle fundus image unlabeled with an anomaly classification.

3. The method of claim 1, wherein the trained feature extractor is obtained by:

performing data enhancement operation on the plurality of image blocks to obtain data-enhanced image blocks;

inputting the image blocks with the enhanced data and the plurality of image blocks into the feature extractor for feature operation, and calculating a contrast loss function; and

And training the feature extractor by using the contrast loss function.

4. The method of claim 3, wherein the trained feature extractor is further coupled to a fully-connected module, wherein performing feature operations related to outlier detection on the plurality of image blocks using the trained feature extractor to obtain a respective plurality of intermediate features comprises:

and performing feature operation related to abnormal region detection on the plurality of image blocks by using the trained feature extractor, and performing full-connection operation on feature operation results by using the full-connection module so as to obtain a plurality of intermediate features.

5. The method of claim 1, wherein inputting the plurality of intermediate features into a first classifier to perform a global classification operation and into a second classifier to perform a local classification operation, respectively, and calculating a full-supervised and a semi-supervised loss function based on whether or not detection boxes are labeled corresponds comprises:

inputting the plurality of intermediate features into the first classifier to execute global classification operation, and correspondingly calculating a first full supervision loss function based on the marked detection frame and a first half supervision loss function based on the unmarked detection frame; and

And inputting the plurality of intermediate features into the second classifier to execute local classification operation, and calculating a second full supervision loss function based on the marked detection frame correspondence and calculating a second half supervision loss function based on the unmarked detection frame correspondence.

6. The method of claim 5, wherein the first classifier and the second classifier comprise fully connected layers, and the first classifier further comprises an attention module, inputting the plurality of intermediate features into the first classifier to perform global classification operations and inputting the plurality of intermediate features into the second classifier to perform local classification operations comprises:

inputting the plurality of intermediate features into an attention module in the first classifier to perform an attention operation, and performing a fully connected operation on an attention operation result via a fully connected layer in the first classifier to perform a global classification operation; and

the full connection layer, which inputs the plurality of intermediate features into the second classifier, performs a full connection operation to perform a partial classification operation.

7. The method of claim 5, wherein training a detection model that detects an abnormal region in a fundus image using the full-supervised and semi-supervised loss functions comprises:

Calculating a first sum of losses of the first full supervision loss function and the first half supervision loss function;

calculating a second sum of losses of the second full-supervised and second semi-supervised loss functions; and

training a detection model for detecting an abnormal region in the fundus image based on the first loss sum and the second loss sum.

8. The method of claim 6, further comprising:

and visualizing the weights in the attention module to determine the weight corresponding to each image block.

9. An apparatus for training a detection model for detecting an abnormal region in a fundus image, comprising:

a processor;

a memory storing program instructions for training a detection model for detecting an abnormal region in a fundus image, which when executed by the processor, cause the apparatus to implement the method according to any one of claims 1-8.

10. A method for detecting an abnormal region in a fundus image, comprising:

acquiring an ultra-wide-angle fundus image to be detected;

inputting the ultra-wide-angle fundus image into a detection model trained according to the method of any one of claims 1-8 for detection, to output a detection result of an abnormal region in the fundus image.

11. An apparatus for detecting an abnormal region in a fundus image, comprising:

a processor;

a memory storing program instructions for detecting an abnormal region in a fundus image, which when executed by the processor, cause the apparatus to implement the method of claim 10.

12. A computer readable storage medium having stored thereon computer readable instructions for training a detection model for detecting an abnormal region in a fundus image and for detecting an abnormal region in a fundus image, which when executed by one or more processors, implement the method of any of claims 1-8 and the method of claim 10.