CN114155190A

CN114155190A - Retinal image synthesis method for generating confrontation network based on focus attention condition

Info

Publication number: CN114155190A
Application number: CN202111243429.XA
Authority: CN
Inventors: 谢海; 雷柏英; 汪天富; 张国明
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-03-08

Abstract

The invention discloses a retinal image synthesis method for generating a confrontation network based on focus attention conditions, which comprises the following steps: obtaining a retinal blood vessel mask map; and inputting the retinal blood vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal composite image. According to the invention, the retina composite image obtained by the confrontation network is generated based on the focus attention condition, so that the focus details of the composite image can be enhanced, the diversity of the composite image is improved, and the disease identification effect of the high-resolution image is enhanced.

Description

Retinal image synthesis method for generating confrontation network based on focus attention condition

Technical Field

The invention relates to the technical field of image processing, in particular to a retinal image synthesis method for generating a confrontation network based on focus attention conditions.

Background

Color fundus photography is currently the most economical, non-invasive imaging modality for detecting retinopathy. Its wide availability makes it an ideal choice for the evaluation of a variety of ophthalmic diseases. Currently, conventional fundus cameras have been widely used to detect retinopathy, although it has some limitations. For example, a conventional fundus image includes only a region of 30 to 60 degrees in the center of the retina. In contrast, the range of ultra wide angle (UWF) retinal images based on Optos cameras is 200 degrees, encompassing 80% of the area of the retina. It allows more clinically relevant lesions to be detected from the peripheral region of the retina, which is important in the focus of the variation in the peripheral retina, such as retinal degeneration, detachment, bleeding, exudation, etc. The deep learning is successfully applied to screening of conventional fundus images, and a good effect is achieved in detection of various retinal diseases. In recent years, deep learning is also applied to UWF fundus images. However, automatic diagnosis based on UWF images still presents some challenges. First, some lesion information appears very small compared to the global image, which makes the lesion area very unobvious. Second, limited samples tend to result in overfitting, degrading the model in the new data set. To alleviate the above problems, many researchers have developed many useful solutions based on generating a countermeasure network (GAN). And the GAN is used for synthesizing the training sample to make up the deficiency of the original data set, so that the image sample can be effectively enhanced. However, GAN typically works well only for low resolution images, and is also deficient for high resolution images.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a retinal image synthesis method for generating an antagonistic network based on a focus attention condition, aiming at solving the problems of blurred focus details and single and insufficient retinal synthesized image samples in the retinal synthesized image in the prior art.

The technical scheme adopted by the invention for solving the problems is as follows:

in a first aspect, an embodiment of the present invention provides a retinal image synthesis method for generating an antagonistic network based on a focal attention condition, where the method includes:

obtaining a retinal blood vessel mask map;

and inputting the retinal blood vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal composite image.

In one implementation, the generating an antagonistic network based on a focal attention condition includes a generator, a weight-shared multi-output arbiter, a random forest classifier, a reverse activation network, and an attention module, wherein the weight-shared multi-output arbiter outputs a disease discrimination result and a disease classification result; the random forest classifier is used for identifying the focus characteristics of the disease classification result output by the weight-shared multi-output discriminator; the reverse activation network is used for activating the lesion feature; the attention module is for fusing the lesion features into a decoder of a generator.

In one implementation, the random forest classifier is configured to identify lesion features of the disease classification result output by the weight-shared multi-output discriminator, specifically:

the random forest classifier is used for obtaining focus features by counting classification feature frequencies in disease classification results output by the weight-sharing multi-output discriminator.

In one implementation, the training process for generating an antagonistic network based on a focal attention condition includes:

acquiring a random Gaussian noise vector;

obtaining a training sample, wherein the training sample comprises a real retina image, a retina blood vessel mask and a disease classification label, and the retina blood vessel mask is obtained by transformation according to the real retina image;

inputting the random Gaussian noise vector, the retinal blood vessel mask and the real retinal image into a first network model, and outputting a predicted disease classification result corresponding to the real retinal image through the first network model;

obtaining a loss function according to the disease classification label and the predicted disease classification result;

training the first network model based on the loss function to obtain a generation countermeasure network based on the focus attention condition.

In one implementation, the retinal vessel mask is specifically derived from the real retinal image transformation as:

performing retinal vessel segmentation on the real retinal image to obtain a retinal vessel segmentation image;

and filtering the retinal blood vessel segmentation image to obtain a retinal blood vessel mask.

In one implementation, the loss function is obtained by adding a confrontation loss function, a classification loss function, an activation feature matching loss function, and a perceptual loss.

In one implementation, the activation-feature-matching-loss function is used to calculate lesion features of the reverse activation network.

In one implementation, the classification loss function is used to learn lesion features of the real retinal image.

In a second aspect, an embodiment of the present invention further provides a retinal image synthesis apparatus for generating an antagonistic network based on a focal attention condition, where the apparatus includes:

the retinal blood vessel mask image acquisition module is used for acquiring a retinal blood vessel mask image;

and the retina composite image acquisition module is used for inputting the retina blood vessel mask image into a trained confrontation network generated based on the focus attention condition to obtain a retina composite image.

In a third aspect, an embodiment of the present invention further provides an intelligent terminal, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors includes a computer-readable medium for executing the method for synthesizing a retinal image based on a lesion attention condition to generate an anti-network.

In a fourth aspect, embodiments of the present invention also provide a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a retinal image synthesis method for generating a confrontation network based on a focal attention condition as described in any one of the above.

The invention has the beneficial effects that: the embodiment of the invention firstly obtains a retinal blood vessel mask image; finally, inputting the retinal vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal composite image; therefore, the retina composite image obtained by the confrontation network is generated based on the focus attention condition in the embodiment of the invention, so that the focus details of the composite image can be enhanced, the diversity of the composite image is improved, and the disease identification effect of the high-resolution image is enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a retinal image synthesis method for generating an anti-network based on a focal attention condition according to an embodiment of the present invention.

Fig. 2 is a flowchart of an overall technique for generating an anti-confrontation network based on a focus attention condition according to an embodiment of the present invention.

Fig. 3 is a diagram illustrating an implementation manner of details of a synthesized DR image according to an embodiment of the present invention.

Fig. 4 is a diagram of an example implementation of details of a synthesized RP image provided by an embodiment of the present invention.

Fig. 5 is a diagram of experimental results of an implementation manner of adding a DR training set by using synthetic data according to an embodiment of the present invention.

Fig. 6 is a diagram of experimental results of an implementation manner for adding an RP training set by using synthetic data according to an embodiment of the present invention.

Fig. 7 is a schematic block diagram of a retinal image synthesis apparatus for generating a confrontation network based on a focal attention condition according to an embodiment of the present invention.

Fig. 8 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

The invention discloses a retinal image synthesis method for generating an antagonistic network based on focus attention conditions, and in order to make the purpose, technical scheme and effect of the invention clearer and clearer, the invention is further described in detail below by referring to the attached drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Since the conventional retinal image synthesis method in the prior art is used to enhance the effect of GAN in the synthesized image, the basic GAN framework is enhanced by using side information. One strategy is to provide class labels to the generator and the evaluator in order to generate class condition samples. The addition of the auxiliary classifier in the GAN framework can enable the synthesized image to have higher global consistency and obtain higher classification performance. However, this method requires a large amount of label-free data to train GAN, is not suitable for UWF data sets with a small amount of data, and only achieves a good effect on low-resolution images and a poor effect on high-resolution images. Another strategy is to condition the retinal vascular mask to synthesize a realistic retinal image. However, current methods are not effective in retaining microscopic lesion details, and the composite image may be blurred in many critical details. These deficiencies can lead to failure of the model to learn effective lesion features.

In order to solve the problems in the prior art, the present embodiment provides a retinal image synthesis method for generating an antagonistic network based on a focus attention condition, and by generating a retinal synthesized image obtained by the antagonistic network based on the focus attention condition, the focus details of the synthesized image can be enhanced, the diversity of the synthesized image can be improved, and the disease identification effect of a high-resolution image can be enhanced. When the method is specifically implemented, a retinal blood vessel mask image is obtained; and then inputting the retinal vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal composite image.

Exemplary method

The embodiment provides a retina image synthesis method for generating a confrontation network based on focus attention conditions, and the method can be applied to an intelligent terminal for image processing. As shown in fig. 1 in detail, the method includes:

step S100, obtaining a retinal blood vessel mask image;

in the embodiment of the invention, the retina image is a super wide angle retina image (UWF), and the super wide angle retina image is an important tool for screening retina diseases because the super wide angle retina image can simultaneously detect the peripheral retinopathy and obtain more complete retina information. Automatic detection of retinal diseases based on deep learning plays an important role in clinical practice. However, training a model with a strong generalization capability requires a large amount of training data. Therefore, the invention firstly obtains the retinal vessel mask image by the trained segmentation network on the retinal vessel segmentation data set, and prepares for the subsequent generation of the diversified retinal composite image.

After obtaining the retinal vessel mask map, the following steps can be performed as shown in fig. 1: s200, inputting the retinal blood vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal synthetic image;

in practice, typical data enhancement methods have limited effects and cannot generate diversified data, so that the acquired retinal vessel mask image is input into a trained antagonistic network generated based on the focus attention condition, and the network can obtain more focus information of the retinal image based on the focus attention condition, so that the synthesized image has better effect on high resolution.

In one implementation, as shown in fig. 2, the generating an antagonistic network based on a lesion attention condition includes a generator, a weight-shared multi-output arbiter, a random forest classifier, a reverse activation network, and an attention module, wherein the weight-shared multi-output arbiter outputs a disease discrimination result and a disease classification result; the random forest classifier is used for identifying the focus characteristics of the disease classification result output by the weight-shared multi-output discriminator; the reverse activation network is used for activating the lesion feature; the attention module is for fusing the lesion features into a decoder of a generator. In the present embodiment, it is preferred that,

the generator employs a common encoder and decoder architecture to synthesize 512 x 512 images. First, the encoder encodes the retinal blood vessel mask, then introduces a noise code before decoding, the noise code being random noise with a positive distribution, and finally the composite retinal image is generated step by the decoder. To preserve the vessel structure during the synthesis of the retinal image, jump connections are used in the encoding and decoding paths. The entire encoder and decoder thus performs 6 downsampling and upsampling, respectively. To increase the model performance, a series of residual blocks are used in the generator, each block comprising two BatchNorm-ReLU-contribution layers. Each convolutional layer is a convolution kernel of 3x3 and uses residual concatenation to reduce overfitting. The down-sampling module sets stride 2 in the first convolution layer, the up-sampling module increases the feature size by using nearest neighbor interpolation, and the up-sampling residual module fuses the features of the reverse activation network of the discriminator through the attention module in the up-sampling process as shown in fig. 2 (b). Finally, the generator outputs the composite retinal image through a 3x3 convolution layer and tanh function with the pixel values held at [ -1,1]In the meantime. The discriminator does not use residual concatenation. The discriminator also performs downsampling 6 times by using the maximum pooling layer. And the module of the discriminator is composed of two groups of Convolition-LeakyReLU layers, and a batch normalization layer is not used, because research shows that the discriminator network of the GAN uses the batch normalization layer to reduce the diversity of generated images. The discriminator simultaneously outputs the discrimination result and the classification result. The output results and classification results share a feature extraction network, but each construct an output layer. Further, an auxiliary classifier C is constructed in the discriminator so as to learn the lesion feature of the image by classifying the loss function. The present invention proposes a lesion feature attention mechanism that uses these lesion feature enhancement generators to generate a realistic retinal image. Specifically, first pass through random forestsThe classifier learns the disease classification, and according to the research of Gu and the like, the focus characteristics are obtained by counting the classification characteristic frequency in the disease classification result output by the weight-sharing multi-output discriminator, that is, the key characteristics, that is, the focus characteristics can be identified by calculating each frequency which is helpful for the classification characteristics. These lesion features are then activated through a reverse activation network and finally fused by an attention module to provide information to the generator. In practice, only a few features extracted by the discriminators are important for the prediction of lesions. After these key features are identified by the random forest, they may be output to an activation network to obtain an activation projection of the key features. As shown in fig. 2, the activation network is the inverse of the arbiter. For each convolutional layer in the discriminator, there is a corresponding deconvolution layer, where the step size and the convolutional kernel size are the same. The deconvolution layers share the same weight parameters, except that the convolution kernel is flipped vertically and horizontally. For each max pooling layer, there is an unpooling layer that performs a partial inverse operation, where max elements are located by skip-join and non-max elements are filled with zeros. For an LeakyReLU function, there is a corresponding LeakyReLU function in the activation network, which is used to ensure that the output activation value for each layer is positive. Examples of some lesion feature activations are shown in fig. 3 and 4. It can be seen from the figure that although the classification training only uses image-level labels for training, the reverse activation map of the key features accurately locates the lesion, indicating that the activation network contains lesion information of the retina. The attention module uses the lesion features obtained by the activation network to provide lesion information to the generator so that the resulting retinal composite image can generate better lesion details. At level I, the generator module is characterized by F, as shown in FIG. 2(c)_lThe feature of the active network is denoted A_l. The two features are normalized and multiplied, respectively, and the attention feature is generated using the ReLU activation function and 1 × 1 convolution. Finally, the attention features are fused into the generator. This attention mechanism can be expressed as:

F^attn＝Conv(ReLU(Norm(F_l)×Norm(A_l)))，

wherein A is_lIs obtained by reverse activation of key features, and contains lesion information. The effect of normalizing the input features using the Norm function is to ensure that feature learning remains normatively controllable. Based on this attention model, each generation model incorporates lesion features that enhance the lesion details of the generated retinal composite image.

In one implementation, the training process for generating the countermeasure network based on the focus attention condition includes the following steps: acquiring a random Gaussian noise vector; obtaining a training sample, wherein the training sample comprises a real retina image, a retina blood vessel mask and a disease classification label, and the retina blood vessel mask is obtained by transformation according to the real retina image; inputting the random Gaussian noise vector, the retinal blood vessel mask and the real retinal image into a first network model, and outputting a predicted disease classification result corresponding to the real retinal image through the first network model; obtaining a loss function according to the disease classification label and the predicted disease classification result; training the first network model based on the loss function to obtain a generation countermeasure network based on the focus attention condition.

Specifically, in the actual training, the training samples are labeled real retina images corresponding to classification labels to form a real retina image-label pair, i.e., a sample contains (real retina image, disease classification label, retina blood vessel mask). The real retina image is obtained by shooting of hospital equipment, the disease classification label is marked by an expert doctor, and the retina blood vessel mask is obtained by a segmentation network trained on a retina blood vessel segmentation data set and is obtained by transformation of the real retina image. Specifically, retinal vessel segmentation is carried out on the real retinal image to obtain a retinal vessel segmentation image; in rotating the retinal vessel segmentation imageAnd (3) filtering the stripes, keeping fine bright spots in the image, removing large-area bright spots, and combining a plurality of filters to obtain the retinal blood vessel mask. In this embodiment, a training set is applied to a retinal image

Wherein

Is an RGB color true retinal image of width W and height H, v_i∈{0，1}^W×HIs corresponding to x_iOf the retinal vascular mask, y_iIs corresponding to x_iDiagnostic classification labels for images. Inputting the random Gaussian noise vector Z and the retinal vessel mask V into a generator in a first network model, wherein Z is connected with a class label vector, outputting a composite image represented formally as

The random noise Z is to introduce appearance diversity and the generator G learns the multi-valued mapping. After the generator has generated the composite image, it is input, together with the real retinal image, via several affine transformations into a weight-shared multi-output discriminator D of the first network model, through which a loss function is calculated, the objective of the discriminator D being to discriminate the composite retinal image from the real retinal image x

Expressed as D (X, v) → D ∈ [0, 1 ∈]When X is the real retinal image X, d should tend to 1, whereas X is the composite retinal image

When d should tend towards 0. For this minimax two-player game setup, the generate confrontation networks (GANs) method is employed, and the following optimization problem characterizing the interaction between G and D is considered:

the random forest in the first network model learns the disease classification using the last layer of features extracted by the discriminator D, and identifies the lesion features by calculating each frequency contributing to the classification. The activation network is the inverse operation of the arbiter, with weights introduced by the arbiter. And inputting the lesion features identified by the random forest into an activation network of the first network model to obtain the lesion features of the corresponding layer, and fusing the lesion features into a generator through an attention module in the decoding process of the generator of the first network model.

Further, combining the above formula with the global L1 penalty provides a more consistent result, ensuring that the composite retinal image does not deviate significantly from the true retinal image, the global L1 penalty is as follows:

further, the performance of the GAN can be enhanced by providing side information. For example, the auxiliary classifier GAN (AC-GAN) constructs a conditional model that extends the goal function of GAN by the auxiliary classifier to improve performance. Therefore, in addition to the classifier D, the auxiliary classifier C can also be constructed by using the classification label of the data set, and the classification loss function is as follows:

the training of GAN can significantly improve the data volume, and data enhancement and regularization methods have been applied to GAN training to improve the performance of GAN. According to the research of Tran et al, data enhancement can improve the learning of the distribution of the original data and further improve the quality of the generated image by adding the transformed data. For the discriminator, a multi-output discriminator based on weight sharing is designed to improve the learning of the original data by the generator by utilizing the transformed data. As shown on the right side of fig. 2(a), except that the generated retinal composite image and the real retinal image are input toIn the discriminator, a series of transformation functions T are constructed_kAnd K is { 1., K }, and the retina composite image and the real retina image are subjected to K-1 transformations respectively and then input into the discriminator. The composite retinal image and the real retinal image undergo the same transformation and therefore the distribution of the original data is not altered when augmented with data. Here, T1 is indicated as an operation without transformation, then

The penalty function in (1) will become:

likewise, the classifiers are also expanded to K, which perform transformed image classification, respectively, and thus,

becomes:

in order to further improve the quality of the synthesized image, the invention is based on a reverse activation network and proposes the loss of the activation characteristic matching. The feature matching loss minimizes the statistical difference between the real retinal image and the retinal composite image at multiple scales, resulting in better image distribution information. The fundus image has appearance diversity due to variations in color, texture, illumination, and the like. The focus distribution information of the synthetic image can be constrained by learning with focus features of different scales. The activation feature matching loss is calculated using the features of the reverse activation network, which are focused on lesion features and ignore the background. P denotes the p-th layer of the activation network, and the activation characteristic matching loss function is expressed as:

the data is enhanced and regularized, the feature learning is improved, and the method is particularly important for generating the diversity learning of the image. Since the focus feature is lost to the activation feature matching, the background feature is not too much constrained, and thus randomness is introduced into the generator, which may enhance diversity. Furthermore, to ensure the physiological details of the retinal image, a perceptual loss is applied, which is based on pre-trained VGG-19 and denoted by q for a certain layer of the VGG19 network, at which the difference between the real image and the synthetic image is calculated, the perceptual loss function being expressed as:

finally, the loss function is obtained by adding the confrontation loss function, the classification loss function, the activation feature matching loss function and the perception loss.

Experimental effects of the invention

1. Data set

To verify the efficacy of the model, three data with diagnostic labels were used for Diabetic Retinopathy (DR) and Retinitis Pigmentosa (RP). The first data set was collected from the Shenzhen ophthalmology hospital and was acquired by the Optos camera and included 398 DR lesion images, 473 RP lesion images and 948 normal retinal images. Raw images were collected from different patients enrolled in the hospital from 2016 to 2019. Three experts are involved in marking these image tags, which will only remain if at least two ophthalmologists agree to the disease tag. In the experiment, normal retinal images were randomized for DR and RP studies, with 70% for training and 30% for testing. The images used for DR and RP experiments are referred to as DR-1 and RP-1, respectively.

The second data set is the public data set deep dr from ISBI 2020. This data set contained 2000 conventional fundus images and 256 ultrawide field images, however in the experiment only 150 training set ultrawide field images were used which provided public labels, 106 lesions and 44 normals. This image set is only used to test the method proposed by the present invention to verify the generalization performance of the method. The data set is referred to as DR-2 for short.

The third data set was the published data set Masumoto from a japanese hospital Tsukazaki, which contained 150 RP lesion images and 223 normal retinal images. Raw images were collected between 2011 and 2017. This image set was also used only to test the method proposed by the present invention to verify the generalization performance of the method, abbreviated RP-2.

2. Effects of composite images

Some retinal images and lesion details synthesized by using DR and RP data training are shown in fig. 3 and 4, and it can be seen from the drawings that the method of the present invention can synthesize a retinal image and related lesion details with better effect.

3. Data enhancement for lesion classification by composite images

In order to evaluate whether the synthesized data can enhance the training of the classification model, three different classical classification models, namely VGG-16, ResNet-50 and inclusion-V3, were trained as benchmarks in the experiment. And three experimental setups were set up. The first setting is to train the classification model using a real training set and test in the synthetic image, the second setting is to train using a real training set and test in the real test set, and the third setting is to use a mixture of real and synthetic images as the training set and test in the real test set. In three experimental settings, the number of samples in the composite image set is consistent with the number of samples in the training set of full images. The results of experiments using DR data and RP data are shown in tables 1 and 2, respectively.

From these tables, it can be seen that the images synthesized by the proposed method can enhance the training of the model. For the results of the first experimental setup, the model trained on real images can achieve the best accuracy on the synthesized images. The composite image is obtained from a training set of real images. Therefore, the distribution of the realistic synthetic image and the training set is similar. As can be seen from the second experimental setting and the third experimental setting, the performance of the model, especially the performance of the model on DR-2 and RP-2, is improved by using the synthetic image, which shows that the synthetic image can improve the characteristic diversity of the image, thereby improving the generalization performance of the model.

Table 1 shows the results of experiments using DR data to evaluate the enhancement performance of the composite image, and three comparative experimental settings show that the composite image can improve the accuracy and generalization performance of the model. Real in the table represents a training set of Real images, and Fake represents a composite image.

Table 2 results of experiments using RP data.

To verify the extent of the synthetic image enhancement model training, the training set was multiplied using synthetic images, and the experimental results are shown in fig. 5 and 6. In the figure, "1 x" represents a training set using only real retinal images, "2 x" represents a hybrid training set using synthetic images to augment one-fold data, and so on. It can be seen that as more retinal composite images are added to the extended hybrid training set, the baseline classification performance improves and reaches the best performance after approximately 7-8 fold expansion.

The invention has the innovation points that:

(1) a GAN synthesized from UWF images based on conditional input and lesion feature attention is proposed. The proposed lesion feature attention mechanism can enhance the lesion details of the composite image.

(2) The loss of feature matching is improved, and the performance of GAN training is improved. The loss function focuses on the characteristics of the lesion, so that the diversity of the synthesized image can be improved while the details of the lesion are enhanced.

(3) Weight-shared multi-arbiter outputs are designed to take advantage of the affine transformation enhanced model performance.

In summary, the present invention proposes a focus feature attention mechanism, which enhances the focus details of the synthesized retina image in the conditional GAN, and improves the feature matching loss to enhance the diversity of the synthesized retina image, and the designed weight sharing multi-discriminant can be trained by using the affine transformation enhanced model.

Exemplary device

As shown in fig. 7, an embodiment of the present invention provides a retinal image synthesis apparatus for generating an antagonistic network based on a lesion attention condition, the apparatus including a retinal blood vessel mask image acquisition module 301 and a retinal synthetic image acquisition module 302, wherein:

a retinal blood vessel mask map acquisition module 301, configured to acquire a retinal blood vessel mask map;

a retina composite image obtaining module 302, configured to input the retinal blood vessel mask map into a trained confrontation network generated based on a focus attention condition, so as to obtain a retina composite image.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 8. The intelligent terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to implement a retinal image synthesis method that generates an antagonistic network based on a focal attention condition. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the operating temperature of internal equipment.

Those skilled in the art will appreciate that the schematic diagram of fig. 8 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.

In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

obtaining a retinal blood vessel mask map;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

In summary, the present invention discloses a retinal image synthesis method for generating an antagonistic network based on a focal attention condition, the method comprising: obtaining a retinal blood vessel mask map; and inputting the retinal blood vessel mask image into a trained confrontation network generated based on focus attention conditions to obtain a retinal composite image. According to the invention, the retina composite image obtained by the confrontation network is generated based on the focus attention condition, so that the focus details of the composite image can be enhanced, the diversity of the composite image is improved, and the disease identification effect of the high-resolution image is enhanced.

Based on the above embodiments, the present invention discloses a retinal image synthesis method for generating an antagonistic network based on a focus attention condition, and it should be understood that the application of the present invention is not limited to the above examples, and it is obvious to those skilled in the art that modifications and changes can be made based on the above description, and all such modifications and changes are intended to fall within the scope of the appended claims.

Claims

1. A retinal image synthesis method for generating an antagonistic network based on a focal attention condition, the method comprising:

obtaining a retinal blood vessel mask map;

2. The retinal image synthesis method for generating an antagonistic network based on a lesion attention condition according to claim 1, wherein the generation of the antagonistic network based on the lesion attention condition comprises a generator, a weight-shared multi-output discriminator, a random forest classifier, a reverse activation network and an attention module, wherein the weight-shared multi-output discriminator outputs a disease discrimination result and a disease classification result; the random forest classifier is used for identifying the focus characteristics of the disease classification result output by the weight-shared multi-output discriminator; the reverse activation network is used for activating the lesion feature; the attention module is for fusing the lesion features into a decoder of a generator.

3. The retinal image synthesis method for generating an antagonistic network based on the lesion attention condition as claimed in claim 2, wherein the random forest classifier is used for identifying the lesion features of the disease classification result output by the weight-sharing multi-output discriminator, and specifically comprises:

4. The retinal image synthesis method for generating an antagonistic network based on a focal attention condition according to claim 2, wherein the training process for generating an antagonistic network based on a focal attention condition includes:

acquiring a random Gaussian noise vector;

5. The retinal image synthesis method for generating an antagonistic network based on a focal attention condition according to claim 4, wherein the retinal blood vessel mask is transformed from the real retinal image specifically as follows:

6. The retinal image synthesis method for generating a countermeasure network based on a focal attention condition of claim 4, wherein the loss function is obtained by adding a countermeasure loss function, a classification loss function, an activation feature matching loss function, and a perceptual loss.

7. The method of claim 6, wherein the activation feature matching loss function is used to calculate lesion features of the reverse activation network.

8. The retinal image synthesis method for generating an antagonistic network based on a focal attention condition according to claim 6, wherein the classification loss function is used for learning focal characteristics of the real retinal image.

9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs being configured to be executed by the one or more processors comprises instructions for performing the method of any of claims 1-8.

10. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-8.