CN110503636B

CN110503636B - Parameter adjustment method, focus prediction method, parameter adjustment device and electronic equipment

Info

Publication number: CN110503636B
Application number: CN201910723272.7A
Authority: CN
Inventors: 边成; 郑冶枫; 马锴
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2024-01-26
Anticipated expiration: 2039-08-06
Also published as: CN110503636A

Abstract

The disclosure provides a parameter adjustment method of a single-mode detection network, a focus prediction method in an eye image, a parameter adjustment device of the single-mode detection network and electronic equipment; relates to the technical field of artificial intelligence. The method comprises the following steps: performing feature coding on the first image features to obtain a first distribution function, and performing feature coding on the second image features to obtain a second distribution function; determining a first loss function value from the first distribution function and the second distribution function; determining the image characteristics to be compared according to the fusion of the sampling result of the second distribution function and the first image; and determining a second loss function value according to the comparison of the image features to be compared and the tags to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged. The method can overcome the problem of poor parameter adjustment effect to a certain extent, and further improves the processing effect of the network model on the input image.

Description

Parameter adjustment method, focus prediction method, parameter adjustment device and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, and relates to a machine learning technology, in particular to a parameter adjustment method of a single-mode detection network, a focus prediction method in an eye image, a parameter adjustment device of the single-mode detection network and electronic equipment.

Background

With the continuous development of artificial intelligence, more and more network models for image processing are presented to perform operations such as feature extraction, identification and classification on images.

Before the image is processed by the image processing network model, the network model needs to be trained, and a common training mode is supervised training. For example, a sample image is input to the network model, and a corresponding loss function is determined according to the comparison of the classification result output by the network model and the target classification result, so that the network parameters of the network model are adjusted according to the loss function, and further training of the network model is completed.

However, in the training method, the input sample image is usually a single sample image, which may make the parameter tuning effect poor, and further may affect the processing effect of the network model on the input image.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide a parameter adjustment method of a single-mode detection network, a focus prediction method in an eye image, a parameter adjustment device of the single-mode detection network and electronic equipment, which overcome the problem of poor parameter adjustment effect to a certain extent, and further improve the processing effect of a network model on an input image.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a parameter adjustment method of a single-mode detection network, including:

extracting features of the first image, performing feature coding on the extracted features of the first image to obtain a first distribution function, and performing feature coding on the extracted features of the second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by carrying out feature fusion on the first image and the second image;

determining a first loss function value from the first distribution function and the second distribution function;

determining the image characteristics to be compared according to the fusion of the sampling result of the second distribution function and the first image;

Determining a second loss function value according to the comparison of the image features to be compared and the tags to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes a first loss function value and a second loss function value.

In an exemplary embodiment of the present disclosure, feature encoding the extracted first image feature to obtain a first distribution function includes:

extracting first image features corresponding to the first image through a single-mode detection network;

and performing feature coding on the first image features to determine a first distribution function corresponding to the first image.

In an exemplary embodiment of the present disclosure, feature encoding the extracted second image feature to obtain a second distribution function includes:

fusing the first image and the second image through a multi-mode detection network, and generating a second image feature according to a fusion result;

and performing feature coding on the second image features to determine a second distribution function which corresponds to the first image and the second image together.

In an exemplary embodiment of the present disclosure, determining an image feature to be compared according to fusion of a sampling result of a second distribution function and a first image includes:

Fusing the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain a third image feature, and fusing the sampling result of the second distribution function with the second image feature to obtain a fourth image feature;

determining a third distribution function corresponding to the third image feature, and determining a fourth distribution function corresponding to the fourth image feature;

determining a third loss function value from the third distribution function and the fourth distribution function;

fusing the sampling result of the fourth distribution function with the third image feature to obtain a fifth image feature, and fusing the sampling result of the fourth distribution function with the fourth image feature to obtain a sixth image feature;

determining a fifth distribution function corresponding to the fifth image feature, and determining a sixth distribution function corresponding to the sixth image feature;

determining a fourth loss function value from the fifth distribution function and the sixth distribution function;

and fusing the sampling result of the sixth distribution function with the fifth image feature to obtain the image feature to be compared.

In an exemplary embodiment of the present disclosure, determining the second loss function value from the comparison of the image feature to be compared to the tag to be compared includes:

Carrying out feature processing on the image features to be compared, comparing the feature-processed image features to be compared with the tags to be compared, and determining a second loss function value according to the comparison result; the characteristic processing comprises convolution processing, pooling processing and nonlinear activation processing.

In an exemplary embodiment of the present disclosure, the parameter adjustment method of the single-mode detection network further includes:

the sum of the first, second, third, and fourth loss function values is determined as the loss function value.

In one exemplary embodiment of the present disclosure, adjusting network parameters of a single-mode detection network according to a first loss function value and a second loss function value until the loss function values converge comprises:

and adjusting network parameters of the single-mode detection network according to the loss function value until the loss function value is converged.

According to a second aspect of the present disclosure, there is provided a lesion prediction method in an eye image, comprising:

inputting the eye image into a single-mode detection network, and fitting a first distribution function corresponding to the eye image according to the single-mode detection network;

fusing the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain a second image feature;

Fitting a second distribution function corresponding to the second image feature according to the single-mode detection network, and fusing the sampling result of the second distribution function with the second image feature to obtain a third image feature;

fitting a third distribution function corresponding to the third image feature according to the single-mode detection network, fusing the sampling result of the third distribution function with the third image feature to obtain a fourth image feature, and predicting a focus in the eye image according to the fourth image feature;

the single-mode detection network is obtained by adjusting a parameter adjusting method of the single-mode detection network according to the first aspect.

According to a third aspect of the present disclosure, there is provided a parameter adjustment apparatus of a single-mode detection network, including a distribution function determination unit, a loss function value determination unit, a feature fusion unit, and a parameter adjustment unit, wherein:

the distribution function determining unit is used for extracting the characteristics of the first image, carrying out characteristic coding on the extracted characteristics of the first image to obtain a first distribution function, and carrying out characteristic coding on the extracted characteristics of the second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by carrying out feature fusion on the first image and the second image;

A loss function value determining unit configured to determine a first loss function value from the first distribution function and the second distribution function;

the feature fusion unit is used for determining the features of the images to be compared according to the fusion of the sampling result of the second distribution function and the first image;

the parameter adjusting unit is used for determining a second loss function value according to the comparison of the image characteristics to be compared and the labels to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes a first loss function value and a second loss function value.

In an exemplary embodiment of the present disclosure, the manner in which the distribution function determining unit performs feature encoding on the extracted first image feature to obtain the first distribution function is specifically:

the distribution function determining unit extracts first image features corresponding to the first image through a single-mode detection network;

the distribution function determining unit performs feature encoding on the first image features to determine a first distribution function corresponding to the first image.

In an exemplary embodiment of the present disclosure, the manner in which the distribution function determining unit performs feature encoding on the extracted second image feature to obtain the second distribution function is specifically:

The distribution function determining unit fuses the first image and the second image through the multi-mode detection network, and generates a second image feature according to a fusion result;

the distribution function determining unit performs feature encoding on the second image features to determine a second distribution function which corresponds to the first image and the second image in common.

In an exemplary embodiment of the present disclosure, a method for determining, by a feature fusion unit, features of an image to be compared according to fusion of a sampling result of a second distribution function and a first image is specifically:

the feature fusion unit fuses the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain a third image feature, and fuses the sampling result of the second distribution function with the second image feature to obtain a fourth image feature;

the feature fusion unit determines a third distribution function corresponding to the third image feature and determines a fourth distribution function corresponding to the fourth image feature;

the feature fusion unit determines a third loss function value according to the third distribution function and the fourth distribution function;

the feature fusion unit fuses the sampling result of the fourth distribution function with the third image feature to obtain a fifth image feature, and fuses the sampling result of the fourth distribution function with the fourth image feature to obtain a sixth image feature;

The feature fusion unit determines a fifth distribution function corresponding to the fifth image feature and determines a sixth distribution function corresponding to the sixth image feature;

the feature fusion unit determines a fourth loss function value according to the fifth distribution function and the sixth distribution function;

and the feature fusion unit fuses the fifth image feature with the sampling result of the sixth distribution function to obtain the image feature to be compared.

In an exemplary embodiment of the present disclosure, the manner in which the parameter adjustment unit determines the second loss function value according to the comparison of the image feature to be compared and the label to be compared is specifically:

the parameter adjusting unit performs feature processing on the image features to be compared, compares the feature-processed image features to be compared with the tags to be compared, and determines a second loss function value according to the comparison result; the characteristic processing comprises convolution processing, pooling processing and nonlinear activation processing.

In an exemplary embodiment of the present disclosure, the loss function value determining unit is further configured to determine a sum of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value as the loss function value.

In an exemplary embodiment of the present disclosure, the loss function value determining unit adjusts the network parameter of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges, specifically:

The loss function value determining unit adjusts the network parameters of the single-mode detection network according to the loss function value until the loss function value is converged.

According to a fourth aspect of the present disclosure, there is provided a lesion prediction device in an eye image, including a function fitting unit, an image feature fusion unit, and an image feature acquisition unit, wherein:

the function fitting unit is used for inputting the eye image into the single-mode detection network, and fitting a first distribution function corresponding to the eye image according to the single-mode detection network;

the image feature fusion unit is used for fusing the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain a second image feature;

the image feature acquisition unit is used for fitting a second distribution function corresponding to the second image feature according to the single-mode detection network, and fusing the sampling result of the second distribution function with the second image feature to obtain a third image feature;

the image feature acquisition unit is also used for fitting a third distribution function corresponding to the third image feature according to the single-mode detection network, fusing the sampling result of the third distribution function with the third image feature to obtain a fourth image feature, and predicting a focus in the eye image according to the fourth image feature;

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any of the above via execution of the executable instructions.

According to a sixth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

Exemplary embodiments of the present disclosure may have some or all of the following advantages:

in the method for adjusting parameters of a single-mode detection network according to an exemplary embodiment of the present disclosure, feature extraction may be performed on a first image (e.g., an OCT image), and feature encoding may be performed on the extracted first image feature to obtain a first distribution function, and feature encoding may be performed on the extracted second image feature to obtain a second distribution function; the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image (such as fundus image); determining a first loss function value from the first distribution function and the second distribution function; furthermore, the image characteristics to be compared can be determined according to the fusion of the sampling result of the second distribution function and the first image; furthermore, a second loss function value can be determined according to the comparison of the image features to be compared and the tags to be compared, and the network parameters of the single-mode detection network are adjusted according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes a first loss function value and a second loss function value. According to the scheme, on one hand, the problem of poor parameter adjusting effect can be overcome to a certain extent, and further the processing effect of the network model on the input image is improved; on the other hand, network parameters of a single-mode detection network (such as a single-mode focus detection network) can be adjusted in a multi-mode (can be understood as an image) fusion mode, so that training of the single-mode detection network is realized, the network training effect is improved, the classification effect and the identification effect of the single-mode detection network on an input image are further improved, and when the embodiment of the disclosure is applied to focus identification, the focus identification accuracy rate of fundus images can be improved.

It should be noted that the single-mode detection network obtained through training in the embodiment of the present disclosure may be applied to an eye disease recognition system. In the traditional eye disease identification system, the fundus image and the OCT image of a patient need to be input into the system in pairs, and then the system can combine the characteristics of the fundus image and the OCT image to determine a focus; the eye disease recognition system is equivalent to a multi-mode detection network, and the mode can be understood as an image. However, the single-mode detection network obtained through training in the embodiment of the present disclosure may determine the focus only through the fundus image or the OCT image, specifically, when the fundus image (or the OCT image) is input to the single-mode detection network, the image features extracted by the single-mode detection network may not only include the features of the fundus image (or the OCT image) but also may also include the features of the OCT image (or the fundus image), so that the requirement for the input image may be reduced under the condition of ensuring the recognition accuracy, and the focus recognition may be performed only by inputting a paired image in the past, where the focus recognition may be performed only by inputting a single image using the single-mode detection network in the embodiment of the present disclosure, so that the convenience of the focus recognition and the focus recognition efficiency may be improved to a certain extent.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which a parameter adjustment method of a single-mode detection network and a parameter adjustment apparatus of a single-mode detection network of embodiments of the present disclosure may be applied;

FIG. 2 illustrates a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of parameter adjustment of a single-mode detection network according to one embodiment of the disclosure;

fig. 4 schematically illustrates a flowchart of a method of lesion prediction in an ocular image according to one embodiment of the present disclosure;

Fig. 5 schematically illustrates a schematic view of an eye image according to one embodiment of the present disclosure;

FIG. 6 schematically illustrates an architecture diagram of a parameter adjustment method of a single-mode detection network according to one embodiment of the present disclosure;

fig. 7 schematically illustrates an architecture diagram of a lesion prediction method in an eye image according to one embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a parameter adjustment apparatus of a single-mode detection network in one embodiment according to the present disclosure;

fig. 9 schematically illustrates a block diagram of a lesion prediction device in an eye image according to one embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic diagram of a system architecture of an exemplary application environment to which a parameter adjustment method of a single-mode detection network and a parameter adjustment device of the single-mode detection network according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be various electronic devices with display screens including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The parameter adjustment method of the single-mode detection network and the lesion prediction method in the eye image provided by the embodiments of the present disclosure are generally executed by the server 105, and accordingly, the parameter adjustment device of the single-mode detection network and the lesion prediction device in the eye image are generally disposed in the server 105. However, it is easily understood by those skilled in the art that the parameter adjustment method of the single-mode detection network and the lesion prediction method in the eye image provided in the embodiments of the present disclosure may also be performed by the terminal devices 101, 102, 103, and accordingly, the parameter adjustment device of the single-mode detection network and the lesion prediction device in the eye image may also be provided in the terminal devices 101, 102, 103, which are not particularly limited in the present exemplary embodiment. For example, in an exemplary embodiment, the server 105 may perform feature extraction on the first image, perform feature encoding on the extracted first image feature to obtain a first distribution function, perform feature encoding on the extracted second image feature to obtain a second distribution function, further determine a first loss function value according to the first distribution function and the second distribution function, determine the image feature to be compared according to fusion of the sampling result of the second distribution function and the first image, further determine a second loss function value according to comparison of the image feature to be compared and the tag to be compared, and adjust network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges. The server 105 may further input the eye image into a single-mode detection network, fit a first distribution function corresponding to the eye image according to the single-mode detection network, fuse a sampling result of the first distribution function with a first image feature corresponding to the eye image to obtain a second image feature, fit the second distribution function corresponding to the second image feature according to the single-mode detection network, fuse a sampling result of the second distribution function with the second image feature to obtain a third image feature, further fit a third distribution function corresponding to the third image feature according to the single-mode detection network, fuse a sampling result of the third distribution function with the third image feature to obtain a fourth image feature, and predict a focus in the eye image according to the fourth image feature.

Fig. 2 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU) 201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data required for the system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input section 206 including a keyboard, a mouse, and the like; an output portion 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 208 including a hard disk or the like; and a communication section 209 including a network interface card such as a LAN card, a modem, and the like. The communication section 209 performs communication processing via a network such as the internet. The drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 210 as needed, so that a computer program read out therefrom is installed into the storage section 208 as needed.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 209, and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU) 201, performs the various functions defined in the methods and apparatus of the present application. In some embodiments, the computer system 200 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

In some embodiments, the computer system 200 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

In the early days of traditional machine learning, careful design of network parameters was required to shorten the difference between the predicted and actual results of neural networks. In the current machine learning era, people can enable the neural network to automatically optimize the network parameters according to the multi-layer loss function, and careful design of the network parameters is not required in many scenes.

The following describes the technical scheme of the embodiments of the present disclosure in detail:

with the continuous growth of the population of China and the aggravation of the aging of the population, the eye health form is becoming serious. It is counted that more than 50% of the individuals do not receive conventional ophthalmic examinations and that more than 90% of the individuals receive treatment after they have suffered. For example, about 1.1 million diabetics in China, more than 4000 million of which are caused by retinopathy, and later blindness is easily caused by such lesions without early intervention. If regular ophthalmic examinations are performed at an early stage of the disease, the risk of blindness can be reduced by 94.4%.

Optical coherence tomography (Optical Coherence tomography, OCT) is a new imaging technique capable of imaging various aspects of biological tissue, such as structural information, blood flow, and elastic parameters. The clarity of OCT for observing the structure of the fundus is generally higher than that of other examination methods, and when the fundus is observed by OCT, the retinal nerve fibers, the inner and outer subjects, the nuclear layer, the cone stem cell layer, the pigment epithelium layer and other eye tissues can be clearly distinguished, so that better effects can be obtained by OCT for diagnosing macular holes, central serous chorioretinopathy, cystoid edema and other eye diseases. Further, since the OCT apparatus can simultaneously obtain the fundus image and the OCT image at the time of imaging, it is possible to perform focus diagnosis on both modalities from the fundus image and the OCT image at the same time, so as to greatly reduce the risk of focus omission.

At present, a network model capable of classifying natural images is trained, so that eye diseases can be identified according to eye images. Conventional networks have only one fixed branch and are relatively difficult to deal with for problems with inconsistent target size (e.g., a dog may occupy a large portion of the area in one photograph and a small portion in another). Typically, a fixed-size convolution kernel cannot handle such asymmetry of information. And the InceptionV4 can increase the sampling information of the network to targets with different sizes in a mode of widening the network width. In addition, denseNet can increase the performance of the network from increasing the depth of the network. Because the gradient disappears in the training of the traditional network, the network cannot return when the gradient is 0 in the process of calculating the back propagation, and the training fails. Thus, denseNet proposes to use dense connections of all the front layers with the back layers to enhance the back propagation of gradients during training.

In general, both the imperceptin v4 and the DenseNet achieve performance improvement in different dimensions, but both consider only one mode (i.e., one image) as input, and a network with a single structure has no way to effectively process two modes of OCT image and fundus image at the same time, and the lack of information of a certain mode may cause problems that the parameter tuning effect is poor during training and the classification of partial data by the network during testing is incorrect.

Based on one or more of the above problems, the present exemplary embodiment provides a parameter adjustment method of a single-mode detection network. The parameter adjustment method of the single-mode detection network may be applied to the server 105, or may be applied to one or more of the terminal devices 101, 102, 103, which is not particularly limited in the present exemplary embodiment. Referring to fig. 3, the parameter adjustment method of the single-mode detection network may include the following steps S310 to S340:

step S310: extracting features of the first image, performing feature coding on the extracted features of the first image to obtain a first distribution function, and performing feature coding on the extracted features of the second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by feature fusion of the first image and the second image.

Step S320: a first loss function value is determined from the first distribution function and the second distribution function.

Step S330: and determining the image characteristics to be compared according to the fusion of the sampling result of the second distribution function and the first image.

Step S340: determining a second loss function value according to the comparison of the image features to be compared and the tags to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes a first loss function value and a second loss function value.

In the method for adjusting parameters of a single-mode detection network according to an exemplary embodiment of the present disclosure, feature extraction may be performed on a first image (e.g., an OCT image), and feature encoding may be performed on the extracted first image feature to obtain a first distribution function, and feature encoding may be performed on the extracted second image feature to obtain a second distribution function; the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image (such as fundus image); determining a first loss function value from the first distribution function and the second distribution function; furthermore, the image characteristics to be compared can be determined according to the fusion of the sampling result of the second distribution function and the first image; furthermore, a second loss function value can be determined according to the comparison of the image features to be compared and the tags to be compared, and the network parameters of the single-mode detection network are adjusted according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes a first loss function value and a second loss function value. According to the scheme, on one hand, the problem of poor parameter adjusting effect can be overcome to a certain extent, and further the processing effect of the network model on the input image is improved; on the other hand, network parameters of the single-mode detection network can be adjusted in a multi-mode (mode can be understood as image) fusion mode, so that training of the single-mode detection network is realized, the network training effect is improved, the classification effect and the identification effect of the single-mode detection network on input images are improved, and when the embodiment of the disclosure is applied to focus identification, the focus identification accuracy rate of fundus images can be improved.

In addition, the embodiments of the present disclosure can be applied to OCT and fundus disease detection algorithms for performing diagnosis of ocular diseases by inputting a pair of OCT images and fundus images of a patient, and then combining the OCT images and fundus images. Because of the unsupervised characteristic of the test stage, the system can be randomly embedded into an OCT or fundus screening system to provide information of another mode for the system as supplement so as to synchronously improve the diagnosis capability of fundus and OCT network.

Note that, the fundus image mentioned in the embodiments of the present disclosure may be taken by a fundus camera, and the OCT image may be taken by an OCT imaging apparatus. Alternatively, the fundus image and the OCT image may be acquired simultaneously by the OCT imaging apparatus, and the embodiments of the present disclosure are not limited. Wherein the OCT image is a coherence tomographic image of a specific location in the fundus image.

Further, the fundus camera described above is a medical camera for acquiring a retina image (i.e., fundus image) of a human eye. In addition, the OCT imaging apparatus is used to detect back-reflected or scattered signals of different depth layers of biological tissue facing incident weak coherent light by the basic principle of a weak coherent light interferometer, and a two-dimensional or three-dimensional structural image (i.e., OCT image) of the biological tissue can be obtained by scanning.

Next, the above steps of the present exemplary embodiment will be described in more detail.

In step S310, feature extraction is performed on the first image, feature encoding is performed on the extracted first image features to obtain a first distribution function, and feature encoding is performed on the extracted second image features to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by feature fusion of the first image and the second image.

In this example embodiment, both the first image and the second image may be eye images, where the eye images include fundus images and OCT images, and if the first image is a fundus image, then the second image is an OCT image; if the first image is an OCT image, then the second image is a fundus image. The fundus image may have a data size of 496×496, the oct image may have a data size of 496×x, and X may be 496, 768, or 1024.

In this exemplary embodiment, if the data size of the fundus image may be 496×496 and the data size of the oct image may be 496×496, the parameter adjustment method of the single-mode detection network may further include the steps of:

Calculating the ratio of the image mean value corresponding to the first image to the image variance and the ratio of the image mean value corresponding to the second image to the image variance so as to realize standardization of the first image and the second image;

and performing operations such as random rotation, random horizontal overturn, random elastic deformation or random speckle noise addition on the normalized first image and the normalized second image to obtain a plurality of groups of images with different forms, wherein each group of images comprises the first image and the second image, and the first image and the second image are used in the step S310.

In this exemplary embodiment, optionally, the manner of feature encoding the extracted first image feature to obtain the first distribution function is specifically:

In this example embodiment, the first image feature is a subset of the first distribution function, which may be in the form of a collection (of features).

In this example embodiment, the method for extracting the first image feature corresponding to the first image through the single-mode detection network specifically includes: performing convolution processing on the first image for N times through a single-mode detection network, performing nonlinear activation processing (namely, relu layer processing) on the convolution result, and determining the nonlinear activation processing result as a first image feature corresponding to the first image; wherein N is a positive integer.

In this example embodiment, the manner of feature encoding the first image feature to determine the first distribution function corresponding to the first image is specifically: inputting the first image characteristics into an encoder so that the encoder performs characteristic encoding on the first image characteristics, and further determining a first distribution function corresponding to the first image; wherein the Encoder, also known as the Encoder, is made up of a series of convolutions, activation layers, and batch normalization layers.

Therefore, according to the implementation of the alternative implementation mode, the first image can be subjected to feature coding through the encoder so as to determine the first distribution function, so that the first distribution function can be used for approaching the similarity degree of the first distribution function and the second distribution function, and the image feature generation effect of the single-mode detection network is improved.

In this example embodiment, optionally, the manner of feature encoding the extracted second image feature to obtain the second distribution function is specifically:

In this example embodiment, the second image feature is a subset of the second distribution function, which may be in the form of a collection (of features).

In this example embodiment, the method for fusing the first image and the second image through the multi-mode detection network and generating the second image feature according to the fusion result specifically includes: the first image and the second image are fused into an image to be processed through a multi-mode detection network, and then second image features corresponding to the image to be processed are extracted; the second image features comprise image features corresponding to the first image and image features corresponding to the second image.

Further, the method for extracting the second image features corresponding to the image to be processed specifically includes: carrying out N times of convolution processing on the image to be processed through a multi-mode detection network, carrying out nonlinear activation processing on a convolution result, and determining the nonlinear activation processing result as a second image characteristic corresponding to the image to be processed; wherein N is a positive integer.

In this example embodiment, the manner of feature encoding the second image feature to determine the second distribution function that corresponds to the first image and the second image together is specifically: and inputting the second image features into an encoder so that the encoder performs feature encoding on the second image features, and further determining a second distribution function which corresponds to the first image and the second image together.

Therefore, by implementing the alternative implementation mode, the second image can be subjected to feature coding through the encoder so as to determine the second distribution function, and the second distribution function can be further used for approaching the similarity degree of the second distribution function and the first distribution function, so that the image feature generation effect of the single-mode detection network is improved.

In step S320, a first loss function value is determined from the first distribution function and the second distribution function.

In this example embodiment, the first loss function value is used to characterize the difference between the first distribution function and the second distribution function.

In step S330, the image features to be compared are determined according to the fusion of the sampling result of the second distribution function and the first image.

In this example embodiment, the manner of determining the sampling result of the second distribution function may be: assigning the unknown number in the second distribution function to determine a second distribution function value according to the assigned expression, and taking the second distribution function value as a sampling result of the second distribution function; wherein the sampling result may include at least one second distribution function value.

In this example embodiment, optionally, a method for determining the feature of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image specifically includes:

In this example embodiment, the manner of fusing the sampling result according to the second distribution function with the first image feature corresponding to the first image to obtain the third image feature may specifically be: carrying out convolution processing on the first image characteristic for N times; wherein N is a positive integer; and fusing hidden variables acquired in Gaussian distribution fitted by the posterior encoder with the first image features after convolution processing, and carrying out maximum pooling processing on the fusion result to obtain a third image feature. In the same way, the manner of obtaining the fourth image feature, the fifth image feature and the sixth image feature is the same as the manner of obtaining the third image feature, and is not repeated here.

In this example embodiment, the sampling result of the second distribution function may be understood as: by hidden variables collected in the gaussian distribution fitted by the first posterior encoder. The sampling result of the fourth distribution function can be understood as: by hidden variables collected in the gaussian distribution fitted by the second posterior encoder. The sampling result of the sixth distribution function can be understood as: by hidden variables collected in the gaussian distribution fitted by the third posterior encoder.

In addition, it should be noted that the single-mode detection network may include three a priori networks, and the multi-mode detection network may include three a posteriori networks, each including a corresponding encoder; optionally, the number of a priori networks included in the single-mode detection network may be at least one, and the number of a priori networks included in the multi-mode detection network may be at least one, which is not limited by the embodiments of the present disclosure.

Wherein the first posterior network comprises a first posterior encoder, the second posterior network comprises a second posterior encoder, and the third posterior network comprises a third posterior encoder. Similarly, the first prior network includes a first prior encoder, the second prior network includes a second prior encoder, and the third prior network includes a third prior encoder. Both the a priori encoder and the a posteriori encoder are used to feature encode the image features to determine the corresponding distribution function.

In this example embodiment, the first distribution function is determined by a first a priori network and the second distribution function is determined by a first a priori network; the third distribution function is determined through a second prior network, and the fourth distribution function is determined through a second posterior network; the fifth distribution function is determined by a third a priori network and the sixth distribution function is determined by a third a priori network.

It can be seen that, implementing this alternative embodiment, three loss function values can be determined, so as to adjust the network parameters of the single-mode detection network according to the three loss function values, so as to improve the classification effect and the recognition effect of the single-mode detection network on the input image.

In step S340, determining a second loss function value according to the comparison between the image feature to be compared and the tag to be compared, and adjusting the network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges; wherein the loss function value includes a first loss function value and a second loss function value.

In this example embodiment, network parameters of a single-mode detection network are used to extract image features. The to-be-compared label can be a preset label corresponding to the input image, and if the to-be-compared label can be determined to be matched with the to-be-compared label through the classification of the to-be-compared image characteristics, the to-be-compared label shows that the classification effect of the single-mode detection network is good, and the label corresponding to the input image can be determined. In addition, the network parameters in the single-mode detection network can be pre-trained parameters on the image net data set of the Dense121 network (namely, a deep learning network), wherein the newly added convolution layer/fusion layer/coding decoding sub-network/full-connection layer can be initialized by using Gaussian distribution with 0.01 mean value of variance, and then corresponding convolution processing, fusion processing, feature coding processing or classification processing are carried out on the image features.

It should be noted that, the parameters and bias parameters of the convolution layer may be obtained by solving based on Adam's gradient descent method. The most basic optimization algorithm of the neural network is the back propagation algorithm plus the gradient descent method. The network parameters can be continuously converged to a global (or local) minimum by a gradient descent method, but because the number of layers of the neural network is too large, errors are usually required to be propagated from output to input layer by a back propagation algorithm, and the network parameters are updated layer by layer. Since the gradient direction is the fastest direction in which the function value becomes larger, the negative gradient direction is the fastest direction in which the function value becomes smaller. Iteration step by step along the negative gradient direction can quickly converge to the minimum of the function. In addition, adam is an optimizer combining the advantages of two optimization algorithms, adaGrad and RMSProp, for comprehensively considering the first moment estimate of the gradient (i.e., the mean of the gradient) and the second moment estimate (i.e., the non-centered variance of the gradient) to calculate the update step.

In this example embodiment, optionally, the determining the second loss function value according to the comparison between the image feature to be compared and the tag to be compared specifically includes:

In this example embodiment, the method for comparing the feature-processed image feature to be compared with the tag to be compared and determining the second loss function value according to the comparison result specifically includes: and inputting the feature-processed image features to be compared into a full-connection layer to determine the classification of the feature of the image to be compared, and determining a second loss function value according to the comparison of the corresponding label of the classification of the feature to be compared and the label to be compared.

It can be seen that by implementing this alternative embodiment, the loss function for adjusting the network parameter can be determined by comparing with the tag to be compared, and further, the image recognition effect of the network model is improved by the loss function.

In this example embodiment, optionally, the method for adjusting parameters of the single-mode detection network may further include the following steps:

Further, the method for adjusting the network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges specifically comprises the following steps:

In this example embodiment, the first, third, and fourth loss function values may each correspond to a KL loss function that is used to approximate the distribution between the three sets of a priori and a posterior networks. In addition, the second loss function corresponding to the second loss function value may be a cross entropy loss function, where the cross entropy loss function is used to guide the priori network to learn focus information existing in the fundus image or the OCT image or to classify and judge the focus; the focus information at least includes an area of the focus area, a center coordinate of the focus area, and the like, and embodiments of the present disclosure are not limited.

Therefore, by implementing the optional implementation manner, the network parameters can be adjusted through the loss function values, so that the network training effect can be improved, and the recognition effect of the single-mode detection network on the input image is improved.

Therefore, by implementing the parameter adjustment method of the single-mode detection network shown in fig. 3, the problem of poor parameter adjustment effect can be overcome to a certain extent, and the processing effect of the network model on the input image is further improved; and the network parameters of the single-mode detection network can be adjusted in a multi-mode fusion mode so as to realize the training of the single-mode detection network, improve the network training effect, further improve the classification effect and the identification effect of the single-mode detection network on the input image, and improve the focus identification accuracy rate on the fundus image when the embodiment of the disclosure is applied to focus identification.

In addition, the present exemplary embodiment also provides a lesion prediction method in an eye image. The lesion prediction method in the eye image may be applied to the server 105, or may be applied to one or more of the terminal devices 101, 102, 103, which is not particularly limited in the present exemplary embodiment. Referring to fig. 4, the lesion prediction method in the eye image may include the following steps S410 to S440:

step S410: and inputting the eye image into a single-mode detection network, and fitting a first distribution function corresponding to the eye image according to the single-mode detection network.

Step S420: and fusing the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain a second image feature.

Step S430: fitting a second distribution function corresponding to the second image feature according to the single-mode detection network, and fusing the sampling result of the second distribution function with the second image feature to obtain a third image feature.

Step S440: fitting a third distribution function corresponding to the third image feature according to the single-mode detection network, fusing the sampling result of the third distribution function with the third image feature to obtain a fourth image feature, and predicting a focus in the eye image according to the fourth image feature.

The single-mode detection network is obtained by adjusting a parameter adjusting method of the single-mode detection network according to the embodiment of the disclosure.

In step S410, the eye image is input into a single-mode detection network, and a first distribution function corresponding to the eye image is fitted according to the single-mode detection network.

In the present exemplary embodiment, the eye image may be an OCT image or a fundus image, and the embodiment of the present disclosure is not limited.

In this example embodiment, the manner of fitting the first distribution function corresponding to the eye image according to the single-mode detection network is specifically: carrying out N times of convolution processing on the eye image through a multi-mode detection network, carrying out nonlinear activation processing on a convolution result, and determining the nonlinear activation processing result as an image feature corresponding to the eye image; wherein N is a positive integer; and inputting the image features corresponding to the eye images into an encoder so that the encoder performs feature encoding on the image features, and further determining a first distribution function corresponding to the image features corresponding to the eye images.

In step S420, the sampling result of the first distribution function is fused with the first image feature corresponding to the eye image, so as to obtain the second image feature.

In this exemplary embodiment, the manner of obtaining the second image feature in step S420 is the same as the manner of obtaining the third image feature in fig. 3, please refer to the example shown in fig. 3, and the description thereof is omitted here.

In step S430, a second distribution function corresponding to the second image feature is fitted according to the single-mode detection network, and the sampling result of the second distribution function is fused with the second image feature to obtain a third image feature.

In this exemplary embodiment, the manner of obtaining the third image feature in step S430 is the same as that of obtaining the third image feature in fig. 3, please refer to the example shown in fig. 3, and the description thereof is omitted here.

In step S440, a third distribution function corresponding to the third image feature is fitted according to the single-mode detection network, and the sampling result of the third distribution function is fused with the third image feature to obtain a fourth image feature, and a focus in the eye image is predicted according to the fourth image feature.

In this example embodiment, if the eye image is an OCT image, the image features of the fundus image are implied from the first image feature, the second image feature, the third image feature, and the fourth image feature corresponding to the OCT image; wherein the OCT image corresponds to the fundus image.

In this exemplary embodiment, the method for predicting the lesion in the eye image according to the fourth image feature is specifically: and carrying out feature recognition on the fourth image feature to determine a feature part of the corresponding focus position, and marking the focus position in the corresponding eye image according to the feature part.

Therefore, by implementing the lesion prediction method in the eye image shown in fig. 4, the features of the fundus image corresponding to the OCT image can be generated through the upper branch of the unsupervised multimode fusion network (i.e., the above-mentioned single-mode detection network), so that the determination efficiency and the determination accuracy of the lesion position are improved; in addition, the method can be applied to a fundus intelligent diagnosis system or an OCT intelligent diagnosis system, so that the diagnosis capability of various-mode intelligent screening systems can be improved; in addition, as the expansibility of the method is strong, the eye image of at least one mode can be supported, and thus, the effect of screening eye diseases can be improved.

Referring to fig. 5, fig. 5 schematically illustrates a schematic view of an eye image according to one embodiment of the present disclosure. As shown in fig. 5, the eye image 500 includes a fundus image 501 and an OCT image 502. The position of the black arrow in the fundus image 501 (the position of the eye focus can also be understood as the position) corresponds to the OCT image 502, and the eye image 500 is the eye image related to the embodiment of fig. 4. In addition, the fundus image 501 and OCT image 502 may train the single-modality detection network as a first image and a second image.

Referring to fig. 6 in conjunction with the eye image shown in fig. 5, fig. 6 schematically illustrates a framework diagram of a parameter adjustment method of a single-mode detection network according to one embodiment of the present disclosure. As shown in fig. 6, an architecture diagram of a parameter adjustment method of a single-mode detection network includes: the first image 601, the second image 602, N stacked convolutional and nonlinear activation layers 603, a maximum pooling layer 604, a fusion layer 605, a maximum pooling layer 606, 3 convolutional layers + pooling layers + nonlinear activation layers 607, a full connection layer 608, a first prior network 609, a second prior network 610, a third prior network 611, a first posterior network 612, a second posterior network 613, and a third posterior network 614.

Specifically, a first distribution function corresponding to the first image 601 may be determined according to the N stacked convolutional layers and nonlinear activated layers 603, the maximum pooling layer 604, and the first a priori network 609, and a second distribution function corresponding to the first image 601 and the second image 602 together may be determined according to the N stacked convolutional layers and nonlinear activated layers 603 (where N is a positive integer), the maximum pooling layer 604, and the first a priori network 612; further, the second distribution function may be sampled according to the first posterior network 612, and the obtained sampling result is fused with the first image feature corresponding to the first image 601 to obtain a third image feature, so as to input the next N stacked convolution layers and nonlinear activation layers 603; wherein the first image feature is obtained by processing the first image 601 by N stacked convolution layers and nonlinear activation layers 603 and a maxima pooling layer 604; further, the sampling result of the second distribution function may be fused with the second image feature to obtain a fourth image feature, so as to input the next N stacked convolution layers and nonlinear activation layers 603; wherein the second image feature is obtained by processing the first image 601 and the second image 602 by N stacked convolution layers and nonlinear activation layers 603 and a maxima pooling layer 604; further, a third distribution function corresponding to a third image feature may be determined through the second prior network 610 and a fourth distribution function corresponding to a fourth image feature may be determined through the second posterior network 613; further, a third loss function value may be determined from the third distribution function and the fourth distribution function; further, the fourth distribution function may be sampled according to the second posterior network 613, and the sampling result may be fused with the third image feature to obtain a fifth image feature, so as to input the next N stacked convolution layers and nonlinear activation layers 603; further, the sampling result of the fourth distribution function and the fourth image feature may be fused to obtain a sixth image feature; further, a fifth distribution function corresponding to the fifth image feature may be determined through the third prior network 611, and a sixth distribution function corresponding to the sixth image feature may be determined, so as to determine a fourth loss function value according to the fifth distribution function and the sixth distribution function; further, the sixth distribution function may be sampled according to the third posterior network 614, and the sampling result may be fused with the fifth image feature to obtain the image feature to be compared; furthermore, the image features to be compared can be subjected to feature processing through the 3 convolution layers, the pooling layer and the nonlinear activation layer 607, the feature processing results are subjected to global pooling, the full connection layer 608 is input to compare the image features to be compared with the labels to be compared, and the comparison results are trained through the cross entropy loss function.

Wherein the first, second and third loss function values are determined from the KL loss function.

In addition, any of the above "input the next N stacked convolution layers and nonlinear activation layers 603" needs to go through the fusion layer 605 and the maximum value pooling layer 606.

Therefore, the parameter adjustment method of the single-mode detection network shown in fig. 3 is executed in combination with the architecture diagram shown in fig. 6, so that the problem of poor parameter adjustment effect can be overcome to a certain extent, and the processing effect of the network model on the input image is further improved; and the network parameters of the single-mode detection network can be adjusted in a multi-mode fusion mode so as to train the single-mode detection network, improve the network training effect and further improve the classification effect and the recognition effect of the single-mode detection network on the input image.

Referring to fig. 7 in conjunction with the eye image shown in fig. 5, fig. 7 schematically illustrates a block diagram of a lesion prediction method in the eye image according to one embodiment of the present disclosure. As shown in fig. 7, the architecture diagram of the lesion prediction method in the eye image includes: a first image 701, N stacked convolutional and nonlinear activation layers 702, a max-pooling layer 703, a fusion layer 704, a max-pooling layer 705, 3 convolutional layers + pooling layers + nonlinear activation layers 706, a full connection layer 707, a first prior network 708, a second prior network 709, and a third prior network 710.

Specifically, a first distribution function corresponding to the first image 701 may be determined according to the N stacked convolution layers and nonlinear activation layers 702, the maximum pooling layer 703, and the first prior network 708; furthermore, the first distribution function may be sampled through the first prior network 708, and the sampling result is fused with the first image feature corresponding to the eye image, so as to obtain a second image feature, so as to input the next N stacked convolution layers and nonlinear activation layers 702; further, a second distribution function corresponding to the second image feature may be determined according to the second prior network 708, the second distribution function is sampled through the second prior network 708, and the sampling result is fused with the second image feature to obtain a third image feature; further, a third distribution function corresponding to the third image feature may be determined according to the third prior network 710, the third distribution function is sampled through the third prior network 710, and the sampling result is fused with the third image feature to obtain a fourth image feature; furthermore, the feature processing can be performed on the fourth image feature through the 3 convolution layers, the pooling layer and the nonlinear activation layer 706, the feature processing result is globally pooled, and the full connection layer 708 is input to compare the fourth image feature with the label to be compared, so as to obtain the label corresponding to the first image 701 and the predicted focus position.

In addition, any of the above "input the next N stacked convolution layers and nonlinear activation layers 702" needs to go through the fusion layer 701 and the maximum pooling layer 705.

It can be seen that, by executing the lesion prediction method in the eye image shown in fig. 4 in combination with the architecture diagram shown in fig. 7, the features of the fundus image corresponding to the OCT image can be generated through the upper branch of the unsupervised multi-modal fusion network (i.e., the above-mentioned single-mode detection network), so that the determination efficiency and the determination accuracy of the lesion position are improved; in addition, the method can be applied to a fundus intelligent diagnosis system or an OCT intelligent diagnosis system, so that the diagnosis capability of various-mode intelligent screening systems can be improved; in addition, as the expansibility of the method is strong, the eye image of at least one mode can be supported, and thus, the effect of screening eye diseases can be improved.

Further, in the present exemplary embodiment, a parameter adjustment apparatus 800 of a single-mode detection network is also provided. The parameter adjusting apparatus 800 of the single-mode detection network can be applied to a server or a terminal device. Referring to fig. 8, the parameter adjusting apparatus 800 of the single-mode detection network may include: a distribution function determination unit 801, a loss function value determination unit 802, a feature fusion unit 803, and a parameter adjustment unit 804, wherein:

A distribution function determining unit 801, configured to perform feature extraction on a first image, perform feature encoding on extracted features of the first image to obtain a first distribution function, and perform feature encoding on extracted features of a second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by carrying out feature fusion on the first image and the second image;

a loss function value determining unit 802 for determining a first loss function value according to the first distribution function and the second distribution function;

the feature fusion unit 803 is configured to determine the feature of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

a parameter adjustment unit 804, configured to determine a second loss function value according to the comparison between the image feature to be compared and the tag to be compared, and adjust the network parameter of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges; wherein the loss function value includes a first loss function value and a second loss function value.

Therefore, implementing the parameter adjusting device 800 of the single-mode detection network shown in fig. 8 can overcome the problem of poor parameter adjustment effect to a certain extent, thereby improving the processing effect of the network model on the input image; and the network parameters of the single-mode detection network can be adjusted in a multi-mode fusion mode so as to realize the training of the single-mode detection network, improve the network training effect, further improve the classification effect and the identification effect of the single-mode detection network on the input image, and improve the focus identification accuracy rate on the fundus image when the embodiment of the disclosure is applied to focus identification.

In an exemplary embodiment of the present disclosure, the manner in which the distribution function determining unit 801 performs feature encoding on the extracted first image feature to obtain the first distribution function is specifically:

the distribution function determining unit 801 extracts first image features corresponding to the first image through a single-mode detection network;

the distribution function determining unit 801 performs feature encoding on the first image feature to determine a first distribution function corresponding to the first image.

It can be seen that, by implementing the exemplary embodiment, the encoder can perform feature encoding on the first image to determine the first distribution function, so that the first distribution function can be used for approximating the similarity degree between the first distribution function and the second distribution function, and the image feature generation effect of the single-mode detection network is improved.

In an exemplary embodiment of the present disclosure, the manner in which the distribution function determining unit 801 performs feature encoding on the extracted second image feature to obtain the second distribution function is specifically:

the distribution function determining unit 801 fuses the first image and the second image through the multi-mode detection network, and generates a second image feature according to the fusion result;

the distribution function determining unit 801 performs feature encoding on the second image feature to determine a second distribution function to which the first image and the second image correspond in common.

It can be seen that, by implementing the exemplary embodiment, the second image can be subjected to feature encoding by the encoder so as to determine the second distribution function, and thus, the second distribution function can be used for approximating the similarity degree of the second distribution function and the first distribution function, so that the image feature generation effect of the single-mode detection network is improved.

In an exemplary embodiment of the present disclosure, the manner in which the feature fusion unit 803 determines the feature of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image is specifically:

the feature fusion unit 803 fuses the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain a third image feature, and fuses the sampling result of the second distribution function with the second image feature to obtain a fourth image feature;

the feature fusion unit 803 determines a third distribution function corresponding to the third image feature, and determines a fourth distribution function corresponding to the fourth image feature;

the feature fusion unit 803 determines a third loss function value from the third distribution function and the fourth distribution function;

the feature fusion unit 803 fuses the sampling result of the fourth distribution function with the third image feature to obtain a fifth image feature, and fuses the sampling result of the fourth distribution function with the fourth image feature to obtain a sixth image feature;

The feature fusion unit 803 determines a fifth distribution function corresponding to the fifth image feature, and determines a sixth distribution function corresponding to the sixth image feature;

the feature fusion unit 803 determines a fourth loss function value from the fifth distribution function and the sixth distribution function;

the feature fusion unit 803 fuses the fifth image feature with the sampling result of the sixth distribution function, so as to obtain the image feature to be compared.

It can be seen that implementing this exemplary embodiment, three loss function values can be determined to adjust network parameters of the single-mode detection network according to the three loss function values, so as to improve the classification effect and the recognition effect of the single-mode detection network on the input image.

In an exemplary embodiment of the present disclosure, the manner in which the parameter adjustment unit 804 determines the second loss function value according to the comparison of the image feature to be compared and the label to be compared is specifically:

the parameter adjusting unit 804 performs feature processing on the image features to be compared, compares the feature-processed image features to be compared with the tags to be compared, and determines a second loss function value according to the comparison result; the characteristic processing comprises convolution processing, pooling processing and nonlinear activation processing.

It can be seen that, by implementing the exemplary embodiment, the loss function for adjusting the network parameters can be determined by comparing with the label to be compared, and then the image recognition effect of the network model is improved through the loss function.

In an exemplary embodiment of the present disclosure, the loss function value determining unit 802 is further configured to determine a sum of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value as the loss function value.

In an exemplary embodiment of the present disclosure, the loss function value determining unit 802 adjusts the network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges in the following manner:

the loss function value determination unit 802 adjusts the network parameters of the single-mode detection network according to the loss function value until the loss function value converges.

It can be seen that, by implementing this exemplary embodiment, the network parameters can be adjusted through the loss function values described above, so that the network training effect can be improved, and the recognition effect of the single-mode detection network on the input image can be further improved.

Further, in the present exemplary embodiment, a lesion prediction device 900 in an eye image is also provided. The lesion prediction device 900 in the eye image may be applied to a server or terminal device. Referring to fig. 9, the lesion prediction device 900 in the eye image may include: a function fitting unit 901, an image feature fusion unit 902, and an image feature acquisition unit 903, wherein:

The function fitting unit 901 is configured to input an eye image into a single-mode detection network, and fit a first distribution function corresponding to the eye image according to the single-mode detection network;

the image feature fusion unit 902 is configured to fuse a sampling result of the first distribution function with a first image feature corresponding to the eye image to obtain a second image feature;

the image feature obtaining unit 903 is configured to fit a second distribution function corresponding to the second image feature according to the single-mode detection network, and fuse a sampling result of the second distribution function with the second image feature to obtain a third image feature;

the image feature obtaining unit 903 is further configured to fit a third distribution function corresponding to the third image feature according to the single-mode detection network, fuse a sampling result of the third distribution function with the third image feature, obtain a fourth image feature, and predict a focus in the eye image according to the fourth image feature;

As can be seen, implementing the lesion prediction device 900 in the eye image shown in fig. 9, the features of the fundus image corresponding to the OCT image can be generated through the upper branch of the unsupervised multi-modal fusion network (i.e., the above-mentioned single-mode detection network), so as to improve the determination efficiency and determination accuracy of the lesion position; in addition, the method can be applied to a fundus intelligent diagnosis system or an OCT intelligent diagnosis system, so that the diagnosis capability of various-mode intelligent screening systems can be improved; in addition, as the expansibility of the method is strong, the eye image of at least one mode can be supported, and thus, the effect of screening eye diseases can be improved.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Since each functional module of the parameter adjustment device of the single-mode detection network according to the exemplary embodiment of the present disclosure corresponds to a step of the foregoing exemplary embodiment of the parameter adjustment method of the single-mode detection network, for details not disclosed in the embodiment of the apparatus of the present disclosure, please refer to the foregoing embodiment of the parameter adjustment method of the single-mode detection network of the present disclosure.

In addition, since each functional module of the lesion prediction device in the eye image according to the exemplary embodiment of the present disclosure corresponds to a step of the exemplary embodiment of the lesion prediction method in the eye image described above, for details not disclosed in the embodiment of the device of the present disclosure, please refer to the embodiment of the lesion prediction method in the eye image described above according to the present disclosure.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for adjusting parameters of a single-mode detection network, comprising:

extracting features of the first image, performing feature coding on the extracted features of the first image to obtain a first distribution function, and performing feature coding on the extracted features of the second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by feature fusion of the first image and the second image;

determining a second loss function value according to the comparison of the image features to be compared and the tags to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes the first loss function value and the second loss function value;

the determining the image feature to be compared according to the fusion of the sampling result of the second distribution function and the first image comprises the following steps:

2. The method of claim 1, wherein feature encoding the extracted first image feature to obtain a first distribution function comprises:

and carrying out feature coding on the first image features to determine a first distribution function corresponding to the first image.

3. The method of claim 1, wherein feature encoding the extracted second image features to obtain a second distribution function comprises:

and carrying out feature coding on the second image features to determine a second distribution function which corresponds to the first image and the second image together.

4. The method of claim 1, wherein determining a second loss function value based on the comparison of the image feature to be compared to the tag to be compared comprises:

carrying out feature processing on the image features to be compared, comparing the feature-processed image features to be compared with the tags to be compared, and determining a second loss function value according to the comparison result; wherein the feature processing includes convolution processing, pooling processing, and nonlinear activation processing.

5. The method as recited in claim 1, further comprising:

Determining a sum of the first, second, third, and fourth loss function values as a loss function value.

6. The method of claim 5, wherein adjusting network parameters of a single-mode detection network based on the first and second loss function values until the loss function values converge comprises:

7. A method of lesion prediction in an ocular image, comprising:

inputting an eye image into a single-mode detection network, and fitting a first distribution function corresponding to the eye image according to the single-mode detection network;

fitting a second distribution function corresponding to the second image feature according to the single-mode detection network, and fusing a sampling result of the second distribution function with the second image feature to obtain a third image feature;

Fitting a third distribution function corresponding to the third image feature according to the single-mode detection network, fusing a sampling result of the third distribution function with the third image feature to obtain a fourth image feature, and predicting a focus in the eye image according to the fourth image feature;

wherein the single-mode detection network is adapted according to the method of any one of claims 1-6.

8. A parameter adjustment apparatus for a single-mode detection network, comprising:

the distribution function determining unit is used for extracting the characteristics of the first image, carrying out characteristic coding on the extracted characteristics of the first image to obtain a first distribution function, and carrying out characteristic coding on the extracted characteristics of the second image to obtain a second distribution function; the first image features correspond to the first image, and the second image features are obtained by feature fusion of the first image and the second image;

a loss function value determination unit configured to determine a first loss function value from the first distribution function and the second distribution function;

The parameter adjusting unit is used for determining a second loss function value according to the comparison of the image features to be compared and the tags to be compared, and adjusting network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value is converged; wherein the loss function value includes the first loss function value and the second loss function value;

wherein, the feature fusion unit is further used for:

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-7 via execution of the executable instructions.

10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-7.