CN114972278A - Training method based on complementary attention - Google Patents

Training method based on complementary attention Download PDF

Info

Publication number
CN114972278A
CN114972278A CN202210631057.6A CN202210631057A CN114972278A CN 114972278 A CN114972278 A CN 114972278A CN 202210631057 A CN202210631057 A CN 202210631057A CN 114972278 A CN114972278 A CN 114972278A
Authority
CN
China
Prior art keywords
neural network
artificial neural
examples
loss function
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210631057.6A
Other languages
Chinese (zh)
Inventor
彭璨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Siji Intelligent Control Technology Co ltd
Original Assignee
Shenzhen Siji Intelligent Control Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Siji Intelligent Control Technology Co ltd filed Critical Shenzhen Siji Intelligent Control Technology Co ltd
Publication of CN114972278A publication Critical patent/CN114972278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The present disclosure describes a complementary attention-based training method, comprising: preparing a training data set, wherein the training data set comprises a plurality of examination images with lesions and labeling images which are related to the examination images and have lesion labeling results; extracting features of the inspection image to obtain a feature map, and processing the inspection image based on an attention mechanism to obtain an attention heat map; the method comprises the steps of classifying inspection images by using a first artificial neural network, obtaining a first loss function by combining with a labeled image, classifying the inspection images by using a second artificial neural network module based on a feature map and an attention heat map, obtaining a second loss function by combining with a labeling result, and obtaining a third loss function by using a third artificial neural network to perform disease-free discrimination on the inspection images. By combining the three loss functions, the accuracy of identifying the tissue lesion can be effectively improved.

Description

Training method based on complementary attention
The application is a divisional application with the application date of 2020, 11/27, the application number of 2020113599702, the name of the invention being a training method and a training system for tissue lesion recognition based on an artificial neural network.
Technical Field
The present disclosure relates generally to complementary attention-based training methods.
Background
With the development and maturity of artificial intelligence technology, the artificial intelligence technology has gradually been popularized in various aspects of the medical field. In particular, medical imaging in medicine is a popular field of application of artificial intelligence technology. Medical imaging is a useful tool for diagnosing many diseases, and a large amount of medical image data is generated during the medical imaging process, and the processing and identification of the image data requires a large amount of time for a physician, and it is difficult to ensure the accuracy of the identification. In the medical image, the tissue lesion recognition is mainly carried out on the tissue in the image by using an artificial intelligence technology so as to improve the accuracy of the tissue lesion recognition.
At present, a Convolutional Neural Network (CNN) is generally used in the identification of medical images by applying an artificial intelligence technique. The convolution structure of the convolutional neural network can reduce the memory amount occupied by the deep network, and the convolutional neural network has three key operations, namely local receptive fields, weight sharing and pooling layers. Therefore, the number of parameters of the network can be effectively reduced, and the overfitting problem of the convolutional neural network is relieved. The structure of the convolutional neural network can be well adapted to the structure of the medical image and the characteristics can be extracted and identified.
However, for some lesion parts such as fundus lesion parts, lesion areas are relatively small and distributed irregularly, and a general convolutional neural network applying an attention mechanism often easily ignores a lesion area with low attention in an attention heat map, so that a phenomenon of misjudgment occurs, and therefore, the accuracy of tissue lesion identification of the lesion areas is low.
Disclosure of Invention
The present disclosure has been made in view of the above-described state of the art, and an object of the present disclosure is to provide a training method and a training system for tissue lesion recognition based on an artificial neural network, which can effectively improve the accuracy of tissue lesion recognition.
To this end, the first aspect of the present disclosure provides a training method for tissue lesion recognition based on an artificial neural network, including: preparing a training data set, wherein the training data set comprises a plurality of examination images and an annotation image related to the examination images, and the annotation image comprises an annotation result with a lesion or an annotation result without a lesion; inputting the training data set into an artificial neural network module to perform feature extraction on the inspection image to obtain a feature map, and processing the feature map based on an attention mechanism to obtain an attention heat map, and processing the attention heat map based on a complementary attention mechanism to obtain a complementary attention heat map; the artificial neural network module comprises a first artificial neural network, a second artificial neural network and a third artificial neural network; performing feature extraction on the inspection image by using the first artificial neural network to obtain the feature map, obtaining the attention heat map indicating a diseased region and the complementary attention heat map indicating a non-diseased region by using the second artificial neural network, the inspection image being composed of the diseased region and the non-diseased region, recognizing the inspection image by using the third artificial neural network based on the feature map to obtain a first recognition result, recognizing the inspection image by using the third artificial neural network based on the feature map and the attention heat map to obtain a second recognition result, and recognizing the inspection image by using the third artificial neural network based on the feature map and the complementary attention heat map to obtain a third recognition result; combining the first recognition result with the annotation image to obtain a first loss function when the attention mechanism is not used, combining the second recognition result with the annotation image to obtain a second loss function when the attention mechanism is used, combining the third recognition result with the annotation image with an annotation result without a lesion to obtain a third loss function when the complementary attention mechanism is used, and obtaining a total loss function including a first loss term based on the first loss function, a second loss term based on a difference between the second loss function and the first loss function, and a third loss term based on the third loss function by using the first loss function, the second loss function, and the third loss function, and optimizing the artificial neural network module by using the total loss function. In this case, the first recognition result, the second recognition result, and the third recognition result can be obtained, and the total loss function can be obtained based on the first recognition result, the second recognition result, and the third recognition result, so that the artificial neural network module can be optimized using the total loss function, and the accuracy of tissue lesion recognition of the artificial neural network module can be improved.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the total loss function further includes a total area term of the attention heat map, and the total area term is used for evaluating an area of the lesion region. In this case, it is possible to estimate the area of the lesion region within the attention heat map using the fifth loss term and control the number of pixels in the attention heat map that have a greater influence on the recognition result, thereby limiting the attention of the network to pixels that have a greater influence on the recognition result.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the total loss function further includes a regularization term for the attention heat map. In this case, the artificial neural network module overfitting can be suppressed.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the first artificial neural network, the second artificial neural network, and the third artificial neural network are trained simultaneously. In this case, the training speed can be increased.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the third artificial neural network includes an input layer, an intermediate layer, and an output layer connected in sequence, and the output layer is configured to output a recognition result reflecting the inspection image. In this case, the recognition result reflecting the tissue image can be output using the third artificial neural network.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the training mode of the artificial neural network module is weak supervision. In this case, the recognition result with a large amount of information can be obtained by the artificial neural network module using the labeling result with a small amount of information.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the first loss function is used to evaluate a degree of inconsistency between a recognition result and the annotation result of the examination image when the attention mechanism is not used. In this case, the accuracy of tissue lesion identification by the artificial neural network module when the attention mechanism is not used can be improved.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the second loss function is used to evaluate a degree of inconsistency between a recognition result and the labeling result of the examination image when the attention mechanism is used. In this case, the accuracy of tissue lesion identification by the artificial neural network module when using the attention mechanism can be improved.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the third loss function is used to evaluate a degree of inconsistency between a recognition result of the examination image when the complementary attention mechanism is used and a labeling result without a lesion. In this case, the accuracy of tissue lesion identification by the artificial neural network module when using the complementary attention mechanism can be improved.
In addition, in the training method for tissue lesion recognition based on an artificial neural network according to the first aspect of the present disclosure, optionally, the artificial neural network module is optimized by using the total loss function to minimize the total loss function. In this case, the total loss function can be minimized to improve the accuracy of tissue lesion identification by the artificial neural network module.
In addition, in the training method for artificial neural network-based tissue lesion recognition according to the first aspect of the present disclosure, optionally, the tissue lesion is a fundus lesion. In this case, the recognition result of the fundus image with respect to the fundus lesion can be obtained by the artificial neural network module.
The second aspect of the present disclosure provides a training system for tissue lesion recognition based on an artificial neural network, which is characterized by being trained by using the training method provided by the first aspect of the present disclosure. In this case, the artificial neural network module can be trained using a training system.
According to the present disclosure, a training method and a training system for tissue lesion recognition based on an artificial neural network can be provided, which can effectively improve the accuracy of tissue lesion recognition.
Drawings
Embodiments of the present disclosure will now be explained in further detail, by way of example only, with reference to the accompanying drawings, in which:
fig. 1 is a schematic diagram illustrating an electronic device to which examples of the present disclosure relate.
Fig. 2 is a diagram illustrating a tissue image according to an example of the present disclosure.
Fig. 3 is a block diagram illustrating an identification system for tissue lesion identification based on an artificial neural network according to an example of the present disclosure.
Figure 4 is a block diagram illustrating one example of an artificial neural network module to which examples of the present disclosure relate.
Fig. 5 is a block diagram illustrating a variation of an artificial neural network module to which examples of the present disclosure relate.
Fig. 6 is a schematic diagram illustrating a structure of a first artificial neural network according to an example of the present disclosure.
Fig. 7 is a block diagram illustrating a training system for tissue lesion recognition based on an artificial neural network according to an example of the present disclosure.
Fig. 8 is a flow chart illustrating a training method for tissue lesion recognition based on an artificial neural network according to an example of the present disclosure.
Fig. 9(a) is a schematic diagram showing an example of a fundus image obtained without using attention mechanism training according to an example of the present disclosure.
Fig. 9(b) is a schematic diagram showing an example of a lesion region of a fundus image obtained using the complementary attention mechanism training according to an example of the present disclosure.
The main reference numbers: 1 … electronic device, 10 … processor, 20 … memory, 30 … computer program, 40 … recognition system, 410 … acquisition module, 4200 … trunk neural network, 420 … artificial neural network module, 421 … first artificial neural network, 422 … second artificial neural network, 423 … third artificial neural network, 424 … feature combination module, 430 … training system, 431 … storage module, 432 … processing module, 433 … optimization module, C1 … first convolutional layer, C2 … second convolutional layer, C3 … third convolutional layer, S1 … first pooling layer, S2 … second pooling layer, S3 … third pooling layer
Detailed Description
Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same components are denoted by the same reference numerals, and redundant description thereof is omitted. The drawings are schematic and the ratio of the dimensions of the components and the shapes of the components may be different from the actual ones.
Fig. 1 is a schematic diagram illustrating an electronic device according to an embodiment of the present disclosure.
As shown in fig. 1, the identification system 40 for tissue lesion identification based on artificial neural network according to the present disclosure may be carried by an electronic device 1 (e.g., a computer). In some examples, the electronic device 1 may include one or more processors 10, a memory 20, and a computer program 30 disposed in the memory 20. The one or more processors 10 may include, among other things, a central processing unit, a graphics processing unit, and any other electronic components capable of processing data. For example, the processor 10 may execute instructions stored on the memory 20.
In some examples, memory 20 may be a computer-readable medium that can be used to carry or store data. In some examples, the Memory 20 may include, but is not limited to, a non-volatile Memory or a Flash Memory (Flash Memory), or the like. In some examples, the memory 20 may also be, for example, a ferroelectric random access memory (FeRAM), a Magnetic Random Access Memory (MRAM), a phase change random access memory (PRAM), or a Resistive Random Access Memory (RRAM). This can reduce the possibility of data loss due to sudden power outage.
In other examples, the Memory 20 may be other types of readable storage media, such as Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable rewritable Read-Only Memory (EEPROM), and compact disc Read-Only Memory (CD-ROM).
In some examples, memory 20 may be an optical disk memory, a magnetic disk memory, or a tape memory. Thus, the appropriate memory 20 can be selected in accordance with different situations.
In some examples, computer program 30 may include instructions for execution by one or more processors 10, which may cause recognition system 40 to perform tissue lesion recognition on a tissue image. In some examples, computer program 30 may be deployed within a local computer or may be deployed on a cloud-based server.
In some examples, computer program 30 may be stored in a computer readable medium. The computer-readable storage medium may include one or more of a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, and a magnetic storage device.
Fig. 2 is a diagram illustrating a tissue image according to an example of the present disclosure. Fig. 3 is a block diagram illustrating an identification system 40 for tissue lesion identification based on an artificial neural network according to an example of the present disclosure.
In some examples, tissue lesion recognition of the tissue image may be performed using an artificial neural network-based tissue lesion recognition system 40 and the recognition result obtained. In some examples, the identification system 40 of tissue lesion identification may also be referred to as an identification system 40.
In some examples, as shown in fig. 3, recognition system 40 may include an acquisition module 410, an artificial neural network module 420, and a training system 430 for artificial neural network-based tissue lesion recognition. In some examples, the acquisition module 410 may be used to acquire tissue images. In some examples, the artificial neural network module 420 may be configured to perform feature extraction, tissue lesion recognition, and the like on the tissue image, and obtain a recognition result of the tissue lesion recognition. In some examples, a training system 430 for artificial neural network-based tissue lesion recognition may be used to train the artificial neural network module 420. In some examples, the training system 430 may utilize the first recognition result, the second recognition result, and the third recognition result obtained by the artificial neural network module 420 and obtain an overall loss function based on the first recognition result, the second recognition result, and the third recognition result to optimize the artificial neural network module 420. In this case, the first recognition result, the second recognition result, and the third recognition result can be obtained, and the total loss function can be obtained based on the first recognition result, the second recognition result, and the third recognition result, so that the artificial neural network module 420 can be optimized using the total loss function, and the accuracy of tissue lesion recognition of the artificial neural network module 420 can be improved.
In some examples, training system 430 for tissue lesion recognition based on an artificial neural network may also be referred to as training system 430.
In some examples, the recognition system 40 may also include a preprocessing module and a determination module (not shown).
In some examples, the tissue image may be an image from a tissue cavity taken by a CT scan, PET-CT scan, SPECT scan, MRI, ultrasound, X-ray, mammogram, angiogram, fluoroscope, capsule endoscope, or a combination thereof. In some examples, the tissue image may be acquired by acquisition module 410.
In some examples, the acquisition module 410 may be configured to acquire tissue images, which may be tissue images acquired by an acquisition device such as a camera, an ultrasound imager, or an X-ray scanner.
In some examples, the tissue image may be, for example, a fundus image, an esophagus image, a stomach image, a large intestine image, a colon image, or a small intestine image. As shown in fig. 2, the tissue image may be a fundus image. In this case, fundus lesion recognition can be performed on the fundus image by the recognition system 40.
In some examples, the tissue lesion identification may be to identify a tissue lesion of the tissue image to obtain an identification result.
In some examples, where the tissue image is a fundus image, the tissue lesion may be a fundus lesion. In this case, the recognition result of the fundus image with respect to the fundus lesion can be obtained by the artificial neural network module 420.
In some examples, the tissue image may be comprised of a lesion region and a non-lesion region.
In some examples, tissue images (color images) with tissue lesions generally contain features such as significant erythema, redness, and so forth, and thus these features can be automatically extracted and identified using a trained artificial neural network to help the patient identify the possible lesions. Therefore, the accuracy and the speed of recognition can be improved, and the problems of large error, long time consumption and the like caused by the fact that a human doctor reads the films one by one through self experience can be solved.
In some examples, where the tissue image is a fundus image, the tissue image may be classified by function. For example, in the training step, the tissue image may be an examination image, an annotation image (described later).
In some examples, the image input to the artificial neural network module 420 may be a tissue image. In this case, tissue lesion recognition can be performed on the tissue image through the artificial neural network module 420.
In some examples, identification system 40 may be used for tissue lesion identification on tissue images. In some examples, the tissue image may be pre-processed, feature extracted, and tissue lesion identified after entering the identification system 40.
In some examples, the recognition system 40 may also include a preprocessing module and a determination module. The pre-processing module may be used to pre-process the tissue image and input the pre-processed tissue image into the artificial neural network module 420.
In some examples, the pre-processing module may pre-process the tissue image. In some examples, the pre-processing may include at least one of region of interest detection, image cropping, resizing, and normalization. In this case, the tissue lesion recognition and judgment of the tissue image by the subsequent artificial neural network module 420 can be facilitated. In some examples, the tissue image may be, for example, a fundus image, an esophagus image, a stomach image, a large intestine image, a colon image, or a small intestine image.
In some examples, the pre-processing module may include an area detection unit, an adjustment unit, and a normalization unit.
In some examples, the region detection unit may detect a region of interest from the tissue image. For example, if the tissue image is a fundus image, a fundus region centered on the optic disc, a fundus region centered on the macula lutea including the optic disc, or the like can be detected from the fundus image. In some examples, the region detection unit may detect a region of interest in the tissue image by, for example, a sampling threshold method, a Hough (Hough) transform.
In some examples, the adjustment unit may be used to crop and resize the tissue image. Due to different apparatuses for acquiring tissue images or different shooting conditions, the obtained tissue images may differ in resolution, size, and the like. In this case, the tissue images may be cropped and resized to reduce the variance. In some examples, the tissue image may be cropped to a particular shape. In some examples, the particular shape may include, but is not limited to, a square, rectangle, circle, or oval, among others.
In other examples, the size of the tissue image may be adjusted to a prescribed size by the adjusting unit. For example, the specified size may be 256 × 256, 512 × 512, 1024 × 1024, or the like. Examples of the disclosure are not limited thereto, and in other examples, the size of the tissue image may be any other specification size. For example, the size of the tissue image may be 128 × 128, 768 × 768, 2048 × 2048, or the like.
In some examples, the pre-processing module may include a normalization unit. The normalization unit may be configured to perform normalization processing on a plurality of tissue images.
In some examples, the normalization method of the normalization unit is not particularly limited, and may be performed using, for example, a zero mean (zero mean), a unit standard deviation (unit standard deviation), or the like. Additionally, in some examples, normalization may also be in the range of [0,1 ]. In this case, by normalization, the difference of different tissue images can be overcome.
In some examples, normalization includes normalization of image format, image slice spacing, image intensity, image contract, and image orientation. In some examples, the tissue images may be normalized to a DICOM format, NIfTI format, or raw binary format.
Figure 4 is a block diagram illustrating one example of an artificial neural network module to which examples of the present disclosure relate.
As described above, the recognition system 40 may include an artificial neural network module 420. In some examples, the artificial neural network module 420 may be used to perform tissue lesion identification on the tissue image. In some examples, the artificial neural network module 420 may include a plurality of artificial neural networks. In some examples, the artificial neural network may be trained using one or more processors 10. In general, an artificial neural network may include artificial neurons or nodes that may be used to receive tissue images and perform operations on the tissue images based on weights, then selectively pass the results of the operations on to other neurons or nodes. Where weights may be associated with artificial neurons or nodes while constraining the output of other artificial neurons. The weights (i.e., network parameters) may be determined by iteratively training the artificial neural network through a training data set (described later).
In some examples, as shown in fig. 4, artificial neural network module 420 may include a backbone neural network 4200 and a second artificial neural network 422.
In some examples, the backbone neural network 4200 may include a first artificial neural network 421, a third artificial neural network 423, and a feature combination module 424.
In some examples, the first artificial neural network 421 may receive the tissue image and perform feature extraction on the tissue image to obtain a feature map.
In some examples, the second artificial neural network 422 may receive the feature map and the recognition result from the third artificial neural network 423 and obtain a heat of attention map indicative of a diseased region and a complementary heat of attention map indicative of a non-diseased region. It is noted that in other examples, the above-described attention heat map or complementary attention heat map may also be considered a feature map.
In some examples, feature combination module 424 may receive the feature map, the attention heat map, and the complementary attention heat map and output a set of feature combinations. In some examples, the feature combination module 424 may also output the feature map directly.
In some examples, the third artificial neural network 423 may receive the feature map or feature combination set and output an identification result of tissue lesion identification of the tissue image.
In some examples, the tissue image (e.g., the pre-processed tissue image) input to the artificial neural network module 420 may enter the first artificial neural network 421, and the recognition result may be finally output by the third artificial neural network 423.
Fig. 5 is a block diagram illustrating a variation of an artificial neural network module to which examples of the present disclosure relate.
Additionally, in some examples, as shown in fig. 5, the artificial neural network module 420 may include a backbone neural network 4200 and a second artificial neural network 422.
In some examples, as shown in fig. 5, the backbone neural network 4200 may include a first artificial neural network 421 and a third artificial neural network 423.
In some examples, the third artificial neural network 423 may have a feature combination function. For details, see the description associated with feature combination module 424.
In some examples, the first artificial neural network 421 may receive the tissue image and perform feature extraction on the tissue image to obtain a feature map.
In some examples, the second artificial neural network 422 may also obtain a complementary attention heat map indicating non-diseased regions from the attention heat map.
In some examples, the third artificial neural network 423 may receive the feature map, the attention heat map, and the complementary attention heat map and output a recognition result of tissue lesion recognition of the tissue image. In some examples, the attention heat map may be a heat map indicative of the lesion area obtained based on an attention mechanism. In some examples, the attention heat map may show the importance of various pixel points in the tissue image when forming the feature map.
In some examples, the complementary attention heat map may be a heat map indicative of non-diseased regions obtained based on a complementary attention mechanism.
In some examples, the complementary attention heat map may be a complementary image of the attention heat map. In some examples, the size and format of the complementary attention heat map may be the same as the size and format of the attention heat map.
As described above, the artificial neural network module 420 may include the first artificial neural network 421 (see fig. 5).
In some examples, the first artificial neural network 421 may use one or more deep neural networks to automatically identify features in the tissue image.
In some examples, the first artificial neural network 421 may be used to receive the tissue image pre-processed by the pre-processing module and generate one or more feature maps. In some examples, the first artificial neural network 421 may be constructed by, for example, combining multiple layers of low-level features (pixel-level features). In this case, an abstract description of the tissue image can be realized.
In some examples, the first artificial neural network 421 may include an input layer, an intermediate layer, and an output layer connected in sequence. The input layer may be configured to receive an image of the tissue pre-processed by the pre-processing module. The intermediate layer is configured to be used for extracting a feature map based on the tissue image, and the output layer is configured to be used for outputting the feature map.
In some examples, the tissue image input to the artificial neural network module 420 may be a matrix of pixels, for example, a matrix of pixels that may be three-dimensional. The length and width of the three-dimensional matrix may represent the size of the image and the depth of the three-dimensional matrix represents the color channels of the image. In some examples, the depth may be 1 (i.e., the tissue image is a grayscale image), and in some examples, the depth may be 3 (i.e., the tissue image is a color image in RGB color mode).
In some examples, the first artificial neural network 421 may employ a convolutional neural network. Because the convolutional neural network has the advantages of local receptive field, weight sharing and the like, the training of parameters can be greatly reduced, the processing speed can be improved, and the hardware overhead can be saved. In addition, the convolutional neural network can more effectively identify the tissue image.
Fig. 6 is a schematic diagram illustrating a structure of a first artificial neural network 421 according to an example of the present disclosure.
In some examples, the first artificial neural network 421 may have a plurality of intermediate layers, each of which may include a plurality of neurons or nodes, and each of the neurons or nodes in the intermediate layers may apply an excitation function (e.g., a relu (rectified linear unit) function, a sigmoid function, or a tanh function) to an output of each of the neurons or nodes. The stimulus functions applied by different neurons affect the stimulus functions applied by other neurons.
In some examples, as shown in fig. 6, the middle layer of the first artificial neural network 421 may include a plurality of convolutional layers and a plurality of pooling layers. In some examples, the convolutional layers and the pooling layers may be combined alternately. In some examples, the tissue images may pass through the first convolutional layer C1, the first pooling layer S1, the second convolutional layer C2, the second pooling layer S2, the third convolutional layer C3, the third pooling layer S3 in sequence. In this case, the convolution processing and the pooling processing can be alternately performed on the tissue image.
In other examples, the first artificial neural network 421 may not include a pooling layer, thereby being able to avoid losing data during pooling and being able to simplify the network structure.
In some examples, the convolutional layer may utilize a convolutional core to convolve the tissue image in a convolutional neural network. In this case, features with higher abstraction can be obtained to make the matrix depth deeper.
In some examples, the convolution kernel size may be 3 x 3. In other examples, the convolution kernel size may be 5 x 5. In some examples, a 5 × 5 convolution kernel may be used at the first convolution layer C1, with the other convolution layers using a 3 × 3 convolution kernel. In this case, training efficiency can be improved. In some examples, the size of the convolution kernel may be set to any size. In this case, the size of the convolution kernel can be selected according to the size of the image and the calculation cost.
In some examples, the pooling layer may also be referred to as a downsampling layer. In some examples, the input tissue image may be processed using pooling approaches such as max-pooling, mean-pooling, or random-pooling. Under the condition, through the pooling operation, on one hand, the feature dimensionality can be reduced, and the operation efficiency is improved, and on the other hand, the convolutional neural network can extract more abstract high-level features, so that the accuracy of tissue lesion identification is improved.
In addition, in some examples, in the convolutional neural network, the number of convolutional layers and pooling layers may be increased according to circumstances. In this case, the convolutional neural network can also be made to extract higher-level features more abstract, so as to further improve the accuracy of tissue lesion identification.
In some examples, after the pre-processed tissue image passes through the first artificial neural network 421, a feature map corresponding to the tissue image may be output. In some examples, the feature map may have multiple depths. In some examples, after the pre-processed tissue image passes through the first artificial neural network 421, a plurality of feature maps may be output. In some examples, multiple feature maps may each correspond to a feature. In some examples, tissue lesion recognition may be performed on the tissue image based on features corresponding to the feature map.
In some examples, the feature map may be sequentially deconvoluted and upsampled before the first artificial neural network 421 outputs the feature map. In some examples, the feature map may undergo multiple deconvolution and upsampling processes. For example, the feature map may sequentially pass through a first deconvolution layer, a first upsampling layer, a second deconvolution layer, a second upsampling layer, a third deconvolution layer, and a third upsampling layer. In this case, it is possible to change the size of the feature map and retain data information of a part of the tissue image.
In some examples, the number of deconvolution layers may be the same as the number of convolution layers, and the number of pooling layers (downsampling layers) may be the same as the number of upsampling layers. This makes it possible to make the size of the feature map the same as that of the tissue image.
In some examples, tissue images processed by the convolutional layer (pooling layer) may be selected for convolution before the feature map is subjected to the deconvolution layer (upsampling). For example, the feature map may be convolved with the output image of the second convolutional layer C2 (second pooling layer S2) before entering the second deconvolution layer (second upsampling layer). Before entering the third deconvolution layer (third upsampling layer), the feature map may be convolved with the output image of the first convolution layer C1 (first pooling layer S1). In this case, data information lost when passing through the pooling layer or the convolutional layer can be supplemented.
In some examples, after the feature map is generated by the first artificial neural network 421, an attention heat map matching the feature map may be generated by the second artificial neural network 422.
In this embodiment, the second artificial neural network 422 is an artificial neural network with attention mechanism. In some examples, the output image of the second artificial neural network 422 may include an attention heat map and a complementary attention heat map.
In some examples, the second artificial neural network 422 may include an input layer, an intermediate layer, and an output layer connected in series. The input layer is configured to receive recognition results of partial weights or tissue lesion recognition through the feature map and the third artificial neural network 423. The intermediate layer may be configured for a feature weight obtained based on a partial weight of the third artificial neural network 423 or a tissue lesion recognition result. The intermediate layer may be configured for generating an attention heat map and/or a complementary attention heat map based on the feature map and the feature weights. The output layer is configured to be usable for outputting the attention heat map and/or the complementary attention heat map. In some examples, the feature map may be generated by the first artificial neural network 421.
In some examples, the attention mechanism may be to selectively screen out and focus on a small amount of important information from a large amount of information of the input feature map.
In some examples, the attention heat map may be an image representing attention in the form of a heat map. In general, pixels of corresponding positions in the attention heat map, which appear in red or white, have a large influence on tissue lesion recognition in the tissue image. The pixels of the corresponding positions in the attention heat map, which are in blue or black, have less influence on the tissue lesion recognition of the tissue image.
In some examples, the individual feature maps may be weighted with feature weights and the attention heat map is derived. In some examples, the feature weights may be obtained through an attention mechanism. In some examples, the attention mechanism may include, but is not limited to, a Channel Attention Mechanism (CAM), a gradient-based channel attention mechanism (Grad-CAM), a gradient-based enhanced channel attention mechanism (Grad-CAM + +), or a Spatial Attention Mechanism (SAM), among others.
In some examples, when the third artificial neural network 423 has a global pooling layer and a fully connected layer. In some examples, the feature weights may be weights in the third artificial neural network 423 to an output layer of the third artificial neural network 423 through a fully connected layer. For example, in the case where the tissue image is a fundus image, the third artificial neural network 423 may receive a feature map of the fundus image and obtain a first recognition result (described later). If the first recognition result is "macula lutea", the weight of the recognition result reaching "macula lutea" from each neuron or node of the global pooling layer in the fully-connected layer is extracted as the feature weight.
In some examples, the feature weights may be calculated based on the tissue lesion recognition results in the third artificial neural network 423. In some examples, the partial derivatives of the first recognition result (e.g., the probability of tissue lesion) of the third artificial neural network 423 for all pixels in a feature map may be calculated, and the partial derivatives of all pixels of the feature map may be globally pooled to obtain the feature weight corresponding to the feature map.
In some examples, a heat of attention map that matches the feature map may be generated by the second artificial neural network 422. In some examples, a complementary attention heat map may be generated by the second artificial neural network 422. In some examples, two pixel values corresponding to pixels having the same location in the attention heat map and the complementary attention heat map are inversely related. In some examples, the attention heat map and/or the complementary attention heat map may be normalized. In some examples, the sum or product of two pixel values corresponding to pixels with the same location in the attention heat map and the complementary attention heat map is a constant value.
In some examples, the attention heat map and/or the complementary attention heat map may be regularized with a total variation.
In some examples, a feature combination module 424 may be connected at the output layer of the first artificial neural network 421 and the second artificial neural network 422.
In some examples, feature combination module 424 may have an input layer and an output layer, and in some examples, the output layer of feature combination module 424 may be a feature map or a set of feature combinations. In some examples, an input layer of feature combination module 424 may receive a feature map, attention heat map, or complementary attention heat map.
In some examples, the feature combination module 424 may feature combine the feature map output by the first artificial neural network 421 and the attention heat map or complementary attention heat map output by the second artificial neural network 422 to form a feature combination set.
In some examples, the set of feature combinations may include at least one of a first set of feature combinations and a second set of feature combinations.
In some examples, the feature combination module 424 may feature combine the feature map output by the first artificial neural network 421 and the attention heat map output by the second artificial neural network 422 to form a first set of feature combinations.
In some examples, the feature combination module 424 may feature combine the feature map output by the first artificial neural network 421 and the complementary attention heat map output by the second artificial neural network 422 to form a second set of feature combinations.
In some examples, feature combining module 424 may output the feature map directly.
In some examples, the feature combination module 424 may also calculate differences between the feature map and the attention heat map to obtain a first set of feature combinations.
In some examples, feature combination module 424 may also calculate differences between the feature map and the complementary attention heat map to obtain a second set of feature combinations
In some examples, the feature combination module 424 may also compute a convolution of the feature map and the attention heat map to obtain a first set of feature combinations.
In some examples, the feature combination module 424 may also compute a convolution of the feature map and the complementary attention heat map to obtain a second set of feature combinations.
In some examples, the feature combination module 424 may also calculate a mean of the feature map and the attention heat map to obtain a first set of feature combinations.
In some examples, the feature combination module 424 may also calculate a mean of the feature map and the complementary attention heat map to obtain a second set of feature combinations.
Further, in other examples, feature combination module 424 may transform the feature map and the attention heat map linearly or non-linearly to obtain a first set of feature combinations.
Further, in other examples, feature combination module 424 may transform the feature map and the complementary attention heat map linearly or non-linearly to obtain a second set of feature combinations.
In some examples, an output layer of feature combination module 424 may output a feature map, a first set of feature combinations, and a second set of feature combinations. In some examples, the feature map, the first feature combination set, and the second feature combination set output by the feature combination module 424 may be input to a third artificial neural network 423 and tissue lesion recognition performed by the third artificial neural network 423.
In some examples, the feature combination module 424 may be incorporated into the third artificial neural network 423 as part of the third artificial neural network 423. In this case, the artificial neural network module 420 may include a first artificial neural network 421, a second artificial neural network 422, and a third artificial neural network 423.
In some examples, where the feature combination module 424 is incorporated into the third artificial neural network 423, the input layer of the third artificial neural network 423 may receive a feature map, attention heat map, or complementary attention heat map.
In some examples, the third artificial neural network 423 may include an input layer, an intermediate layer, and an output layer connected in sequence. In some examples, the output layer is configured to be operable to output a recognition result reflecting the tissue image. In this case, the recognition result reflecting the tissue image can be output using the third artificial neural network 423. In some examples, the output layer of the third artificial neural network 423 may include a Softmax layer. In some examples, the middle layer of the third artificial neural network 423 may be a fully connected layer.
In some examples, the final classification may be performed by the fully connected layer and the probability that the tissue image belongs to the category of the respective tissue lesion may be finally obtained through the Softmax layer. In this case, the recognition result of the tissue lesion recognition of the tissue image can be obtained based on the probability.
In some examples, the third artificial neural network 423 may include various linear classifiers, such as a single layer of fully connected layers.
In some examples, the third artificial neural network 423 may include various non-linear classifiers. Such as Logistic Regression (Logistic Regression), Random Forest (Random Forest), Support Vector Machines (Support Vector Machines), etc.
In some examples, the third artificial neural network 423 may include a plurality of classifiers. In some examples, the classifier may give an identification of tissue lesion identification of the tissue image. For example, in the case where the tissue image is a fundus image, a recognition result of fundus lesion recognition of the fundus image may be given. In this case, fundus lesion recognition can be performed on the fundus image.
In some examples, the output of the third neural network 423 may be values between 0 and 1, which may be used to represent the probability that the tissue image belongs to a category of respective tissue lesions.
In some examples, when the probability that the tissue image belongs to a category of a certain tissue lesion is highest, the category is taken as a recognition result of tissue lesion recognition of the tissue image. For example, if the probability that a tissue image belongs to each tissue lesion type is the highest, the tissue lesion recognition result of the tissue image may be lesion-free. For example, in the process of identifying a fundus lesion in a fundus image, if the prediction probabilities of the macula and the non-lesion output from the third artificial neural network 423 are 0.8 and 0.2, respectively, it can be considered that the fundus image has a macular lesion.
In some examples, the third artificial neural network 423 may output recognition results that match the tissue image. In some examples, the recognition results may include a first recognition result when the attentional mechanism is not used, a second recognition result when the attentional mechanism is used, and a third recognition result when the attentional mechanism and the complementary attentional mechanism are used.
In some examples, the third artificial neural network 423 may perform tissue lesion recognition on the feature map output by the feature combination module 424 and obtain the first recognition result.
In some examples, the third artificial neural network 423 may perform tissue lesion recognition on the first feature combination set output by the feature combination module 424 and obtain a second recognition result.
In some examples, the third artificial neural network 423 may perform tissue lesion recognition on the second feature combination set output by the feature combination module 424 and obtain a third recognition result.
In some examples, the recognition results may include both lesion and non-lesion results. In some examples, the recognition result may also include no lesion or a specific type of lesion. For example, in the case where the tissue image is a fundus image, the recognition result may include, but is not limited to, one of no lesion, hypertensive retinopathy, or diabetic retinopathy. In this case, a recognition result of fundus lesion recognition of the fundus image can be obtained. In some examples, the recognition result of one tissue image may be various. For example, the recognition result may be both a result of hypertensive retinopathy and a result of diabetic retinopathy.
In some examples, the recognition system 40 may also include a determination module.
In some examples, the determination module may receive an output of the artificial neural network module 420. In this case, the output results of the artificial neural network module 420 can be integrated by the determination module and the final recognition result can be output, so that an integrated report can be generated.
In some examples, the first recognition result may be used as a final recognition result of the tissue image. In this case, when the tissue lesion recognition is performed on the tissue image by using the artificial neural network module 420, the tissue lesion recognition may be performed on the tissue image through the main neural network 4200 including the first artificial neural network 421 and the third artificial neural network 423, thereby increasing the recognition speed.
In some examples, the second recognition result may be used as a final recognition result of the tissue image.
As described above, the third recognition result may be acquired based on the complementary attention mechanism. In some examples, a final recognition result of the tissue image may be obtained based on the first recognition result, the second recognition result, and the third recognition result. For example, in some examples, the final recognition result may include the second recognition result and the third recognition result. In some examples, the final recognition result may include the first recognition result and the third recognition result.
In some examples, the summary report generated by the determination module may include at least one of the first recognition result, the second recognition result, the third recognition result, and the final recognition result. In some examples, the determination module may color code the tissue image based on the attention heat map to generate a lesion indication map to indicate the lesion region. The summary report generated by the decision module may include a lesion indicator map.
In some examples, the summary report generated by the determination module may include a location of the respective lesion and mark the location with a marking box.
In some examples, the aggregated report generated by the determination module may display the lesion area of the tissue image in a heat map. Specifically, in the heat map, the region with a high probability of being diseased may appear red or white, and the region with a low probability of being diseased may appear blue or black. In this case, the lesion area can be indicated in an intuitive manner.
In some examples, the determination module may also be used for framing of the lesion region. In some examples, the lesion area may be boxed by a fixed shape (e.g., regular shapes such as triangles, circles, quadrilaterals, etc.). In some examples, the lesion area may also be delineated. In this case, the lesion region can be visually displayed.
In some examples, the determination module may also be used to delineate a diseased region. For example, the values corresponding to the pixels in the attention heat map may be analyzed, and the pixels with the values larger than the first preset value may be classified as a lesion region, and the pixels with the values smaller than the first preset value may be classified as a non-lesion region.
The identification method of tissue lesion identification based on artificial neural network is implemented by the identification system 40.
In some examples, the identification method includes: acquiring a tissue image and acquiring a recognition result of tissue lesion recognition by using the artificial neural network module 420. In some examples, the tissue image may be a tissue image acquired by an acquisition device. In some examples, the artificial neural network module 420 is trained by a training system 430. In this case, the recognition result of the tissue lesion recognition can be obtained by the artificial neural network module 420, and the artificial neural network module 420 can be optimized by the total loss function, so that the accuracy of the tissue lesion recognition can be improved.
Hereinafter, a training method (may be simply referred to as a training method) and a training system for tissue lesion recognition based on an artificial neural network according to the present embodiment will be described in detail with reference to the drawings.
In some examples, the training method may be implemented with a training system 430 for tissue lesion recognition based on an artificial neural network. In this case, the artificial neural network module 420 can be trained using the training system 430.
Fig. 7 is a block diagram illustrating a training system 430 for tissue lesion recognition based on an artificial neural network according to an example of the present disclosure.
In some examples, as shown in fig. 7, training system 430 may include a storage module 431, a processing module 432, and an optimization module 433. In some examples, the storage module 431 may be configured to store a training data set. In some examples, the processing module 432 may utilize the artificial neural network module 420 for feature extraction, generating attention and complementary attention heat maps, and tissue lesion recognition. In some examples, the optimization module 433 may obtain a total loss function based on the recognition results (including the first recognition result, the second recognition result, and the third recognition result) of the tissue lesion recognition to optimize the artificial neural network module 420. In this case, the recognition result of the tissue lesion recognition can be obtained by using the attention mechanism and the complementary attention mechanism, and the total loss function can be obtained based on the recognition result of the tissue lesion recognition, so that the artificial neural network module 420 can be optimized by using the total loss function, thereby improving the accuracy of the tissue lesion recognition of the artificial neural network module 420.
In some examples, the training mode of the artificial neural network module 420 may be weak supervision. In this case, the recognition result with a large amount of information can be obtained by the artificial neural network module 420 using the labeling result with a small amount of information. In some examples, where the annotation result is a text annotation, the location and size of the lesion region may be included in the recognition result. In some examples, the training mode of the artificial neural network module 420 may also be an unsupervised mode, a semi-supervised mode, a reinforcement learning mode, and the like.
In some examples, the artificial neural network module 420 may be trained using the first loss function, the second loss function, and the third loss function. It should be noted that, since the training model and the loss function involved are generally complex, the model generally has no analytical solution, and in some examples, the value of the loss function may be reduced as much as possible by iterating the model parameters for a limited number of times through an optimization algorithm (e.g., a batch gradient descent method (BGD), a random gradient descent method (SGD), etc.), that is, an analytical solution of the model is found. In some examples, the artificial neural network module 420 may be trained using a back propagation algorithm, in which case the network parameters with the least error can be achieved, thereby improving the recognition accuracy.
Fig. 8 is a flow chart illustrating a training method for tissue lesion recognition based on an artificial neural network according to an example of the present disclosure.
In some examples, as shown in fig. 8, the training method may include preparing a training data set (step S100); inputting the training data set into the artificial neural network module 420, and obtaining a first recognition result, a second recognition result and a third recognition result which are matched with each inspection image (step S200); calculating a total loss function based on the first recognition result, the second recognition result and the third recognition result (step S300) and optimizing the artificial neural network module 420 using the total loss function (step S400). In this case, the first recognition result, the second recognition result, and the third recognition result can be obtained, and the total loss function can be obtained based on the first recognition result, the second recognition result, and the third recognition result, so that the artificial neural network module 420 can be optimized using the total loss function, and the accuracy of tissue lesion recognition of the artificial neural network module 420 can be improved.
In step S100, a training data set may be prepared. In some examples, the training data set may include a plurality of examination images and either lesion-bearing or lesion-free annotation results associated with the examination images.
In some examples, the training data set may include a plurality of inspection images and annotation images associated with the inspection images.
In some examples, the examination images may be 5-20 ten thousand tissue images from a cooperating hospital with patient information removed. In some examples, the examination image may be a tissue image from a CT scan, a PET-CT scan, a SPECT scan, MRI, ultrasound, X-ray, mammogram, angiogram, fluorogram, capsule endoscopy, or a combination thereof. In some examples, the inspection image may be a fundus image. In some examples, the examination image may be composed of a lesion region and a non-lesion region. In some examples, the inspection image may be used for training of the artificial neural network module 420.
In some examples, the inspection image may be acquired by the acquisition module 410.
In some examples, the annotation image can include an annotation result with a lesion or an annotation result without a lesion. In some examples, the annotation result may be a true value to measure the size of the loss function.
In some examples, the annotation result can be an image annotation or a text annotation. In some examples, the image annotation may be an annotation box for framing the lesion region by manual annotation.
In some examples, the callout box can be a fixed shape, such as a regular shape like a triangle, circle, or quadrilateral. In some examples, the annotation box may also be an irregular shape based on the delineation of the lesion region.
In some examples, the text annotation may be a determination to check whether a lesion is present in the image. Such as "diseased" or "non-diseased". In some examples, the text annotation may also be a type of lesion. For example, in the case where the inspection image is a fundus image, the text label may be "macular degeneration", "hypertensive retinopathy", or "diabetic retinopathy", or the like.
In some examples, the training data set may be stored in storage module 431. In some examples, the storage module 431 may be configured to store the training data set.
In some examples, the training data set may include 30% -60% of exam images without lesion outcome annotation results. In some examples, the training data set may include 10%, 20%, 30%, 40%, 50%, or 60% of the exam images without lesion outcome annotation results.
In some examples, the training data set may be stored using the storage module 431. In some examples, the storage module 431 may include the memory 20.
In some examples, the storage module 431 may be configured to store the inspection image and the annotation image associated with the inspection image.
In some examples, artificial neural network module 420 may receive a training data set stored by storage module 431.
In some examples, the training data set may be pre-processed.
In step S200, the training data set may be input to the artificial neural network module 420, and the first recognition result, the second recognition result, and the third recognition result that match the respective inspection images are obtained. In some examples, the training data set may be input to the artificial neural network module 420 to obtain a feature map, a heat of attention map, and a complementary heat of attention map. In some examples, feature extraction may be performed on the inspection image to obtain a feature map. In some examples, the feature map may be processed based on an attention mechanism to obtain an attention heat map. In some examples, the attention heat map may be processed based on a complementary attention mechanism to obtain a complementary attention heat map.
In some examples, step S200 may be implemented with processing module 432. In some examples, the processing module 432 may include at least one processor 10.
In some examples, as described above, the artificial neural network module 420 may include a first artificial neural network 421, a second artificial neural network 422, and a third artificial neural network 423.
In some examples, the processing module 432 may be configured to perform feature extraction on the inspection image using the first artificial neural network 421 to obtain a feature map. In some examples, the processing module 432 may be configured for obtaining an attention heat map indicative of a diseased region and a complementary attention heat map indicative of a non-diseased region using the second artificial neural network 422.
In some examples, the processing module 432 may be configured for obtaining an identification result including tissue lesion identification using the third artificial neural network 423. As described above, the third artificial neural network 423 may include an output layer. In some examples, the output layer may be configured to output a recognition result reflecting the inspection image. In this case, the third artificial neural network 423 can output a recognition result reflecting the inspection image.
In some examples, the processing module 432 may identify the inspection image based on the feature map using the third artificial neural network 423 to obtain a first identification result.
In some examples, the processing module 432 may identify the inspection image based on the feature map and the attention heat map using the third artificial neural network 423 to obtain a second identification result.
In some examples, the processing module 432 may utilize the third artificial neural network 423 to identify the inspection image based on the feature map and the complementary attention heat map to obtain a third identification result.
In some examples, the tissue lesion may be a fundus lesion. In this case, the artificial neural network module 420 can be used for fundus lesion recognition of the fundus image.
In step S300, a total loss function may be calculated based on the first recognition result, the second recognition result, and the third recognition result.
In some examples, step S300 may be implemented with the optimization module 433.
In some examples, the optimization module 433 may obtain an overall loss function of the artificial neural network module 420 based on the first, second, and third loss functions. In this case, the artificial neural network module 420 can be optimized with the total loss function.
In some examples, the optimization module 433 may combine the first recognition result with the annotation image to obtain a first loss function when the attentiveness mechanism is not used. In some examples, the first loss function may be used to evaluate a degree of inconsistency between the recognition result and the annotation result of the inspection image when the attention mechanism is not used. In this case, the accuracy of tissue lesion identification by the artificial neural network module 420 when the attention mechanism is not used can be improved.
In some examples, the optimization module 433 may combine the second recognition result with the annotation image to obtain a second loss function when using the attentiveness mechanism. In some examples, a second loss function may be used to evaluate a degree of inconsistency between the recognition result and the annotation result of the inspection image when the attentiveness mechanism is used. In this case, the accuracy of tissue lesion identification by the artificial neural network module 420 when using the attention mechanism can be improved.
In some examples, the optimization module 433 may combine the third recognition result with the annotation image with an annotation result without a lesion to obtain a third loss function when using the complementary attention mechanism. In some examples, a third loss function may be used to evaluate a degree of inconsistency between the recognition result of the examination image when the complementary attention mechanism is used and the lesion-free recognition. In this case, the accuracy of tissue lesion identification by the artificial neural network module 420 when using the complementary attention mechanism can be improved.
In some examples, the first loss function, the second loss function, and the third loss function may be obtained by an error loss function. In some examples, the error loss function may be a correlation function, an L1 loss function, an L2 loss function, a Huber loss function, or the like, used to evaluate the correlation between the true value (i.e., the annotated result) and the predicted value (i.e., the identified result).
In some examples, the overall loss function may include a first loss term, a second loss term, and a third loss term.
In some examples, the first loss term may be positively correlated with the first loss function. In this case, the degree of inconsistency between the recognition result and the labeling result of the examination image when the attention mechanism is not used can be evaluated using the first loss term, so that the accuracy of tissue lesion recognition can be improved.
In some examples, the second loss term may be positively correlated with a difference of the second loss function and the first loss function. In some examples, the second loss term may be a constant value when the second loss function is less than the first loss function. In this case, the degree of inconsistency between the recognition result of the inspection image when the attention mechanism is used and the recognition result when the attention mechanism is not used can be evaluated using the second loss term.
In some examples, the second loss term can be positively correlated with a difference of the second loss function and the first loss function. Specifically, when the difference between the second loss function and the first loss function is greater than zero, the difference between the second loss function and the first loss function may be set as the second loss term, and when the difference between the second loss function and the first loss function is less than zero, the second loss term may be set to zero. In this case, the degree of inconsistency between the first recognition result and the second recognition result can be evaluated using the second loss term, so that the second recognition result can be brought closer to the annotation result with respect to the first recognition result.
In some examples, the third loss term may be positively correlated with the third loss function. In this case, the degree of inconsistency between the third recognition result and the lesion-free labeling result of the examination image when the complementary attention mechanism is used can be evaluated using the third loss term, so that the occurrence of erroneous judgment or missing judgment can be reduced.
In some examples, the total loss function may also include a fourth loss term. In some examples, the fourth loss term may be a regularization term. In some examples, the fourth loss term may be a regularization term for the attention heat map. In some examples, the regularization term may be obtained based on a total variation. In this case, the artificial neural network module 420 may be inhibited from overfitting.
In some examples, the overall loss function may include loss term weight coefficients that match the individual loss terms. In some examples, the overall loss function may further include a first loss term weight coefficient matching the first loss term, a second loss term weight coefficient matching the second loss term, a third loss term weight coefficient matching the third loss term, a fourth loss term weight coefficient matching the fourth loss term, and so on.
In some examples, the first loss term may be multiplied by a first loss term weight coefficient, the second loss term may be multiplied by a second loss term weight coefficient, the third loss term may be multiplied by a third loss term weight coefficient, the fourth loss term may be multiplied by a fourth loss term weight coefficient, and the fifth loss term may be multiplied by a fifth loss term weight coefficient. Thus, the influence degree of each loss term on the total loss function can be adjusted through the loss term weight coefficient.
In some examples, the loss term weight coefficient may be set to 0. In some examples, the loss term weight coefficient may be set to a positive number. In this case, since each loss term is a non-negative number, the value of the total loss function can be made not less than zero.
In some examples, the functional formula of the total loss function may be:
Figure BDA0003679741590000251
where L is the total loss function, λ 1 Is the first loss term weight coefficient, λ 2 Is the second loss term weight coefficient, λ 3 Is the third loss term weight coefficient, λ 4 Is the fourth loss term weight coefficient, f is the error loss function, X is the inspection image, F (X) is the feature map generated after the inspection image X passes through the first artificial neural network 421, l (X) is the labeling result of the inspection image X, max is the maximum function, C is the classifier function for outputting the identification result based on the input feature map or feature combination set, margin is the preset parameter, l (X) is the error loss function, X is the inspection image, F (X) is the feature map generated after passing through the first artificial neural network 421, max is the maximum function, C is the classifier function for outputting the identification result based on the input feature map or feature combination set, l (X) is the preset parameter, l (X) is the error loss term weight coefficient, f (X) is the feature map generated after passing through the first artificial neural network 421, f (X) is the error loss term, f (X) is the feature map generated after passing through the first artificial neural network 421, l (X), f (X) is the marking result of the inspection image X), f (X), f) (X) is the marking result of the inspection image X), X, f (X), where max) is the feature map, where max is the feature map, where C) is the maximum function, where C is the classifier function, C is the classifier function for outputting the recognition result for the fourth artificial neural network for the fourth loss term, where 0 For the lesion-free labeling result, M (X) is a attention heat map matched with the examination image X,
Figure BDA0003679741590000252
for a complementary attention heat map matching the examination image X, "·" in the functional expression of the overall loss function is a dot product of a matrix, and regularize (M) is a canonical term for the attention heat map M. In some examplesThe classifier function may be implemented by a third artificial neural network 423.
In some examples, the optimization module 433 may obtain a total loss function including a first loss term based on the first loss function, a second loss term based on a difference between the second loss function and the first loss function, and a third loss term based on the third loss function using the first loss function, the second loss function, and the third loss function and optimize the artificial neural network module 420 using the total loss function.
In some examples, the total loss function may further include a fifth loss term. In some examples, the fifth loss term may be a total area term of the attention heat map. Specifically, the total area term of the attention heat map may be an area determined as a lesion area in the attention heat map. In some examples, the total area term of the attention heat map m (x) may be represented by the formula SUM (m (x)). In some examples, the artificial neural network module 420 may be trained with a fourth loss term to make the lesion area within the attention heat map smaller. In this case, it is possible to estimate the area of the lesion region within the attention heat map using the fifth loss term and control the number of pixels in the attention heat map that have a greater influence on the recognition result, thereby limiting the attention of the network to pixels that have a greater influence on the recognition result. Thus, the accuracy of lesion region identification can be increased.
In some examples, the total loss function may further include a sixth loss term. In some examples, the sixth loss term may be used to evaluate a degree of inconsistency between a framed region of the lesion region in the recognition result and an annotation frame of the lesion region annotated by a human in the annotation image.
In step S400, the artificial neural network module 420 may be optimized using the total loss function.
In some examples, step S400 may be implemented with the optimization module 433.
In some examples, the optimization module 433 may optimize the artificial neural network module 420 with a total loss function to minimize the total loss function. In this case, the total loss function can be minimized to improve the accuracy of tissue lesion identification by the artificial neural network module 420.
In some examples, the optimization module 433 may obtain a total loss function based on the first loss term, the second loss term, the third loss term, and the total area term of the attention heat map, and optimize the artificial neural network module 420 with the total loss function to obtain the artificial neural network module 420 that may be used for tissue lesion recognition. This can further improve the accuracy of tissue lesion recognition by the artificial neural network module 420.
In some examples, the optimization module 433 may adjust the total loss function by changing weights of the first, second, third, and fourth loss terms.
In some examples, the optimization module 433 may optimize the artificial neural network module 420 based on the first and sixth loss terms as a total loss function (i.e., setting the loss term weight coefficients of the other loss terms to zero). Thus, the accuracy of the attention heat map and the complementary attention heat map generated by the second artificial neural network 422 can be improved.
In some examples, the loss term weighting coefficients in the overall loss function may be modified during the optimization process.
In some examples, the optimization module 433 may iterate through the parameters in the total loss function a plurality of times with an optimization algorithm to reduce the value of the total loss function. For example, in the present embodiment, the value of the loss function may be reduced by randomly selecting a set of parameters of the input function by a small-batch stochastic gradient descent (mini-batch stochastic gradient) algorithm, and then iterating the parameters a plurality of times.
In some examples, the training is suspended when the total loss function is less than a second preset value or the number of iterations exceeds three preset values.
In some examples, the optimization module 433 may pre-train the artificial neural network module 420 without applying the attention mechanism and then train the artificial neural network module 420 with applying the attention mechanism. In this case, the training speed can be increased.
In some examples, the optimization module 433 may train the first artificial neural network 421, the second artificial neural network 422, and the third artificial neural network 423 simultaneously. In this case, the training speed can be increased.
In some examples, after training is complete, the optimization module 433 may employ, for example, 0-20000 tissue images (e.g., fundus images) as test tissue images to compose a test set.
In some examples, the test tissue image may be used for post-training testing of the artificial neural network module 420.
Fig. 9(a) is a schematic diagram showing an example of a lesion region of a fundus image obtained without using attention mechanism training according to an example of the present disclosure. Fig. 9(b) is a schematic diagram showing an example of a lesion region of a fundus image obtained using a complementary attention mechanism training according to an example of the present disclosure.
In some examples, the accuracy of tissue lesion identification is higher for fundus images trained using a complementary attention mechanism. As an example of the non-use attention mechanism, fig. 9(a) shows a lesion region a of a fundus image obtained without training using the attention mechanism. As an example of the complementary attention mechanism, fig. 9(B) shows a lesion region B of a fundus image obtained by training using the complementary attention mechanism.
While the present disclosure has been described in detail in connection with the drawings and examples, it should be understood that the above description is not intended to limit the disclosure in any way. Those skilled in the art can make modifications and variations to the present disclosure as needed without departing from the true spirit and scope of the disclosure, and such modifications and variations are intended to be within the scope of the disclosure.

Claims (10)

1. A complementary attention-based training method, comprising:
preparing a training data set, wherein the training data set comprises a plurality of examination images formed by lesion areas and non-lesion areas and labeling images related to the examination images;
inputting the training data set into an artificial neural network module to perform feature extraction on the inspection image so as to obtain a feature map;
processing the feature map based on an attention mechanism to obtain an attention heat map indicative of the lesion region;
processing the attention heat map based on a complementary attention mechanism to obtain a complementary attention heat map indicative of the non-diseased region;
obtaining a first recognition result based on the feature map, obtaining a second recognition result based on the feature map and the attention heat map, and obtaining a third recognition result based on the feature map and the complementary attention heat map;
obtaining a total loss function based on the first recognition result, the second recognition result, the third recognition result, and the annotation image,
and optimizing the artificial neural network module by using the total loss function.
2. The training method of claim 1,
the labeling image comprises a labeling result with a pathological change or a labeling result without a pathological change.
3. The training method of claim 2,
the labeling result is image labeling or text labeling, and the image labeling is manually labeled and is used for framing the labeling frame of the lesion area.
4. The training method of claim 1,
the artificial neural network module comprises a first artificial neural network, a second artificial neural network and a third artificial neural network;
performing feature extraction on the inspection image by using the first artificial neural network to obtain the feature map,
obtaining the attention heat map indicative of a diseased region and the complementary attention heat map indicative of a non-diseased region using the second artificial neural network,
and obtaining a first recognition result based on the feature map, obtaining a second recognition result based on the feature map and the attention heat map, and obtaining a third recognition result based on the feature map and the complementary attention heat map by using the third artificial neural network.
5. The training method of claim 4,
in the attention heat map and the complementary attention heat map, a sum of pixel values of pixels at the same position is a constant value.
6. The training method of claim 4,
and after obtaining the attention heat map and the complementary attention heat map, carrying out normalization processing on the attention heat map and/or the complementary attention heat map.
7. The training method of claim 1,
combining the first recognition result with the annotation image to obtain a first loss function when the attention mechanism is not used, combining the second recognition result with the annotation image to obtain a second loss function when the attention mechanism is used, combining the third recognition result with the annotation image with an annotation result without a lesion to obtain a third loss function when the complementary attention mechanism is used,
obtaining a total loss function including a first loss term based on the first loss function, a second loss term based on a difference between the second loss function and the first loss function, and a third loss term based on the third loss function using the first loss function, the second loss function, and the third loss function.
8. The training method of claim 7,
the first, second, and third loss functions are obtained by an error loss function that is at least one of a correlation function, an L1 loss function, an L2 loss function, or a Huber loss function.
9. The training method of claim 7,
the first loss term is positively correlated with the first loss function.
10. The training method of claim 7,
and when the second loss function is smaller than the first loss function, the second loss term is zero.
CN202210631057.6A 2019-11-28 2020-11-27 Training method based on complementary attention Pending CN114972278A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2019111941764 2019-11-28
CN201911194176 2019-11-28
CN202011359970.2A CN112862745B (en) 2019-11-28 2020-11-27 Training method and training system for tissue lesion recognition based on artificial neural network

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202011359970.2A Division CN112862745B (en) 2019-11-28 2020-11-27 Training method and training system for tissue lesion recognition based on artificial neural network

Publications (1)

Publication Number Publication Date
CN114972278A true CN114972278A (en) 2022-08-30

Family

ID=75996687

Family Applications (6)

Application Number Title Priority Date Filing Date
CN202210631057.6A Pending CN114972278A (en) 2019-11-28 2020-11-27 Training method based on complementary attention
CN202011364685.XA Active CN112862746B (en) 2019-11-28 2020-11-27 Tissue lesion identification method and system based on artificial neural network
CN202210631673.1A Pending CN115049602A (en) 2019-11-28 2020-11-27 Optimization method of artificial neural network module
CN202211242486.0A Pending CN115511860A (en) 2019-11-28 2020-11-27 Tissue lesion identification method based on complementary attention mechanism
CN202211242487.5A Pending CN115511861A (en) 2019-11-28 2020-11-27 Identification method based on artificial neural network
CN202011359970.2A Active CN112862745B (en) 2019-11-28 2020-11-27 Training method and training system for tissue lesion recognition based on artificial neural network

Family Applications After (5)

Application Number Title Priority Date Filing Date
CN202011364685.XA Active CN112862746B (en) 2019-11-28 2020-11-27 Tissue lesion identification method and system based on artificial neural network
CN202210631673.1A Pending CN115049602A (en) 2019-11-28 2020-11-27 Optimization method of artificial neural network module
CN202211242486.0A Pending CN115511860A (en) 2019-11-28 2020-11-27 Tissue lesion identification method based on complementary attention mechanism
CN202211242487.5A Pending CN115511861A (en) 2019-11-28 2020-11-27 Identification method based on artificial neural network
CN202011359970.2A Active CN112862745B (en) 2019-11-28 2020-11-27 Training method and training system for tissue lesion recognition based on artificial neural network

Country Status (1)

Country Link
CN (6) CN114972278A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022226949A1 (en) * 2021-04-29 2022-11-03 深圳硅基智控科技有限公司 Artificial neural network-based identification method and system for tissue lesion identification
CN114581715B (en) * 2022-03-08 2024-09-24 赛维森(广州)医疗科技服务有限公司 Semi-supervised screening method and screening system for pathology slide digital image dataset
CN117292310B (en) * 2023-08-22 2024-09-27 空介(丽水)数字科技有限公司 Virtual digital person application method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108771530B (en) * 2017-05-04 2021-03-30 深圳硅基智能科技有限公司 Fundus lesion screening system based on deep neural network
US11501154B2 (en) * 2017-05-17 2022-11-15 Samsung Electronics Co., Ltd. Sensor transformation attention network (STAN) model
CN108021916B (en) * 2017-12-31 2018-11-06 南京航空航天大学 Deep learning diabetic retinopathy sorting technique based on attention mechanism
CN108830157B (en) * 2018-05-15 2021-01-22 华北电力大学(保定) Human behavior identification method based on attention mechanism and 3D convolutional neural network
CN108846829B (en) * 2018-05-23 2021-03-23 平安科技(深圳)有限公司 Lesion site recognition device, computer device, and readable storage medium
CN110674664A (en) * 2018-06-15 2020-01-10 阿里巴巴集团控股有限公司 Visual attention recognition method and system, storage medium and processor
CN109766936B (en) * 2018-12-28 2021-05-18 西安电子科技大学 Image change detection method based on information transfer and attention mechanism
US10430946B1 (en) * 2019-03-14 2019-10-01 Inception Institute of Artificial Intelligence, Ltd. Medical image segmentation and severity grading using neural network architectures with semi-supervised learning techniques
CN110084794B (en) * 2019-04-22 2020-12-22 华南理工大学 Skin cancer image identification method based on attention convolution neural network
CN110097559B (en) * 2019-04-29 2024-02-23 李洪刚 Fundus image focus region labeling method based on deep learning
CN110335261B (en) * 2019-06-28 2020-04-17 山东科技大学 CT lymph node detection system based on space-time circulation attention mechanism
CN110349147B (en) * 2019-07-11 2024-02-02 腾讯医疗健康(深圳)有限公司 Model training method, fundus macular region lesion recognition method, device and equipment

Also Published As

Publication number Publication date
CN112862745A (en) 2021-05-28
CN112862746B (en) 2022-09-02
CN112862746A (en) 2021-05-28
CN115049602A (en) 2022-09-13
CN115511861A (en) 2022-12-23
CN112862745B (en) 2022-06-14
CN115511860A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
Mahapatra et al. Image super-resolution using progressive generative adversarial networks for medical image analysis
CN111784671B (en) Pathological image focus region detection method based on multi-scale deep learning
US10430946B1 (en) Medical image segmentation and severity grading using neural network architectures with semi-supervised learning techniques
CN112862745B (en) Training method and training system for tissue lesion recognition based on artificial neural network
CN111325739B (en) Method and device for detecting lung focus and training method of image detection model
US20190057488A1 (en) Image processing method and device
CN113496489B (en) Training method of endoscope image classification model, image classification method and device
JP2020518915A (en) System and method for automated fundus image analysis
CN110110808B (en) Method and device for performing target labeling on image and computer recording medium
CN115136189A (en) Automated detection of tumors based on image processing
CN112466466B (en) Digestive tract auxiliary detection method and device based on deep learning and computing equipment
CN113781488A (en) Tongue picture image segmentation method, apparatus and medium
CN114693719A (en) Spine image segmentation method and system based on 3D-SE-Vnet
KR20200110111A (en) Method and devices for diagnosing dynamic multidimensional disease based on deep learning in medical image information
Rele et al. Machine Learning based Brain Tumor Detection using Transfer Learning
CN117523350A (en) Oral cavity image recognition method and system based on multi-mode characteristics and electronic equipment
WO2022226949A1 (en) Artificial neural network-based identification method and system for tissue lesion identification
CN116958679A (en) Target detection method based on weak supervision and related equipment
CN112862786B (en) CTA image data processing method, device and storage medium
CN112862787B (en) CTA image data processing method, device and storage medium
US20240087115A1 (en) Machine learning enabled system for skin abnormality interventions
CN112862785B (en) CTA image data identification method, device and storage medium
CN112734707A (en) Auxiliary detection method, system and device for 3D endoscope and storage medium
CN117392468B (en) Cancer pathology image classification system, medium and equipment based on multi-example learning
Kodumuru et al. Diabetic Retinopathy Screening Using CNN (ResNet 18)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination