CN112085718A - NAFLD ultrasonic video diagnosis system based on twin attention network - Google Patents

NAFLD ultrasonic video diagnosis system based on twin attention network Download PDF

Info

Publication number
CN112085718A
CN112085718A CN202010924390.7A CN202010924390A CN112085718A CN 112085718 A CN112085718 A CN 112085718A CN 202010924390 A CN202010924390 A CN 202010924390A CN 112085718 A CN112085718 A CN 112085718A
Authority
CN
China
Prior art keywords
attention
module
twin
loss
nafld
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010924390.7A
Other languages
Chinese (zh)
Other versions
CN112085718B (en
Inventor
王连生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010924390.7A priority Critical patent/CN112085718B/en
Publication of CN112085718A publication Critical patent/CN112085718A/en
Application granted granted Critical
Publication of CN112085718B publication Critical patent/CN112085718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, wherein the twin attention subnetworks consist of a double-flow feature extraction module, a linear classification module and a context attention module, and the loss function consists of binary cross entropy loss (BCE), Contrast Similarity Loss (CSL) and Contrast Difference Loss (CDL). The invention adds the double-flow characteristic extraction module on the basis of the twin attention network and introduces the loss function, so that the NAFLD ultrasonic video diagnosis system achieves the accuracy of 90.56 percent, the specificity of 88.26 percent and the sensitivity of 93.58 percent, and an efficient and feasible method is provided for the NAFLD ultrasonic video diagnosis.

Description

NAFLD ultrasonic video diagnosis system based on twin attention network
Technical Field
The invention relates to the technical field of short video processing, in particular to a NAFLD ultrasonic video diagnosis system based on a twin attention network.
Background
Early screening of non-alcoholic fatty liver disease (NAFLD) helps patients to prevent irreversible advanced liver disease, but manual diagnosis of NAFLD's ultrasound video requires physicians to review lengthy videos, which is both cumbersome and time consuming in clinical practice. Therefore, the method of deep learning can be utilized to realize the automatic diagnosis of NAFLD in the ultrasonic video so as to improve the diagnosis efficiency.
The most major problems facing the task of NAFLD diagnosis in ultrasound video are interference of extraneous information and poor characterization of the low quality of the ultrasound itself.
Disclosure of Invention
In order to solve the problems, the invention provides a NAFLD ultrasonic video diagnosis system based on a twin attention network, so as to realize efficient automatic diagnosis of NAFLD.
The invention adopts the following technical scheme:
a NAFLD ultrasonic video diagnosis system based on a twin attention network is composed of two twin attention subnetworks which are identical in structure and share weight, and a loss function, wherein the twin attention subnetworks are composed of a double-current feature extraction module, a linear classification module and a context attention module, and the loss function is composed of binary cross entropy loss, contrast similarity loss and contrast difference loss.
Further, the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention.
Further, the sharing module is to extract low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used to extract features of a high level to generate attention.
Further, V ═ I for a given videot1, 2.., T }, the dual-stream feature extraction module providing two feature representations, respectively f, for each frame of the videocls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDEach frame I intWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
Further, the linear classification module uses a linear classifier to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
Further, the linear classification module is based on the feature f extracted by the double-current feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
Further, the contextual attention module scores the importance of each frame in conjunction with the context for highlighting the discriminative information on key frames and suppressing extraneous information for useless frames.
Further, the contextual attention module is based on a feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
Figure BDA0002667814460000021
Figure BDA0002667814460000022
Figure BDA0002667814460000023
wherein, therein
Figure BDA0002667814460000024
And
Figure BDA0002667814460000025
respectively represent parameters of
Figure BDA0002667814460000026
Forward LSTM (t from 1 to t)) And the parameters are
Figure BDA0002667814460000027
Backward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure BDA0002667814460000028
further, at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
Figure BDA0002667814460000029
further, the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSSL+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each video
Figure BDA0002667814460000036
With the true value y, the final loss function can be calculated as follows:
Figure BDA0002667814460000031
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure BDA0002667814460000032
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure BDA0002667814460000033
thus, the contrast similarity loss is calculated as follows:
Figure BDA0002667814460000034
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure BDA0002667814460000035
thus, the loss of contrast variability is calculated as follows:
Figure BDA0002667814460000041
after adopting the technical scheme, compared with the background technology, the invention has the following advantages:
the context attention network effectively solves the problem of irrelevant information interference by introducing an attention mechanism; the negative influence of low quality of ultrasound is relieved to a certain extent by combining time sequence information; the characteristics used for the classification module and the attention module are respectively extracted by adopting different branches in the double-flow characteristic extraction module, so that the expressiveness of the extracted characteristics is effectively improved, and the performance of the system is further improved, and the expressiveness of the system is further improved by combining the double-flow characteristic extraction module with a loss function (namely binary cross entropy loss, contrast similarity loss and contrast difference loss), so that the accuracy of 90.56%, the specificity of 88.26% and the sensitivity of 93.58% are finally obtained.
Drawings
Fig. 1 is a schematic diagram of twin attention subnetwork structure of the NAFLD ultrasonic diagnostic system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight, wherein the twin attention network consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, as shown in figure 1, the twin attention subnetworks consist of a double-current feature extraction module a, a linear classification module b and a context attention c module, and the loss function consists of binary cross entropy loss, contrast similarity loss and contrast difference loss.
The double-flow feature extraction module a comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module a is used for extracting different features of classification and attention.
The sharing module is used for extracting low-level features shared by the classification module and the attention module, and the calculation cost is greatly reduced while the relevance of two task bottom layers is established; then, the classification module is used for extracting the high-level features to generate classification, and the attention module is used for extracting the high-level features to generate attention, at the moment, the features of different branches can be well suitable for the requirements of different tasks, and the effect of a follow-up module is further improved.
For a given video V ═ It1, 2.. T }, said dual-stream feature extraction module a providing two feature representations, respectively f, for each frame of the videocls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDEach frame I intWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
The linear classifier is used by the linear classification module b to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
The linear classification module b extracts the features f based on the double-flow feature extraction module aclsLearning the linear mapping W ∈ R1 ×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
The context attention module c scores the importance of each frame in conjunction with the context for highlighting the discriminative information on the key frames and suppressing extraneous information for the useless frames.
The contextual attention module c is based on the feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
Figure BDA0002667814460000051
Figure BDA0002667814460000052
Figure BDA0002667814460000053
wherein, therein
Figure BDA0002667814460000054
And
Figure BDA0002667814460000055
respectively represent parameters of
Figure BDA0002667814460000056
Forward LSTM (t from 1 to t) and parameters of
Figure BDA0002667814460000057
Backward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure BDA0002667814460000061
at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
Figure BDA0002667814460000062
after the NAFLD ultrasonic video diagnosis system is constructed, a training process is started to the model, and loss functions used in the training process are divided into the following three parts: binary Cross Entropy Loss (BCEL), Contrast Similarity Loss (CSL), Contrast Dissimilarity Loss (CDL). The binary cross entropy loss acts on the final diagnosis result, the difference between the final diagnosis result and the actual value is measured, and each module is optimized; the contrast similarity loss measures the feature similarity of the key frame part between the positive sample pair and the negative sample pair, and the contrast difference loss measures the feature difference of the key frame part between the positive sample pair and the negative sample pair, so that the selection capability of the model on the key frame is promoted, and the expressiveness of the features is enhanced.
The mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSSL+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each video
Figure BDA0002667814460000065
With the true value y, the final loss function can be calculated as follows:
Figure BDA0002667814460000063
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure BDA0002667814460000064
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure BDA0002667814460000071
thus, the contrast similarity loss is calculated as follows:
Figure BDA0002667814460000072
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure BDA0002667814460000073
thus, the loss of contrast variability is calculated as follows:
Figure BDA0002667814460000074
the data used for training consisted of 520 subjects' liver ultrasound videos, with 260 videos from NAFLD patients and an additional 260 videos belonging to normal samples. Since the input of the training phase is a pair of positive and negative samples, we need to ensure that the positive and negative samples have the same length, so we sample 20 frames of images at equal intervals for all videos. The original resolution of the video is 800 × 600, and the sampling frequency is 31 Hz. After video acquisition, 3 doctors with abundant experience carry out manual annotation at the same time, and the voting results of the 3 doctors are synthesized to finally judge whether the subject suffers from NAFLD.
The evaluation indexes adopted in the embodiment include accuracy, specificity, sensitivity and AUC values. The following results were obtained:
(1) validity of dual stream feature extraction module
The ResNet50 is used as a basic network, improvement is carried out on the basis of the twin attention network, an original feature extraction module of the twin attention network is replaced by a double-flow feature extraction module, and model performances before and after replacement are compared to verify the superiority of the double-flow feature extraction module. The results obtained are as follows:
TABLE 1 double-flow feature extraction Module validation results
Method of producing a composite material Rate of accuracy Specificity of Sensitivity of the composition AUC value
CAN(ResNet50) 0.8736 0.8322 0.9358 0.9415
CAN (double current feature extraction module) 0.8868 0.8622 0.9207 0.9459
As can be seen from the above table, the dual-stream feature extraction module effectively improves the performance of most of the indicators of the twin attention network compared to the original ResNet50 as the base network. Specifically, compared with the original twin attention network, the twin attention network using the dual-flow feature extraction module as the base network improves the accuracy and specificity by 0.72%, 3.00% and 0.44% respectively. The results prove that the classification and attention modules need different features, and the double-branch structure of the double-flow feature extraction module can effectively improve the features with task specificity and enhance the performance of the model.
(2) Effectiveness of contrast difference loss and contrast similarity loss
The contrast difference loss and the contrast similarity loss are respectively used for measuring the difference and the similarity of key frame parts between the positive and negative sample pairs, a pair of positive and negative samples are given on the basis of the double-current feature extraction network, and for the features required by the classification branches, the contrast difference loss enables the key frame parts to be as far away as possible, so that the models can be better distinguished; for features required by the attention branch, the contrast similarity loss makes the key frame part distance as close as possible, so that the model can better select the key frame.
In this embodiment, contrast difference loss and contrast similarity loss with different weights are introduced to a twin attention network to which a dual-flow feature extraction module is added, and the following results are obtained by comparing networks without introduced loss functions:
table 2 CDL and CSL validity verification results
Method (lambda) Rate of accuracy Specificity of Sensitivity of the composition AUC value
0(CAN+BFEM) 0.8868 0.8622 0.9207 0.9459
0.2 0.8942 0.8700 0.9269 0.9473
0.4 0.9056 0.8826 0.9358 0.9521
0.6 0.8903 0.8690 0.9192 0.9402
As shown in table 2, at lower weights, the indices were improved after the addition of both CDL and CSL, with the best results being achieved around 0.4, with improvements in accuracy, specificity, sensitivity and AUC values of 1.88%, 2.04%, 1.51%, 0.62%, respectively, compared to the group without the introduction of CDL and CSL.
(3) The effectiveness of the NAFLD ultrasonic video diagnosis system of the invention
Compared with the common twin attention network (CAN), the NAFLD ultrasonic video diagnosis System (SAN) provided by the invention is additionally provided with a double-flow feature extraction module, and a loss function (binary cross entropy loss, contrast difference loss and contrast similarity loss) BFEM and a newly designed loss function are introduced, for example, a table 3 shows a comparison result with the common twin attention network CAN.
TABLE 3 SAN superiority verification results
Method of producing a composite material Rate of accuracy Specificity of Sensitivity of the composition AUC
CAN 0.8736 0.8322 0.9358 0.9415
SAN 0.9056 0.8826 0.9358 0.9521
As shown in table 3, SAN increased by 2.2%, 5.04% and 1.06%, respectively, compared to CAN, with the same sensitivity.
From the above results, it can be seen that the BFEM in the ANFLD ultrasound video diagnostic system SAN provided by the present invention effectively extracts different features required for classification and attention, and the newly designed CSL and CDL further constrain the distribution of the features, enhancing the expressiveness of the features, and finally obtaining an accuracy of 90.56%, a specificity of 88.26%, and a sensitivity of 93.58% by combining the two, which proves the feasibility and effectiveness of SAN.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A NAFLD ultrasonic video diagnosis system based on twin attention network is characterized in that: the system comprises two twin attention subnetworks which are identical in structure and share weight, and a loss function, wherein the twin attention subnetworks comprise a dual-flow feature extraction module, a linear classification module and a context attention module, and the loss function comprises binary cross entropy loss (BCE), Contrast Similarity Loss (CSL) and Contrast Difference Loss (CDL).
2. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention.
3. A twin attention network based NAFLD ultrasound video diagnostic system according to claim 2, wherein: the sharing module is used for extracting low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used to extract features of a high level to generate attention.
4. A twin attention network based NAFLD ultrasound video diagnostic system according to claim 2 or 3, wherein:
for a given video V ═ It1, 2.., T }, the dual-stream feature extraction module providing two feature representations, respectively f, for each frame of the videocls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDEach frame I intWhere θ denotes a sharing parameter, θcls,θattIndependent parameters representing the classification module and the attention module, respectively,ItIs the T-th frame of the video, and T represents the frame number of the video.
5. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the linear classification module uses a linear classifier to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
6. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 5, wherein: the linear classification module is based on the feature f extracted by the double-flow feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
7. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the contextual attention module scores the importance of each frame in conjunction with the context for highlighting discriminative information on key frames and suppressing irrelevant information of useless frames.
8. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 7, wherein: the contextual attention module is based on a feature vector f for each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
Figure FDA0002667814450000021
Figure FDA0002667814450000022
Figure FDA0002667814450000023
wherein, therein
Figure FDA0002667814450000024
And
Figure FDA0002667814450000025
respectively represent parameters of
Figure FDA0002667814450000026
Forward LSTM (t from 1 to t) and parameters of
Figure FDA0002667814450000027
Backward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure FDA0002667814450000028
9. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
Figure FDA0002667814450000029
10. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSSL+LCDL);
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each video
Figure FDA00026678144500000210
With the true value y, the final loss function can be calculated as follows:
Figure FDA0002667814450000031
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure FDA0002667814450000032
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure FDA0002667814450000033
the loss of contrast similarity is calculated as follows:
Figure FDA0002667814450000034
wherein, P represents the positive and negative sample pair logarithm in a batch;
the contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure FDA0002667814450000035
the loss of contrast variability was calculated as follows:
Figure FDA0002667814450000036
CN202010924390.7A 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network Active CN112085718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010924390.7A CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010924390.7A CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Publications (2)

Publication Number Publication Date
CN112085718A true CN112085718A (en) 2020-12-15
CN112085718B CN112085718B (en) 2022-05-10

Family

ID=73732599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010924390.7A Active CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Country Status (1)

Country Link
CN (1) CN112085718B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111539316A (en) * 2020-04-22 2020-08-14 中南大学 High-resolution remote sensing image change detection method based on double attention twin network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111539316A (en) * 2020-04-22 2020-08-14 中南大学 High-resolution remote sensing image change detection method based on double attention twin network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KELVIN K.L.WONG ET AL.: "Recent developments in machine learning for medical imaging applications", 《COMPUTERIZED MEDICAL IMAGING & GRAPHICS》 *
SHUXIN WANG ET AL.: "Conquering Data Variations in Resolution: A Slice-Aware Multi-Branch Decoder Network", 《IEEE TRANSACTIONS ON MEDICAL IMAGING》 *

Also Published As

Publication number Publication date
CN112085718B (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN113191215B (en) Rolling bearing fault diagnosis method integrating attention mechanism and twin network structure
CN114926746B (en) SAR image change detection method based on multiscale differential feature attention mechanism
CN108765383B (en) Video description method based on deep migration learning
CN111700608B (en) Electrocardiosignal multi-classification method and device
CN111191660A (en) Rectal cancer pathology image classification method based on multi-channel collaborative capsule network
CN114565761B (en) Deep learning-based method for segmenting tumor region of renal clear cell carcinoma pathological image
Yang et al. TTL-IQA: Transitive transfer learning based no-reference image quality assessment
CN111985538A (en) Small sample picture classification model and method based on semantic auxiliary attention mechanism
CN111539491B (en) System and method for classifying multiple nodules based on deep learning and attention mechanism
CN108305253A (en) A kind of pathology full slice diagnostic method based on more multiplying power deep learnings
CN108830301A (en) The semi-supervised data classification method of double Laplace regularizations based on anchor graph structure
CN114783034A (en) Facial expression recognition method based on fusion of local sensitive features and global features
CN116864103A (en) Myopenia diagnosis method based on multi-modal contrast learning
CN115496720A (en) Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment
CN117408946A (en) Training method of image processing model and image processing method
CN114093507B (en) Intelligent dermatological classification method based on contrast learning in edge computing network
Anaam et al. Studying the applicability of generative adversarial networks on HEp-2 cell image augmentation
CN112085742B (en) NAFLD ultrasonic video diagnosis method based on context attention
Zhou et al. Multi-objective evolutionary generative adversarial network compression for image translation
CN117516937A (en) Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement
CN112085718B (en) NAFLD ultrasonic video diagnosis system based on twin attention network
CN107633259A (en) A kind of cross-module state learning method represented based on sparse dictionary
CN116596836A (en) Pneumonia CT image attribute reduction method based on multi-view neighborhood evidence entropy
CN113486969A (en) X-ray image classification method based on improved Resnet network
CN117496126B (en) Automatic image positioning system and method based on keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant