CN112085718B - NAFLD ultrasonic video diagnosis system based on twin attention network - Google Patents

NAFLD ultrasonic video diagnosis system based on twin attention network Download PDF

Info

Publication number
CN112085718B
CN112085718B CN202010924390.7A CN202010924390A CN112085718B CN 112085718 B CN112085718 B CN 112085718B CN 202010924390 A CN202010924390 A CN 202010924390A CN 112085718 B CN112085718 B CN 112085718B
Authority
CN
China
Prior art keywords
attention
module
loss
twin
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010924390.7A
Other languages
Chinese (zh)
Other versions
CN112085718A (en
Inventor
王连生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010924390.7A priority Critical patent/CN112085718B/en
Publication of CN112085718A publication Critical patent/CN112085718A/en
Application granted granted Critical
Publication of CN112085718B publication Critical patent/CN112085718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, wherein the twin attention subnetworks consist of a double-flow feature extraction module, a linear classification module and a context attention module, and the loss function consists of binary cross entropy loss (BCE), Contrast Similarity Loss (CSL) and Contrast Difference Loss (CDL). The invention adds the double-current feature extraction module on the basis of the twin attention network and introduces the loss function, so that the NAFLD ultrasonic video diagnosis system achieves the accuracy of 90.56 percent, the specificity of 88.26 percent and the sensitivity of 93.58 percent, and an efficient and feasible method is provided for NAFLD ultrasonic video diagnosis.

Description

NAFLD ultrasonic video diagnosis system based on twin attention network
Technical Field
The invention relates to the technical field of short video processing, in particular to a NAFLD ultrasonic video diagnosis system based on a twin attention network.
Background
Early screening of non-alcoholic fatty liver disease (NAFLD) helps patients to prevent irreversible advanced liver disease, but manual diagnosis of NAFLD's ultrasound video requires physicians to review lengthy videos, which is both cumbersome and time consuming in clinical practice. Therefore, the method of deep learning can be utilized to realize the automatic diagnosis of NAFLD in the ultrasonic video so as to improve the diagnosis efficiency.
The most major problems facing the task of NAFLD diagnosis in ultrasound video are interference of extraneous information and poor characterization of the low quality of the ultrasound itself.
Disclosure of Invention
In order to solve the problems, the invention provides a NAFLD ultrasonic video diagnosis system based on a twin attention network, so as to realize efficient automatic diagnosis of NAFLD.
The invention adopts the following technical scheme:
a NAFLD ultrasonic video diagnosis system based on a twin attention network is composed of two twin attention subnetworks which are identical in structure and share weight, and a loss function, wherein the twin attention subnetworks are composed of a double-current feature extraction module, a linear classification module and a context attention module, and the loss function is composed of binary cross entropy loss, contrast similarity loss and contrast difference loss.
Further, the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention.
Further, the sharing module is to extract low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used to extract features of a high level to generate attention.
Further, V ═ I for a given videot1, 2.. T }, the dual-stream feature extraction module providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
Further, the linear classification module uses a linear classifier to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
Further, the linear classification module is based on the feature f extracted by the double-current feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
Further, the contextual attention module scores the importance of each frame in conjunction with the context for highlighting the discriminative information on key frames and suppressing extraneous information for useless frames.
Further, the contextual attention module is based on a feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
Figure GDA0003562399040000021
Figure GDA0003562399040000022
Figure GDA0003562399040000023
wherein, therein
Figure GDA0003562399040000024
And
Figure GDA0003562399040000025
respectively represent parameters of
Figure GDA0003562399040000026
Forward LSTM (t from 1 to t) and parameters of
Figure GDA0003562399040000027
Backward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure GDA0003562399040000028
further, at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
Figure GDA0003562399040000029
further, the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSSL+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each video
Figure GDA0003562399040000036
With the true value y, the final loss function can be calculated as follows:
Figure GDA0003562399040000031
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure GDA0003562399040000032
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure GDA0003562399040000033
thus, the contrast similarity loss is calculated as follows:
Figure GDA0003562399040000034
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure GDA0003562399040000035
thus, the loss of contrast variability is calculated as follows:
Figure GDA0003562399040000041
after adopting the technical scheme, compared with the background technology, the invention has the following advantages:
the context attention network effectively solves the problem of irrelevant information interference by introducing an attention mechanism; the negative influence of low quality of ultrasound is relieved to a certain extent by combining time sequence information; the characteristics used for the classification module and the attention module are respectively extracted by adopting different branches in the double-flow characteristic extraction module, so that the expressiveness of the extracted characteristics is effectively improved, and the performance of the system is further improved, and the expressiveness of the system is further improved by combining the double-flow characteristic extraction module with a loss function (namely binary cross entropy loss, contrast similarity loss and contrast difference loss), so that the accuracy of 90.56%, the specificity of 88.26% and the sensitivity of 93.58% are finally obtained.
Drawings
Fig. 1 is a schematic diagram of twin attention subnetwork structure of the NAFLD ultrasonic diagnostic system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight, wherein the twin attention network consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, as shown in figure 1, the twin attention subnetworks consist of a double-current feature extraction module a, a linear classification module b and a context attention c module, and the loss function consists of binary cross entropy loss, contrast similarity loss and contrast difference loss.
The double-flow feature extraction module a comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module a is used for extracting different features of classification and attention.
The sharing module is used for extracting low-level features shared by the classification module and the attention module, and the calculation cost is greatly reduced while the relevance of two task bottom layers is established; then, the classification module is used for extracting the high-level features to generate classification, and the attention module is used for extracting the high-level features to generate attention, at the moment, the features of different branches can be well suitable for the requirements of different tasks, and the effect of a follow-up module is further improved.
For a given video V ═ It1, 2.. T }, said dual-stream feature extraction module a providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
The linear classifier is used by the linear classification module b to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
The linear classification module b extracts the features f based on the double-flow feature extraction module aclsLearning the linear mapping W ∈ R1 ×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsSigmoid function normalizes scalar to be between 0 and 1, representing the final probability valueAs follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
The context attention module c scores the importance of each frame in conjunction with the context for highlighting the discriminative information on the key frames and suppressing extraneous information for the useless frames.
The contextual attention module c is based on the feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
Figure GDA0003562399040000051
Figure GDA0003562399040000052
Figure GDA0003562399040000053
wherein, therein
Figure GDA0003562399040000054
And
Figure GDA0003562399040000055
respectively represent parameters of
Figure GDA0003562399040000056
Forward LSTM (t from 1 to t) and parameters of
Figure GDA0003562399040000057
Backward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure GDA0003562399040000061
at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
Figure GDA0003562399040000062
after the NAFLD ultrasonic video diagnosis system is constructed, a training process is started to the model, and loss functions used in the training process are divided into the following three parts: binary Cross Entropy Loss (BCEL), Contrast Similarity Loss (CSL), Contrast Dissimilarity Loss (CDL). The binary cross entropy loss acts on the final diagnosis result, the difference between the final diagnosis result and the actual value is measured, and each module is optimized; the contrast similarity loss measures the feature similarity of the key frame part between the positive sample pair and the negative sample pair, and the contrast difference loss measures the feature difference of the key frame part between the positive sample pair and the negative sample pair, so that the selection capability of the model on the key frame is promoted, and the expressiveness of the features is enhanced.
The mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSS1+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each video
Figure GDA0003562399040000063
With the true value y, the final loss function can be calculated as follows:
Figure GDA0003562399040000064
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure GDA0003562399040000065
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure GDA0003562399040000071
thus, the contrast similarity loss is calculated as follows:
Figure GDA0003562399040000072
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure GDA0003562399040000073
thus, the loss of contrast variability is calculated as follows:
Figure GDA0003562399040000074
the data used for training consisted of 520 subjects' liver ultrasound videos, with 260 videos from NAFLD patients and an additional 260 videos belonging to normal samples. Since the input of the training phase is a pair of positive and negative samples, we need to ensure that the positive and negative samples have the same length, so we sample 20 frames of images at equal intervals for all videos. The original resolution of the video is 800 × 600, and the sampling frequency is 31 Hz. After video acquisition, 3 doctors with abundant experience carry out manual annotation at the same time, and the voting results of the 3 doctors are synthesized to finally judge whether the subject suffers from NAFLD.
The evaluation indexes adopted in the embodiment include accuracy, specificity, sensitivity and AUC values. The following results were obtained:
(1) validity of dual stream feature extraction module
The ResNet50 is used as a basic network, improvement is carried out on the basis of the twin attention network, an original feature extraction module of the twin attention network is replaced by a double-flow feature extraction module, and model performances before and after replacement are compared to verify the superiority of the double-flow feature extraction module. The results obtained are as follows:
TABLE 1 double-flow feature extraction Module validation results
Method Rate of accuracy Specificity of Sensitivity of the composition AUC value
CAN(ResNet50) 0.8736 0.8322 0.9358 0.9415
CAN (double current feature extraction module) 0.8868 0.8622 0.9207 0.9459
As can be seen from the above table, the dual-flow feature extraction module effectively improves the performance of most of the indicators of the twin attention network compared to the original ResNet50 as the base network. Specifically, compared with the original twin attention network, the twin attention network using the dual-flow feature extraction module as the base network improves the accuracy and specificity by 0.72%, 3.00% and 0.44% respectively. The results prove that the classification and attention modules need different features, and the double-branch structure of the double-flow feature extraction module can effectively improve the features with task specificity and enhance the performance of the model.
(2) Effectiveness of contrast difference loss and contrast similarity loss
The contrast difference loss and the contrast similarity loss are respectively used for measuring the difference and the similarity of key frame parts between the positive and negative sample pairs, a pair of positive and negative samples are given on the basis of the double-current feature extraction network, and for the features required by the classification branches, the contrast difference loss enables the key frame parts to be as far away as possible, so that the models can be better distinguished; for features required by the attention branch, the contrast similarity loss makes the key frame part distance as close as possible, so that the model can better select the key frame.
In this embodiment, contrast difference loss and contrast similarity loss with different weights are introduced to a twin attention network to which a dual-flow feature extraction module is added, and the following results are obtained by comparing networks without introduced loss functions:
table 2 CDL and CSL validity verification results
Method (lambda) Rate of accuracy Specificity of Sensitivity of the composition AUC value
0(CAN+BFEM) 0.8868 0.8622 0.9207 0.9459
0.2 0.8942 0.8700 0.9269 0.9473
0.4 0.9056 0.8826 0.9358 0.9521
0.6 0.8903 0.8690 0.9192 0.9402
As shown in table 2, at lower weights, the indices were improved after the addition of both CDL and CSL, with the best results being achieved around 0.4, with improvements in accuracy, specificity, sensitivity and AUC values of 1.88%, 2.04%, 1.51%, 0.62%, respectively, compared to the group without the introduction of CDL and CSL.
(3) The effectiveness of the NAFLD ultrasonic video diagnosis system of the invention
Compared with the common twin attention network (CAN), the NAFLD ultrasonic video diagnosis System (SAN) provided by the invention is additionally provided with a double-flow feature extraction module, and a loss function (binary cross entropy loss, contrast difference loss and contrast similarity loss) BFEM and a newly designed loss function are introduced, for example, a table 3 shows a comparison result with the common twin attention network CAN.
TABLE 3 SAN superiority verification results
Method Rate of accuracy Specificity of Sensitivity of the composition AUC
CAN 0.8736 0.8322 0.9358 0.9415
SAN 0.9056 0.8826 0.9358 0.9521
As shown in table 3, SAN increased by 2.2%, 5.04% and 1.06%, respectively, compared to CAN, with the same sensitivity.
From the above results, it can be seen that the BFEM in the ANFLD ultrasound video diagnostic system SAN provided by the present invention effectively extracts different features required for classification and attention, and the newly designed CSL and CDL further constrain the distribution of the features, enhancing the expressiveness of the features, and finally obtaining an accuracy of 90.56%, a specificity of 88.26%, and a sensitivity of 93.58% by combining the two, which proves the feasibility and effectiveness of SAN.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (6)

1. A NAFLD ultrasonic video diagnosis system based on twin attention network is characterized in that: the system comprises two twin attention subnetworks with the same structure and sharing weight and a loss function, wherein the twin attention subnetworks comprise a double-current feature extraction module, a linear classification module and a context attention module, and the loss function comprises binary cross entropy loss BCE, contrast similarity loss CSL and contrast difference loss CDL;
the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention; the sharing module is used for extracting low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used for extracting features of a high level to generate attention;
the linear classification module predicts the probability that each frame belongs to the NAFLD by using a linear classifier, and provides fine-grained reference for final diagnosis;
the contextual attention module scores the importance of each frame in conjunction with the context for highlighting discriminative information on key frames and suppressing irrelevant information of useless frames.
2. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein:
for a given video V ═ It1, 2.. T }, the dual-stream feature extraction module providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
3. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the linear classification module is based on the feature f extracted by the double-flow feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
4. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the contextual attention module is based on a feature vector f for each frameattHidden layer containing timing information further extracted by Bi-LSTMThe features may be expressed as follows:
Figure FDA0003562399030000021
Figure FDA0003562399030000022
Figure FDA0003562399030000023
wherein, therein
Figure FDA0003562399030000024
And
Figure FDA0003562399030000025
respectively represent parameters of
Figure FDA0003562399030000026
Has forward LSTM and parameters of
Figure FDA0003562399030000027
Backward LSTM of (1), where forward LSTM represents t from 1 to t and backward LSTM represents t from t to 1, and then the fully connected layer is linearly mapped W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
Figure FDA0003562399030000028
5. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the final probability value obtained is used for representing the whole videoThe diagnostic result of (a), said diagnostic result being represented as:
Figure FDA0003562399030000029
6. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LCSL+LCDL);
wherein, λ is a scale factor controlling the relative importance of binary cross entropy loss BCE, contrast similarity loss CSL and contrast difference loss CDL;
the binary cross entropy loss is based on the prediction probability of each video
Figure FDA00035623990300000211
With the true value y, the final loss function can be calculated as follows:
Figure FDA00035623990300000210
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
Figure FDA0003562399030000031
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
Figure FDA0003562399030000032
the loss of contrast similarity is calculated as follows:
Figure FDA0003562399030000033
wherein, P represents the positive and negative sample pair logarithm in a batch;
the contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
Figure FDA0003562399030000034
the loss of contrast variability was calculated as follows:
Figure FDA0003562399030000035
CN202010924390.7A 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network Active CN112085718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010924390.7A CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010924390.7A CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Publications (2)

Publication Number Publication Date
CN112085718A CN112085718A (en) 2020-12-15
CN112085718B true CN112085718B (en) 2022-05-10

Family

ID=73732599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010924390.7A Active CN112085718B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis system based on twin attention network

Country Status (1)

Country Link
CN (1) CN112085718B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354017B (en) * 2020-03-04 2023-05-05 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111539316B (en) * 2020-04-22 2023-05-05 中南大学 High-resolution remote sensing image change detection method based on dual-attention twin network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network

Also Published As

Publication number Publication date
CN112085718A (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN109493308B (en) Medical image synthesis and classification method for generating confrontation network based on condition multi-discrimination
CN108765383B (en) Video description method based on deep migration learning
Chen et al. Multi-label chest X-ray image classification via semantic similarity graph embedding
CN111985538A (en) Small sample picture classification model and method based on semantic auxiliary attention mechanism
CN111325750B (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
Yang et al. TTL-IQA: Transitive transfer learning based no-reference image quality assessment
CN111539491B (en) System and method for classifying multiple nodules based on deep learning and attention mechanism
CN114169442B (en) Remote sensing image small sample scene classification method based on double prototype network
CN107301409B (en) System and method for selecting Bagging learning to process electrocardiogram based on Wrapper characteristics
CN108305253A (en) A kind of pathology full slice diagnostic method based on more multiplying power deep learnings
CN115496720A (en) Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment
CN116864103A (en) Myopenia diagnosis method based on multi-modal contrast learning
CN117408946A (en) Training method of image processing model and image processing method
CN114093507A (en) Skin disease intelligent classification method based on contrast learning in edge computing network
Kang et al. Label-assemble: Leveraging multiple datasets with partial labels
CN112085742B (en) NAFLD ultrasonic video diagnosis method based on context attention
Zhou et al. Multi-objective evolutionary generative adversarial network compression for image translation
CN117371511A (en) Training method, device, equipment and storage medium for image classification model
CN113344028A (en) Breast ultrasound sequence image classification method and device
CN117516937A (en) Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement
CN112085718B (en) NAFLD ultrasonic video diagnosis system based on twin attention network
CN115834161A (en) Power grid false data injection attack detection method of artificial intelligence four-layer architecture
CN113486969A (en) X-ray image classification method based on improved Resnet network
Amalia et al. The Application of Modified K-Nearest Neighbor Algorithm for Classification of Groundwater Quality Based on Image Processing and pH, TDS, and Temperature Sensors
CN117496126B (en) Automatic image positioning system and method based on keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant