CN112085742B - NAFLD ultrasonic video diagnosis method based on context attention - Google Patents

NAFLD ultrasonic video diagnosis method based on context attention Download PDF

Info

Publication number
CN112085742B
CN112085742B CN202010923741.2A CN202010923741A CN112085742B CN 112085742 B CN112085742 B CN 112085742B CN 202010923741 A CN202010923741 A CN 202010923741A CN 112085742 B CN112085742 B CN 112085742B
Authority
CN
China
Prior art keywords
image
nafld
layer
features
contextual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010923741.2A
Other languages
Chinese (zh)
Other versions
CN112085742A (en
Inventor
王连生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010923741.2A priority Critical patent/CN112085742B/en
Publication of CN112085742A publication Critical patent/CN112085742A/en
Application granted granted Critical
Publication of CN112085742B publication Critical patent/CN112085742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

The invention discloses a NAFLD ultrasonic video diagnosis method based on context attention, which comprises the following steps: s1, acquiring liver ultrasonic videos, and dividing the liver ultrasonic videos into a training set and a testing set; s2, preprocessing a liver ultrasonic video, performing sparse sampling with 5 frames as intervals, and cutting and scaling an acquired image to obtain an image with 224 multiplied by 224 resolution; s3, constructing a context attention model, inputting the images in the step S2 into the context attention model, carrying out group normalization processing, extracting features and key frame images, and obtaining NAFLD ultrasonic video diagnosis results by matching with context information; according to the invention, the key frame image of the ultrasonic video is extracted through self-learning, so that the influence of subjective factors is avoided, the context information of the ultrasonic video is fully utilized, and the accuracy of NAFLD ultrasonic video diagnosis is improved.

Description

NAFLD ultrasonic video diagnosis method based on context attention
Technical Field
The invention relates to the technical field of ultrasonic video diagnosis, in particular to a NAFLD ultrasonic video diagnosis method based on context attention.
Background
Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases, and recent researches show that the prevalence of NAFLD in Asian areas is as high as 29.62%, and the prevalence of NAFLD in general population in China is as high as 29.81%, and early NAFLD is benign and reversible, but is easily further worsened into liver fibrosis, liver cirrhosis, liver cancer and other irreversible liver diseases if no timely intervention or treatment is performed.
The ultrasonic diagnosis is a preferred mode for screening NAFLD, has the advantages of non-invasiveness, low cost and the like, the development of related technologies of a computer-aided system provides a new thought for the ultrasonic diagnosis of NAFLD, a series of technologies taking deep learning as a core greatly improves the high efficiency and objectivity of NAFLD diagnosis, and lightens the burden of doctors, however, the existing method generally depends on manually extracted key frame images rather than directly diagnosing acquired video data, the characteristics of low quality and non-uniform operation specification of the ultrasonic imaging technology per se increase the difficulty of manual extraction, the difference of level and experience exists among doctors, unavoidable subjective factors are introduced, and the accuracy of a final diagnosis result is low.
Disclosure of Invention
The invention aims to provide a NAFLD ultrasonic video diagnosis method based on context attention, which is used for extracting key frame images of an ultrasonic video through self-learning, avoiding the influence of subjective factors, fully utilizing the context information of the ultrasonic video and improving the accuracy of NAFLD ultrasonic video diagnosis.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a NAFLD ultrasonic video diagnosis method based on context attention comprises the following steps:
s1, acquiring liver ultrasonic videos, and dividing the liver ultrasonic videos into a training set and a testing set;
s2, preprocessing a liver ultrasonic video, performing sparse sampling with 5 frames as intervals, and cutting and scaling an acquired image to obtain an image with 224 multiplied by 224 resolution;
s3, constructing a context attention model, inputting the images in the step S2 into the context attention model, carrying out group normalization processing, extracting features and key frame images, and obtaining NAFLD ultrasonic video diagnosis results by matching with context information.
Further, the preprocessing in step S2 is specifically that the liver ultrasound video acquires an image with resolution of 800×600, the image is subjected to data enhancement by rotating or translating the image, and the image with resolution of 224×224 is obtained by scaling after the image with resolution of 512×512 is cut into a central region.
Further, the context attention model in the step S3 includes a feature extraction module, a linear classification module and a context attention module, the feature extraction module is used for encoding and extracting features of the image of the liver ultrasonic video, the linear classification module is used for classifying the image of the liver ultrasonic video in an image level manner to obtain a classification result, the context attention module is used for extracting context information of the image of the liver ultrasonic video and scoring the image in combination with the features to obtain a scoring result, and the scoring result is combined with the classification result of the linear classification module to obtain a NAFLD ultrasonic video diagnosis result.
Further, the feature extraction module is provided with bottleneck units, the context attention model is provided with 5 stages, the resolution of each stage is reduced by 1 time after the image is input into the context attention model, a 7×7 convolution layer is adopted in a first layer convolution kernel of the 1 st stage to obtain a receptive field, 3, 4, 6 and 3 bottleneck units are respectively used in the 2 nd-5 th stages after the receptive field passes through a maximum pooling layer with the step length of 2, high-level features of the image are extracted through the bottleneck units, the corresponding 2048-dimensional feature vectors are obtained through global average pooling, and the 2048-dimensional feature vectors are used as codes of the image.
Further, the bottleneck unit is composed of a convolution layer stack matching group normalization layer and an identity mapping branch, the output end of the convolution layer is connected with the group normalization layer and a ReLU activation layer, the dimension of the image is firstly reduced through the convolution layer, the first normalization is carried out, the high-level features are extracted through activation, the second normalization is carried out, the initial dimension of the restored image is activated, the third normalization is carried out, the activation is overlapped with the features output by the identity mapping branch, and the final bottleneck unit output is obtained after activation.
Further, the bottleneck unit is provided with 3 layers of convolution layers, and the convolution kernels of the 3 layers of convolution layers are 1×1, 3×3 and 1×1 respectively.
Further, the group normalized formula is specifically as follows:
wherein x is i As a feature of the image,for normalized features, i is the dimension index of the feature, and the feature map i= (i) of the two-dimensional image N ,i C ,i H ,i W ) N is the batch size of the feature map, C is the channel number of the feature map, H is the height of the feature map, W is the width of the feature map, μ i Is the mean value, sigma i Is the variance S i For a subset of pixels used to calculate the mean and variance, m is S i E is a positive constant close to 0, G is the number of group normalizes, C/G is the number of channels normalized per group.
Further, the linear classification module classifies the probability of whether the image is NAFLD through a linear classifier; the linear classification module comprises a full connection layer and a Sigmoid activation layer, the features extracted by the feature extraction module are subjected to weighted fusion on each dimension through the full connection layer to obtain one-dimensional scalar, the one-dimensional scalar is classified by using the linear classifier to obtain key features and useless features, and the key features and the useless features are normalized through the Sigmoid activation layer to obtain a classification result that the probability of the image belonging to NAFLD is 0-1.
Further, the context attention module comprises a Bi-LSTM layer, a full connection layer and a softmax activation layer, context information among images is extracted through the Bi-LSTM layer, hidden layers of the context information are mapped to one-dimensional scalar quantities to represent importance degrees of the context information through the full connection layer, normalization processing is carried out on the importance degrees of all the images through the softmax activation layer, and scoring results are obtained through scoring the importance degrees.
Further, the Bi-LSTM layer comprises a forward LSTM and a backward LSTM, and the hidden layer state mapped to the one-dimensional scalar is obtained by splicing the hidden layer state output by the forward LSTM and the hidden layer state output by the backward LSTM.
After the technical scheme is adopted, compared with the background technology, the invention has the following advantages:
1. the method comprises the steps of obtaining an image by sampling a liver ultrasonic video, obtaining a zoomed image for a contextual attention model by preprocessing the image, inputting the image into the contextual attention model, extracting the characteristics of the image, classifying the characteristics to obtain a key frame image, and obtaining a NAFLD ultrasonic video diagnosis result by analyzing the contextual attention model in combination with contextual information; the key frame images of the ultrasonic video are extracted through self-learning, so that the influence of subjective factors is avoided, the context information of the ultrasonic video is fully utilized, and the accuracy of NAFLD ultrasonic video diagnosis is improved.
2. According to the invention, the bottleneck unit of the feature extraction module is used for encoding the image of the liver ultrasonic video and extracting the features, the group normalization layer is added in the convolution layers for coordinating the data distribution of each convolution layer, the feature is firstly subjected to dimension reduction and extraction, then the parameter quantity required by feature extraction can be reduced, the risk of overfitting is effectively avoided, the linear classification module is used for classifying the image of the liver ultrasonic video in an image level manner, a classification result is obtained, the classification result is a key frame image and a useless frame image, the context attention model is focused on the key frame image, and the useless frame image is ignored.
3. According to the invention, the context information of the liver ultrasonic video image is extracted through the context attention module, the importance degree of the image is scored by combining the characteristics, the scoring result is obtained, the NAFLD ultrasonic video diagnosis result is obtained by combining the scoring result with the classification result of the linear classification module, the Bi-LSTM layer can give consideration to the front and back unit information, and the hidden layer state output by the forward LSTM and the hidden layer state output by the backward LSTM are spliced to obtain more reliable time sequence characteristics.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of a specific flow chart of the present invention;
FIG. 3 is a schematic diagram of the bottleneck unit structure of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
The invention discloses a NAFLD ultrasonic video diagnosis method based on context attention, which is shown in the accompanying drawings from 1 to 3, and comprises the following steps:
s1, acquiring liver ultrasonic videos, and dividing the liver ultrasonic videos into a training set and a testing set.
S2, preprocessing the liver ultrasonic video, performing sparse sampling with 5 frames as intervals, and cutting and scaling the acquired image to obtain an image with 224 multiplied by 224 resolution.
S3, constructing a context attention model, inputting the images in the step S2 into the context attention model, carrying out group normalization processing, extracting features and key frame images, and obtaining NAFLD ultrasonic video diagnosis results by matching with context information.
The preprocessing in step S2 is specifically that the liver ultrasound video acquires an image with resolution of 800×600, the image is subjected to data enhancement by rotating or translating the image, and the image with resolution of 224×224 is obtained by cutting the image into an image with resolution of 512×512 in the central area and then scaling the image.
With reference to fig. 1 and 2, the contextual model in step S3 includes a feature extraction module, a linear classification module and a contextual attention module, the feature extraction module is used for encoding and extracting features of the image of the liver ultrasound video, the linear classification module is used for classifying the image of the liver ultrasound video in an image level to obtain a classification result, the contextual attention module is used for extracting contextual information of the image of the liver ultrasound video and scoring the image in combination with the features to obtain a scoring result, and the scoring result is combined with the classification result of the linear classification module to obtain a NAFLD ultrasound video diagnosis result.
The feature extraction module is provided with a bottleneck unit, is based on ResNet50, and adds a group normalization layer after a convolution layer of the bottleneck unit so as to ensure the stability of input data distribution of each layer; the context attention model is provided with 5 stages, the resolution of each stage is reduced by 1 time after the image is input into the context attention model, a 7X 7 convolution layer is adopted for a first layer convolution kernel of the 1 st stage to obtain a receptive field, 3, 4, 6 and 3 bottleneck units are respectively used in the 2 nd-5 stages after a maximum value pooling layer with the step length of 2 is adopted, the high-level features of the image are extracted through the bottleneck units, the corresponding 2048-dimensional feature vectors are obtained through global average pooling, and the 2048-dimensional feature vectors are used as codes of the image.
Referring to fig. 3, the bottleneck unit is composed of a convolutional layer stack matching group normalization layer and an identity mapping branch, the output end of the convolutional layer is connected with the group normalization layer and a ReLU activation layer, the dimension of the image is firstly reduced by the convolutional layer, the first normalization is performed, the high-level features are extracted by activation, the second normalization is performed, the initial dimension of the restored image is activated, the third normalization is performed, the activation is overlapped with the features output by the identity mapping branch, and the final bottleneck unit output is obtained after activation.
The bottleneck unit is provided with 3 layers of convolution layers, and the convolution kernels (Conv) of the 3 layers of convolution layers are 1×1, 3×3 and 1×1 respectively.
The formula for group normalization is specifically as follows:
wherein x is i As a feature of the image,for normalized features, i is the dimension index of the feature, and the feature map i= (i) of the two-dimensional image N ,i C ,i H ,i W ) N is the batch size of the feature map, C is the channel number of the feature map, H is the height of the feature map, W is the width of the feature map, μ i Is the mean value, sigma i Is the variance S i For a subset of pixels used to calculate the mean and variance, m is S i E is a positive constant close to 0, G is the number of group normalizes, C/G is the number of channels normalized per group.
The feature extraction module has the advantages that: firstly, under the condition of less and difficult liver ultrasonic video annotation, resNet50 is used as a basic network (backbone) for feature extraction, the pre-training weight on a large data set can be effectively utilized for transfer learning, and the features with strong expression capacity are extracted; secondly, because the memory occupied by the video data is larger and the condition that the video length is not fixed exists, the size of the batch is limited, the distribution of frames in the video has high similarity, the mean value and variance deviation in the batch are larger, the problems are effectively solved through a group normalization method, and high performance and robustness can still be shown under the condition of smaller batch size.
The linear classification module classifies the probability of whether the image is NAFLD through a linear classifier; the linear classification module comprises a full-connection layer and a Sigmoid activation layer, the full-connection layer is used for carrying out weighted fusion on the features extracted by the feature extraction module in each dimension, and linear mapping W epsilon R is learned 1×D Obtaining a one-dimensional scalarWF t For one-dimensional scalar WF using linear classifier t Classifying to obtain key features and useless features, normalizing the key features and the useless features by a Sigmoid activation layer to obtain a classification result with NAFLD probability of 0-1, wherein the classification result represents a final probability value, and the classification result specifically comprises the following steps:
p t =σ(WF t +b)
where b is a constant term and σ is a Sigmoid function.
The context attention module comprises a Bi-LSTM layer, a full connection layer and a softmax activation layer, extracts context information between images through the Bi-LSTM layer, wherein the context information is context time sequence information between the images, maps hidden layers of the context information to one-dimensional scalar quantities to represent importance degrees of the hidden layers by using the full connection layer, and learns linear mapping W a ∈R 1×D/2 And normalizing the importance degrees of all the images through the softmax activation layer, and scoring the importance degrees to obtain a scoring result.
The Bi-LSTM layer comprises a forward LSTM layer and a backward LSTM layer, and the hidden layer state of the forward LSTM output and the hidden layer state of the backward LSTM output are spliced (Concat) to obtain the hidden layer state mapped to a one-dimensional scalar, and F is used for the feature vector based on each frame image t The hidden layer features containing context information, which are further extracted by the Vi-LSTM layer, are represented as follows:
wherein,and->The respective representation parameter is +.>Forward LSTM (t from 1 to t) and the parameter is +.>Is (t is from 1 to t),>hidden state for forward LSTM output, < >>Hidden state for backward LSTM output, h t H is the hidden layer state after splicing t-1 Is the hidden layer state at the last moment.
The advantages of the contextual attention module are expressed as: the first and the context attention modules play an auxiliary role on the linear classification module, not all images contain key information for judging whether NAFLD is carried out, and the classification result of part of images is meaningless, so that the importance degree of each frame of image needs to be scored through the attention module, the frames with higher scores are regarded as key frame images to be focused by the context attention model, and the useless frame images with lower scores are ignored, so that the reliable and accurate video overall classification result is obtained; the extraction of the second and context information plays an important role in scoring, and the combination of the context information is helpful for better judging whether each frame of image belongs to a key frame image or not, so that the influence caused by the low quality of ultrasound is relieved.
Combining the classification result of each frame of image output by the linear module and the importance degree score of each frame of image output by the contextual attention module to obtain NAFLD ultrasonic video diagnosis resultThe results were as follows:
the modules of the context attention model are jointly optimized through the cross entropy loss function, and the diagnosis result of each video is basedWith the true value y, the loss function formula is as follows:
wherein n represents the liver ultrasonic video number in the training set, only the labeling information of the video level is needed to calculate the cross entropy loss in the analysis process, and then the gradient is transmitted to each frame image in the same video at the same time by using back propagation, so that the dependence of the context attention model on the labeling of the key frame images is avoided.
Experimental evaluation
Extracting a sample from the obtained liver ultrasonic video to obtain NAFLD video diagnosis effect, evaluating the NAFLD video diagnosis effect by adopting accuracy (specificity), sensitivity and AUC value measurement models, and adopting a NAFLD diagnosis confusion matrix as shown in Table 1:
NAFLD patient (true value) Normal (true value)
NAFLD patient (predictive value) True positive number TP False positive number FP
Normal (predictive value) False negative number FN True negative number TN
Table 1NAFLD diagnostic confusion matrix accuracy (accuracy) represents the ratio of the number of correctly predicted samples to the number of all samples:
specificity (specificity) means the proportion of true positives in samples with predicted positives (the higher the specificity, the less prone to false detection):
sensitivity (sensitivity) means the proportion of true positives in a sample that is true positives (the higher the sensitivity, the less prone to missed detection):
the AUC measurement model represents the area under the curve of the ROC curve, and the abscissa of the ROC is the false positive rate (False Positive Rate, FPR) and the true positive rate (True Positive Rate, TPR), respectively, and the specific calculation formula is as follows:
ROC curves depict the variation of TPR with FPR at different thresholds, with larger AUC values of area under the curve indicating greater discrimination of the model.
Two commonly used feature extraction networks VGG16 and conceptionv 3 were chosen for comparison to demonstrate the effectiveness of the ResNet50, with the comparison results shown in table 2 below:
Methods Accuracy Specificity Sensitivity AUC
VGG16 0.7838 0.7763 0.7973 0.8510
InceptionV3 0.8108 0.7949 0.8378 0.8721
ResNet50 0.8243 0.8077 0.8513 0.9035
TABLE 2 ResNet50 availability comparison Table
ResNet50 is superior to feature extraction networks VGG16 and InceptionV3 in accuracy, specificity, sensitivity, and AUC values.
In the case of using ResNet50 as the base network, the contextual attention model performance using group normalization and batch normalization, respectively, was compared and the results are shown in Table 3 below:
table 3 group normalization and batch normalization comparison table
Group normalization is superior to batch normalization in accuracy, specificity, sensitivity, and AUC values.
The contextual model performs effectiveness comparison by introducing contextual information and Bi-LSTM layers through the contextual model, and the comparison results are obtained by ablation as shown in table 4 below:
table 4 contextual attention module effectiveness comparison table
The accuracy, specificity, sensitivity and AUC values of the contextual attention model with Bi-LSTM and contextual attention module are all optimal.
The context attention model (CAN) is compared with the CNN model and the cnn+svm model to obtain the results of validity and feasibility, and the results are shown in table 5 below:
Methods Accuracy Specificity Sensitivity AUC
CNN+SVM 0.8243 0.8333 0.8108 0.8824
CNN 0.8311 0.8181 0.8513 0.9018
CAN 0.8243 0.8077 0.8513 0.9035
TABLE 5 comparison of the contextual attention model (CAN) with the CNN model and CNN+SVM model
Compared with a CNN+SVM model, the accuracy of the context attention model (CAN) is the same, the sensitivity and the AUC value are respectively improved by 4.05 percent, 2.11 percent, and the specificity is reduced by 2.56 percent; compared with a CNN model, the accuracy and the specificity of the method are slightly reduced by 0.68% and 1.04%, the sensitivity is the same, the AUC value is increased by 0.17%, the NAFLD diagnosis is more difficult to directly face to the liver ultrasonic video, the CNN is easily interfered by a large number of useless frames, the interference is avoided by a method for extracting a key frame image by a context attention model (CAN), on the other hand, the video contains abundant context information which is not possessed by the key frame image, and the effect of the context attention model (CAN) facing to the liver ultrasonic video is better.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A contextual attention-based NAFLD ultrasound video diagnostic method, comprising the steps of:
s1, acquiring liver ultrasonic videos, and dividing the liver ultrasonic videos into a training set and a testing set;
s2, preprocessing a liver ultrasonic video, performing sparse sampling with 5 frames as intervals, and cutting and scaling an acquired image to obtain an image with 224 multiplied by 224 resolution;
s3, constructing a context attention model, inputting the images in the step S2 into the context attention model, carrying out group normalization processing, extracting features and key frame images, and obtaining NAFLD ultrasonic video diagnosis results by matching with context information;
the contextual model in the step S3 comprises a feature extraction module, a linear classification module and a contextual attention module, wherein the feature extraction module is used for encoding and extracting features of the images of the liver ultrasonic video, the linear classification module is used for classifying the images of the liver ultrasonic video in an image level mode to obtain a classification result, the contextual attention module is used for extracting contextual information of the images of the liver ultrasonic video and scoring the images by combining the features to obtain a scoring result, and the scoring result is combined with the classification result of the linear classification module to obtain an NAFLD ultrasonic video diagnosis result.
2. The contextual attention-based NAFLD ultrasound video diagnostic method of claim 1, wherein: the preprocessing in step S2 is specifically that the liver ultrasound video acquires an image with resolution of 800×600, the image is subjected to data enhancement by rotating or translating the image, and the image with resolution of 224×224 is obtained by cutting the image into an image with resolution of 512×512 in the central area and then scaling the image.
3. The contextual attention-based NAFLD ultrasound video diagnostic method of claim 1, wherein: the feature extraction module is provided with bottleneck units, the context attention model is provided with 5 stages, the resolution of each stage is reduced by 1 time after the image is input into the context attention model, a 7×7 convolution layer is adopted in a first layer convolution kernel of the 1 st stage to obtain a receptive field, 3, 4, 6 and 3 bottleneck units are respectively used in the 2 nd-5 th stages after a maximum pooling layer with the step length of 2 is adopted, the high-level features of the image are extracted through the bottleneck units, the corresponding 2048-dimensional feature vectors are obtained through global average pooling, and the 2048-dimensional feature vectors are used as codes of the image.
4. A contextual attention-based NAFLD ultrasound video diagnostic method as defined in claim 3, wherein: the bottleneck unit consists of a convolution layer stack matched group normalization layer and an identity mapping branch, the output end of the convolution layer is connected with the group normalization layer and a ReLU activation layer, the dimension of the image is firstly reduced through the convolution layer, the first normalization is carried out, the high-level features are activated and extracted, the second normalization is carried out, the initial dimension of the restored image is activated, the third normalization is carried out, the activation is overlapped with the features output by the identity mapping branch, and the final bottleneck unit output is obtained after the activation.
5. The contextual awareness-based NAFLD ultrasound video diagnostic method of claim 4, wherein: the bottleneck unit is provided with 3 layers of convolution layers, and the convolution kernels of the 3 layers of convolution layers are respectively 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1.
6. The contextual awareness-based NAFLD ultrasound video diagnostic method of claim 4, wherein: the group normalized formula is specifically as follows:
wherein x is i As a feature of the image,for normalized features, i is the dimension index of the feature, and the feature map i= (i) of the two-dimensional image N ,i C ,i H ,i W ) N is the batch size of the feature map, C is the channel number of the feature map, H is the height of the feature map, W is the width of the feature map, μ i Is the mean value, sigma i Is the variance S i For use inIn the pixel subset for calculating the mean and variance, m is S i E is a positive constant close to 0, G is the number of group normalizes, C/G is the number of channels normalized per group.
7. The contextual attention-based NAFLD ultrasound video diagnostic method of claim 1, wherein: the linear classification module classifies the probability of whether the image is NAFLD through a linear classifier; the linear classification module comprises a full connection layer and a Sigmoid activation layer, the features extracted by the feature extraction module are subjected to weighted fusion on each dimension through the full connection layer to obtain one-dimensional scalar, the one-dimensional scalar is classified by using the linear classifier to obtain key features and useless features, and the key features and the useless features are normalized through the Sigmoid activation layer to obtain a classification result that the probability of the image belonging to NAFLD is 0-1.
8. The contextual attention-based NAFLD ultrasound video diagnostic method of claim 1, wherein: the context attention module comprises a Bi-LSTM layer, a full-connection layer and a softmax activation layer, context information among images is extracted through the Bi-LSTM layer, hidden layers of the context information are mapped to one-dimensional scalar quantities to represent importance degrees of the context information through the full-connection layer, normalization processing is carried out on the importance degrees of all the images through the softmax activation layer, and scoring results are obtained through scoring the importance degrees.
9. The contextual awareness-based NAFLD ultrasound video diagnostic method of claim 8, wherein: the Bi-LSTM layer comprises a forward LSTM and a backward LSTM, and the hidden layer state mapped to the one-dimensional scalar is obtained by splicing the hidden layer state output by the forward LSTM and the hidden layer state output by the backward LSTM.
CN202010923741.2A 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis method based on context attention Active CN112085742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010923741.2A CN112085742B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis method based on context attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010923741.2A CN112085742B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis method based on context attention

Publications (2)

Publication Number Publication Date
CN112085742A CN112085742A (en) 2020-12-15
CN112085742B true CN112085742B (en) 2024-04-16

Family

ID=73731455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010923741.2A Active CN112085742B (en) 2020-09-04 2020-09-04 NAFLD ultrasonic video diagnosis method based on context attention

Country Status (1)

Country Link
CN (1) CN112085742B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091507B (en) * 2021-09-02 2022-07-29 北京医准智能科技有限公司 Ultrasonic focus region detection method, device, electronic equipment and storage medium
CN117197107A (en) * 2023-09-21 2023-12-08 脉得智能科技(无锡)有限公司 Mammary gland ultrasonic video diagnosis system based on double-branch network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
CN110020682A (en) * 2019-03-29 2019-07-16 北京工商大学 A kind of attention mechanism relationship comparison net model methodology based on small-sample learning
CN111191078A (en) * 2020-01-08 2020-05-22 腾讯科技(深圳)有限公司 Video information processing method and device based on video information processing model
CN111310676A (en) * 2020-02-21 2020-06-19 重庆邮电大学 Video motion recognition method based on CNN-LSTM and attention
CN111340794A (en) * 2020-03-09 2020-06-26 中山大学 Method and device for quantifying coronary artery stenosis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
CN110020682A (en) * 2019-03-29 2019-07-16 北京工商大学 A kind of attention mechanism relationship comparison net model methodology based on small-sample learning
CN111191078A (en) * 2020-01-08 2020-05-22 腾讯科技(深圳)有限公司 Video information processing method and device based on video information processing model
CN111310676A (en) * 2020-02-21 2020-06-19 重庆邮电大学 Video motion recognition method based on CNN-LSTM and attention
CN111340794A (en) * 2020-03-09 2020-06-26 中山大学 Method and device for quantifying coronary artery stenosis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于注意力机制的上下文相关的问答配对方法;王路等;《中文信息学报》;125-132 *

Also Published As

Publication number Publication date
CN112085742A (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN111985536B (en) Based on weak supervised learning gastroscopic pathology image Classification method
CN112101451B (en) Breast cancer tissue pathological type classification method based on generation of antagonism network screening image block
CN111951288B (en) Skin cancer lesion segmentation method based on deep learning
CN109614869B (en) Pathological image classification method based on multi-scale compression reward and punishment network
CN114565761B (en) Deep learning-based method for segmenting tumor region of renal clear cell carcinoma pathological image
Al-Areqi et al. Effectiveness evaluation of different feature extraction methods for classification of covid-19 from computed tomography images: A high accuracy classification study
CN112085742B (en) NAFLD ultrasonic video diagnosis method based on context attention
US11538577B2 (en) System and method for automated diagnosis of skin cancer types from dermoscopic images
CN114399634B (en) Three-dimensional image classification method, system, equipment and medium based on weak supervision learning
CN111968124A (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
Chethan et al. An Efficient Medical Image Retrieval and Classification using Deep Neural Network
CN114140437A (en) Fundus hard exudate segmentation method based on deep learning
CN110992309B (en) Fundus image segmentation method based on deep information transfer network
CN116228759A (en) Computer-aided diagnosis system and apparatus for renal cell carcinoma type
CN116129200A (en) Bronchoscope image benign and malignant focus classification device based on deep learning
CN113255718B (en) Cervical cell auxiliary diagnosis method based on deep learning cascade network method
CN113011514B (en) Intracranial hemorrhage sub-type classification algorithm applied to CT image based on bilinear pooling
Zhu et al. Segmentation network with compound loss function for hydatidiform mole hydrops lesion recognition
CN115049603B (en) Intestinal polyp segmentation method and system based on small sample learning
CN112085718B (en) NAFLD ultrasonic video diagnosis system based on twin attention network
CN113658151B (en) Mammary gland lesion magnetic resonance image classification method, device and readable storage medium
CN117636064B (en) Intelligent neuroblastoma classification system based on pathological sections of children
CN114463277A (en) Method for automatically acquiring qualified four-cavity heart section image in fetal heart ultrasonic video
Dutta et al. Abnormality Detection and Segmentation in Breast Digital Mammography Images Using Neural Network
Chitra et al. Prediction Models Applying Convolutional Neural Network based Deep Learning to Cervical Cancer Outcomes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant