CN114998647B - Breast cancer full-size pathological image classification method based on attention multi-instance learning - Google Patents
Breast cancer full-size pathological image classification method based on attention multi-instance learning Download PDFInfo
- Publication number
- CN114998647B CN114998647B CN202210526657.6A CN202210526657A CN114998647B CN 114998647 B CN114998647 B CN 114998647B CN 202210526657 A CN202210526657 A CN 202210526657A CN 114998647 B CN114998647 B CN 114998647B
- Authority
- CN
- China
- Prior art keywords
- network
- stage
- full
- instances
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 206010006187 Breast cancer Diseases 0.000 title claims abstract description 21
- 208000026310 Breast neoplasm Diseases 0.000 title claims abstract description 21
- 230000001575 pathological effect Effects 0.000 title claims abstract description 13
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 19
- 239000013598 vector Substances 0.000 claims description 15
- 238000012795 verification Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000007170 pathology Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims description 3
- 230000015654 memory Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000006403 short-term memory Effects 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 2
- 210000001519 tissue Anatomy 0.000 description 7
- 206010028980 Neoplasm Diseases 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- WZUVPPKBWHMQCE-UHFFFAOYSA-N Haematoxylin Chemical compound C12=CC(O)=C(O)C=C2CC2(O)C1C1=CC=C(O)C(O)=C1OC2 WZUVPPKBWHMQCE-UHFFFAOYSA-N 0.000 description 2
- 206010027476 Metastases Diseases 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000001165 lymph node Anatomy 0.000 description 2
- 230000009401 metastasis Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 239000013255 MILs Substances 0.000 description 1
- 206010027459 Metastases to lymph nodes Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The breast cancer full-size pathological image classification method based on attention multi-instance learning comprises the following steps: step 1: acquiring a data set and a label; step 2: preprocessing a data set; step 3: constructing a two-stage full-size pathological image (WSI) classification network; step 4: storing the optimal weight of the two-stage network; step 5: and calculating the accuracy of the network on the test set. The SAMIL of the present invention introduces a lightweight and efficient SA module that fuses spatial attention and channel attention, which are used to capture pixel-level pairings and channel dependencies, respectively. SAMIL stack MHA with LSTM to adaptively highlight the most unique instance features to better calculate correlations between selected instances, improving classification accuracy.
Description
Technical Field
The invention relates to the technical field of image classification methods, in particular to a breast cancer full-size pathological image classification method based on attention multi-instance learning.
Background
According to the recent global cancer estimation, 230 ten thousand new diagnosis cases of breast cancer are found in 2020 women, and lung cancer has become the most common cancer worldwide. At the same time, the digitization of full size images (WSI), i.e. hematoxylin eosin (H & E) stained biopsy specimens, provides an exact reference for breast cancer diagnosis.
In recent years, with the breakthrough success of deep learning in various computer tasks, computer-aided WSI classification methods for cancer diagnosis have received more attention. In particular, some researchers turn WSI classification into a weakly supervised task and introduce multi-instance learning (MILs) as a solution to the problems of massive WSI scale and difficulty in pixel-level labeling in fully supervised learning. The MIL solution mainly focuses on two key links, namely an instance level selection module is constructed, positive probability of slice level images is calculated based on the extracted depth features, and the first K slices with the highest probability are taken as candidate instances; the design aggregation operator generates packet embeddings for calculating the score for each packet. Although multi-instance learning has made great progress in the task of classifying full-slice pathology images.
The defects of the method are that: feature correlation of each sub-feature is rarely described in the spatial or channel dimensions, which is detrimental to the discovery of cancer cells with microscopic breast cancer lymph node metastases. There are limitations in capturing dependencies between different instances that help classify WSI.
Disclosure of Invention
The invention aims to provide a full-size breast cancer pathological image classification method based on attention multi-instance learning, which can acquire more discriminative patch level representation and can improve accuracy of breast cancer metastasis lymph node pathological image classification.
A breast cancer full-size pathological image classification method based on attention multi-instance learning comprises the following steps:
Step 1: acquiring a data set and a label: acquiring a data set and a label of a breast cancer histopathological image, and randomly dividing the breast cancer histopathological image into a training set, a verification set and a test set according to a proportion;
Step 2: preprocessing a data set: preprocessing the divided data set based on inverse binarization thresholding operation, generating a mask of background/tissue area for each WSI picture, cutting the tissue area into slices with a size of a×a, and storing the coordinate set of the slices. In order to further reduce the calculated amount, a probability p is added, when the part of the tissue region in the slice is larger than the probability p, the coordinates of the slice are saved, and the processed WSI image X 'i can be expressed as X' i={xi,1,xi,2…,xi,m }, wherein m is the number of the slices in each full-size breast cancer pathological image;
Step 3: a two-stage full-scale pathology image (WSI) classification network is constructed: the method comprises the steps of selecting an instance in a first stage, extracting features of slices by using an SA-ResNet network, selecting the first K instances with the highest probability in each WSI (wireless sensor array) by using a multi-instance learning method, predicting the full-size level in a second stage, and reliably predicting the whole WSI image by using an aggregator constructed by superposing a multi-head attention (MHA) network and a long short-term memory (LSTM) network;
Step 31: at one stage, the SA-ResNet network performs feature extraction on the slice: taking a slice X ' ∈R C×H×W as the input of a pre-trained SA-ResNet network, obtaining a feature matrix X ε R c×h×w after the residual structure of ResNet, dividing X into G groups along the channel dimension by replacement attention, namely X= [ X 1,…,XG],Xk∈Rc/G×h×w,Xk ] is continuously divided into two branches, namely X k1,Xk2∈Rc/2G×h×w, one branch utilizes the inter-channel correlation, outputting a channel attention map, the other branch utilizes the inter-feature spatial relationship, generating a space attention map, connecting the results of the two branches, enabling the number of channels X ' k to be the same as the number of channels of X k, and then carrying out polymerization operation on all feature matrices X ' k, wherein the final output of the SA module is X out∈Rc×h×w.Xout, and generating the feature vector X gap of the slice through global average pooling.
Step 32: acquiring a small training SA-ResNet network: after the feature vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are ordered from small to large, and the T small blocks with the top probability rank in each full-size image are taken to train the SA-ResNet network.
Step 33: input V to obtain full-size level prediction: and predicting the slices in each WSI by using a one-stage pre-trained optimal weight file, sequencing the predicted probabilities, and taking the first K instances with the highest probability in each full-size image as the input V= [ V 1,…,vK]∈RK×C ] of full-size level prediction.
Step 34: the first K instances with highest aggregate probability: with MHA and LSTM, for the i-th head attention unit (H i) in MHA, the calculation formula is as follows:
Wherein v= [ V 1,…,vK]∈RK×C, V represents the number of instances of the first K selected instance features, K represents the number of instances, V 1,…,vK represents a single instance feature, V j,vk e V, C is the instance feature embedding dimension, the convolution kernels are W e R D×1 and Z e R D×C, D is the feature embedding dimension. The hyperbolic tangent tanh is the activation function. After element multiplication o, for MHA, another convolution is performed to project back to the original dimension for all outputs of the connector unit:
Wherein, Representing the first K instances after feature enhancement, v= [ V 1,…,vK]∈RK×C, V represents the selected first K instance features, K represents the number of instances, V 1,…,vK represents a single instance feature, W pro∈R(H×D)×C represents a convolution kernel, T represents a transpose of the matrix, H 1,…,Hh represents the head attention unit, H represents the number of heads, C and D feature embedding dimensions.
Step 35: further modeling the dependencies between the selected Top-K instances: LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image level representations. LSTM can capture short-term and long-term dependencies, given an input feature sequence (v 1,…,vK), and the hidden layer of LSTM is recursively calculated from t=1 to t=k using the following formula:
Wherein f t,it,ot represents a forget gate, an input gate, and an output gate, respectively. W {f,i,o,c} and U {f,i,o,c} represent weight matrices to be learned, b {f,i,o,c} represents bias vectors, h t-1 is a hidden vector, c t represents memory cells, sigmoid and hyperbolic tangent tanh represent activation functions. The output of the last LSTM is used as the final packet level representation vector for prediction.
Step 4: saving the optimal weight of the two-stage network: inputting the data set into a two-stage classification network, training the one-stage network by adopting a training set, updating network parameters in each iteration, verifying the verification set once every three iterations, storing the optimal weight of the one-stage network according to the accuracy of the optimal verification set, processing the data set by using the optimal weight of the one-stage, selecting K instances with the highest probability rank in each WSI as the input of the two stages, initializing the two-stage network by using the optimal weight of the one-stage, verifying once after finishing one iteration in each training, and storing the optimal weight of the two-stage network according to the accuracy of the optimal verification set;
Step 5: calculating the accuracy of the network on the test set: and initializing a network by using two-stage optimal weights, inputting a test set into the network to obtain a prediction result of each WSI, comparing the prediction result with real tag data, counting the number of WSIs which are correctly predicted and incorrectly predicted, and calculating the accuracy of the network on the test set.
Compared with the prior art, the invention has the following beneficial effects:
(1) SAMIL introduces a lightweight and efficient SA module that fuses spatial attention and channel attention, which are used to capture pixel-level pairwise relationships and channel dependencies, respectively.
(2) SAMIL stack MHA with LSTM to adaptively highlight the most unique instance features to better calculate correlations between selected instances, improving classification accuracy.
Drawings
Fig. 1 is an overall frame diagram of the SAMIL model.
Detailed Description
Experimental data used in the present invention was from the lymph node metastasis dataset of 2016Camelyon Grand Challenge. The dataset contained 399 complete full-size images, including both normal and metastatic forms, for detection of metastasis in HE-stained tissue sections of sentinel-assisted lymph nodes of breast cancer patients.
In a schematic diagram of the invention, a method for classifying full-size pathological images of two-stage breast cancer based on attention-deficit-increasing examples comprises the following steps:
step 1: acquiring a data set and a label: the lymph node metastasis data set is randomly divided into training sets according to the proportion of 2:1:1: verification set: test set, wherein training set 204, verification set 95, test set 100.
Step 2: preprocessing a data set: the method is used for preprocessing the divided data set based on inverse binarization thresholding operation, generating a mask of a background/tissue area for each WSI picture, dividing the tissue area into sections with the size of 512 multiplied by 512, and storing coordinate sets of the sections. In order to further reduce the calculated amount, a probability value of 0.4 is added, coordinates of the slice are saved when the part of the tissue area in the slice is larger than 0.4, and the processed WSI image X 'i can be expressed as X' i={xi,1,xi,2…,xi,m, wherein m is the number of the slices in each full-size breast cancer pathological image;
Step 3: a two-stage full-scale pathology image (WSI) classification network is constructed: the method comprises the steps of selecting a first stage for example, extracting features of slices by using an SA-ResNet network, selecting 10 examples with the highest probability in each WSI (wireless sensor array) by using a multi-example learning method, predicting the whole WSI image by using a full-size level prediction model, and reliably predicting the whole WSI image by using an aggregator constructed by superposing a multi-head attention (MHA) network and a long short-term memory (LSTM) network;
Step 31: at one stage, the SA-ResNet network performs feature extraction on the slice: slice x i,j∈R3×512×512 is scaled to 224 x 3 pixels as input to the pre-training SA-ResNet network. The SA module is inserted into each residual stage (e.g., conv2_x) in ResNet-50. The input to SA is the feature matrix X ε R 256×56×56. The SA module first divides X into 64 groups along the channel dimension, i.e., x= [ X 1,…,Xk,…,X64],Xk∈R4×56×56,Xk ] is further divided into two branches, X k1,Xk2∈R2×56×56 respectively, one branch uses the inter-channel relationship, outputs a channel attention pattern X 'k1∈R2×56×56, the other branch uses the inter-feature spatial relationship, generates a spatial attention pattern X' k2∈R2×56×56, connects the two branches to obtain X 'k∈R4×56×56, and then performs an aggregation operation on all feature matrices X' k, and the final output of the SA module is X out∈R256×56×56. The SA modules in conv3_x, conv4_x, conv5_x residual blocks are the same, and the feature vector generated by global average pooling of X out is X gap∈R2048×1×1.
Step 32: acquiring a small training SA-ResNet network: after the feature vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are ordered from small to large, and 2 small blocks with the highest probability rank in each full-size image are taken to train the SA-ResNet network.
Step 33: input V to obtain full-size level prediction: and predicting the slices in each WSI by using a one-stage pre-trained optimal weight file, sequencing the predicted probabilities, and taking the first 10 instances with the highest probability in each full-size image as input V= [ V 1,…,v10]∈R2048×1 ] of two-stage full-size level prediction.
Step 34: the first K instances with highest aggregate probability: with MHA and LSTM, for the i-th head attention unit in the multi-head attention, the calculation formula is as follows:
Where v= [ V 1,…,v10]∈R10×2048, V denotes the first 10 example features selected, V 1,…,v10 denotes the single example feature, V j,vk e V, and the convolution kernels are W e R 512×1 and Z e R 512×2048. The hyperbolic tangent tanh is the activation function. In element multiplication Thereafter, the key instances are highlighted according to the relationship between them. For MHA, all outputs of the connector unit of the invention, another convolution is performed to project back to the original dimension:
Wherein, Representing the first 10 examples after feature enhancement, v= [ V 1,…,v10]∈R10×2048, V represents the first 10 example features selected, V 1,…,v10 represents a single example feature, W pro∈R(3×512)×2048 represents a convolution kernel, T represents a transpose of the matrix, H 1,…,Hh represents the head attention unit, H represents the number of heads, in this study h=3. The multi-headed attention recalibrates all instance features from different representation subspaces enriching the original selected instance V.
Step 35: further modeling the dependencies between the first 10 selected instances: LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image level representations. LSTM can capture short-term and long-term dependencies, given an input feature sequence (v 1,…,v10), the hidden layer of LSTM is recursively calculated from t=1 to t=10 using the following formula: :
Wherein f t,it,ot represents a forget gate, an input gate, and an output gate, respectively. W {f,i,o,c} and U {f,i,o,c} represent weight matrices to be learned, b {f,i,o,c} represents bias vectors, h t is a hidden vector, c t is a memory unit, sigmoid and hyperbolic tangent tanh represent activation functions. In the feature fusion module, the present invention stacks two layers of LSTM so that enhanced instances can interact more fully. The output of the last LSTM is used as the final packet level representation vector for prediction.
Step 4: saving the optimal weight of the two-stage network: inputting the data set into a two-stage classification network, training the one-stage network by adopting a training set, updating network parameters in each iteration, verifying the verification set once every three iterations, storing the optimal weight of the one-stage network according to the accuracy of the optimal verification set, and during the training process, using an Adam optimizer to relieve the gradient vibration problem, wherein the learning rate is set to be 1e-4, and the weight attenuation is set to be 1e-5. Processing the data set by using one-stage optimal weights, selecting the 10 instances with the highest probability ranking in each WSI as two-stage inputs, initializing a two-stage network by using the one-stage optimal weights, setting the learning rate to be 1e-4 and the weight attenuation to be 1e-4 by using an Adam optimizer in the two-stage training process, performing 1 verification after each training is completed for 1 iteration, and storing the optimal weights of the two-stage network according to the accuracy of the optimal verification set;
Step 5: calculating the accuracy of the network on the test set: and initializing a network by using two-stage optimal weights, inputting a test set into the network to obtain a prediction result of each WSI, comparing the prediction result with 100 real label data of the test set, and counting the number of WSIs which are correctly predicted and incorrectly predicted so as to calculate SAMIL accuracy rate on the test set.
According to the steps, the invention provides a novel SAMIL model for a breast cancer WSI classification task. SAMIL uses a permuted attention (SA) module to select discrimination instances and uses LSTM's multi-head attention (MHA) to implement packet level prediction, thus exploring the benefits of attention mechanisms to solve MIL problems. In addition, the experimental result shows that compared with the most advanced MIL method, the method has excellent performance on Camelyon data sets, and the accuracy is 96.56% at most.
Claims (1)
1. The breast cancer full-size pathological image classification method based on attention multi-instance learning is characterized by comprising the following steps of: the method comprises the following steps: step 1: acquiring a data set and a label: acquiring a data set and a label of a breast cancer histopathological image, and randomly dividing the breast cancer histopathological image into a training set, a verification set and a test set according to a proportion; step 2: preprocessing a data set: preprocessing the divided data set based on inverse binarization thresholding operation, generating a mask of a background/tissue region for each WSI image, dividing the tissue region into slices with a size of a multiplied by a, storing a coordinate set of the slices, adding a probability p for further reducing the calculation amount, and storing the coordinates of the slices when the part of the tissue region in the slices is larger than the probability p, wherein m is the number of the slices in each full-size breast cancer pathological image, and the processed WSI image X 'i can be expressed as X' i={xi,1,xi,2…,xi,m; step 3: a two-stage full-scale pathology image (WSI) classification network is constructed: the method comprises the steps of selecting an instance in a first stage, extracting features of slices by using an SA-ResNet network, selecting the first K instances with the highest probability in each WSI (wireless sensor array) by using a multi-instance learning method, predicting the full-size level in a second stage, and reliably predicting the whole WSI image by using an aggregator constructed by superposing a multi-head attention (MHA) network and a long short-term memory (LSTM) network; step 4: saving the optimal weight of the two-stage network: inputting the data set into a two-stage classification network, training the one-stage network by adopting a training set, updating network parameters in each iteration, verifying the verification set once every three iterations, storing the optimal weight of the one-stage network according to the accuracy of the optimal verification set, processing the data set by using the optimal weight of the one-stage, selecting K instances with the highest probability rank in each WSI as the input of the two stages, initializing the two-stage network by using the optimal weight of the one-stage, verifying once after finishing one iteration in each training, and storing the optimal weight of the two-stage network according to the accuracy of the optimal verification set; step 5: calculating the accuracy of the classification network on the test set: using a two-stage optimal weight initializing network, inputting a test set into the classification network to obtain a prediction result of each WSI, comparing the prediction result with real tag data, counting the number of WSIs which are predicted correctly and mispredicted, and calculating the accuracy of the classification network on the test set; in step 3, step 31: at one stage, the SA-ResNet network performs feature extraction on the slice: taking a slice X ' ∈R C×H×W as an input of a pre-trained SA-ResNet network, after a residual structure of ResNet, obtaining a feature matrix X ε R c×h×w, dividing X into G groups along a channel dimension by replacement attention, namely X= [ X 1,…,XG],Xk∈Rc/G×h×w,Xk ] is continuously divided into two branches, namely X k1,Xk2∈Rc/2G×h×w, one branch utilizes the interrelation among channels, a channel attention map is output, the other branch utilizes the spatial relationship among features, a space attention map is generated, the results of the two branches are connected, the number of channels X ' k is the same as the number of channels of X k, then, performing polymerization operation on all feature matrices X ' k, and the final output of the SA module is X out∈Rc×h×w,Xout to generate a feature vector X gap of the slice through global average pooling; step 32: acquiring a small training SA-ResNet network: after the feature vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are ordered from small to large, and T small blocks with the highest probability rank in each full-size image are taken to train an SA-ResNet network; step 33: input V to obtain full-size level prediction: predicting the slices in each WSI by using a one-stage pre-trained optimal weight file, sequencing the predicted probabilities, and taking the first K instances with the highest probability in each full-size image as the input V= [ V 1,…,vK]∈RK×C ] of full-size level prediction; step 34: the first K instances with highest aggregate probability: with MHA and LSTM, for the i-th head attention unit (H i) in MHA, the calculation formula is as follows:
Where v= [ V 1,…,vK]∈RK×C, V denotes the number of instances of the first K selected instance features, K denotes the number of instances, V 1,…,vK denotes the single instance feature, V j,vk e V, C is the instance feature embedding dimension, the convolution kernels are W e R D×1 and Z e R D×C, D is the feature embedding dimension, hyperbolic tangent tanh is the activation function, and after element multiplication, another convolution is performed for all outputs of the connector unit to project back to the original dimension:
Wherein, Representing the first K instances after feature enhancement, v= [ V 1,…,vK]∈RK×C, V represents the selected first K instance features, K represents the number of instances, V 1,…,vK represents a single instance feature, W pro∈R(H×D)×C represents a convolution kernel, T represents a transpose of the matrix, H 1,…,Hh represents the head attention unit, H represents the number of heads, C and D feature embedding dimensions; step 35: further modeling the dependencies between the selected Top-K instances: LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image level representations, LSTM can capture short-term and long-term dependencies, given an input feature sequence (v 1,…,vK), and hidden layers of LSTM are recursively computed from t=1 to t=k using the following formula:
Wherein f t,it,ot represents a forgetting gate, an input gate and an output gate, W {f,i,o,c} and U {f,i,o,c} represent weight matrices to be learned, b {f,i,o,c} represents a bias vector, h t-1 is a hidden vector, c t represents a memory unit, sigmoid and hyperbolic tangent tanh represent an activation function, and the output of the last LSTM is used as a final packet level representation vector for prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210526657.6A CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210526657.6A CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998647A CN114998647A (en) | 2022-09-02 |
CN114998647B true CN114998647B (en) | 2024-05-07 |
Family
ID=83027208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210526657.6A Active CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998647B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237781B (en) * | 2023-11-16 | 2024-03-19 | 哈尔滨工业大学(威海) | Attention mechanism-based double-element fusion space-time prediction method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415212A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Abnormal cell detection method, device and computer readable storage medium |
CN114238577A (en) * | 2021-12-17 | 2022-03-25 | 中国计量大学上虞高等研究院有限公司 | Multi-task learning emotion classification method integrated with multi-head attention mechanism |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083705B (en) * | 2019-05-06 | 2021-11-02 | 电子科技大学 | Multi-hop attention depth model, method, storage medium and terminal for target emotion classification |
-
2022
- 2022-05-16 CN CN202210526657.6A patent/CN114998647B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415212A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Abnormal cell detection method, device and computer readable storage medium |
CN114238577A (en) * | 2021-12-17 | 2022-03-25 | 中国计量大学上虞高等研究院有限公司 | Multi-task learning emotion classification method integrated with multi-head attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN114998647A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804530B (en) | Subtitling areas of an image | |
CN107229757B (en) | Video retrieval method based on deep learning and Hash coding | |
CN109033978B (en) | Error correction strategy-based CNN-SVM hybrid model gesture recognition method | |
CN108764019A (en) | A kind of Video Events detection method based on multi-source deep learning | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN111325237B (en) | Image recognition method based on attention interaction mechanism | |
CN111276240A (en) | Multi-label multi-mode holographic pulse condition identification method based on graph convolution network | |
CN112597324A (en) | Image hash index construction method, system and equipment based on correlation filtering | |
CN112163114B (en) | Image retrieval method based on feature fusion | |
Salazar | On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling | |
CN113868448A (en) | Fine-grained scene level sketch-based image retrieval method and system | |
CN114998647B (en) | Breast cancer full-size pathological image classification method based on attention multi-instance learning | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
CN116363750A (en) | Human body posture prediction method, device, equipment and readable storage medium | |
Ma et al. | Dirichlet process mixture of generalized inverted dirichlet distributions for positive vector data with extended variational inference | |
CN113240033B (en) | Visual relation detection method and device based on scene graph high-order semantic structure | |
CN113516019B (en) | Hyperspectral image unmixing method and device and electronic equipment | |
Afzal et al. | Discriminative feature abstraction by deep L2 hypersphere embedding for 3D mesh CNNs | |
Wei et al. | A multiobjective group sparse hyperspectral unmixing method with high correlation library | |
CN113822134A (en) | Instance tracking method, device, equipment and storage medium based on video | |
Deffo et al. | CNNSFR: A convolutional neural network system for face detection and recognition | |
Termritthikun et al. | Evolutionary neural architecture search based on efficient CNN models population for image classification | |
CN116188428A (en) | Bridging multi-source domain self-adaptive cross-domain histopathological image recognition method | |
CN113887509B (en) | Rapid multi-modal video face recognition method based on image set | |
CN114821631A (en) | Pedestrian feature extraction method based on attention mechanism and multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |