CN114820481A - Lung cancer histopathology full-section EGFR state prediction method based on converter - Google Patents

Lung cancer histopathology full-section EGFR state prediction method based on converter Download PDF

Info

Publication number
CN114820481A
CN114820481A CN202210385274.1A CN202210385274A CN114820481A CN 114820481 A CN114820481 A CN 114820481A CN 202210385274 A CN202210385274 A CN 202210385274A CN 114820481 A CN114820481 A CN 114820481A
Authority
CN
China
Prior art keywords
image block
image blocks
egfr
formula
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210385274.1A
Other languages
Chinese (zh)
Other versions
CN114820481B (en
Inventor
祝新宇
史骏
束童
唐昆铭
孙宇
杨志鹏
张元�
王垚
郑利平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202210385274.1A priority Critical patent/CN114820481B/en
Publication of CN114820481A publication Critical patent/CN114820481A/en
Application granted granted Critical
Publication of CN114820481B publication Critical patent/CN114820481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a lung cancer histopathology full-section EGFR state prediction method based on a converter, which comprises the following steps: 1. acquiring a lung cancer histopathology full-section data set and preprocessing the data set; 2. the method comprises the following steps that a first stage is carried out, wherein a visual converter network model capable of predicting the positivity and negativity of an image block is established and trained; 3. predicting the positive and negative categories of the image blocks in the data set by using a trained visual converter network model capable of predicting the positive and negative of the image blocks, screening out negative image blocks, and generating an EGFR mutation type data set by using the positive image blocks; 4. establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the generated EGFR mutation type data set; 5. and (4) completing the prediction of the full-section EGFR state by using the first and second-stage trained models. The invention uses two visual converter network models to form a backbone network, thereby effectively reducing the error rate of the pseudo label and improving the accuracy of prediction.

Description

Lung cancer histopathology full-section EGFR state prediction method based on converter
Technical Field
The invention relates to the technical field of computer vision, in particular to a lung cancer histopathology full-section EGFR state prediction method based on a converter.
Background
EGFR (Epidermal growth factor receptor) is a transmembrane protein with cytoplasmic kinase activity that transduces important growth factor signals from the extracellular environment to the cell. Lung adenocarcinoma is a common histological type of lung cancer, and the discovery of Epidermal Growth Factor Receptor (EGFR) mutations revolutionized its therapeutic approach. Positive lung adenocarcinoma EGFR states can be broadly classified into mutant (mutant) and wild (wild) types, and positive EGFR states other than mutant and wild types are classified as other (other) types in this patent in order to ensure the accuracy of state classification. In first-line therapy, detection of EGFR mutations is crucial, as there are significant differences in medication and treatment for different types of EGFR status. Therefore, accurately judging the EGFR state plays an important role in patient treatment and doctor medication.
Sequencing of mutations from biopsies has become the gold standard for detection of EGFR mutations. In the actual diagnosis and treatment process, a pathologist needs to visually check tens of thousands of cells under a microscope. Each pathologist needs to process a large number of patient specimens every day, so that the fatigue of film reading is often caused, and misdiagnosis sometimes happens. Therefore, an efficient and quantitative method for predicting the EGFR state of the lung cancer histopathology full-section is needed, so that the burden of a pathologist on reading the lung cancer histopathology full-section is reduced, and the accuracy of predicting the EGFR state of the lung cancer histopathology full-section is improved. At present, the algorithm for predicting the EGFR state of the lung cancer histopathology full-section mainly takes a supervised learning classification algorithm based on deep learning as a main algorithm.
In recent years, deep learning models have achieved remarkable effects in various fields of computer vision, and some researchers have applied convolutional neural networks to lung cancer histopathology full-section EGFR state prediction tasks, such as residual error network (ResNet) and dense convolutional network (densenert), but they rely on inductive bias, cannot dynamically and adaptively model, cannot capture features between EGFR receptors on a spatial scale, and are difficult to accurately predict lung cancer EGFR states.
Disclosure of Invention
The invention aims to make up for the defects of the prior art, provides a converter-based lung cancer histopathology full-section EGFR state prediction method, aims to solve the problem that the lung cancer histopathology full-section EGFR state prediction is difficult due to the fact that pathology images are complex in structure, variable in type and rich in characteristic information, and obtains the internal long-distance dependence relationship of the lung cancer histopathology full-section images by constructing a two-stage network based on a visual converter, so that corresponding representations of EGFR receptors of different types are obtained, and accurate and efficient prediction of the lung cancer histopathology full-section EGFR state is completed.
The invention is realized by the following technical scheme:
a lung cancer histopathology full-section EGFR state prediction method based on a converter specifically comprises the following steps:
(1) acquiring a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image and preprocessing the lung cancer histopathology full-section data set;
(2) establishing and utilizing the data set in the step (1) to train a visual converter network model capable of predicting the positivity and negativity of the image block;
(3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks;
(4) establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3);
(5) and (4) utilizing the visual converter network model which is established in the step (2) and the step (4) and can predict the positivity and negativity of the image block and the visual converter network model which can predict the EGFR mutation type of the image block to complete the prediction of the full-slice EGFR state.
Acquiring and preprocessing a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image in the step (1), wherein the method specifically comprises the following steps:
the lung cancer histopathology full-slice images are sorted according to negative and positive labels, the full-slice images are subjected to blank background area removal and blocking treatment, a plurality of image blocks are obtained through random sampling and are marked as
Figure BDA0003593395680000021
Wherein,
Figure BDA0003593395680000022
representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y is i Representing the ith image block X i Assigning the positive and negative labels of the full-slice to the image blocks as the pseudo labels thereof according to the corresponding categories, thereby obtaining each image block and the positive and negative classification thereof; 1,2, …, N; n denotes the number of image blocks.
Establishing and training a visual converter network model capable of predicting the positivity and negativity of an image block by using the data set in the step (1), wherein the method specifically comprises the following steps:
a visual transformer ViT composed of L encoders is constructed as a first stage network, each encoder including: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 2.1, for image block X i Performing block processing to obtain a sequence containing m image blocks
Figure BDA0003593395680000031
Wherein,
Figure BDA0003593395680000032
representing image blocks X i The jth image block of (1);
Figure BDA0003593395680000033
representing image blocks X i Width and height of each image block after blocking, and m is P 2 /p 2
Step 2.2, setting a learnable classification mark x class And obtaining m image blocks and a classification mark x by using the formula (1) class D-dimensional embedded representation z l0 As input to the 1 st encoder;
Figure BDA0003593395680000034
in the formula (1), E pos Representing m image blocks and a class mark x class In image block X i (iii) a spatial position of; e represents the set embedding matrix;
step 2.3, obtaining m image blocks and classification marks x by using the formula (2) class Output z 'of multi-head attention device layer at l-th encoder' l
z' l =MSA(LN(z l-1 ))+z l-1 ,l=1,…,L (2)
In formula (2), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of l-1 Represents the output of the l-1 st encoder;
step 2.4, obtaining the output z of the multi-layer perceptron of the first encoder by using the formula (3) l
z l =MLP(LN(z′ l ))+z′ l ,l=1,…,L (3)
In formula (3), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;
step 2.5, output z of the multi-layer perceptron of the first encoder l Multiple attention layers fed to the l +1 st encoder gave output z' l+1 Z 'is further prepared' l+1 The multi-layer sensor which is sent to the (l + 1) th encoder obtains an output z l+1 Repeating the step 2.5 times until the L encoder, and obtaining the output z of the L encoder L
Step 2.6, obtaining the output z 'after normalization treatment by utilizing the formula (4)' L And extracting the classification mark x class Corresponding D-dimensional features
Figure BDA0003593395680000041
z′ L =LN(z L ) (4)
In formula (4), LN (·) represents the processing of the normalization layer;
step 2.7, performing linear transformation on the characteristics by using the formula (5) to obtain an output result pos of the linear classifier pred
Figure BDA0003593395680000042
In formula (5), Linear (·) represents a Linear classification function;
Figure BDA0003593395680000043
c represents negative/positive;
2.8, constructing a cross entropy loss function L by using the formula (6), and training a first-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting whether an image block is positive or negative is obtained;
Figure BDA0003593395680000044
in the formula (6), y label And N is the total number of the image blocks.
And (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out the negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:
sorting lung cancer histopathology full-slice images according to EGFR mutation state class labels, removing blank background areas of the full-slice images, partitioning, randomly sampling to obtain a plurality of image blocks, sending the image blocks into a trained visual converter network model capable of predicting the positivity and negativity of the image blocks, predicting the positivity and negativity class of each image block, screening out negative image blocks to obtain n positive image blocks, generating an EGFR mutation type data set, and recording the EGFR mutation type data set as an EGFR mutation type data set
Figure BDA0003593395680000045
Wherein,
Figure BDA0003593395680000046
representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' i Denotes image block X' i The corresponding class, the EGFR mutation class label in the dataset; 1,2, …, n; n denotes the number of image blocks.
Establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:
constructing a visual transformer ViT of S encoders as a second stage network, each encoder comprising: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 4.1, image block X' i Performing block processing to obtain a sequence containing m' image blocks
Figure BDA0003593395680000051
Wherein,
Figure BDA0003593395680000052
denotes an image block X' i The jth image block of (1);
Figure BDA0003593395680000053
p × p denotes an image block X' i Width and height of each image block after blocking processing, and m ═ P 2 /p 2
Step 4.2, setting a learnable classification mark x' class M image blocks and a classification mark x 'are obtained by using the formula (7)' class D-dimensional embedded representation z s0 As input to the 1 st encoder;
Figure BDA0003593395680000054
in formula (7), E' pos Representing m 'image blocks and class labels x' class In image block X' i (iii) a spatial position of; e' represents the set embedding matrix;
step 4.3, m ' image blocks and classification marks x ' are obtained by utilizing the formula (8) ' class Output z 'of multi-head attention device layer at s-th encoder' l
z′ s =MSA(LN(z s-1 ))+z s-1 ,s=1,…,S (8)
In formula (8), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of s-1 Represents the output of the s-1 th encoder;
step 4.4, obtaining the output z of the multi-layer perceptron of the s encoder by using the formula (9) s
z s =MLP(LN(z′ s ))+z′ s ,s=1,…,S (9)
In formula (9), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;
step 4.5, output z of the multi-layer perceptron of the s encoder s The multi-head attention-making layer sent to the (s + 1) th encoder obtains an output z' s+1 Z 'is further prepared' s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z s+1 Repeating the step 4.5 for a plurality of times until the S encoder,obtaining the output z of the S encoder S
Step 4.6, obtaining the normalized output z 'by utilizing the formula (10)' S And extracting a classification mark x' class Corresponding D-dimensional features
Figure BDA0003593395680000055
z′ S =LN(z S ) (10)
In formula (10), LN (·) represents the processing of the normalization layer;
step 4.7, performing linear transformation on the characteristics by using the formula (11) to obtain an output result egfr of the linear classifier pred
Figure BDA0003593395680000061
In formula (11), Linear (·) represents a Linear classification function;
Figure BDA0003593395680000062
c represents the number of EGFR state classes;
step 4.8, constructing a cross entropy loss function L by using the formula (12), and training a second-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting the EGFR mutation type of the image block is obtained:
Figure BDA0003593395680000063
in the formula (12), y label Is the EGFR state pseudo label of the image block, and N is the total number of the image blocks.
And (5) completing prediction of the full-slice EGFR state by using the visual converter network model capable of predicting the negativity and positivity of the image block and the visual converter network model capable of predicting the EGFR mutation type of the image block, which are established in the steps (2) and (4), and specifically comprising the following steps:
step 5.1, removing blank background areas from the lung cancer histopathology full-section images and carrying out blocking processing to obtain a plurality of image blocks, and recording the image blocks as sequences (x) 1 ,x 2 ,…,x j ,…,x m );
Step 5.2, image block (x) 1 ,x 2 ,…,x j ,…,x m ) Sending the image blocks into the visual converter network model capable of predicting the negative and positive of the image blocks, predicting the negative and positive types of the image blocks, and screening out the negative image blocks to obtain a positive image block sequence (x) 1 ,x 2 ,…,x j ,…,x n ) (ii) a Setting a positive and negative classification threshold t, and calculating the proportion t of the positive image block according to the formula (13) pos Comparing the classification threshold t with the positive image block ratio t pos Determining the positive and negative classification of the whole section;
Figure BDA0003593395680000064
step 5.3, carrying out next prediction on the total section which is predicted to be positive in the step 5.2; all-slice positive image block (x) 1 ,x 2 ,…,x j ,…,x n ) Inputting the visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states;
Figure BDA0003593395680000071
the invention has the advantages that:
1. according to the lung cancer histopathology full-section image feature learning method, the vision converter is used for carrying out feature learning on the lung cancer histopathology full-section image, the vision converter can carry out dynamic self-adaptive modeling, local and global features of the image are captured based on an attention mechanism, and feature representation capability of the lung cancer histopathology full-section image is improved;
2. the invention utilizes the vision converter to learn the remote dependence relationship in the image, thereby establishing the dependence relationship among all parts of the lung cancer histopathology full-section image and further improving the EGFR state prediction accuracy.
3. According to the invention, two vision converters ViT are used to form a backbone network, the first vision converter completes classification of positive and negative of lung cancer histopathology full-section images and extracts positive image blocks, and the second vision converter only performs EGFR type classification on the positive image blocks, so that the error rate of pseudo labels is effectively reduced, and the prediction accuracy is improved.
Drawings
FIG. 1 is a block diagram of a network in accordance with the present invention;
fig. 2 is a general flow diagram of the present invention.
Detailed Description
In this embodiment, a converter-based method for predicting EGFR states of lung cancer histopathology full-section images comprehensively considers the difficulty of EGFR state classification tasks, so that images are firstly input into a first visual converter network to predict positive and negative classifications of full-section images, and positive image blocks are input into a second visual converter network to predict EGFR states of the image blocks, thereby completing classification of EGFR states of lung cancer histopathology full-section images, as shown in fig. 1 and 2, the method specifically includes the following steps:
the method comprises the following steps of (1) acquiring a lung cancer histopathology full-section data set according to a lung cancer histopathology full-section image and preprocessing the lung cancer histopathology full-section data set, wherein the lung cancer histopathology full-section data set specifically comprises the following steps:
the lung cancer histopathology full-slice images are sorted according to negative and positive labels, the full-slice images are subjected to blank background area removal and blocking treatment, a plurality of image blocks are obtained through random sampling and are marked as
Figure BDA0003593395680000072
Wherein,
Figure BDA0003593395680000081
is shown asThe method comprises the following steps that i image blocks, C represents the number of channels of the image blocks, and P multiplied by P represents the width and the height of each image block; y is i Representing the ith image block X i Assigning the positive and negative labels of the full-slice to the image blocks as the pseudo labels thereof according to the corresponding categories, thereby obtaining each image block and the positive and negative classification thereof; 1,2, …, N; n represents the number of image blocks; the data EGFR status used in this example contains 2 categories of negative and positive; the data set comprises 100 full slices, and 500 image blocks are randomly sampled on each full slice, so that N is 500, and each image block size is 256 × 256, so that C is 3, and P is 256; 80% of each class in the dataset was used for training and the remaining 20% was used for testing.
Step (2), establishing and utilizing the data set in the step (1) to train a visual converter network model capable of predicting the positivity and negativity of the image block, which specifically comprises the following steps:
a deep learning network model based on visual converters as shown in fig. 1 is established, and the deep learning network comprises 2 visual converter ViT networks. A visual transformer ViT composed of L encoders is constructed as a first stage network, each encoder including: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 2.1, for image block X i Performing block division to obtain a sequence containing m image blocks
Figure BDA0003593395680000082
Wherein,
Figure BDA0003593395680000083
representing image blocks X i The jth image block of (1);
Figure BDA0003593395680000084
representing image blocks X i Width and height of each image block after blocking, and m is P 2 /p 2 (ii) a In the present embodiment, each image block size is 16 × 16, so p is 16, and m is 196.
Step 2.2, setting a learnable classification mark x class And obtaining m image block sums by using the formula (1)Classification tag x class D-dimensional embedded representation z l0 As input to the 1 st encoder;
Figure BDA0003593395680000085
in the formula (1), E pos Representing m image blocks and a class mark x class In image block X i (iii) a spatial position of; e represents the set embedding matrix; in this example, D is 768, x class Is a 768-dimensional vector formed by 768 random numbers, E is a matrix formed by 768 × 768 random numbers, the number of rows of the matrix is 768, the number of columns is 768, and E is pos The random number matrix is a matrix formed by 197 x 768 random numbers, the number of rows of the matrix is 197, and the number of columns of the matrix is 768.
Step 2.3, obtaining m image blocks and classification marks x by using the formula (2) class Output z 'of multi-head attention device layer at l-th encoder' l
z' l =MSA(LN(z l-1 ))+z l-1 ,l=1,…,L (2)
In formula (2), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of l-1 Represents the output of the l-1 st encoder;
step 2.4, obtaining the output z of the multi-layer perceptron of the first encoder by using the formula (3) l
z l =MLP(LN(z′ l ))+z′ l ,l=1,…,L (3)
In formula (3), MLP (·) represents the processing of the multi-layered sensor, which in this embodiment includes two layers of networks and a GELU nonlinear activation layer; LN (-) represents the processing of the normalization layer;
step 2.5, output z of the multi-layer perceptron of the first encoder l Multiple attention layers fed to the l +1 st encoder gave output z' l+1 Z 'is further prepared' l+1 The multi-layer sensor which is sent to the (l + 1) th encoder obtains an output z l+1 Repeating the step 2.5 for multiple times until the L encoder is obtainedOutput z L
Step 2.6, obtaining the output z 'after normalization treatment by utilizing the formula (4)' L And extracting the classification mark x class Corresponding D-dimensional features
Figure BDA0003593395680000091
z′ L =LN(z L ) (4)
In formula (4), LN (·) represents the processing of the normalization layer;
step 2.7, performing linear transformation on the characteristics by using the formula (5) to obtain an output result pos of the linear classifier pred
Figure BDA0003593395680000094
In formula (5), Linear (·) represents a Linear classification function;
Figure BDA0003593395680000092
c represents negative/positive;
step 2.8, constructing a cross entropy loss function L by using the formula (6), and training a first-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting whether an image block is positive or negative is obtained:
Figure BDA0003593395680000093
in the formula (6), y label And N is the total number of the image blocks.
And (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:
for lung cancer tissue diseaseSorting the full-slice images according to the EGFR mutation state class labels, removing blank background areas of the full-slice images, partitioning the full-slice images, randomly sampling to obtain a plurality of image blocks, sending the image blocks into the vision converter network model which is trained in the step 2 and can predict the positivity and negativity of the image blocks, predicting the positivity and negativity class of each image block, screening out the negative image blocks to obtain n positive image blocks, generating an EGFR mutation type data set, and recording the EGFR mutation type data set as
Figure BDA0003593395680000101
Wherein,
Figure BDA0003593395680000102
representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' i Denotes image block X' i The corresponding class, i.e., EGFR mutation class label in the data set; 1,2, …, n; n represents the number of image blocks; the data EGFR mutation status used in this example contains 3 categories of Mutant, Wild and Other; the data set comprises 100 full slices, 500 image blocks are randomly sampled on each full slice, and each image block size is 256 × 256, so that C is 3 and P is 256; 80% of each class in the dataset was used for training and the remaining 20% was used for testing.
Step (4), establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:
constructing a visual transformer ViT of S encoders as a second stage network, each encoder comprising: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 4.1 of image block X' i Performing block processing to obtain a sequence containing m' image blocks
Figure BDA0003593395680000103
Wherein,
Figure BDA0003593395680000104
denotes an image block X' i J (th) image ofA block;
Figure BDA0003593395680000105
p X p denotes image block X i Width and height of each image block after blocking processing, and m ═ P 2 /p 2 (ii) a In the present embodiment, each image block size is 16 × 16, so p is 16, and m' is 196.
Step 4.2, setting a learnable classification mark x' class M image blocks and a classification mark x 'are obtained by using the formula (7)' class D-dimensional embedded representation z s0 As input to the 1 st encoder;
Figure BDA0003593395680000106
in formula (7), E' pos Representing m 'image blocks and class labels x' class In image block X' i (iii) a spatial position of; e' represents the set embedding matrix; in this example, D-768, x' class Is a 768-dimensional vector formed by 768 random numbers, E 'is a matrix formed by 768 × 768 random numbers, the number of rows of the matrix is 768, the number of columns is 768, E' pos The random number generator is a matrix formed by 197 × 768 random numbers, the number of rows of the matrix is 197, and the number of columns of the matrix is 768.
Step 4.3, m ' image blocks and classification marks x ' are obtained by utilizing the formula (8) ' class Output z 'of multi-head attention device layer at s-th encoder' l
z′ s =MSA(LN(z s-1 ))+z s-1 ,s=1,…,S (8)
In formula (8), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of s-1 Represents the output of the s-1 th encoder;
step 4.4, obtaining the output z of the multi-layer perceptron of the s encoder by using the formula (9) s
z s =MLP(LN(z′ s ))+z′ s ,s=1,…,S (9)
In equation (9), MLP (·) represents the processing of the multi-layered sensor, which in this embodiment comprises two layers of networks and a GELU nonlinear activation layer; LN (·) denotes the processing of the normalization layer;
step 4.5, output z of the multi-layer perceptron of the s encoder s The multi-head attention-making layer sent to the (s + 1) th encoder obtains an output z' s+1 Z 'is further prepared' s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z s+1 Repeating the step 4.5 times until the S encoder, and obtaining the output z of the S encoder S
Step 4.6, obtaining the normalized output z 'by utilizing the formula (10)' S And extracting a classification mark x' class Corresponding D-dimensional features
Figure BDA0003593395680000111
z′ S =LN(z S ) (10)
In formula (10), LN (·) represents the processing of the normalization layer;
step 4.7, performing linear transformation on the characteristics by using the formula (11) to obtain an output result egfr of the linear classifier pred
Figure BDA0003593395680000112
In formula (11), Linear (·) represents a Linear classification function;
Figure BDA0003593395680000113
c represents the number of EGFR state classes;
step 4.8, constructing a cross entropy loss function L by using the formula (12), and training a second-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so as to obtain a trained visual converter network model capable of predicting the EGFR mutation type of the image block:
Figure BDA0003593395680000121
in the formula (12), y label Is the EGFR state pseudo label of the image block, and N is the total number of the image blocks.
And (5) completing prediction of the full-slice EGFR state by using the visual converter network model capable of predicting the negativity and positivity of the image block and the visual converter network model capable of predicting the EGFR mutation type of the image block, which are established in the steps (2) and (4), and specifically comprising the following steps:
step 5.1, removing blank background areas from the lung cancer histopathology full-section images and carrying out blocking processing to obtain a plurality of image blocks, and recording the image blocks as sequences (x) 1 ,x 2 ,…,x j ,…,x m );
Step 5.2, image block (x) 1 ,x 2 ,…,x j ,…,x m ) Sending into a visual converter network model capable of predicting the positive and negative of the image block, predicting the positive and negative categories of the image block, and screening out the negative image blocks to obtain a positive image block sequence (x) 1 ,x 2 ,…,x j ,…,x n ) (ii) a Setting a positive and negative classification threshold t, and calculating the positive image block ratio t according to the formula (13) pos Comparing the classification threshold t with the positive image block ratio t pos Determining the positive and negative classification of the whole section;
Figure BDA0003593395680000122
step 5.3, carrying out next prediction on the total section which is predicted to be positive in the step 5.2; all-slice positive image block (x) 1 ,x 2 ,…,x j ,…,x n ) Inputting a visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states; in this embodiment, K is 3.
Figure BDA0003593395680000123

Claims (6)

1. A lung cancer histopathology full-section EGFR state prediction method based on a converter is characterized by comprising the following steps: the method specifically comprises the following steps:
(1) acquiring a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image and preprocessing the lung cancer histopathology full-section data set;
(2) establishing and utilizing the data set in the step (1) to train a visual converter network model capable of predicting the positivity and negativity of the image block;
(3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks;
(4) establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3);
(5) and (4) utilizing the visual converter network model which is established in the step (2) and the step (4) and can predict the positivity and negativity of the image block and the visual converter network model which can predict the EGFR mutation type of the image block to complete the prediction of the full-slice EGFR state.
2. The method of claim 1, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: acquiring and preprocessing a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image in the step (1), wherein the method specifically comprises the following steps:
the lung cancer histopathology full-slice images are sorted according to negative and positive labels, the full-slice images are subjected to blank background area removal and blocking treatment, a plurality of image blocks are obtained through random sampling and are marked as
Figure FDA0003593395670000011
Wherein,
Figure FDA0003593395670000012
representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y is i Representing the ith image block X i Assigning the positive and negative labels of the full-slice to the image blocks as the pseudo labels thereof according to the corresponding categories, thereby obtaining each image block and the positive and negative classification thereof; 1,2, …, N; n denotes the number of image blocks.
3. The method of claim 2, wherein the EGFR state prediction method for the lung cancer histopathology full-section based on the converter is as follows: establishing and training a visual converter network model capable of predicting the positivity and negativity of an image block by using the data set in the step (1), wherein the method specifically comprises the following steps:
a visual transformer ViT composed of L encoders is constructed as a first stage network, each encoder including: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 2.1, for image block X i Performing block processing to obtain a sequence containing m image blocks
Figure FDA0003593395670000021
Wherein,
Figure FDA0003593395670000022
representing image blocks X i The jth image block of (1);
Figure FDA0003593395670000023
representing image blocks X i Width and height of each image block after blocking, and m is P 2 /p 2
Step 2.2, setting a learnable classification mark x class And obtaining m image blocks and a classification mark x by using the formula (1) class D-dimensional embedded representation z l0 As input to the 1 st encoder;
Figure FDA0003593395670000024
in the formula (1), E pos Representing m image blocks and a class mark x class In image block X i (iii) a spatial position of; e represents the set embedding matrix;
step 2.3, obtaining m image blocks and classification marks x by using the formula (2) class Output z 'of multi-head attention device layer at l-th encoder' l
z' l =MSA(LN(z l-1 ))+z l-1 ,l=1,…,L (2)
In formula (2), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of l-1 Represents the output of the l-1 st encoder;
step 2.4, obtaining the output z of the multi-layer perceptron of the first encoder by using the formula (3) l
z l =MLP(LN(z′ l ))+z′ l ,l=1,…,L (3)
In formula (3), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;
step 2.5, output z of the multi-layer perceptron of the first encoder l Multiple attention layers fed to the l +1 st encoder gave output z' l+1 Z 'is further prepared' l+1 The multi-layer sensor which is sent to the (l + 1) th encoder obtains an output z l+1 Repeating the step 2.5 times until the L encoder, and obtaining the output z of the L encoder L
Step 2.6, obtaining the output z 'after normalization treatment by utilizing the formula (4)' L And extracting the classification mark x class Corresponding D-dimensional features
Figure FDA0003593395670000025
z′ L =LN(z L ) (4)
In formula (4), LN (·) represents the processing of the normalization layer;
step 2.7, performing linear transformation on the characteristics by using the formula (5) to obtain an output result pos of the linear classifier pred
Figure FDA0003593395670000031
In formula (5), Linear (·) represents a Linear classification function;
Figure FDA0003593395670000032
c represents negative/positive;
2.8, constructing a cross entropy loss function L by using the formula (6), and training a first-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting whether an image block is positive or negative is obtained;
Figure FDA0003593395670000033
in the formula (6), y label And N is the total number of the image blocks.
4. The method of claim 3, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: and (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out the negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:
the lung cancer histopathology full-slice images are sorted according to EGFR mutation state class labels, blank background areas of the full-slice images are removed, blocking processing is carried out, a plurality of image blocks are obtained through random sampling, the image blocks are sent to a trained visual converter network model capable of predicting the positivity and negativity of the image blocks, and prediction is carried outScreening out negative and positive image blocks to obtain n positive image blocks, generating EGFR mutation type data set, and recording as
Figure FDA0003593395670000034
Wherein,
Figure FDA0003593395670000035
representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' i Denotes image block X' i The corresponding class, i.e., EGFR mutation class label in the data set; 1,2, …, n; n denotes the number of image blocks.
5. The method of claim 4 for predicting the EGFR state of the lung cancer histopathology whole section based on the converter, wherein the EGFR state prediction method comprises the following steps: establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:
constructing a visual transformer ViT of S encoders as a second stage network, each encoder comprising: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;
step 4.1 of image block X' i Performing block processing to obtain a sequence containing m' image blocks
Figure FDA0003593395670000041
Wherein,
Figure FDA0003593395670000044
denotes image block X' i The jth image block of (1);
Figure FDA0003593395670000042
p × p denotes image block X' i Width and height of each image block after blocking processing, and m ═ P 2 /p 2
Step 4.2, setting oneLearnable classification mark x' class M image blocks and a classification mark x 'are obtained by using the formula (7)' class D-dimensional embedded representation z s0 As input to the 1 st encoder;
Figure FDA0003593395670000043
in formula (7), E' pos Representing m 'image blocks and class labels x' class In image block X' i (iii) a spatial position of; e' represents the set embedding matrix;
step 4.3, m ' image blocks and classification marks x ' are obtained by utilizing the formula (8) ' class Output z 'of multi-head attention device layer at s-th encoder' l
z′ s =MSA(LN(z s-1 ))+z s-1 ,s=1,…,S (8)
In equation (8), MSA (. cndot.) represents the processing of the multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of s-1 Represents the output of the s-1 th encoder;
step 4.4, obtaining the output z of the multi-layer perceptron of the s encoder by using the formula (9) s
z s =MLP(LN(z′ s ))+z′ s ,s=1,…,S (9)
In formula (9), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;
step 4.5, output z of the multi-layer perceptron of the s encoder s The multi-head attention mechanism layer sent to the s +1 th encoder obtains output z' s+1 Z 'is further prepared' s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z s+1 Repeating the step 4.5 times until the S encoder, and obtaining the output z of the S encoder S
Step 4.6, obtaining the normalized output z 'by utilizing the formula (10)' S And extracting a classification mark x' class Corresponding D-dimensional features
Figure FDA0003593395670000045
z′ S =LN(z S ) (10)
In formula (10), LN (·) represents the processing of the normalization layer;
step 4.7, performing linear transformation on the characteristics by using the formula (11) to obtain an output result egfr of the linear classifier pred
Figure FDA0003593395670000051
In formula (11), Linear (·) represents a Linear classification function;
Figure FDA0003593395670000052
representing the number of EGFR state classes;
step 4.8, constructing a cross entropy loss function L by using the formula (12), and training a second-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so as to obtain a trained visual converter network model capable of predicting the EGFR mutation type of the image block:
Figure FDA0003593395670000053
in the formula (12), y label Is the EGFR state pseudo label of the image block, and N is the total number of the image blocks.
6. The method of claim 5, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: and (5) completing prediction of the full-slice EGFR state by using the visual converter network model capable of predicting the negativity and positivity of the image block and the visual converter network model capable of predicting the EGFR mutation type of the image block, which are established in the steps (2) and (4), and specifically comprising the following steps:
step 5.1, removing blank from the lung cancer histopathology full-section imageCarrying out block processing on the white background area to obtain a plurality of image blocks, and recording the image blocks as sequences (x) 1 ,x 2 ,…,x j ,…,x m );
Step 5.2, image block (x) 1 ,x 2 ,…,x j ,…,x m ) Sending the image blocks into the visual converter network model capable of predicting the negative and positive of the image blocks, predicting the negative and positive types of the image blocks, and screening out the negative image blocks to obtain a positive image block sequence (x) 1 ,x 2 ,…,x j ,…,x n ) (ii) a Setting a positive and negative classification threshold t, and calculating the positive image block ratio t according to the formula (13) pos Comparing the classification threshold t with the positive image block ratio t pos Determining the positive and negative classification of the whole section;
Figure FDA0003593395670000054
step 5.3, carrying out next prediction on the whole slice which is predicted to be positive in the step 5.2; all-slice positive image block (x) 1 ,x 2 ,…,x j ,…,x n ) Inputting the visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states;
Figure FDA0003593395670000061
CN202210385274.1A 2022-04-13 2022-04-13 Converter-based lung cancer histopathological full-section EGFR state prediction method Active CN114820481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210385274.1A CN114820481B (en) 2022-04-13 2022-04-13 Converter-based lung cancer histopathological full-section EGFR state prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210385274.1A CN114820481B (en) 2022-04-13 2022-04-13 Converter-based lung cancer histopathological full-section EGFR state prediction method

Publications (2)

Publication Number Publication Date
CN114820481A true CN114820481A (en) 2022-07-29
CN114820481B CN114820481B (en) 2024-09-06

Family

ID=82536201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210385274.1A Active CN114820481B (en) 2022-04-13 2022-04-13 Converter-based lung cancer histopathological full-section EGFR state prediction method

Country Status (1)

Country Link
CN (1) CN114820481B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071318A (en) * 2023-01-10 2023-05-05 四川文理学院 Image screening method and system
CN117408997A (en) * 2023-12-13 2024-01-16 安徽省立医院(中国科学技术大学附属第一医院) Auxiliary detection system for EGFR gene mutation in non-small cell lung cancer histological image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111699510A (en) * 2018-02-12 2020-09-22 豪夫迈·罗氏有限公司 Transformation of digital pathology images
CN113469119A (en) * 2021-07-20 2021-10-01 合肥工业大学 Cervical cell image classification method based on visual converter and graph convolution network
US20210390686A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Unsupervised content-preserved domain adaptation method for multiple ct lung texture recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111699510A (en) * 2018-02-12 2020-09-22 豪夫迈·罗氏有限公司 Transformation of digital pathology images
US20210390686A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Unsupervised content-preserved domain adaptation method for multiple ct lung texture recognition
CN113469119A (en) * 2021-07-20 2021-10-01 合肥工业大学 Cervical cell image classification method based on visual converter and graph convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟婷;刘宇航;张凯昱;: "一种基于增强卷积神经网络的病理图像诊断算法", 激光与光电子学进展, no. 08, 30 November 2018 (2018-11-30) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071318A (en) * 2023-01-10 2023-05-05 四川文理学院 Image screening method and system
CN116071318B (en) * 2023-01-10 2024-01-16 四川文理学院 Image screening method and system
CN117408997A (en) * 2023-12-13 2024-01-16 安徽省立医院(中国科学技术大学附属第一医院) Auxiliary detection system for EGFR gene mutation in non-small cell lung cancer histological image
CN117408997B (en) * 2023-12-13 2024-03-08 安徽省立医院(中国科学技术大学附属第一医院) Auxiliary detection system for EGFR gene mutation in non-small cell lung cancer histological image

Also Published As

Publication number Publication date
CN114820481B (en) 2024-09-06

Similar Documents

Publication Publication Date Title
CN105825511B (en) A kind of picture background clarity detection method based on deep learning
CN113469119B (en) Cervical cell image classification method based on visual converter and image convolution network
US8965116B2 (en) Computer-aided assignment of ratings to digital samples of a manufactured web product
Colak et al. Automated McIntosh-based classification of sunspot groups using MDI images
CN114820481B (en) Converter-based lung cancer histopathological full-section EGFR state prediction method
CN110135459B (en) Zero sample classification method based on double-triple depth measurement learning network
CN113378792B (en) Weak supervision cervical cell image analysis method fusing global and local information
Chakraborty et al. Detection of skin disease using metaheuristic supported artificial neural networks
CN113256636B (en) Bottom-up parasite species development stage and image pixel classification method
CN114782753B (en) Lung cancer tissue pathology full-section classification method based on weak supervision learning and converter
CN111145145B (en) Image surface defect detection method based on MobileNet
CN112687374B (en) Psychological crisis early warning method based on text and image information joint calculation
CN112529234A (en) Surface water quality prediction method based on deep learning
CN109978870A (en) Method and apparatus for output information
CN110728666A (en) Typing method and system for chronic nasosinusitis based on digital pathological slide
CN114863179B (en) Endoscope image classification method based on multi-scale feature embedding and cross attention
Zhang Application of artificial intelligence recognition technology in digital image processing
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
CN115757919A (en) Symmetric deep network and dynamic multi-interaction human resource post recommendation method
Chen et al. A MULTI-EXPERT ANNOTATED FUNDUS COMPUTER VISION IMAGE SEGMENTATION MODEL USING MULTI-VIEW INFORMATION BOTTLENECK THEORY
CN115880277A (en) Stomach cancer pathology total section T stage classification prediction method based on Swin transducer and weak supervision
CN108846327B (en) Intelligent system and method for distinguishing pigmented nevus and melanoma
US20210383525A1 (en) Image and data analystics model compatibility regulation methods
Weckman et al. KNOWLEDGE EXTRACTION FROM THE NEURAL ‘BLACK BOX'IN ECOLOGICAL MONITORING
CN115223001A (en) Medical image identification method based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant