CN113888466A - Pulmonary nodule image detection method and system based on CT image - Google Patents

Pulmonary nodule image detection method and system based on CT image Download PDF

Info

Publication number
CN113888466A
CN113888466A CN202111030746.3A CN202111030746A CN113888466A CN 113888466 A CN113888466 A CN 113888466A CN 202111030746 A CN202111030746 A CN 202111030746A CN 113888466 A CN113888466 A CN 113888466A
Authority
CN
China
Prior art keywords
image
patch
cnn
transformer
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111030746.3A
Other languages
Chinese (zh)
Inventor
李波
徐麒皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Wuhan University of Science and Technology WHUST
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN202111030746.3A priority Critical patent/CN113888466A/en
Publication of CN113888466A publication Critical patent/CN113888466A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • G06T2207/30064Lung nodule

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a lung nodule image detection method and system based on a CT image, wherein the detection method comprises the following steps: s1, image serialization: performing labeling by reshaping the slices of the input lung CT image into a set of patch sequences; s2, utilizing patch embedding, and mapping the vectorization patch sequence to a potential two-dimensional embedding space by using trainable linear mapping; s3, establishing a CNN and Transformer mixed encoder: coding the marked image block from the CNN characteristic map into an input sequence for extracting the global context through a Transformer; s4, cascade decoder: firstly, the coding features obtained in the step S3 are up-sampled through a decoder, then the up-sampled coding features are combined with a high-resolution CNN feature map to achieve accurate positioning, and finally, the U-Net is utilized to recover local spatial information to enhance more accurate detail detection information. The method can effectively improve the accuracy of pulmonary nodule detection.

Description

Pulmonary nodule image detection method and system based on CT image
Technical Field
The invention relates to the technical field of image processing, in particular to a lung nodule image detection method and system based on a CT image.
Background
The lung cancer is the cancer with the highest cause of death in the world, and the lung nodules are used as early-stage expression forms of the lung cancer, can be observed on a CT image as quasi-circular lung shadows with the diameter not more than 3cm, and can help doctors to realize diagnosis of benign and malignant lung nodules by accurately detecting the outlines of the lung nodules. Since the lung nodules are minute in size and have features such as morphology and brightness similar to those of tissues such as blood vessels in the lung parenchyma, it is difficult to separate them by visual observation alone, and thus they are likely to cause serious interference in judgment by a doctor. In order to reduce the workload of doctors and improve the efficiency of nodule diagnosis, computer-aided diagnosis techniques have been used in clinical work.
Deep learning currently achieves excellent application effects in the field of computer vision. The U-Net architecture has become a de facto standard in various medical image segmentation tasks and has enjoyed great success. However, due to the inherent locality of convolution operations, U-Net typically exhibits limitations in explicitly modeling remote dependencies. The design of transformers for sequence-to-sequence prediction has become an alternative architecture with a congenital global self-attack mechanism, but may result in limited localization capabilities due to insufficient details of its low-level features.
Disclosure of Invention
Aiming at the problem that the conventional method for encoding a marked image block by using only a Transformer and then directly upsampling a hidden feature representation into a dense output with complete resolution cannot produce a satisfactory result, which generally shows large difference in texture, shape and size among patients, the invention provides a lung nodule image detection method and system based on a CT image, wherein the detection method uses a Transformer Unet combined framework, and proposes a self-attention mechanism based on CNN features on the basis of the conventional research, and different from the conventional CNN-based method, the Transformer Unet establishes the self-attention mechanism from the perspective of sequence to sequence prediction. In order to compensate for the loss of feature resolution caused by the Transformer, the network adopts a mixed structure of CNN and Transformer to utilize detailed high-resolution spatial information from CNN features and Transformer-encoded global context information. Inspired by the design of the U-shaped architecture, self-attention features encoded by the transform are then upsampled to combine with different high resolution CNN features that jump from the encoding path to achieve accurate positioning.
The invention relates to a pulmonary nodule image detection method and system based on a CT image, which adopts a transformerUnet combined framework, wherein the transformerUnet combined framework is a neural network framework (Transformer) based on an attention coding technology in deep learning and a biomedical semantic segmentation network framework (U-Net) based on a full convolution network technology, a substitute system framework of the Transformer with an innate global self-entry mechanism for sequence-to-sequence prediction is designed, and the problem of low positioning accuracy caused by insufficient low-level features can be solved by adding a medical image segmentation model. Different from the existing manually designed pulmonary nodule detection model, the detection framework provided by the invention consists of two parts: a Transformer part and a U-Net part.
Interpretation of terms:
1. transformer: attention is directed to neural network architectures for coding techniques.
2. CNN: convolutional Neural Networks.
3. U-Net: a biomedical semantic segmentation network architecture belongs to a full convolution neural network.
4. Batch: the feature detector in the convolutional neural network divides the input image into a plurality of patches, and the patch is called patch.
5. And (4) CUP: cascaded Upsampler, a Cascaded decoder that upsamples larger pictures with a less computationally intensive decoder to increase decoding speed.
6. MSA: Multi-Head Self Attention, while understanding the input sequence from different angles, and computing multiple attentions.
7. MLP: the multi layer perceiver, also called artificial neural network, has a structure with a plurality of hidden layers in the middle besides an input and output layer.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
the invention discloses a lung nodule image detection method based on a CT image, which adopts a Transformer Unet combined framework to detect the lung nodule image, wherein the Transformer Unet combined framework comprises a Transformer part and a U-Net part, and the detection method comprises the following steps:
s1, image serialization: performing labeling by reshaping the slices of the input lung CT image into a set of patch sequences;
s2, utilizing patch embedding, and mapping the vectorization patch sequence to a potential two-dimensional embedding space by using trainable linear mapping;
s3, establishing a CNN and Transformer mixed encoder: coding the marked image block from the CNN characteristic map into an input sequence for extracting the global context through a Transformer;
s4, cascade decoder: firstly, the coding features obtained in the step S3 are up-sampled through a decoder, then the up-sampled coding features are combined with a high-resolution CNN feature map to achieve accurate positioning, and finally, the U-Net is utilized to recover local spatial information to enhance more accurate detail detection information.
Further, in step S1, let the lung CT image be
Figure BDA0003245131070000031
H × W is the spatial resolution, and C is the number of channels.
Further, step S1 specifically includes:
tokenization is performed by remodeling the input lung CT image x into a set of patch sequences
Figure BDA0003245131070000032
Where p is the sequence size, so the size of each patch is p × p, the number of each image patch
Figure BDA0003245131070000033
I.e. the input sequence length.
Further, step S2 specifically includes:
s21, in order to encode the patch sequence space information, a specific position code added to the patch sequence embedding is learned to retain the position information, as shown in the following equation:
Figure BDA0003245131070000034
wherein the content of the first and second substances,
Figure BDA0003245131070000035
is a patch embedded map that is embedded in,
Figure BDA0003245131070000036
representing position embedding information, D is the dimension of the input patch;
s22, in order to recover the spatial order of the patch embedded, the size of the coding feature is first selected from
Figure BDA0003245131070000037
Become into
Figure BDA0003245131070000038
The channel size of the features is reduced to the number of feature classes using 1 × 1 convolution, and then the feature map is directly upsampled to full resolution H × W for predicting the final segmentation result.
Further, step S3 specifically includes:
the CNN and transform hybrid encoder is constructed by l-layer multi-headed self-attention and multi-layer perceptrons as the expressions shown in equations (2) and (3), so the output of the l-th layer can be written as follows:
Figure BDA0003245131070000041
Figure BDA0003245131070000042
where MSA denotes multi-head self-attention, MLP denotes multi-layer perceptron, LN (-) denotes the normalization operator of the image,
Figure BDA0003245131070000043
indicating the first layer of multi-headed attention output, zlRepresenting layer I codingDescription of the image.
Further, the method further includes compensating for information loss of a CNN and Transformer hybrid encoder, and specifically includes:
similar to U-Net, skip concatenation is used to fuse the multi-scale features from the hybrid encoder with the upsampled features, using CNN as a feature extractor to generate a feature map instead of inputting a1 × 1 patch extracted from the original image, thereby preserving more deep and shallow features to compensate for information loss.
Further, step S4 specifically includes:
the plurality of upsampling steps are used for decoding the hidden features to output a final segmentation mask map, specifically:
in the case of hidden features
Figure BDA0003245131070000044
Is reconstructed into
Figure BDA0003245131070000045
Then, a cascaded decoder is realized by cascading a plurality of upsampling blocks to achieve the following
Figure BDA0003245131070000046
To H × W full resolution, wherein cascading multiple upsampling blocks sequentially comprises two upsamples, a 3 × 3 convolutional layer and a ReLU layer;
and finally, the cascade decoder and the hybrid encoder form a U-shaped structure together, and the feature fusion is carried out by realizing the upsampling of feature maps with different levels of resolution ratios through jump connection.
The invention also discloses a lung nodule image detection system based on the CT image, which adopts a transformer Unet combined framework and specifically comprises the following steps:
an image serialization module for remodeling the slices of the input lung CT image into a set of patch sequences to perform labeling;
a patch embedding module to map the vectorized patch sequence to a potential two-dimensional embedding space using a trainable linear mapping;
a mixed encoder module of CNN and Transformer, which is used for encoding the marked image block from the CNN feature map into an input sequence for extracting the global context through the Transformer;
and the cascade decoder module is used for firstly up-sampling the coding characteristics obtained by the CNN and Transformer hybrid encoder module through the decoder, then combining the up-sampled coding characteristics with the high-resolution CNN characteristic diagram to realize accurate positioning, and finally utilizing U-Net to enhance more accurate detail detection information by recovering local spatial information.
The invention has the beneficial effects that:
1. the existing algorithm for detecting the pulmonary nodule needs a lot of time in the process of feature extraction. The traditional feature extraction algorithm needs a large amount of manual labeling, the features need a large amount of priori knowledge, the method for detecting and classifying the pulmonary nodules by using deep learning can effectively avoid subjective uncertainty of judgment of doctors, effectively relieve the workload of the doctors and simultaneously improve the accuracy rate of pulmonary nodule detection, and the deep learning model can automatically learn and extract the features suitable for the current task.
2. The effect of lung nodule detection by directly using a Transformer is not as good as that of U-Net or Attenttion, and the Transformer can well extract high-level semantic features, which is beneficial to a classification task, but lacks low-level features to segment a lung nodule image. Therefore, a Transformer Unet network formed by combining Transformer jump connection with a U-Net structure has strong learning capacity of high-level semantic features and bottom-level detail features, can effectively improve the accuracy of pulmonary nodule detection, and assists doctors in judgment.
Drawings
Fig. 1 is a lung CT image serial slice according to an embodiment of the present invention.
Fig. 2 is a schematic diagram illustrating a principle of a TransformerUnet binding architecture according to an embodiment of the present invention.
Detailed Description
In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.
Examples 1,
As shown in fig. 1 and fig. 2, the present embodiment discloses a lung nodule image detection method based on CT images, which performs lung nodule image detection by using a Transformer unet combination architecture, where the Transformer unet combination architecture includes a Transformer portion and a U-Net portion.
The lung nodule image detection method based on the CT image comprises the following steps:
step S1, image preprocessing, which is to perform image serialization by remodeling slices of the input CT image of the lung into a batch sequence to perform labeling.
Given a CT image of the lung as
Figure BDA0003245131070000061
H × W is the spatial resolution, and C is the number of channels. The goal is to predict a pixel label map of the corresponding size H W, unlike prior methods of training CNN (e.g., U-Net), encoding an image into a high-level feature representation, and then decoding it to full spatial resolution, by introducing the self-attribute mechanism into the encoder design using the Transformer, the image is first encoded into a high-level feature representation and then decoded to the original resolution size.
The pixel sizes and the thickness granularity of different scanning surfaces are different, so that the training task of the model is not facilitated, and the situation can be effectively avoided by adopting image serialization. The image serialization described in this embodiment specifically includes:
tokenization is performed by remodeling the input lung CT image x into a set of patch sequences
Figure BDA0003245131070000062
Where p is the sequence size and the unit of p is the pixels, so the size of each patch is p × p and the number of each image patch
Figure BDA0003245131070000063
I.e. the input sequence length.
Step S2, patch embedding: with patch embedding, a trainable linear mapping is used to map the vectorized patch sequence to a potential two-dimensional embedding space.
In this embodiment, step S2 specifically includes:
s21, in order to encode the patch sequence space information, a specific position code added to the patch sequence embedding is learned to retain the position information, as shown in the following equation:
Figure BDA0003245131070000064
wherein the content of the first and second substances,
Figure BDA0003245131070000065
is a patch embedded map that is embedded in,
Figure BDA0003245131070000066
representing position embedding information, D is the dimension of the input patch;
s22, in order to recover the spatial order of the patch embedded, the size of the coding feature is first selected from
Figure BDA0003245131070000067
Become into
Figure BDA0003245131070000068
The channel size of the features is reduced to the number of feature classes using 1 × 1 convolution, and then the feature map is directly upsampled to full resolution H × W for predicting the final segmentation result.
Step S3, establishing a CNN and Transformer hybrid encoder: the marked image blocks from the CNN feature map are encoded by the Transformer as an input sequence for extracting the global context.
In the mixed encoder of the CNN and the Transformer, different suspected lung nodule candidate sets are obtained after embedding according to patch. Due to the internal limitations of convolution operations (which still remain in terms of long distance relationships in the modeling), these architectures often yield poor performance, especially for patients exhibiting large differences in structure texture, shape, and size. To overcome this limitation, a self-entry mechanism is established based on the CNN features, which encodes the labeled image blocks from the CNN feature map into an input sequence that extracts the global context. Secondly, unlike previous CNN-based methods, the Transformer is not only powerful in global feature extraction, but also exhibits excellent transferability to downstream tasks under large-scale pre-training, as an alternative architecture, it completely employs distributed convolution operations, relying only on attention mechanism.
Specifically, step S3 specifically includes:
the CNN and transform hybrid encoder is constructed by l-layer multi-headed self-attention and multi-layer perceptrons as the expressions shown in equations (2) and (3), so the output of the l-th layer can be written as follows:
Figure BDA0003245131070000071
Figure BDA0003245131070000072
where MSA denotes multi-head self-attention, MLP denotes multi-layer perceptron, LN (-) denotes the normalization operator of the image,
Figure BDA0003245131070000073
indicating the first layer of multi-headed attention output, zlRepresenting a description of the l-th layer coded picture.
Because of the information loss of the CNN and Transformer hybrid encoder, this embodiment further includes compensation for the information loss of the CNN and Transformer hybrid encoder, and a hybrid CNN-Transformer architecture is used as an encoder and cascaded upsampling is performed to achieve accurate positioning. The method specifically comprises the following steps:
similar to U-Net, skip concatenation is used to fuse the multi-scale features from the hybrid encoder with the upsampled features, using CNN as a feature extractor to generate a feature map instead of inputting a1 × 1 patch extracted from the original image, thereby preserving more deep and shallow features to compensate for information loss.
Here shallow and deep features are concatenated together to reduce the loss of spatial information from down-sampling. Then a linear layer, the connecting feature size remains the same as the size of the upsampling feature.
Step S4, the concatenated decoder: firstly, the coding features obtained in the step S3 are up-sampled through a decoder, then the up-sampled coding features are combined with a high-resolution CNN feature map to achieve accurate positioning, and finally, more accurate detail detection information is enhanced by recovering local spatial information through U-Net, false positive of lung nodule detection is effectively reduced, and an accurate image is provided for an auxiliary diagnosis system.
In this embodiment, step S4 specifically includes:
the plurality of upsampling steps are used for decoding the hidden features to output a final segmentation mask map, specifically:
in the case of hidden features
Figure BDA0003245131070000081
Is reconstructed into
Figure BDA0003245131070000082
Then, a cascaded decoder is realized by cascading a plurality of upsampling blocks to achieve the following
Figure BDA0003245131070000083
To H × W full resolution, wherein cascading multiple upsampling blocks sequentially comprises two upsamples, a 3 × 3 convolutional layer and a ReLU layer;
and finally, the cascade decoder and the hybrid encoder form a U-shaped structure together, and the feature fusion is carried out by realizing the upsampling of feature maps with different levels of resolution ratios through jump connection.
The transformerUnet combined architecture provided by the invention is shown in FIG. 2, and establishes self-attention mechanism from the perspective of sequence-to-sequence prediction. To compensate for the loss of feature resolution caused by the transform, the Transformer uet employs a CNN-transform hybrid structure to exploit the high-resolution spatial information from CNN features and the transform-encoded global context information. Inspired by the U-Net design, the self-attribute feature of the transform coding is then upsampled, which combines with the different high resolution CNN features that hop the connection from the coding path to achieve accurate positioning. This design enables the overall network framework to retain the advantage of the Transformer and also benefits lung nodule image detection. Fig. 1 is a slice of a CT image acquired of a lung.
The network establishes a deep learning framework in a Python environment on the basis of an Nvidia RTX2080Ti GPU hardware platform under an Ubuntu16 operating system, and is trained by using a LUNA16 and a LIDC data set, and a large number of experiments prove the feasibility of transformer Unet model training and testing.
Data amplification, such as random rotation and flipping, was used for all experiments. For the Transformer encoder, only ViT with 12 Transformer layers is employed. For the hybrid encoder design, in combination with ResNet-50 and ViT, all transform architectures (i.e., ViT) and ResNet-50 are pre-trained on ImageNet, the resolution and patch size of the input image are set to 224 × 224 and 16, respectively, and four cascaded upsampled blocks need to be set in the CUP to achieve the original image resolution. The model was trained using an SGD optimizer with a learning rate of 0.01, momentum of 0.9, weight decay of 1e-4. The default batch size is 24, the default number of training iterations for the LUNA16 dataset is 20k, and the default number of training iterations for the LIDC dataset is 14 k.
The invention is characterized in that on one hand, a CNN architecture (U-Net) is utilized to provide a way for extracting low-level characteristic clues, and such fine spatial details can be well supplemented. And on the other hand, a Transformer network is adopted to encode the marked image blocks from the Convolutional Neural Network (CNN) feature map into an input sequence for extracting the global context under the U-Net framework. Finally, the decoder upsamples the encoded features, which are combined with the high resolution CNN feature map to achieve accurate positioning. With the combination of U-Net, Transformers can be used as a powerful encoder for lung nodule detection tasks by recovering local spatial information.
Examples 2,
The embodiment discloses a system of a lung nodule image detection method based on a CT image described in embodiment 1, which adopts a TransformerUnet combination architecture, and specifically includes:
an image serialization module for remodeling the slices of the input lung CT image into a set of patch sequences to perform labeling;
a patch embedding module to map the vectorized patch sequence to a potential two-dimensional embedding space using a trainable linear mapping;
a mixed encoder module of CNN and Transformer, which is used for encoding the marked image block from the CNN feature map into an input sequence for extracting the global context through the Transformer;
and the cascade decoder module is used for firstly up-sampling the coding characteristics obtained by the CNN and Transformer hybrid encoder module through the decoder, then combining the up-sampled coding characteristics with the high-resolution CNN characteristic diagram to realize accurate positioning, and finally utilizing U-Net to enhance more accurate detail detection information by recovering local spatial information.
The functions of the above modules correspond to those of embodiment 1, and are not described herein again.
The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.

Claims (8)

1. A lung nodule image detection method based on CT image is characterized in that lung nodule image detection is carried out by adopting a transformer Unet combined framework, and the detection method comprises the following steps:
s1, image serialization: performing labeling by reshaping the slices of the input lung CT image into a set of patch sequences;
s2, utilizing patch embedding, and mapping the vectorization patch sequence to a potential two-dimensional embedding space by using trainable linear mapping;
s3, establishing a CNN and Transformer mixed encoder: coding the marked image block from the CNN characteristic map into an input sequence for extracting the global context through a Transformer;
s4, cascade decoder: firstly, the coding features obtained in the step S3 are up-sampled through a decoder, then the up-sampled coding features are combined with a high-resolution CNN feature map to achieve accurate positioning, and finally, the U-Net is utilized to recover local spatial information to enhance more accurate detail detection information.
2. The method according to claim 1, wherein in step S1, the lung CT image is taken as
Figure FDA0003245131060000011
H × W is the spatial resolution, and C is the number of channels.
3. The method according to claim 2, wherein step S1 specifically includes:
tokenization is performed by remodeling the input lung CT image x into a set of patch sequences
Figure FDA0003245131060000012
Where p is the sequence size, so the size of each patch is p × p, the number of each image patch
Figure FDA0003245131060000013
I.e. the input sequence length.
4. The method according to claim 3, wherein step S2 specifically comprises:
s21, in order to encode the patch sequence space information, a specific position code added to the patch sequence embedding is learned to retain the position information, as shown in the following equation:
Figure FDA0003245131060000014
wherein the content of the first and second substances,
Figure FDA0003245131060000015
is a patch embedded map that is embedded in,
Figure FDA0003245131060000016
representing position embedding information, D is the dimension of the input patch;
s22, in order to recover the spatial order of the patch embedded, the size of the coding feature is first selected from
Figure FDA0003245131060000017
Become into
Figure FDA0003245131060000018
The channel size of the features is reduced to the number of feature classes using 1 × 1 convolution, and then the feature map is directly upsampled to full resolution H × W for predicting the final segmentation result.
5. The method according to claim 1, wherein step S3 specifically comprises:
the CNN and transform hybrid encoder is constructed by l-layer multi-headed self-attention and multi-layer perceptrons as the expressions shown in equations (2) and (3), so the output of the l-th layer can be written as follows:
Figure FDA0003245131060000021
Figure FDA0003245131060000022
where MSA denotes multi-head self-attention, MLP denotes multi-layer perceptron, LN (-) denotes the normalization operator of the image,
Figure FDA0003245131060000023
indicating the first layer of multi-headed attention output, zlRepresenting a description of the l-th layer coded picture.
6. The method according to any one of claims 1-5, further comprising compensating for a loss of information of a hybrid encoder of CNN and Transformer, specifically comprising:
similar to U-Net, skip concatenation is used to fuse the multi-scale features from the hybrid encoder with the upsampled features, using CNN as a feature extractor to generate a feature map instead of inputting a1 × 1 patch extracted from the original image, thereby preserving more deep and shallow features to compensate for information loss.
7. The method according to claim 5, wherein step S4 specifically comprises:
the plurality of upsampling steps are used for decoding the hidden features to output a final segmentation mask map, specifically:
in the case of hidden features
Figure FDA0003245131060000024
Is reconstructed into
Figure FDA0003245131060000025
Then, a cascaded decoder is realized by cascading a plurality of upsampling blocks to achieve the following
Figure FDA0003245131060000026
To H × W full resolution, wherein cascading multiple upsampling blocks sequentially comprises two upsamples, a 3 × 3 convolutional layer and a ReLU layer;
and finally, the cascade decoder and the hybrid encoder form a U-shaped structure together, and the feature fusion is carried out by realizing the upsampling of feature maps with different levels of resolution ratios through jump connection.
8. A pulmonary nodule image detection system based on CT images is characterized in that a transformer Unet combined framework is adopted, and the pulmonary nodule image detection system specifically comprises:
an image serialization module for remodeling the slices of the input lung CT image into a set of patch sequences to perform labeling;
a patch embedding module to map the vectorized patch sequence to a potential two-dimensional embedding space using a trainable linear mapping;
a mixed encoder module of CNN and Transformer, which is used for encoding the marked image block from the CNN feature map into an input sequence for extracting the global context through the Transformer;
and the cascade decoder module is used for firstly up-sampling the coding characteristics obtained by the CNN and Transformer hybrid encoder module through the decoder, then combining the up-sampled coding characteristics with the high-resolution CNN characteristic diagram to realize accurate positioning, and finally utilizing U-Net to enhance more accurate detail detection information by recovering local spatial information.
CN202111030746.3A 2021-09-03 2021-09-03 Pulmonary nodule image detection method and system based on CT image Pending CN113888466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111030746.3A CN113888466A (en) 2021-09-03 2021-09-03 Pulmonary nodule image detection method and system based on CT image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111030746.3A CN113888466A (en) 2021-09-03 2021-09-03 Pulmonary nodule image detection method and system based on CT image

Publications (1)

Publication Number Publication Date
CN113888466A true CN113888466A (en) 2022-01-04

Family

ID=79012272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111030746.3A Pending CN113888466A (en) 2021-09-03 2021-09-03 Pulmonary nodule image detection method and system based on CT image

Country Status (1)

Country Link
CN (1) CN113888466A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114757942A (en) * 2022-05-27 2022-07-15 南通大学 Method for detecting active tuberculosis by multilayer spiral CT (computed tomography) based on deep learning
CN115713661A (en) * 2022-11-29 2023-02-24 湘南学院 Spinal column lateral bending Lenke parting system
CN116779170A (en) * 2023-08-24 2023-09-19 济南市人民医院 Pulmonary function attenuation prediction system and device based on self-adaptive deep learning
WO2024000161A1 (en) * 2022-06-28 2024-01-04 中国科学院深圳先进技术研究院 Ct pancreatic tumor automatic segmentation method and system, terminal and storage medium
CN117636064A (en) * 2023-12-21 2024-03-01 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children
CN117636064B (en) * 2023-12-21 2024-05-28 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114638842B (en) * 2022-03-15 2024-03-22 桂林电子科技大学 Medical image segmentation method based on MLP
CN114757942A (en) * 2022-05-27 2022-07-15 南通大学 Method for detecting active tuberculosis by multilayer spiral CT (computed tomography) based on deep learning
WO2024000161A1 (en) * 2022-06-28 2024-01-04 中国科学院深圳先进技术研究院 Ct pancreatic tumor automatic segmentation method and system, terminal and storage medium
CN115713661A (en) * 2022-11-29 2023-02-24 湘南学院 Spinal column lateral bending Lenke parting system
CN115713661B (en) * 2022-11-29 2023-06-23 湘南学院 Scoliosis Lenke parting system
CN116779170A (en) * 2023-08-24 2023-09-19 济南市人民医院 Pulmonary function attenuation prediction system and device based on self-adaptive deep learning
CN117636064A (en) * 2023-12-21 2024-03-01 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children
CN117636064B (en) * 2023-12-21 2024-05-28 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children

Similar Documents

Publication Publication Date Title
CN113888466A (en) Pulmonary nodule image detection method and system based on CT image
Chen et al. Recent advances and clinical applications of deep learning in medical image analysis
CN113870258B (en) Counterwork learning-based label-free pancreas image automatic segmentation system
WO2020108362A1 (en) Body posture detection method, apparatus and device, and storage medium
CN109903292A (en) A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN113012172B (en) AS-UNet-based medical image segmentation method and system
CN116309650B (en) Medical image segmentation method and system based on double-branch embedded attention mechanism
CN112734755A (en) Lung lobe segmentation method based on 3D full convolution neural network and multitask learning
CN114494296A (en) Brain glioma segmentation method and system based on fusion of Unet and Transformer
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
Azad et al. Enhancing medical image segmentation with TransCeption: a multi-scale feature fusion approach
CN115471470A (en) Esophageal cancer CT image segmentation method
CN115861616A (en) Semantic segmentation system for medical image sequence
CN117132595B (en) Intelligent light-weight processing method and system for DWI (discrete wavelet transform) images of rectal cancer and cervical cancer
CN113205094A (en) Tumor image segmentation method and system based on ORSU-Net
CN114972378A (en) Brain tumor MRI image segmentation method based on mask attention mechanism
CN116596949A (en) Medical image segmentation method based on conditional diffusion model
CN117455906B (en) Digital pathological pancreatic cancer nerve segmentation method based on multi-scale cross fusion and boundary guidance
CN115526829A (en) Honeycomb lung focus segmentation method and network based on ViT and context feature fusion
Zheng et al. Self-supervised monocular depth estimation based on combining convolution and multilayer perceptron
Wu et al. Continuous Refinement-based Digital Pathology Image Assistance Scheme in Medical Decision-Making Systems
Wen et al. Short‐term and long‐term memory self‐attention network for segmentation of tumours in 3D medical images
CN115375712B (en) Lung lesion segmentation method for realizing practicality based on bilateral learning branch
CN116468887A (en) Method for segmenting colon polyp with universality
CN116342877A (en) Semantic segmentation method based on improved ASPP and fusion module in complex scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination