CN115984578A - Tandem fusion DenseNet and Transformer skin image feature extraction method - Google Patents

Tandem fusion DenseNet and Transformer skin image feature extraction method Download PDF

Info

Publication number
CN115984578A
CN115984578A CN202211570369.7A CN202211570369A CN115984578A CN 115984578 A CN115984578 A CN 115984578A CN 202211570369 A CN202211570369 A CN 202211570369A CN 115984578 A CN115984578 A CN 115984578A
Authority
CN
China
Prior art keywords
transformer
densenet
layer
feature extraction
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211570369.7A
Other languages
Chinese (zh)
Inventor
白雪梅
王帅
张晨洁
史新瑞
赵荟圆
侯聪聪
王澳
师宏锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202211570369.7A priority Critical patent/CN115984578A/en
Publication of CN115984578A publication Critical patent/CN115984578A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a skin image feature extraction method with tandem fusion of DenseNet and a Transformer, and belongs to the field of deep learning image classification. Preprocessing an input picture, converting the input picture into a tensor, sending the tensor to a DenseNet part for feature extraction, and obtaining local features of a face; sending the feature map obtained after DenseNet into a Transformer to obtain the global features of the face; carrying out information fusion on the global features and the local features to obtain fusion features, and carrying out skin image identification through the fusion features; the feature map output by the Transformer passes through a Layer Noramyl Layer, an average pooling Layer and a full link Layer, and finally, the prediction category and the disease probability are output. The invention can fully utilize the skin information contained in the global characteristics and the local characteristics, thereby improving the accuracy of skin diagnosis and well judging the type and the disease probability of skin diseases.

Description

Tandem fusion DenseNet and Transformer skin image feature extraction method
Technical Field
The invention relates to the field of deep learning image classification, in particular to a method for more fully extracting skin disease image characteristics by serially fusing DenseNet and Transformer.
Background
Skin diseases are more common and multiple diseases in medicine, and skin detection technology is receiving more and more attention. The traditional manual diagnosis has certain subjectivity and cannot meet the detection requirements of complex and various skin diseases. In recent years, deep learning techniques have been increasingly used in more well-known areas, and features obtained by deep learning have proved to be more representative than features constructed by conventional methods in many tasks.
The research of deep learning has become an application trend, wherein a Convolutional Neural Network (CNN) model is always the mainstream model in the CV field and has the best application prospect, and gradually becomes the most widely applied object in the machine learning and computer vision fields, and achieves good results. The DenseNet convolution operation is good at extracting local features, but does not have the capability of extracting global characteristics, and in order to sense the global information of the image, the stacking convolution layer is required, and the pooling operation is adopted to enlarge the receptive field. And the Transformer has the global and dynamic receptive field capability, so that the monopoly of the CNN on the aspect of visual representation is broken, and a better result is obtained on an image recognition task. The method for extracting features by using the deep network is widely applied to various aspects such as images, voice, videos and the like.
One difficulty facing the field of medical image analysis, including skin diagnosis, is the insufficient amount of high quality medical image data. In the face of an image with insufficient data quantity, information on the image needs to be extracted more fully. In the aspect of auxiliary diagnosis of skin disease images, the CNN algorithm and the Transformer are fully fused, the image processing performance is improved, and the diagnosis accuracy is improved.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a method for extracting depth features of 8 skin disease images of melanoma, melanocyte nevi, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, hemangioma and squamous cell carcinoma in an ISIC2019 data set by using tandem fusion of DenseNet and Transformer. By utilizing the advantages that DenseNet is good at extracting local features of an image, and a Transformer structure is good at extracting global features of the image and the characteristics of the local features such as edges and textures of a lesion area needing attention of a skin image, key information is extracted by using DenseNet, and then the Transformer is used for carrying out global analysis on the information, so that lesion features can be better extracted from the skin image, and the accuracy of auxiliary diagnosis is improved.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
step one, downloading a source data set ISIC2019, and compressing all pictures to 448 × 448;
and step two, reserving a first convolution Layer, a pooling Layer, a first Transition Layer and the first two Transition Block layers in the DenseNet as a local feature extraction module. Converting the input picture into tensor, and sending the tensor to a feature extraction module for local feature extraction;
step three, reducing the dimension of the feature vector obtained after the picture tensor passes through the local feature extraction module on the channel number of the feature graph output by the DenseNet through a convolution layer with the convolution kernel size of 1 x 1 and the convolution kernel number of 96 so as to match the requirement of the first Stage in the Transformer on the channel number of the input vector;
and step four, sending the feature graph after dimensionality reduction into a Transformer algorithm for further feature extraction, wherein the Transformer algorithm selects Swin Transformer-Tiny. The algorithm is divided into 4 stages, with the number of Swin Transformer blocks in each Stage being 2,2,6,2.
And step five, outputting the prediction result of image classification by the feature vector extracted by the Transformer through a Layer Noramyl Layer, a pooling Layer and a full-link Layer.
Drawings
FIG. 1 is a flow chart of the use of the algorithm of the present invention.
Fig. 2 is a block diagram of the algorithm proposed by the present invention.
FIG. 3 is an internal structure diagram of the DenseNet Block of the present invention.
FIG. 4 is an internal structure diagram of Swin transducer Blocks in the present invention.
Detailed Description
For a further understanding of the invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings. The specific using process of the invention is realized by the following steps:
step one, downloading an open-source data set ISIC2019 of an international skin imaging cooperative organization, wherein the data set comprises 25331 pictures which comprise eight skin disease types: melanoma, melanocytic nevi, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, hemangioma, squamous cell carcinoma. The method includes the steps that pictures of each category in a data set are divided into a training set and a testing set according to the proportion of 8. In order to alleviate the problem of data imbalance and avoid overfitting during training, data enhancement processing is carried out on the skin disease picture, and specifically, geometric transformation methods such as rotation and translation are used. Since the improved algorithm requires the input picture size to be 448 x 3, the data set is subjected to a downscaling process to reduce the resolution of the data set picture to 48 x 448 x 3.
Step two, for the detection of skin diseases, local characteristics of skin damage, such as the edge shape of a damaged part and texture in a damaged area, need to be paid more attention. The characteristics of the skin surface outside the lesion area are not considered. The DenseNet algorithm is mainly responsible for the extraction of local features. The complete DenseNet is not required to be used as a preposed part of the Swin transducer, and only the first convolution Layer, the pooling Layer, the first two DenseBlock and the first Transition Layer in the DenseNet are required to form a local feature extraction module. A large number of image features are extracted by the module to obtain a feature map of the image. The core of convolution, whose function is to perform feature extraction on data, is generally composed of a plurality of convolution kernels. Each convolution kernel is connected with a local area of the previous layer of feature map, the local area is a receptive field of the previous layer of the convolution kernel, and the convolution kernel can obtain a new feature map through convolution operation. The calculation of the feature map is generally divided into 2 steps: firstly, carrying out convolution operation on the previous layer of data through a convolution kernel, and then applying a nonlinear function to each operation result. The typical form of a convolutional layer is:
Figure BDA0003992517240000031
in the formula (I), the compound is shown in the specification,
Figure BDA0003992517240000032
j-th feature vector for the l-th layer output, f is the excitation function, k is the number of feature quantities, and->
Figure BDA0003992517240000033
The ith characteristic quantity output for the l-1 th layer is convolution operation and is selected based on the value of the convolution operation>
Figure BDA0003992517240000034
A weight matrix for the jth convolution kernel output for the ith layer, <' >>
Figure BDA0003992517240000035
The bias moment of the jth convolution kernel output for the ith layer. />
First, the 448 × 3 image is passed through the first convolution layer of the DenseNet, resulting in 224 × 3 feature vectors, the first convolution layer having a kernel size of 7 × 7, 64 convolution kernels, and a step size of 2. The output feature vector is passed through a maximum pooling layer of size 3 x 3 and step size 2, resulting in a feature vector of 112 x 64. Then enters a first Dense Block module which contains optional layers Bottleneck, and the input of each layer is the concatenation of the outputs of all the previous layers on the channel. The first Dense Block contained six sets of 1 × 1 convolutional layers and 3 × 3 convolutional layers. The feature vector of 112 × 64 input to the first detect Block passes through the Batch Normalization layer and the ReLU layer in the Block, and the dimension of the output feature vector is not changed and is still 112 × 64. And then enter an optional layer bottleeck, which uses a convolution kernel of 1 × 1 to reduce the phenomenon of excessive depth caused by the splicing of vectors on the channel, and the output is 112 × 128. After passing through a convolution layer with convolution kernel size of 3 × 3 and 32 convolution kernels, the final output is 112 × 32 after the first Dens Blcok.
Then passes through the Transition Block module. The module is arranged between two Dense blocks, plays a role in connection and consists of a convolution layer and a pooling layer. Its input is the feature vector of 112 x 32 from the previous Dense Block output. The lattice in Block is an optional layer, and multiple convolution kernels with the size of 1 × 1 are adopted, and here, the parameters can be reduced by compressing according to a preset compression coefficient θ (0-1), and the output is 112 × 112 (32 × θ). Then passed through a second Dense Block, which contained 12 sets of 1X 1 and 3X 3 convolutions. A feature vector of 56 x 32 is obtained. The calculation formula of the feature tensor obtained after DenseNet is as follows:
M=D (2) (T(D (1) (P 3 (Conv 7 (Z))))) (2)
wherein Z is the input picture tensor, conv 7 Is the first 7 by 7 convolutional layer, P 3 Pooling layer of 3 x 3, D (1) Is the first Dense Block Layer, T is the Transition Layer, D (2) The second Dense Block layer.
And step three, the characteristic vector obtained by the local characteristic extraction module is high in quantity and width of 56, but the depth does not meet the requirement of a Transformer. Therefore, the number of channels of the feature map is also 96 by one convolution kernel with the size of 1 × 1 and the number of convolution kernels of 96. In order to solve the problem that the data distribution of the middle layer is changed in the training process so as to prevent the gradient from disappearing or exploding and accelerate the training speed, and then the characteristic vector passes through a Batch Normalization layer. And finally, sending the feature vectors into a Transformer for further feature extraction.
The feature vectors fed into the transform are:
C=BN(Conv 1 (M)) (3)
wherein M is a feature tensor obtained after DenseNet, conv 1 The convolution layer contains 96 convolution kernels of 1 × 1 size, and BN is the Batch Normalization layer.
And step four, further feature extraction in the Transformer. The transform selects the Swin transform algorithm. The algorithm is divided into 4 stages, and the number of transform blocks in each Stage is 2,2,6,2. The depth of each stage input vector is 96, 192, 384, 768 respectively. The first stage inputs a 56 × 96 sized feature vector through a transform Block, which has paired W-MSA and SW-MSA modules, performing window self-attention to focus the weight more on the skin lesion. After Block, the dimension of the feature vector is not changed, and after Patch Merging, the height and width of the feature vector are reduced to half, and the depth is doubled, so that a feature map with the size of 28 × 192 is obtained. After passing through Stage3 and Stage4 in the Transformer, the size of the final characteristic diagram is 7 × 768.
And step five, for the classification task, the feature graph output by the Transformer also passes through a Layer Noramyl Layer, an average pooling Layer and a full connection Layer, and finally the prediction category is output. The fully connected layer is similar to a conventional neural network in that each neuron is connected to all neurons in the previous layer. Thus, the fully-connected layer contains global information of the data, connecting each neuron of the fully-connected layer to a Softmax function, which is usually used in the output layer of the classification problem, whose function is to represent the prediction result in the form of probability, whose formula is:
Figure BDA0003992517240000041
wherein S is the output value Z of the mth neuron m The value of the probability converted by the Softmax function, C being the number of neurons,Z c Is the output value of the c-th neuron.

Claims (6)

1. A tandem fusion DenseNet and Transformer skin image feature extraction method is characterized by comprising the following steps:
step one, downloading a source data set ISIC2019, and compressing all pictures to 448 × 448;
and step two, reserving a first convolution Layer, a pooling Layer, a first Transition Layer and the first two Transition Block layers in the DenseNet as a local feature extraction module. Converting the input picture into tensor, and sending the tensor to a feature extraction module for local feature extraction;
step three, reducing the dimension of the feature vector obtained after the picture tensor passes through the local feature extraction module on the channel number of the feature graph output by the DenseNet through a convolution layer with the convolution kernel size of 1 x 1 and the convolution kernel number of 96 so as to match the requirement of the first Stage in the Transformer on the channel number of the input vector;
and step four, sending the feature graph after dimensionality reduction into a Transformer algorithm for further feature extraction, wherein the Transformer algorithm selects Swin Transformer-Tiny. The algorithm is divided into 4 stages, and the number of blocks in each stage is 2,2,6,2.
And step five, outputting the prediction result of image classification by the feature vector extracted by the transform through an LN layer, a pooling layer and a full-connection layer.
2. The method for extracting skin image features through tandem fusion of DenseNet and Transformer as claimed in claim 1, wherein: firstly, converting an input picture into a tensor, and sending the tensor to a DenseNet part for feature extraction, wherein the DenseNet part is mainly responsible for extracting local features.
3. The method for extracting skin image features by fusing DenseNet and Transformer in series according to claim 1, wherein: the characteristic diagram obtained after Densenet has the height and width of the vector of 56, but the depth does not meet the requirement of a Transformer. Therefore, the number of channels of the feature map is also 96 through a convolution layer with the convolution kernel size of 1 × 1 and the number of convolution layers of 96, and then the feature vector is sent to the Transformer for further feature extraction.
4. The method for extracting skin image features by fusing DenseNet and Transformer in series according to claim 1, wherein: the characteristic tensor sent into the Transformer after Densenet is as follows: m = δ (F) 1 (O)) wherein F 1 The convolution transformation is 1 × 1, O is a feature map of CNN-terminal output, and δ is an activation function.
5. The method for extracting skin image features by fusing DenseNet and Transformer in series according to claim 1, wherein: for the classification task, the feature map output by the Transformer passes through a Layer Noramyl Layer, an average pooling Layer and a full link Layer, and finally a prediction category is output.
6. The method for extracting skin image features by fusing DenseNet and Transformer in series according to claim 1, wherein: during prediction, an input picture is processed to remove a non-human face part. And uniformly cutting the processed picture according to the size of 448 × 448, converting the cut picture into tensors, and sequentially sending the tensors into a network model for prediction. Finally, the type and the disease probability of the skin diseases contained in the picture are obtained.
CN202211570369.7A 2022-12-12 2022-12-12 Tandem fusion DenseNet and Transformer skin image feature extraction method Pending CN115984578A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211570369.7A CN115984578A (en) 2022-12-12 2022-12-12 Tandem fusion DenseNet and Transformer skin image feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211570369.7A CN115984578A (en) 2022-12-12 2022-12-12 Tandem fusion DenseNet and Transformer skin image feature extraction method

Publications (1)

Publication Number Publication Date
CN115984578A true CN115984578A (en) 2023-04-18

Family

ID=85965665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211570369.7A Pending CN115984578A (en) 2022-12-12 2022-12-12 Tandem fusion DenseNet and Transformer skin image feature extraction method

Country Status (1)

Country Link
CN (1) CN115984578A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636064A (en) * 2023-12-21 2024-03-01 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636064A (en) * 2023-12-21 2024-03-01 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children
CN117636064B (en) * 2023-12-21 2024-05-28 浙江大学 Intelligent neuroblastoma classification system based on pathological sections of children

Similar Documents

Publication Publication Date Title
CN108491849B (en) Hyperspectral image classification method based on three-dimensional dense connection convolution neural network
CN111191660B (en) Colon cancer pathology image classification method based on multi-channel collaborative capsule network
CN112819910B (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN112070158B (en) Facial flaw detection method based on convolutional neural network and bilateral filtering
CN113034505B (en) Glandular cell image segmentation method and glandular cell image segmentation device based on edge perception network
CN111738363B (en) Alzheimer disease classification method based on improved 3D CNN network
CN109190511B (en) Hyperspectral classification method based on local and structural constraint low-rank representation
CN113112416B (en) Semantic-guided face image restoration method
Karthiga et al. Transfer learning based breast cancer classification using one-hot encoding technique
WO2024040828A1 (en) Method and device for fusion and classification of remote sensing hyperspectral image and laser radar image
CN115984578A (en) Tandem fusion DenseNet and Transformer skin image feature extraction method
CN116310329A (en) Skin lesion image segmentation method based on lightweight multi-scale UNet
Qian et al. Classification of rice seed variety using point cloud data combined with deep learning
Yang et al. An image super-resolution deep learning network based on multi-level feature extraction module
CN117576467B (en) Crop disease image identification method integrating frequency domain and spatial domain information
CN110991554A (en) Improved PCA (principal component analysis) -based deep network image classification method
Patil et al. Expression invariant face recognition using semidecimated DWT, Patch-LDSMT, feature and score level fusion
Li et al. The effectiveness of image augmentation in breast cancer type classification using deep learning
CN117315481A (en) Hyperspectral image classification method based on spectrum-space self-attention and transducer network
CN116823868A (en) Melanin tumor image segmentation method
CN115100509B (en) Image identification method and system based on multi-branch block-level attention enhancement network
CN113837263B (en) Gesture image classification method based on feature fusion attention module and feature selection
CN113269684B (en) Hyperspectral image restoration method based on single RGB image and unsupervised learning
CN112991194A (en) Infrared thermal wave image deblurring method based on depth residual error network
Zeng et al. Aircraft segmentation from remote sensing image by transferring natual image trained forground extraction CNN model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination