CN116310335A - Method for segmenting pterygium focus area based on Vision Transformer - Google Patents

Method for segmenting pterygium focus area based on Vision Transformer Download PDF

Info

Publication number
CN116310335A
CN116310335A CN202310254245.6A CN202310254245A CN116310335A CN 116310335 A CN116310335 A CN 116310335A CN 202310254245 A CN202310254245 A CN 202310254245A CN 116310335 A CN116310335 A CN 116310335A
Authority
CN
China
Prior art keywords
image
feature map
image feature
pterygium
multiplied
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310254245.6A
Other languages
Chinese (zh)
Inventor
朱绍军
方新闻
郑博
吴茂念
杨卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huzhou University
Original Assignee
Huzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huzhou University filed Critical Huzhou University
Priority to CN202310254245.6A priority Critical patent/CN116310335A/en
Publication of CN116310335A publication Critical patent/CN116310335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for dividing a pterygium focus area based on Vision Transformer, belongs to the technical field of image processing and application, and aims to solve the problems of inaccurate positioning and division of the pterygium focus area in the prior art. The method comprises the following steps: selecting an anterior ocular segment image provided by a cooperation hospital as an original sample, dividing the sample into a training set, a verification set and a test set, and performing a series of preprocessing operations; the semantic segmentation model fused with Vision Transformer, the convolution network and the pyramid pooling module is provided for carrying out semantic segmentation on the pterygium focus area in the anterior segment image, and more target information can be extracted by utilizing the Vision Transformer-based pterygium focus area segmentation method provided by the invention, so that the pterygium in the anterior segment image can be segmented efficiently and accurately.

Description

Method for segmenting pterygium focus area based on Vision Transformer
Technical Field
The invention belongs to the technical field of medical image processing and application, and particularly relates to a segmentation method of pterygium focus areas based on deep learning.
Background
The prevalence of pterygium is about 12% worldwide. Prior to 2015, students mostly achieved segmentation of target objects through traditional machine learning. The traditional segmentation method comprises a threshold value method, a region growing method, an edge detection method and the like, but the segmentation precision and the segmentation efficiency of the traditional machine learning on medical images are difficult to meet the practical application requirements.
In recent years, many researches have realized classification diagnosis of diseases through convolution technology in deep learning and achieved an accuracy of about 95%. But the classification result alone does not provide accurate localization of the lesion area for surgical treatment of pterygium. At present, the application research of the convolution technology in the field of medical segmentation is more, and the segmentation precision is superior to that of the traditional machine learning. Although the target information can be extracted through the deep convolution, much edge detail information is lost in the convolution process, so that the edge segmentation effect is not ideal.
Disclosure of Invention
The invention aims to: the invention provides a method for dividing a pterygium focus area based on Vision Transformer, aiming at the problems of less pterygium data, low division precision, difficult boundary division and the like. According to the method, vision Transformer is used as a main body, a convolutional neural network is used as an auxiliary body, an attention mechanism is fused, a model is trained by using an expert-labeled pterygium focus region data set, complete information of the pterygium region is extracted as a target, and a new segmentation method is provided according to the structural characteristics of the model network and the requirements of medical image segmentation tasks, so that accurate segmentation of the pterygium is realized.
The technical scheme is as follows:
1. the segmentation method based on Vision Transformer pterygium focus area comprises an acquisition module, a semantic segmentation network module and a training module, and the segmentation processing is carried out on the diseased anterior segment image by utilizing the data acquisition module, the semantic segmentation network module and the training module, and is characterized by comprising the following steps:
(1) The method comprises the steps that the anterior segment image of the diseased eye forms a group of pterygium segmentation data sets, the pterygium segmentation data sets are used as original data samples, and the data acquisition module carries out preprocessing operation on images in the original data samples so as to ensure that the lengths and heights of the images are the same, and a group of training set images are formed;
(2) Segmenting the training set image by utilizing the semantic segmentation network module, wherein the semantic segmentation network module comprises a Vision Transformer network and a convolution network; the Vision Transformer network processes the training set images by using an image blocking method, and obtains an image block association relationship by superposing a multi-layer multi-head attention mechanism so as to obtain an image attention diagram; the convolution network obtains an image feature map through a multi-layer convolution operation; the image attention map and the image feature map are obtained through matrix addition operation, and a pterygium segmentation map is obtained through a pyramid pooling method;
(3) Training the segmentation model by using a training module, inputting the pterygium segmentation data set into the semantic segmentation network module for training, and finally forming a pterygium focus region segmentation model by setting a learning rate, a loss function method and a learning iteration period adjustment model parameter during training;
the pretreatment operation is as follows:
the method of the invention requires that the input image size is MxNx3, M and N are positive integers, the original image size is H xW, H and W are positive integers, firstly, the image is scaled into Mx ((N/H) xW), and then, the two sides of the shorter sides are evenly supplemented with gray sides to convert the size into MxN;
the image blocking method comprises the following steps:
up-sampling an image with the size of M multiplied by N multiplied by 3 into M 'multiplied by N' multiplied by 3 through up-sampling operation, inputting the image with the size of M 'multiplied by N' multiplied by 3 into Vision Transformer, dividing an input picture into image blocks with the size of (M '/Patch) multiplied by (N'/Patch), and adding trainable position information parameters with the size of 1× ((M '/Patch) multiplied by (N'/Patch))× (3 multiplied by Patch) on each image block;
the multi-head attention mechanism is as follows:
inputting a multi-head attention mechanism by taking the image blocks as units, calculating the relation among the image blocks through matrix operation, generating new image features with the size of ((M '/Patch) X (N'/Patch))X (3 XPatch) X Patch), and cycling the multi-head attention mechanism for 12-16 times;
the image characteristics generated in the multi-head attention mechanism are subjected to a transformation operation to obtain the image attention diagram, and a convolution module with a convolution kernel size of 3 multiplied by 3 is connected to obtain the image attention diagram with a size of 2048 multiplied by 30;
the convolution network is as follows:
taking parameters obtained after the ResNet50 model is pre-trained on a public data set ImageNet as initialization parameters of the convolution network, and extracting the image feature map through convolution modules with different 4 layers of sizes and structures;
the pyramid pooling method comprises the following steps:
the image attention map and the image feature map are subjected to matrix addition operation to obtain a new image feature map, the new image feature map is input into a pyramid pooling module, and the dimension of the image feature map is converted into 1/4 of the dimension of the image feature map through convolution operation; respectively carrying out pooling operation by using 4 pooling blocks with different sizes to obtain an image feature map a, an image feature map b, an image feature map c and an image feature map d; finally, the image feature map a, the image feature map b, the image feature map c and the image feature map d are up-sampled and then stacked with the new image feature map to obtain an image feature map (e);
inputting the image feature map a, the image feature map b, the image feature map c and the image feature map d into an up-sampling module, obtaining an image feature map through up-sampling and feature fusion operation, stacking the image feature map a, the image feature map b, the image feature map c and the image feature map d to obtain a brand new image feature map f, and finally carrying out convolution operation on the image feature map f to obtain a semantic segmentation image.
2. The method of dividing according to claim 1, wherein,
the Loss function comprises a cross entropy Loss function and a Dice Loss function, the cross entropy Loss function and the Dice Loss function are fused to be used as the Loss function of the semantic segmentation network model, and the following objective function is minimized:
Loss=Cross Entropy Loss+Dice Loss
where Cross Entropy Loss denotes the cross entropy Loss function and Dice Loss denotes the Dice Loss function.
3. The method of dividing according to claim 1, wherein,
the learning iteration period is 80 periods, freezing training is adopted in 0-40 periods, 40-80 normal training is adopted in 40-40 periods, and the learning rate is 1e-5.
The beneficial effects are that:
the pterygium segmentation dataset marked by the expert is used for training, so that the authority of training is ensured.
The ResNet50 is used as a feature extraction network, and a transfer learning method is used for pre-training on the public data set ImageNet, so that a deep convolution network ensures that a model can extract enough complete focus region features.
The multi-headed attention mechanism in Vision Transformer is capable of preserving detailed information of the outline of the lesion area by linking the relationship inside the image to a great extent.
And adding a stage up-sampling module into the pyramid pooling module, and reserving target detail information by the stage up-sampling module through special drawing fusion while extracting context information by utilizing pooling blocks with different sizes.
The segmentation effect of pterygium is improved by fusing cross entropy Loss and Dice Loss as a Loss function in a network model.
Drawings
FIG. 1 is a schematic diagram of a semantic segmentation network structure
FIG. 2 is a schematic diagram of a pyramid pooling module structure
FIG. 3 is a schematic diagram of a Vision Transformer structure
FIG. 4 is a schematic diagram of a phase up-sampling structure
FIG. 5 is a graph showing the comparison of the pre-processing and post-processing of data
FIG. 6 is a schematic diagram of a segmentation result
Detailed Description
Examples: the method for dividing the pterygium focus area based on Vision Transformer provided by the invention is used for dividing the pterygium focus area, and comprises the following steps of:
1. pterygium dataset
The pterygium segmentation dataset contains 517 anterior ocular segment images of the pterygium (containing pterygium symptoms of varying degrees of illness), with training verification data 367 and test data 150, with focal areas of each pterygium being personally labeled by an ophthalmologist.
2. Data preprocessing
The input image size required by the method is 473×473×3, the original image size is h×w (H > W), the image is scaled to 473× ((473/H) ×w) ×3, and then the shorter sides are evenly gray-lined with the gray-lined sides to convert the size to 473×473×3.
3. Training semantic segmentation networks
In the network training process, data are input to a network in batches, firstly, an image with the size of 473×473×3 is up-sampled to 480×480×3 through an up-sampling operation, the image with the size of 480×480×3 is input to Vision Transformer, after an input picture is divided into 30×30×3 image blocks by convolution with the size of 16×16 and the step size of 16, features are tiled to obtain serialized picture features with the size of 900×768×1, and a trainable position information parameter with the size of 900×768×1 is added to each image block to serve as sequence features of the image blocks. Inputting a multi-head attention mechanism by taking the sequence characteristics as a unit, transforming the sequence into q, k and v matrixes, calculating the relation among image blocks through matrix operation, generating new image characteristics with the size of 900 multiplied by 768, cycling 12 times of multi-head attention mechanism operation, and carrying out normalization and linear transformation operation on the image characteristics generated in the multi-head attention mechanism to obtain an image attention map with the size of 30 multiplied by 768. Performing convolution operation on the image attention graph through a convolution module with the convolution kernel size of 3×3 to obtain an image attention graph (a) with the size of 30×30×2048;
secondly, the convolutional network adopts a ResNet50 and adopts a transfer learning method, and parameters obtained by training a ResNet50 model on an ImageNet are used as initialization parameters of the convolutional network; inputting an image with the size of 473 multiplied by 3 into a ResNet50 network, and performing four layers of different convolution modules, wherein each layer of convolution module comprises three different convolution operations, namely 3 times, 4 times, 6 times and 3 times, respectively, so as to obtain an image characteristic diagram (B) with the size of 30 multiplied by 2048;
obtaining a new image characteristic diagram (C) by adding elements from the image attention diagram (A) and the image characteristic diagram (B), inputting the new image characteristic diagram (C) into a pyramid pooling module, and converting the dimension of the image characteristic diagram (C) into 1/4 of the input dimension by using convolution operation with kernel_size of 1×1; then, respectively carrying out pooling operation by using pooling blocks of 1×1, 2×2, 3×3 and 6×6 to obtain an image feature map a, an image feature map b, an image feature map c and an image feature map d; finally, up-sampling the image feature map a, the image feature map b, the image feature map C and the image feature map D to 30 multiplied by 512, and stacking the image feature map a, the image feature map b, the image feature map C and the image feature map D in a channel stacking mode to obtain an image feature map (D);
inputting an image feature map a, an image feature map b, an image feature map c and an image feature map D into an up-sampling module at the stage, carrying out feature fusion on the image feature map a, the image feature map b, the image feature map c and the image feature map D through up-sampling operation to obtain an image feature map with the size of 30 multiplied by 512, and carrying out channel stacking with the image feature map (D) to obtain a brand new feature map (e 1);
finally, repeating the operations in the pyramid pooling module and the stage up-sampling module on the image feature map of the third layer of the ResNet50 to obtain an image feature map (e 2), and carrying out feature fusion operation on the image feature map and the brand-new feature map (e 1) to obtain a pterygium semantic segmentation map;
and carrying out loss function calculation on the pterygium semantic segmentation map and the real pterygium semantic segmentation map, and updating network parameters. The invention fuses the cross entropy Loss function and the Dice Loss function as the Loss function of the semantic segmentation network model, and minimizes the following objective function:
Loss=Cross Entropy Loss+Dice Loss
where Cross Entropy Loss denotes the cross entropy Loss function and Dice Loss denotes the Dice Loss function.
Note y=y truth ,y′=y pred The following Cross Entropy Loss objective function is defined:
Cross Entropy Loss=-y·log(y′)-(1-y)·log(1-y)
the more accurate the pixel classification, the smaller the Loss.
The marks A and B respectively represent a predicted contour area point set and a real contour area point set, and define the following Dice Loss objective function:
Figure SMS_1
the greater the overlapping ratio of the predicted lesion area to the real lesion area, the smaller the Loss.
4. Analysis of processing results
The method uses the following two performance metrics to quantify the processing results, which are respectively: single-class cross-over ratio (IoU), average cross-over ratio (MIoU), single-class Pixel Accuracy (PA), average pixel accuracy (MPA), and the calculation formula is as follows:
Figure SMS_2
Figure SMS_3
p i represents the segmented region g i Representing the real world region. The intersection ratio (IOU) is obtained by the union of the true value and the predicted value on the intersection ratio; the average cross-over ratio (MIOU) is calculated by calculating the cross-over ratio for each class (including the background class) and taking the average of all classes.
Figure SMS_4
Figure SMS_5
p ii Indicating the correct number of pixels to predict, p ij Representing the number of pixels predicted to be j for class i. Pixel Accuracy (PA) represents the proportion of correctly marked pixels to total pixels; average pixel accuracy is the ratio of the number of correctly classified pixels per class, after which the average of all classes is calculated.
The results on the pterygium test set of the present invention are MIOU:87.43%, MPA:92.57%, IOU:79.44%, PA:87.16%. A large number of applications show that the pterygium lesion area segmentation method based on Vision Transformer provided by the invention has higher segmentation performance. This is of great importance in the medical field.
As described above, while the present findings have been shown and described with reference to certain preferred embodiments, it is not to be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. The segmentation method based on Vision Transformer pterygium focus area comprises an acquisition module, a semantic segmentation network module and a training module, and the segmentation processing is carried out on the diseased anterior segment image by utilizing the data acquisition module, the semantic segmentation network module and the training module, and is characterized by comprising the following steps:
(1) The method comprises the steps that the anterior segment image of the diseased eye forms a group of pterygium segmentation data sets, the pterygium segmentation data sets are used as original data samples, and the data acquisition module carries out preprocessing operation on images in the original data samples so as to ensure that the lengths and heights of the images are the same, and a group of training set images are formed;
(2) Segmenting the training set image by utilizing the semantic segmentation network module, wherein the semantic segmentation network module comprises a Vision Transformer network and a convolution network; the Vision Transformer network processes the training set images by using an image blocking method, and obtains an image block association relationship by superposing a multi-layer multi-head attention mechanism so as to obtain an image attention diagram; the convolution network obtains an image feature map through a multi-layer convolution operation; the image attention map and the image feature map are obtained through matrix addition operation, and a pterygium segmentation map is obtained through a pyramid pooling method;
(3) Training the segmentation model by using a training module, inputting the pterygium segmentation data set into the semantic segmentation network module for training, and finally forming a pterygium focus region segmentation model by setting a learning rate, a loss function method and a learning iteration period adjustment model parameter during training;
the pretreatment operation is as follows:
the method of the invention requires that the input image size is MxNx3, M and N are positive integers, the original image size is H xW, H and W are positive integers, firstly, the image is scaled into Mx ((N/H) xW), and then, the two sides of the shorter sides are evenly supplemented with gray sides to convert the size into MxN;
the image blocking method comprises the following steps:
up-sampling an image with the size of M multiplied by N multiplied by 3 into M 'multiplied by N' multiplied by 3 through up-sampling operation, inputting the image with the size of M 'multiplied by N' multiplied by 3 into Vision Transformer, dividing an input picture into image blocks with the size of (M '/Patch) multiplied by (N'/Patch), and adding trainable position information parameters with the size of 1× ((M '/Patch) multiplied by (N'/Patch))× (3 multiplied by Patch) on each image block;
the multi-head attention mechanism is as follows:
inputting a multi-head attention mechanism by taking the image blocks as units, calculating the relation among the image blocks through matrix operation, generating new image features with the size of ((M '/Patch) X (N'/Patch))X (3 XPatch) X Patch), and cycling the multi-head attention mechanism for 12-16 times;
the image characteristics generated in the multi-head attention mechanism are subjected to a transformation operation to obtain the image attention diagram, and a convolution module with a convolution kernel size of 3 multiplied by 3 is connected to obtain the image attention diagram with a size of 2048 multiplied by 30;
the convolution network is as follows:
taking parameters obtained after the ResNet50 model is pre-trained on a public data set ImageNet as initialization parameters of the convolution network, and extracting the image feature map through convolution modules with different 4 layers of sizes and structures;
the pyramid pooling method comprises the following steps:
the image attention map and the image feature map are subjected to matrix addition operation to obtain a new image feature map, the new image feature map is input into a pyramid pooling module, and the dimension of the image feature map is converted into 1/4 of the dimension of the image feature map through convolution operation; respectively carrying out pooling operation by using 4 pooling blocks with different sizes to obtain an image feature map a, an image feature map b, an image feature map c and an image feature map d; finally, the image feature map a, the image feature map b, the image feature map c and the image feature map d are up-sampled and then stacked with the new image feature map to obtain an image feature map (e);
inputting the image feature map a, the image feature map b, the image feature map c and the image feature map d into an up-sampling module, obtaining an image feature map through up-sampling and feature fusion operation, stacking the image feature map a, the image feature map b, the image feature map c and the image feature map d to obtain a brand new image feature map f, and finally carrying out convolution operation on the image feature map f to obtain a semantic segmentation image.
2. The method of dividing according to claim 1, wherein,
the Loss function comprises a cross entropy Loss function and a Dice Loss function, the cross entropy Loss function and the Dice Loss function are fused to be used as the Loss function of the semantic segmentation network model, and the following objective function is minimized:
Loss=Cross Entropy Loss+Dice Loss
where Cross Entropy Loss denotes the cross entropy Loss function and Dice Loss denotes the Dice Loss function.
3. The method of dividing according to claim 1, wherein,
the learning iteration period is 80 periods, freezing training is adopted in 0-40 periods, 40-80 normal training is adopted in 40-40 periods, and the learning rate is 1e-5.
CN202310254245.6A 2023-03-11 2023-03-11 Method for segmenting pterygium focus area based on Vision Transformer Pending CN116310335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310254245.6A CN116310335A (en) 2023-03-11 2023-03-11 Method for segmenting pterygium focus area based on Vision Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310254245.6A CN116310335A (en) 2023-03-11 2023-03-11 Method for segmenting pterygium focus area based on Vision Transformer

Publications (1)

Publication Number Publication Date
CN116310335A true CN116310335A (en) 2023-06-23

Family

ID=86812807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310254245.6A Pending CN116310335A (en) 2023-03-11 2023-03-11 Method for segmenting pterygium focus area based on Vision Transformer

Country Status (1)

Country Link
CN (1) CN116310335A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993756A (en) * 2023-07-05 2023-11-03 石河子大学 Method for dividing verticillium wilt disease spots of field cotton

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993756A (en) * 2023-07-05 2023-11-03 石河子大学 Method for dividing verticillium wilt disease spots of field cotton
CN116993756B (en) * 2023-07-05 2024-09-27 石河子大学 Method for dividing verticillium wilt disease spots of field cotton

Similar Documents

Publication Publication Date Title
CN111242288B (en) Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN108335303B (en) Multi-scale palm skeleton segmentation method applied to palm X-ray film
CN113902761B (en) Knowledge distillation-based unsupervised segmentation method for lung disease focus
WO2023045231A1 (en) Method and apparatus for facial nerve segmentation by decoupling and divide-and-conquer
CN112734764A (en) Unsupervised medical image segmentation method based on countermeasure network
CN113192076B (en) MRI brain tumor image segmentation method combining classification prediction and multi-scale feature extraction
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN111881743B (en) Facial feature point positioning method based on semantic segmentation
CN111008650B (en) Metallographic structure automatic grading method based on deep convolution antagonistic neural network
CN109767459A (en) Novel ocular base map method for registering
CN111724401A (en) Image segmentation method and system based on boundary constraint cascade U-Net
CN112488963A (en) Method for enhancing crop disease data
CN113205509A (en) Blood vessel plaque CT image segmentation method based on position convolution attention network
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN114511502A (en) Gastrointestinal endoscope image polyp detection system based on artificial intelligence, terminal and storage medium
CN114565628B (en) Image segmentation method and system based on boundary perception attention
CN116310335A (en) Method for segmenting pterygium focus area based on Vision Transformer
CN112927237A (en) Honeycomb lung focus segmentation method based on improved SCB-Unet network
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
CN115458174A (en) Method for constructing intelligent diagnosis model of diabetic retinopathy
CN118430790A (en) Mammary tumor BI-RADS grading method based on multi-modal-diagram neural network
CN116739949B (en) Blastomere edge enhancement processing method of embryo image
CN116188435B (en) Medical image depth segmentation method based on fuzzy logic
CN114627123B (en) Leucocyte detection method integrating double-current weighting network and spatial attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination