CN117253035A

CN117253035A - Single-target medical image segmentation method based on attention under polar coordinates

Info

Publication number: CN117253035A
Application number: CN202311045884.8A
Authority: CN
Inventors: 胡凯; 罗安; 张园; 高协平
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2023-08-18
Filing date: 2023-08-18
Publication date: 2023-12-19

Abstract

The invention provides a single-target medical image segmentation method based on attention under polar coordinates, which comprises the following steps: performing data preprocessing operation on the single-target medical image dataset; extracting the characteristics of the preprocessed sample set; applying the extracted feature map to a pole prediction network to obtain poles, converting the middle feature map to a polar coordinate by combining the predicted poles, and dividing the converted feature map by using a polar coordinate attention dividing network; through the process training, the whole network model is well trained, the preprocessed test sample set is input into a network, and converted and output back to a Cartesian system, so that a final segmentation result is obtained. The invention creatively provides a single-target medical image segmentation method based on attention under polar coordinates, which fully utilizes the prior knowledge that a single-target image is a single-connected region, and converts a pixel classification problem which is complex and difficult to learn into a single-pole prediction and strip-shaped attention calculation problem.

Description

Single-target medical image segmentation method based on attention under polar coordinates

Technical Field

The invention relates to the technical field of image processing, in particular to a single-target medical image segmentation method based on attention under polar coordinates.

Background

Medical image segmentation is a task of delineating anatomical structures on medical images that have important diagnostic value, providing assistance for clinical diagnosis and treatment. One common use case of medical segmentation is to identify individual structures having a generally elliptical shape or distribution, such as most organs, skin lesions, polyps, heart adipose tissue, and similar structures and abnormalities.

In the traditional image segmentation method, the rule-based method is difficult to process complex image conditions by means of manually designed features; the traditional machine learning method requires prior knowledge of domain experts, and meanwhile, the problems of unreasonable feature selection and the like exist. The deep learning method has been widely used in medical image segmentation tasks by automatically learning image features. However, the existing medical image segmentation methods almost all consider the segmentation task as a classification problem at the pixel level, i.e. for each pixel in the image, classifying it as a corresponding target area or background area. The expression mode of the segmentation task does not accord with the task characteristics of the single-target medical image segmentation, and the target area is a single and complete connected area in the single-target medical image. Classifying each pixel cleaves its integrity and singleness. Resulting in a final segmentation result that often has multiple mispredictions of similar blobs.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a single-target medical image segmentation method based on attention in polar coordinates.

The technical scheme for solving the technical problems is as follows: a single-target medical image segmentation method based on attention in polar coordinates comprises the following steps:

s1, performing image data preprocessing operation on an acquired single-target medical image data sample set;

s2, carrying out multi-level feature extraction on the preprocessed sample set obtained in the step S1 by using a feature extraction network;

s3, applying the feature map extracted in the step S2 to a pole prediction network to obtain a prediction pole, converting the feature map in the step S2 to a polar coordinate by combining the prediction pole, and calculating area loss under the polar coordinate;

s4, segmenting the feature map converted in the S3 by using a polar coordinate attention segmentation network, and calculating the attention loss under the polar coordinate;

s5, training the whole network model through the process, inputting the preprocessed image to be segmented into a network, converting and outputting the preprocessed image to a Cartesian system, and obtaining a final segmentation result.

Preferably, the multi-scale feature extraction network in step S2 is a U-shaped feature extraction network, specifically:

extracting multi-scale features by using multi-layer convolution operation and pooling operation in a coding layer part of a feature extraction network, wherein the convolution operation of each layer uses 3 depth separable convolutions in parallel to obtain a feature expression query vector Q, a key vector K and a value vector V; using multi-head self-attention to apply to the three feature expressions to find global attention; the target information is obtained by performing targeted convolution on the RoI through variable convolution.

And merging the multi-scale features extracted by the coding layer by using jump connection and transposed convolution in a decoder part of the feature extraction network to obtain comprehensive expression of high-level and low-level features of the original image.

Preferably, in the step S3, the pole prediction network is composed of two hourglass network modules and a prediction module, downsampling is performed by using convolution with a stride of 2, and the prediction module starts from a modified residual block, where the first convolution layer is replaced by cross pooling, and the modified residual block is followed by one convolution layer for generating the pole prediction heat map.

Preferentially, the cross pooling in S3 is specifically: the method comprises the steps of receiving two inputs, carrying out maximum pooling on a first input in a row unit, carrying out maximum pooling on a second input in a column unit, and adding the two inputs to obtain a final output;

the Apc penalty function is:wherein s is _Electrode The size of the segmented label transformed by the prediction pole is represented, and w×h represents the resolution of the segmented label image.

Preferably, the polar coordinate calculation formula in the step S3 is:

angle and distance magnitide of each pixel in the image:

wherein atan2 is a 2-parameter arctangent function;

pole (c) of Cartesian image I (x, y) for a given resolution H W _x ,c _y ) The polar coordinate expression (ρ, φ) for each point is calculated using the following equation:

preferentially, the polar at) loss function at polar coordinates in step S4 is:

wherein w represents a preset strip width, h is the strip length of the label, and h' is the predicted strip length.

The beneficial effects of the invention are as follows: the invention creatively constructs the attention segmentation method under polar coordinates based on the characteristics of the segmentation task of the single-target medical image. The segmentation problem of the single-target medical image is seen from a brand new view, and the pixel classification problem which is complex and difficult to learn is converted into the prediction of a single pole and the calculation problem of the strip-shaped attention. This not only simplifies the segmentation problem of the single-target medical image, but also preserves the integrity and singleness of the single-connected region in principle from the implementation. Meanwhile, the introduction of attention strengthens the connection of all parts in the whole body on the premise of keeping the advantages, so that the whole body is separated from other parts, and the accurate prediction of a target area is realized. And the polar coordinate change introduced in the period can play a role of increasing the relevant information of the target segmentation area and weakening noise under the selection of a proper pole, so that the network segmentation accuracy is further enhanced.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a single layer encoding schematic diagram of a feature extraction network according to the present invention.

Fig. 3 is a schematic representation of a pole prediction network in accordance with the present invention.

FIG. 4 is a schematic diagram of a prediction module in a pole prediction network according to the present invention.

FIG. 5 is a schematic diagram of cross pooling of prediction modules in the present invention.

FIG. 6 is a diagram of an example of polar coordinate conversion of ISIC2017 in accordance with the present invention.

Fig. 7 is a schematic diagram of the polar attention implementation of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the drawings and the accompanying examples.

Referring to fig. 1, fig. 1 is a flowchart of the present invention, a single-target medical image segmentation method based on attention in polar coordinates of the present invention includes the following steps:

s1: image data preprocessing operations including resolution adjustment and normalization are adopted for the acquired single-target medical image data sample set. The method comprises the following steps:

s1-1: the present example selects as the experimental data set a challenge data set (ISIC 2017) of the international biomedical Imaging Seminar (ISBI) in 2017 sponsored by the international organization for collaborative skin imaging (ISIC).

S1-2: adjusting the resolution of the acquired single-target image data sample set to w×h, where W is the length and width of the image, taking the ISIC2017 data set as an example, where w=h=512;

s1-3: and (3) performing data preprocessing operation on the sample set adjusted in the step (S1). The method comprises the following steps:

s1-4: and (3) carrying out mean variance normalization operation on the original image of each sample, namely subtracting the mean value according to the channel and dividing the mean value by the variance. Specifically, for one dataset x= { X ₁ ,x ₂ ,...,x _n We calculate their mean σ and standard deviation μ, respectively:

then, we use the following formula for each data point x _i And (5) carrying out mean variance normalization:

s2: and (2) carrying out multi-level feature extraction on the preprocessed data sample set obtained in the step (S1) by using a feature extraction network, wherein the multi-level feature extraction is specifically as follows:

s2-1: referring to fig. 2, in the coding layer portion of the feature extraction network, multi-scale features are extracted using a multi-layer convolution operation and a pooling operation, wherein the convolution operation of each layer first uses 3 depth separable convolutions in parallel to obtain a feature query vector Q, a key vector K, and a value vector V; then, calculating global attention by using multi-head self-attention to be applied to the three feature expressions; and finally, performing targeted convolution on the RoI by using variable convolution to obtain target information. The process is specifically as follows:

s2-1-1: the depth separable convolution operation is performed on the features extracted from the previous layer using the following formula:

wherein y represents an output feature map, x represents an input feature map, w represents a convolution kernel, R represents a receptive field of the convolution kernel, and p ₀ Representing the position on the output profile, p _n Represents the position on the convolution kernel, K represents the channel index, and K represents the number of channels.

S2-1-2: for learning long-range semantic context information, the feature Q, K, V found in S2-1 is multi-headed self-Attention calculated using the following formula:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein head is _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

D in the above formula _k ＝d _v ＝d _model The/h represents the dimension of each feature divided by the number of heads in the multi-head attention, softmax represents the normalized exponential function, concat represents the number of attention heads connected together, W ^o For outputting weight, W _i ^Q ,W _i ^K ,W _i ^V Meaning that each attention header uses different weights to calculate the query, key and value vector.

S2-1-3: the convolution kernel of the variable convolution is performed on the long Cheng Yuyi information obtained in S2-1-2 by using the following formula to realize bias learning on the region of interest:

wherein y represents an output feature map, x represents an input feature map, w represents a convolution kernel, R represents a receptive field of the convolution kernel, and p ₀ Representing the position on the output profile, p _n Representing the position on the convolution kernel Δp _n Representing the learned offset.

S2-2: using a jump join and transpose convolution at the decoder portion of the feature extraction network: and fusing the multi-scale features extracted by the coding layer to obtain comprehensive expression of the high-level and low-level features of the original image. Where transposed convolution is used to implement a learnable upsampling process. The specific implementation process is as follows:

let the input feature map beThe output characteristic diagram is->Convolution kernel->Wherein Kl _w ＝2,Kl _h =2; step length (stride) is s _h =2 and s _w ＝2；H _in ×W _in ，H _out ×W _out Representing the size of the input and output feature maps, respectively, the operation of transpose convolution can be expressed as:

wherein i=0, 1,.. _out -1 and j=0, 1,.. _out -1. The size of the output feature map can be calculated using the following formula:

H _out ＝(H _in -1)·s _h +Kl _h

W _out ＝(W _in -1)·s _w +Kl _w

s3: the pole prediction network is used for acting on the final characteristic output of the step S2 to obtain a prediction pole, the middle characteristic diagram of the step S2 is converted into a polar coordinate system by combining the prediction pole, and Apc loss is calculated, wherein the method specifically comprises the following steps:

s3-1: referring to fig. 3, the pole prediction network consists of two modified hourglass network modules and one prediction module. In the hourglass network module, the improvement is that the present invention uses a convolution with a step of 2 for downsampling. For the whole pole prediction network, after the input image is applied to the hourglass network module for multi-level joint feature extraction, the invention only uses the output features of the last layer as the input of the prediction module to predict the pole.

S3-2: referring to fig. 4, the prediction module starts with a modified residual block, where the first convolutional layer is replaced with cross-pooling. The modified residual block is followed by a convolutional layer. For generating pole prediction heat maps. The cross pooling implementation principle is shown in fig. 4 and 5, and the cross pooling implementation principle accepts two input feature graphs X ₁

And X ₂ Wherein X is ₁ 、X ₂ The output characteristic diagram of the hourglass module is obtained by two parallel convolutions, and X ₁ 、X ₂ The sizes are w×h. The process of cross-pooling can be described as:

Y＝Y ₁ +Y ₂

wherein Y is ₁ Is to X ₁ Maximum pooling in units of rows, Y ₂ Is to X ₂ Maximum pooling in columns is performed, and Y isAnd adding the two to obtain a final output.

S3-3: after obtaining the predicted pole, the polar transformation is performed on the split label graph in combination with this pole, and for a specific transformation example, see fig. 6, apc loss is calculated using the following formula:

wherein s is _Electrode The size of the segmented label transformed by the prediction pole is represented, and w×h represents the resolution of the segmented label image.

S4: the transformed feature map is segmented using a polar attention segmentation network, wherein the polar attention segmentation principle is described in fig. 7: the feature map is converted from the form (cartesian system) shown in the first map in fig. 7 to the second map (i.e., feature map in polar coordinates) by step S3; at this time, the target segmentation area is concentrated at the lower part of the image, and the final target segmentation area can be obtained by precisely calculating the lengths of the strips shown in the third and fourth images and combining the lengths.

S4-1: the specific method for calculating the length of each strip comprises the following steps: and obtaining an attention relation matrix among the strips by using multi-head self-attention taking a single strip as a characteristic unit, and then using the relation matrix as input to apply a multi-layer perceptron to calculate and obtain the predicted length of each strip.

S4-2: after obtaining the predicted length of each strip, the polar att loss is calculated using the following formula:

S5: through the training of the whole network model in the process, the preprocessed image to be segmented is input into the network, converted and output back to the Cartesian system, and a final segmentation result is obtained.

Claims

1. The single-target medical image segmentation method based on the attention under the polar coordinates is characterized by comprising the following steps of:

s1: performing image data preprocessing operation on the acquired single-target medical image data sample set;

s2: performing multi-level feature extraction on the preprocessed sample set obtained in the step S1 by using a feature extraction network;

s3: applying the feature map extracted in the step S2 to a pole prediction network to obtain a prediction pole, converting the feature map in the step S2 to a polar coordinate by combining the prediction pole, and calculating the area loss under the polar coordinate;

s4: dividing the feature map converted in the step S3 by using a polar coordinate attention dividing network, and calculating the attention loss under the polar coordinate;

2. The method for segmenting a single-object medical image based on attention under polar coordinates according to claim 1, wherein the multi-scale feature extraction network in step S2 is a U-shaped feature extraction network, specifically:

3. The polar attention based single target medical image segmentation method according to claim 1, wherein the pole prediction network consists of two hourglass network modules and one prediction module downsampled using a convolution of step 2, the prediction module starting with a modified residual block, wherein the first convolution layer is replaced with cross pooling, the modified residual block being followed by one convolution layer for generating the pole prediction heat map.

4. A method of segmentation of a single-object medical image based on attention in polar coordinates according to claim 3, characterized in that the cross pooling is specifically: the method comprises the steps of receiving two inputs, carrying out maximum pooling on a first input in a row unit, carrying out maximum pooling on a second input in a column unit, and adding the two inputs to obtain a final output;

5. The method for segmenting a single-object medical image based on attention in polar coordinates according to, wherein the polar coordinate calculation formula in the step S3 is:

angle and distance magnitide of each pixel in the image:

wherein atan2 is a 2-parameter arctangent function;

pole (c) of Cartesian image I (x, y) for a given resolution H W _x ,c _y ) The following formula is usedThe polar coordinate expression (ρ, φ) for each point is calculated:

6. the method for segmenting a single-object medical image based on polar attention according to claim 1, wherein the polar attention loss function in step S4 is: