CN117745745B - CT image segmentation method based on context fusion perception - Google Patents

CT image segmentation method based on context fusion perception Download PDF

Info

Publication number
CN117745745B
CN117745745B CN202410180218.3A CN202410180218A CN117745745B CN 117745745 B CN117745745 B CN 117745745B CN 202410180218 A CN202410180218 A CN 202410180218A CN 117745745 B CN117745745 B CN 117745745B
Authority
CN
China
Prior art keywords
representing
output
stage
convolution
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410180218.3A
Other languages
Chinese (zh)
Other versions
CN117745745A (en
Inventor
刘敏
汪嘉正
申文婷
张哲�
王耀南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202410180218.3A priority Critical patent/CN117745745B/en
Publication of CN117745745A publication Critical patent/CN117745745A/en
Application granted granted Critical
Publication of CN117745745B publication Critical patent/CN117745745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a CT image segmentation method based on context fusion perception, which comprises the steps of constructing a CT image segmentation model comprising a backbone structure with an encoder and a decoder, a parallel cavity convolution module PDCM, a pyramid fusion module PFM and a position attention module PAM, and optimizing the model by using cross entropy and dice loss as mixed loss; encoding the input image by using an encoder, and outputting encoding results at different stages; the PFM module is utilized to respectively cascade the encoding results of different stages and perform context feature fusion through separable cavity convolution of different rates, and the output is jumped with a decoder of the same stage; the PDCM module is utilized to enhance and fuse the final output characteristic diagram of the encoder through six different branches, and the improved high-order characteristic map is sent to a decoder; the PAM module is utilized to locate and segment the object through multiple layers of position attention for each stage of characteristic diagram output by the decoder. The accuracy of target segmentation is improved.

Description

CT image segmentation method based on context fusion perception
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a CT image segmentation method based on context fusion perception.
Background
Early discovery and accurate diagnosis are key to improving cure rate and survival rate of a variety of diseases. Even in high income countries, the survival rate of liver cancer is still not ideal and little improvement has been achieved in recent decades. Meanwhile, since the liver is mostly blocked by the right rib, conventional physical examination cannot find an insignificant liver tumor, so early detection of liver cancer is often challenging.
The advent of Computed Tomography (CT) imaging technology has revolutionized the diagnosis of liver tumors. This scanning technique uses X-rays to create detailed body images that make the location inside the liver or adjacent to the tumor visually. However, due to the statistical uncertainty of the CT physical measurement data, various noises (such as quantum noises and electronic noises) are introduced into the CT image in the imaging process, so that the contrast of the CT image is low, and the boundaries of a lesion area are difficult to distinguish. Meanwhile, since most early tumors are small in volume and lesion features are not obvious, the human eye may not be able to accurately distinguish in the CT image. These limitations prevent the accuracy and efficiency of diagnosis and present difficulties for the physician to analyze the disease and to formulate a treatment regimen. Therefore, it is necessary to develop an accurate CT image segmentation method to solve the difficult problems of fuzzy boundary and tiny target segmentation, and help doctors to complete early disease diagnosis and clinical scheme formulation.
In recent years, many researchers have made multi-angle attempts to alleviate the effects of these problems. On one hand, aiming at the problem of fuzzy boundary caused by low contrast and noise of a target, CPFNet proposed by a scholars adds two pyramid modules in the encoder-decoder structure, so that the receptive field of a network can be effectively enlarged, the global information integration capability of the network is improved, and the influence of CT image background noise is relieved to a certain extent; the MCI-Net proposed by a scholars is added with a multi-scale context extraction module, and partial detail features are deeply captured by combining four cascade mixed expansion convolution branches, so that effective identification of partial low-contrast features in CT images is realized. However, the method lacks means for effectively fusing global and local information, and is difficult to realize effective segmentation of fuzzy boundaries only by unilateral information, so that the application of the method in clinical decision is limited. On the other hand, the introduction of the attention mechanism can focus the network on important areas in the image, which provides a thought for solving the problem of tiny object localization. Three attention modules are introduced into the CNN by a learner and are respectively used for the spatial position, the channel number and the scale of the feature map so as to realize accurate medical image segmentation, however, the research of the method is mainly oriented to a general medical image segmentation task, and no reasonable solution is provided for a small target. Therefore, in order to effectively solve the above-mentioned problems, a CT image segmentation method based on context fusion awareness is proposed.
Disclosure of Invention
Aiming at the technical problems, the invention provides a CT image segmentation method based on context fusion perception.
The technical scheme adopted for solving the technical problems is as follows:
A CT image segmentation method based on context fusion awareness, the method comprising the steps of:
S100: constructing a CT image segmentation model comprising a backbone structure with an encoder and a decoder, a parallel cavity convolution module PDCM, a pyramid fusion module PFM and a position attention module PAM, and utilizing cross entropy and dice loss as mixing loss to jointly optimize the model;
s200: acquiring an input image, encoding the input image by using an improved ResNet encoder, and outputting encoding results of different stages, wherein the encoding results output by the encoder in each stage have different scales;
S300: the PFM module is utilized to respectively cascade the encoding results of different stages of the encoder, context feature fusion is carried out through separable cavity convolution of different rates, and the output is connected with the decoder of the same stage in a jumping manner;
s400: the PDCM module is utilized to enhance and fuse the final output characteristic diagram of the encoder through six different branches, and the improved high-order characteristic map is sent to a decoder;
s500: the PAM module is utilized to locate and segment the target through multiple layers of position attention for each stage of characteristic diagram output by the decoder.
Preferably, the cross entropy and dice loss are used as mixing losses in S100 to jointly optimize the model, in particular:
(1)
(2)
(3)
Wherein, Representing cross entropy loss,/>Representing dice loss,/>Representing the mixing loss,/>Representing real tags,/>Representing the result of the prediction,/>Represents the/>True value of individual pixels,/>Represents the/>Predicted value of individual pixels,/>Representing the number of pixels of the sample.
Preferably, the improved ResNet encoder in S200 is specifically:
the residual error network ResNet pre-trained by the ImageNet is used as a backbone structure of the encoder, the last pooling layer and the full-connection layer of ResNet are removed, and primary characteristics extracted by a residual error module at each stage are output before the downsampling operation;
to achieve effective extraction of each stage of features, one is added after the primary features output by the residual modules of each stage And a ReLu nonlinear activation layer to obtain the output characteristics/>, of each stage,/>Wherein/>Representing different encoding stages.
Preferably, S300 includes:
The PFM module integrates output features of different scales at each stage in the backbone structure encoder by using a continuous convolution layer, and then the scales of the feature graphs at each stage are unified and then spliced by bilinear interpolation up-sampling operation, wherein the output features at each stage are subjected to context fusion with deeper features only; extracting deep features from different levels through separable cavity convolution with different rates, integrating output results through a series of convolution and downsampling, and jumping-connecting the final results with a decoder at the same stage; wherein the continuous convolution layer is formed by convolution kernel 、/>、/>The convolution and a batch normalization layer and a ReLu nonlinear activation layer are alternately formed.
Preferably, to fuse multiple layers of context information, the model uses a total of 4 PFM modules, each expressed mathematically as:
(4)
(5)
(6)
Wherein, Representing the different encoding phases of the encoder,/>Representing the output characteristics of each stage of the encoder,/>Representing the integrated output feature map,/>Represents the/>Feature map of each PFM module after preliminary integration and cascading,/>Represents the firstOutput after processing by PFM module,/>,/>Representing a cascading operation,/>Representing the up-sampling operation and,Representing upsampling multiple,/>Representative Rate is/>/>Separable hole convolution operation,/>Representing the convolution kernel as/>、/>And/>Convolution and batch normalization and ReLu non-linear activation.
Preferably, S400 includes:
after the final high-order output characteristics of the encoder are obtained, the characteristics are enhanced and fused through six different branches, wherein five branches comprise different numbers and rates of hole convolutions, and the last branch is a residual branch for preventing gradient from disappearing; at the end of each hole convolution branch, use The convolution, batch normalization and ReLu non-linearly activated sequential operations are corrected; after the five cavity convolution branches reform the high-order features, the high-order features are spliced in a cascading mode, channels are integrated through a series of convolution operations, and finally the high-order features and the residual branches are added element by element to obtain output features which are used as input of a decoder.
Preferably, PDCM branches and outputs can be expressed mathematically as:
(7)
(8)
(9)
(10)
(11)
(12)
(13)
Wherein, For the output feature map of the fifth stage of the encoder,/>Represents the/>Output of branches,/>,/>Representative Rate is/>/>Hole convolution operation,/>Representing a cascading operation,/>Representative use/>Continuous operation of convolution, batch normalization and ReLu nonlinear activation,/>Representing a point-by-point addition operation of the matrix.
Preferably, S500 includes:
S510: extracting an output feature map from each stage of a decoder, acquiring a first type feature and a second type feature in the feature map of each stage by using average pooling and maximum pooling operation, and fusing context information of different stages into a multi-layer mixed feature map in a point-by-point adding and cascading mode;
S520: the multi-layer mixed feature map is used for adaptively adjusting the importance weight of each position point through a position attention module, so that the key target is effectively perceived and positioned; and finally, integrating the restored two-dimensional feature map into the size of the output image to obtain a final segmentation result.
Preferably, S510 is specifically:
Extracting output feature maps of stages from a decoder ,/>Firstly, expanding each stage of output feature map to the same scale as the final stage of output of a decoder through the same continuous convolution and up-sampling operation as in a PFM module, obtaining a first type of feature and a second type of feature in each stage of feature map through average pooling and maximum pooling operation, fusing the two types of features obtained in each stage in a point-by-point addition mode, and combining the features extracted in different stages together through a cascading mode to obtain a multi-layer hybrid feature map;
S520 specifically comprises: the multi-layer mixed feature map respectively acquires global first type features and global second type features through global average pooling and maximum pooling operations of channel dimensions, and then reforms the two types of features into one-dimensional vectors And/>The method comprises the steps of adaptively adjusting importance weights of each position point in two types of features by using a multi-layer perceptron MLP (multi-level perceptron), recovering the importance weights into a two-dimensional feature map, fusing the position attention weights in the two types of feature maps in a point-by-point addition mode, fusing the obtained attention weights with a multi-layer mixed feature map in a point-by-point multiplication mode, and weighting the position information of a region of interest through multi-layer fusion to obtain a position weighted feature map/>Realizing effective perception and positioning of key targets, finally, up-sampling the position weighted feature map to the size of an output image, and integrating channels through a series of convolution operations to obtain a final output result/>
Preferably, the PAM module overall flow is expressed mathematically as:
(14)
(15)
(16)
(17)
(18)
Wherein, Representing a multi-layer mixed feature map obtained after feature cascading extracted at each stage,/>Representing an average pooling operation along the channel dimension,/>Representing maximum pooling operations along the channel dimension,/>Representing feature patterns extracted from stages of the decoder,/>Representing an operation of reforming a two-dimensional feature map into a one-dimensional feature vector,/>And/>Representing the reformed one-dimensional global first-class feature and global second-class feature vector respectively,/>Representing an operation of restoring a one-dimensional feature vector to a two-dimensional feature map,/>Representing a multi-layer perceptron,/>Representing a point-wise multiplication operation of a matrix,/>Representing a location weighted feature map obtained after passing through multiple layers of location attention,/>Representing the final output result.
According to the CT image segmentation method based on context fusion perception, the edge information of the target is accurately perceived through reconstruction jump connection, the context information is deeply fused by using cavity convolution with different rates and a multi-dimensional attention mechanism, accurate positioning and segmentation of the micro target are achieved, and accuracy of segmentation of the fuzzy boundary target and the micro target in the CT image is effectively improved.
Drawings
FIG. 1 is a flow chart of a CT image segmentation method based on context fusion awareness according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an overall network structure of a CT image segmentation model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a fourth stage PFM module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a PDCM module structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram showing a PAM module according to an embodiment of the present invention;
Fig. 6 is a schematic diagram illustrating an effect of CT image segmentation according to an embodiment of the present invention.
Detailed Description
In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.
In one embodiment, as shown in fig. 1 and 2, a method for segmenting a CT image based on context fusion awareness, the method comprising the steps of:
S100: a CT image segmentation model is constructed comprising a backbone structure with encoder and decoder, a parallel hole convolution module PDCM, a pyramid fusion module PFM and a position attention module PAM, the model is jointly optimized using cross entropy and dice loss as mixing loss.
In one embodiment, the model is jointly optimized in S100 using cross entropy and dice loss as mixing loss, specifically:
(1)
(2)
(3)
Wherein, Representing cross entropy loss,/>Representing dice loss,/>Representing the mixing loss,/>Representing real tags,/>Representing the result of the prediction,/>Represents the/>True value of individual pixels,/>Represents the/>Predicted value of individual pixels,/>Representing the number of pixels of the sample.
In particular, cross entropy loss is exploitedAnd dice loss/>Together constitute a mixing loss/>And combining the learning effect of the optimization model. When the segmentation content of the sample is unbalanced, dice loss tends to learn a clear large target, and cross entropy loss increases the learning weight of a fuzzy small target, so that the combination of the two can effectively improve the learning ability of a network to different targets and effectively improve the segmentation precision.
S200: an input image is obtained, the input image is encoded by using a modified ResNet encoder, the encoding results of different stages are output, and the encoding results output by the encoder in each stage are different in scale.
In one embodiment, the improved ResNet encoder in S200 is specifically:
the residual error network ResNet pre-trained by the ImageNet is used as a backbone structure of the encoder, the last pooling layer and the full-connection layer of ResNet are removed, and primary characteristics extracted by a residual error module at each stage are output before the downsampling operation;
to achieve effective extraction of each stage of features, one is added after the primary features output by the residual modules of each stage And a ReLu nonlinear activation layer to obtain the output characteristics/>, of each stage,/>Wherein/>Representing different encoding stages.
Specifically, the last pooling layer and the full-connection layer of ResNet are removed to better fuse the backbone structure with the subsequent module, and in addition, the nonlinear features are deeply activated on the premise of keeping the scale of the feature map unchanged, so that the effective extraction and fusion of the features in the subsequent module are facilitated.
S300: and cascading the coding results of different stages of the encoder by using the PFM module, performing context feature fusion by separable cavity convolution of different rates, and jumping-connecting the output with the decoder of the same stage.
In one embodiment, S300 includes:
The PFM module integrates output features of different scales at each stage in the backbone structure encoder by using a continuous convolution layer, and then the scales of the feature graphs at each stage are unified and then spliced by bilinear interpolation up-sampling operation, wherein the output features at each stage are subjected to context fusion with deeper features only; extracting deep features from different levels through separable cavity convolution with different rates, integrating output results through a series of convolution and downsampling, and jumping-connecting the final results with a decoder at the same stage; wherein the continuous convolution layer is formed by convolution kernel 、/>、/>The convolution and a batch normalization layer and a ReLu nonlinear activation layer are alternately formed.
Specifically, the continuous convolution layer is used for stabilizing the distribution of output features and integrating the output features of different stages into the same channel number; and integrating the feature maps with different scales into the same scale through bilinear interpolation up-sampling operation, and cascading the feature maps into a new feature map.
Further, different rates are designed according to the output of each stageSeparable hole convolution/>(Rate of separable hole convolution/>To ensure continuity of feature regions within the receptive field), extract the generated feature map from different levels to obtain context information, and then skip the generated results to the decoder. The number of separable hole convolutions is the same as the number of input levels, and the use of separable hole convolutions in the network can increase the receptive field and decrease the parameters compared to the use of normal convolutions. Therefore, the PFM can well improve the problem of insufficient acquisition of global information of the input feature map by the network.
In one embodiment, as shown in FIG. 3, to fuse multiple layers of context information, the model uses a total of 4 PFM modules, each expressed mathematically as:
(4)
(5)
(6)
Wherein, Representing the different encoding phases of the encoder,/>Representing the output characteristics of each stage of the encoder,/>Representing the integrated output feature map,/>Represents the/>Feature map of each PFM module after preliminary integration and cascading,/>Represents the firstOutput after processing by PFM module,/>,/>Representing a cascading operation,/>Representing the up-sampling operation and,Representing upsampling multiple,/>Representative Rate is/>/>Separable hole convolution operation,/>Representing the convolution kernel as/>、/>And/>Convolution and batch normalization and ReLu non-linear activation.
S400: and the PDCM module is utilized to enhance and fuse the characteristic diagram finally output by the encoder through six different branches, and the characteristic diagram of the higher order is transformed and then sent to the decoder.
In one embodiment, S400 includes:
after the final high-order output characteristics of the encoder are obtained, the characteristics are enhanced and fused through six different branches, wherein five branches comprise different numbers and rates of hole convolutions, and the last branch is a residual branch for preventing gradient from disappearing; at the end of each hole convolution branch, use The convolution, batch normalization and ReLu non-linearly activated sequential operations are corrected; after the five cavity convolution branches reform the high-order features, the high-order features are spliced in a cascading mode, channels are integrated through a series of convolution operations, and finally the high-order features and the residual branches are added element by element to obtain output features which are used as input of a decoder.
Specifically, as shown in FIG. 4, the PDCM branches of the holes include different numbers and different ratesHole convolution/>(Wherein the rate of hole convolution/>The receptive field is maximized on the premise of not exceeding the high-order characteristic size), and the receptive fields with different sizes can be provided by using cavity convolution for comprehensively extracting the high-order characteristic information under different scales. At the end of each branch, use/>The convolution, batch normalization and ReLu non-linearly activated sequential operations are corrected. After the five cavity convolution branches finely reform the high-order features, the features are spliced in a cascading mode, and then feature channels are integrated by using the same continuous convolution operation as that in the PFM module.
Further, in PDCM branches, the last branch is used as a residual branch, and the high-order input features and the integration features extracted from other branches are fused in an element-by-element addition mode so as to prevent gradient disappearance.
In one embodiment, PDCM branches and outputs can be expressed mathematically as:
(7)
(8)
(9)
(10)
(11)
(12)
(13)
Wherein, For the output feature map of the fifth stage of the encoder,/>Represents the/>Output of branches,/>,/>Representative Rate is/>/>Hole convolution operation,/>Representing a cascading operation,/>Representative use/>Continuous operation of convolution, batch normalization and ReLu nonlinear activation,/>Representing a point-by-point addition operation of the matrix.
Further, the decoder has the same number of decoding modules as the encoder, each stage of decoding modules is composed ofThe continuous operation of convolution, batch normalization and bilinear interpolation up-sampling is formed to reform the channel number and restore the characteristic diagram stage by stage. Before the output characteristic diagram of the current stage is sent to the next decoding stage, the output characteristic diagram of the current stage is fused with the output of the corresponding PFM module in a point-by-point addition mode, and then the fused output characteristic diagram is sent to a decoder of the next stage for the same operation, and finally the output characteristic diagram/>, required by the PAM module, of each stage of the decoder is obtained,/>
S500: the PAM module is utilized to locate and segment the target through multiple layers of position attention for each stage of characteristic diagram output by the decoder.
In one embodiment, S500 includes:
S510: extracting an output feature map from each stage of a decoder, acquiring a first type feature and a second type feature in the feature map of each stage by using average pooling and maximum pooling operation, and fusing context information of different stages into a multi-layer mixed feature map in a point-by-point adding and cascading mode;
S520: the multi-layer mixed feature map is used for adaptively adjusting the importance weight of each position point through a position attention module, so that the key target is effectively perceived and positioned; and finally, integrating the restored two-dimensional feature map into the size of the output image to obtain a final segmentation result.
Specifically, the average pooling operation is based on a feature map (e.g., channel number is) All the values of each point in the channel are calculated and averaged to obtain a pooling graph with the channel being 1, the features appearing in the feature graph obtained after the operation are the first type of features, and the maximum pooling operation is to calculate the maximum value of the channel to obtain the pooling graph with the channel being 1, namely the features with the strongest performance are the second type of features.
In one embodiment, as shown in fig. 5, S510 is specifically:
Extracting output feature maps of stages from a decoder ,/>Firstly, expanding each stage of output feature map to the same scale as the final stage of output of a decoder through the same continuous convolution and up-sampling operation as in a PFM module, obtaining a first type of feature and a second type of feature in each stage of feature map through average pooling and maximum pooling operation, fusing the two types of features obtained in each stage in a point-by-point addition mode, and combining the features extracted in different stages together through a cascading mode to obtain a multi-layer hybrid feature map;
S520 specifically comprises: the multi-layer mixed feature map respectively acquires global first type features and global second type features through global average pooling and maximum pooling operations of channel dimensions, and then reforms the two types of features into one-dimensional vectors And/>The method comprises the steps of adaptively adjusting importance weights of each position point in two types of features by using a multi-layer perceptron MLP (multi-level perceptron), recovering the importance weights into a two-dimensional feature map, fusing the position attention weights in the two types of feature maps in a point-by-point addition mode, fusing the obtained attention weights with a multi-layer mixed feature map in a point-by-point multiplication mode, and weighting the position information of a region of interest through multi-layer fusion to obtain a position weighted feature map/>Realizing effective perception and positioning of key targets, finally, up-sampling the position weighted feature map to the size of an output image, and integrating channels through a series of convolution operations to obtain a final output result/>
Specifically, the multi-layer hybrid feature map adaptively adjusts importance weights of each position point through a position attention module, so that effective perception and positioning of key targets are realized.
In one embodiment, the PAM module overall flow is expressed mathematically as:
(14)
(15)
(16)
(17)
(18)
Wherein, Representing a multi-layer mixed feature map obtained after feature cascading extracted at each stage,/>Representing an average pooling operation along the channel dimension,/>Representing maximum pooling operations along the channel dimension,/>Representing feature patterns extracted from stages of the decoder,/>Representing an operation of reforming a two-dimensional feature map into a one-dimensional feature vector,/>And/>Representing the reformed one-dimensional global first-class feature and global second-class feature vector respectively,/>Representing an operation of restoring a one-dimensional feature vector to a two-dimensional feature map,/>Representing a multi-layer perceptron,/>Representing a point-wise multiplication operation of a matrix,/>Representing a location weighted feature map obtained after passing through multiple layers of location attention,/>Representing the final output result.
In an embodiment of the present invention, the CT image segmentation effect on liver tumor regions is shown in fig. 6, wherein white lines outline the CT image segmentation effect of the present invention, and black lines represent real tumor regions marked by several doctors with years of clinical experience. The contrast between the two areas in the image can clearly show that the real label and the segmentation result of the invention have high overall overlapping ratio, and still have high segmentation accuracy for the tumor area with the fuzzy boundary and the tiny feature, thus showing the effectiveness of the invention in the segmentation task of the fuzzy boundary target and the tiny target of the CT image.
According to the CT image segmentation method based on context fusion perception, the edge information of the target is accurately perceived through reconstruction jump connection, the context information is deeply fused by using cavity convolution with different rates and a multi-dimensional attention mechanism, accurate positioning and segmentation of the micro target are achieved, the accuracy of segmentation of the fuzzy boundary target and the micro target in the CT image is effectively improved, and assistance is provided for diagnosis and segmentation of early diseases, formulation of a follow-up treatment scheme and other clinical applications of doctors.
The CT image segmentation method based on context fusion awareness provided by the invention is described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (7)

1. A CT image segmentation method based on context fusion awareness, the method comprising the steps of:
S100: constructing a CT image segmentation model comprising a backbone structure with an encoder and a decoder, a parallel cavity convolution module PDCM, a pyramid fusion module PFM and a position attention module PAM, and utilizing cross entropy and dice loss as mixing loss to jointly optimize the model;
s200: acquiring an input image, encoding the input image by using an improved ResNet encoder, and outputting encoding results of different stages, wherein the encoding results output by the encoder in each stage have different scales;
S300: the PFM module is utilized to respectively cascade the encoding results of different stages of the encoder, context feature fusion is carried out through separable cavity convolution of different rates, and the output is connected with the decoder of the same stage in a jumping manner; s300 includes:
The PFM module integrates output features of different scales at each stage in the backbone structure encoder by using a continuous convolution layer, and then the scales of the feature graphs at each stage are unified and then spliced by bilinear interpolation up-sampling operation, wherein the output features at each stage are subjected to context fusion with deeper features only; extracting deep features from different levels through separable cavity convolution with different rates, integrating output results through a series of convolution and downsampling, and jumping-connecting the final results with a decoder at the same stage; wherein the continuous convolution layer is formed by convolution kernel 、/>、/>The convolution and a batch normalization layer and a ReLu nonlinear activation layer are alternately formed;
s400: the PDCM module is utilized to enhance and fuse the final output characteristic diagram of the encoder through six different branches, and the improved high-order characteristic map is sent to a decoder; s400 includes:
after the final high-order output characteristics of the encoder are obtained, the characteristics are enhanced and fused through six different branches, wherein five branches comprise different numbers and rates of hole convolutions, and the last branch is a residual branch for preventing gradient from disappearing; at the end of each hole convolution branch, use The convolution, batch normalization and ReLu non-linearly activated sequential operations are corrected; after the five cavity convolution branches reform the high-order features, the high-order features are spliced in a cascading mode, channels are integrated through a series of convolution operations, and finally the high-order features and residual branches are added element by element to obtain output features which are used as input of a decoder;
s500: positioning and dividing a target through multi-layer position attention by utilizing a PAM module to each stage of feature images output by a decoder; s500 includes:
S510: extracting an output feature map from each stage of a decoder, acquiring a first type feature and a second type feature in the feature map of each stage by using average pooling and maximum pooling operation, and fusing context information of different stages into a multi-layer mixed feature map in a point-by-point adding and cascading mode;
S520: the multi-layer mixed feature map is used for adaptively adjusting the importance weight of each position point through a position attention module, so that the key target is effectively perceived and positioned; and finally, integrating the restored two-dimensional feature map into the size of the output image to obtain a final segmentation result.
2. The method according to claim 1, characterized in that the optimization model is jointly optimized in S100 using cross entropy and dice loss as mixing loss, in particular:
(1)
(2)
(3)
Wherein, Representing cross entropy loss,/>Representing dice loss,/>Representing the mixing loss,/>Representing real tags,/>Representing the result of the prediction,/>Represents the/>True value of individual pixels,/>Represents the/>Predicted value of individual pixels,/>Representing the number of pixels of the sample.
3. The method of claim 1, wherein the improved ResNet encoder in S200 is specifically:
the residual error network ResNet pre-trained by the ImageNet is used as a backbone structure of the encoder, the last pooling layer and the full-connection layer of ResNet are removed, and primary characteristics extracted by a residual error module at each stage are output before the downsampling operation;
to achieve effective extraction of each stage of features, one is added after the primary features output by the residual modules of each stage And a ReLu nonlinear activation layer to obtain the output characteristics/>, of each stage,/>Wherein/>Representing different encoding stages.
4. A method according to claim 3, characterized in that to fuse multiple layers of context information, the model uses a total of 4 PFM modules, each mathematically represented as:
(4)
(5)
(6)
Wherein, Representing the different encoding phases of the encoder,/>Representing the output characteristics of each stage of the encoder,/>Representing the integrated output feature map,/>Represents the/>Feature map of each PFM module after preliminary integration and cascading,/>Represents the/>Output after processing by PFM module,/>,/>Representing a cascading operation,/>Representing an upsampling operation,/>Representing upsampling multiple,/>Representative Rate is/>/>Separable hole convolution operation,/>Representing the convolution kernel as、/>And/>Convolution and batch normalization and ReLu non-linear activation.
5. The method of claim 4, wherein PDCM branches and outputs are represented mathematically as:
(7)
(8)
(9)
(10)
(11)
(12)
(13)
Wherein, For the output feature map of the fifth stage of the encoder,/>Represents the/>Output of branches,/>,/>Representative Rate is/>/>Hole convolution operation,/>Representing a cascading operation,/>Representative use/>Continuous operation of convolution, batch normalization and ReLu nonlinear activation,/>Representing a point-by-point addition operation of the matrix.
6. The method according to claim 5, wherein S510 is specifically:
Extracting output feature maps of stages from a decoder ,/>Firstly, expanding each stage of output feature map to the same scale as the final stage of output of a decoder through the same continuous convolution and up-sampling operation as in a PFM module, obtaining a first type of feature and a second type of feature in each stage of feature map through average pooling and maximum pooling operation, fusing the two types of features obtained in each stage in a point-by-point addition mode, and combining the features extracted in different stages together through a cascading mode to obtain a multi-layer hybrid feature map;
S520 specifically comprises: the multi-layer mixed feature map respectively acquires global first type features and global second type features through global average pooling and maximum pooling operations of channel dimensions, and then reforms the two types of features into one-dimensional vectors And/>The method comprises the steps of adaptively adjusting importance weights of each position point in two types of features by using a multi-layer perceptron MLP (multi-level perceptron), recovering the importance weights into a two-dimensional feature map, fusing the position attention weights in the two types of feature maps in a point-by-point addition mode, fusing the obtained attention weights with a multi-layer mixed feature map in a point-by-point multiplication mode, and weighting the position information of a region of interest through multi-layer fusion to obtain a position weighted feature map/>Realizing effective perception and positioning of key targets, finally, up-sampling the position weighted feature map to the size of an output image, and integrating channels through a series of convolution operations to obtain a final output result/>
7. The method of claim 6, wherein the PAM module overall flow is expressed mathematically as:
(14)
(15)
(16)
(17)
(18)
Wherein, Representing a multi-layer mixed feature map obtained after feature cascading extracted at each stage,/>Representing an average pooling operation along the channel dimension,/>Representing maximum pooling operations along the channel dimension,/>Representing feature patterns extracted from stages of the decoder,/>Representing an operation of reforming a two-dimensional feature map into a one-dimensional feature vector,/>And/>Representing the reformed one-dimensional global first-class feature and global second-class feature vector respectively,/>Representing an operation of restoring a one-dimensional feature vector to a two-dimensional feature map,/>Representing a multi-layer perceptron,/>Representing a point-wise multiplication operation of a matrix,/>Representing a location weighted feature map obtained after passing through multiple layers of location attention,/>Representing the final output result.
CN202410180218.3A 2024-02-18 2024-02-18 CT image segmentation method based on context fusion perception Active CN117745745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410180218.3A CN117745745B (en) 2024-02-18 2024-02-18 CT image segmentation method based on context fusion perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410180218.3A CN117745745B (en) 2024-02-18 2024-02-18 CT image segmentation method based on context fusion perception

Publications (2)

Publication Number Publication Date
CN117745745A CN117745745A (en) 2024-03-22
CN117745745B true CN117745745B (en) 2024-05-10

Family

ID=90279616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410180218.3A Active CN117745745B (en) 2024-02-18 2024-02-18 CT image segmentation method based on context fusion perception

Country Status (1)

Country Link
CN (1) CN117745745B (en)

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598714A (en) * 2019-08-19 2019-12-20 中国科学院深圳先进技术研究院 Cartilage image segmentation method and device, readable storage medium and terminal equipment
CN111444924A (en) * 2020-04-20 2020-07-24 中国科学院声学研究所南海研究站 Method and system for detecting plant diseases and insect pests and analyzing disaster grades
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
WO2021031066A1 (en) * 2019-08-19 2021-02-25 中国科学院深圳先进技术研究院 Cartilage image segmentation method and apparatus, readable storage medium, and terminal device
WO2021104056A1 (en) * 2019-11-27 2021-06-03 中国科学院深圳先进技术研究院 Automatic tumor segmentation system and method, and electronic device
CN113850825A (en) * 2021-09-27 2021-12-28 太原理工大学 Remote sensing image road segmentation method based on context information and multi-scale feature fusion
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN114037833A (en) * 2021-11-18 2022-02-11 桂林电子科技大学 Semantic segmentation method for Miao-nationality clothing image
CN114219968A (en) * 2021-11-29 2022-03-22 太原理工大学 MA-Xnet-based pavement crack segmentation method
CN114677514A (en) * 2022-04-19 2022-06-28 苑永起 Underwater image semantic segmentation model based on deep learning
CN114897094A (en) * 2022-06-01 2022-08-12 西南科技大学 Esophagus early cancer focus segmentation method based on attention double-branch feature fusion
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN115170582A (en) * 2022-06-13 2022-10-11 武汉科技大学 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception
CN115457021A (en) * 2022-09-30 2022-12-09 云南大学 Skin disease image segmentation method and system based on joint attention convolution neural network
CN115546570A (en) * 2022-08-25 2022-12-30 西安交通大学医学院第二附属医院 Blood vessel image segmentation method and system based on three-dimensional depth network
CN115713624A (en) * 2022-09-02 2023-02-24 郑州大学 Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
CN115797931A (en) * 2023-02-13 2023-03-14 山东锋士信息技术有限公司 Remote sensing image semantic segmentation method based on double-branch feature fusion
CN116580192A (en) * 2023-04-18 2023-08-11 湖北工业大学 RGB-D semantic segmentation method and system based on self-adaptive context awareness network
CN116681888A (en) * 2023-04-28 2023-09-01 中科超精(南京)科技有限公司 Intelligent image segmentation method and system
CN116912503A (en) * 2023-09-14 2023-10-20 湖南大学 Multi-mode MRI brain tumor semantic segmentation method based on hierarchical fusion strategy
CN117496144A (en) * 2023-11-02 2024-02-02 四川大学 Multi-attention codec network and system applied to skin-loss segmentation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270447B2 (en) * 2020-02-10 2022-03-08 Hong Kong Applied Science And Technology Institute Company Limited Method for image segmentation using CNN

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021031066A1 (en) * 2019-08-19 2021-02-25 中国科学院深圳先进技术研究院 Cartilage image segmentation method and apparatus, readable storage medium, and terminal device
CN110598714A (en) * 2019-08-19 2019-12-20 中国科学院深圳先进技术研究院 Cartilage image segmentation method and device, readable storage medium and terminal equipment
WO2021104056A1 (en) * 2019-11-27 2021-06-03 中国科学院深圳先进技术研究院 Automatic tumor segmentation system and method, and electronic device
CN111444924A (en) * 2020-04-20 2020-07-24 中国科学院声学研究所南海研究站 Method and system for detecting plant diseases and insect pests and analyzing disaster grades
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception
CN113850825A (en) * 2021-09-27 2021-12-28 太原理工大学 Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN114037833A (en) * 2021-11-18 2022-02-11 桂林电子科技大学 Semantic segmentation method for Miao-nationality clothing image
CN114219968A (en) * 2021-11-29 2022-03-22 太原理工大学 MA-Xnet-based pavement crack segmentation method
CN114677514A (en) * 2022-04-19 2022-06-28 苑永起 Underwater image semantic segmentation model based on deep learning
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN114897094A (en) * 2022-06-01 2022-08-12 西南科技大学 Esophagus early cancer focus segmentation method based on attention double-branch feature fusion
CN115170582A (en) * 2022-06-13 2022-10-11 武汉科技大学 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism
CN115546570A (en) * 2022-08-25 2022-12-30 西安交通大学医学院第二附属医院 Blood vessel image segmentation method and system based on three-dimensional depth network
CN115713624A (en) * 2022-09-02 2023-02-24 郑州大学 Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
CN115457021A (en) * 2022-09-30 2022-12-09 云南大学 Skin disease image segmentation method and system based on joint attention convolution neural network
CN115797931A (en) * 2023-02-13 2023-03-14 山东锋士信息技术有限公司 Remote sensing image semantic segmentation method based on double-branch feature fusion
CN116580192A (en) * 2023-04-18 2023-08-11 湖北工业大学 RGB-D semantic segmentation method and system based on self-adaptive context awareness network
CN116681888A (en) * 2023-04-28 2023-09-01 中科超精(南京)科技有限公司 Intelligent image segmentation method and system
CN116912503A (en) * 2023-09-14 2023-10-20 湖南大学 Multi-mode MRI brain tumor semantic segmentation method based on hierarchical fusion strategy
CN117496144A (en) * 2023-11-02 2024-02-02 四川大学 Multi-attention codec network and system applied to skin-loss segmentation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Branch Aggregation Attention Network for Robotic Surgical Instrument Segmentation;Wenting Shen 等;《IEEE Transactions on Medical Imaging》;20230621;第42卷(第11期);3408 - 3419 *
LSKANet: Long Strip Kernel Attention Network for Robotic Surgical Scene Segmentation;Min Liu 等;《IEEE Transactions on Medical Imaging 》;20231128;1-15 *
卷积神经网络图像语义分割技术;田启川;孟颖;;小型微型计算机系统;20200529(第06期);184-195 *
基于特征融合的实时语义分割算法;蔡雨;黄学功;张志安;朱新年;马祥;;激光与光电子学进展;20201231(第02期);137-144 *
改进的卷积神经网络在肺部图像上的分割应用;钱宝鑫;肖志勇;宋威;;计算机科学与探索;20201231(第08期);102-111 *

Also Published As

Publication number Publication date
CN117745745A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Wang et al. Hybrid dilation and attention residual U-Net for medical image segmentation
CN111951288B (en) Skin cancer lesion segmentation method based on deep learning
CN112102321A (en) Focal image segmentation method and system based on deep convolutional neural network
CN113506310B (en) Medical image processing method and device, electronic equipment and storage medium
CN113436173B (en) Abdominal multi-organ segmentation modeling and segmentation method and system based on edge perception
Ding et al. FTransCNN: Fusing Transformer and a CNN based on fuzzy logic for uncertain medical image segmentation
CN110648331B (en) Detection method for medical image segmentation, medical image segmentation method and device
CN112288041B (en) Feature fusion method of multi-mode deep neural network
Yamanakkanavar et al. MF2-Net: A multipath feature fusion network for medical image segmentation
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN116309651B (en) Endoscopic image segmentation method based on single-image deep learning
Shan et al. SCA-Net: A spatial and channel attention network for medical image segmentation
CN117078930A (en) Medical image segmentation method based on boundary sensing and attention mechanism
CN114399510B (en) Skin focus segmentation and classification method and system combining image and clinical metadata
CN116051589A (en) Method and device for segmenting lung parenchyma and pulmonary blood vessels in CT image
CN117392153B (en) Pancreas segmentation method based on local compensation and multi-scale adaptive deformation
Dai et al. CAN3D: Fast 3D medical image segmentation via compact context aggregation
Ghaleb Al-Mekhlafi et al. Hybrid techniques for diagnosing endoscopy images for early detection of gastrointestinal disease based on fusion features
Ma et al. Segmenting lung lesions of COVID-19 from CT images via pyramid pooling improved Unet
CN116935044B (en) Endoscopic polyp segmentation method with multi-scale guidance and multi-level supervision
Li et al. MFA-Net: Multiple Feature Association Network for medical image segmentation
CN117745745B (en) CT image segmentation method based on context fusion perception
Zhao et al. Multi-to-binary network (MTBNet) for automated multi-organ segmentation on multi-sequence abdominal MRI images
CN116542988A (en) Nodule segmentation method, nodule segmentation device, electronic equipment and storage medium
CN116309679A (en) MLP-like medical image segmentation method suitable for multiple modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant