CN116542924A - Prostate focus area detection method, device and storage medium - Google Patents

Prostate focus area detection method, device and storage medium Download PDF

Info

Publication number
CN116542924A
CN116542924A CN202310486436.5A CN202310486436A CN116542924A CN 116542924 A CN116542924 A CN 116542924A CN 202310486436 A CN202310486436 A CN 202310486436A CN 116542924 A CN116542924 A CN 116542924A
Authority
CN
China
Prior art keywords
prostate
feature
attention
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310486436.5A
Other languages
Chinese (zh)
Inventor
朴永日
李智玮
张淼
吴岚虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310486436.5A priority Critical patent/CN116542924A/en
Publication of CN116542924A publication Critical patent/CN116542924A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30081Prostate
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device and a storage medium for detecting a prostate focus area, which relate to the field of computer vision, and accurate automatic prostate focus area segmentation is an important requirement for computer-aided diagnosis and treatment of prostate diseases. However, the lack of a clear boundary and high contrast between the prostate and surrounding tissue makes it difficult to accurately extract the prostate from the background. Due to the information scarcity of medical images, multi-dimensional information extraction must be performed using a variety of methods, but current methods do not adequately learn and mine the hidden information. In order to cope with the above challenges, the present invention proposes a new method, device and storage medium for detecting prostate focus area, which are suitable for the field of computer vision, and in order to improve the quality of information extracted at different scales, different convolution combinations and various attention mechanisms are used in the proposed network. So that the imaging device has excellent performance under different imaging modes.

Description

Prostate focus area detection method, device and storage medium
Technical Field
The invention relates to the field of computer vision, in particular to a method and a device for detecting a prostate focus area based on a dynamic multi-scale perception self-adaptive integration network and a storage medium.
Background
Prostate segmentation refers to the detection of a prostate region in an input MR image of the prostate, and if a prostate lesion is segmented, the lesion is segmented from the original image. Since medical images are very different from conventional natural images, the performance is inevitably degraded by directly applying conventional segmentation algorithms to medical image segmentation. This is especially true for the segmentation of prostate lesion areas. Accurate segmentation of the prostate and focal areas from Magnetic Resonance (MR) images is a critical diagnostic and treatment plan for prostate diseases, especially prostate cancer, prostatitis, prostatic hyperplasia, etc. in general diagnosis of diseases like these common prostate diseases, medical images are usually segmented manually by several specialists, which is a time consuming process. With the development of deep learning, convolutional neural networks have advanced to a great extent in the field of prostate and lesion segmentation, and more branch studies on prostate and lesion region segmentation of the prostate have been generated. These studies have played a very important role, both in theoretical studies on the prostate and its diseases and in first-line medical practice.
Prostate and prostate lesion segmentation can be divided into two major categories, depending on the form of input: 2D prostate and lesion area detection thereof, and 3D prostate and lesion area detection thereof. Wherein, the 2D prostate and the focus area detect the input single MR image, and the channel number is 1 gray scale image. The 3D prostate and lesion area detection inputs are then consecutive MR images, because for one patient the imaging of the prostate and lesion area is consecutive several images, i.e. different slices in the axial direction, which, due to the fact that the slices of the same patient can be correlated, also results in a 3D prostate and lesion area detection. With the progress of technology and hardware, the detection of the prostate and the focus area thereof, which are input as video, has recently been raised, that is, more input information is utilized to realize more accurate segmentation results.
However, the current state-of-the-art MR image automatic segmentation methods face some challenges. The lack of a clear boundary and high contrast between the prostate and surrounding tissue makes it difficult to accurately extract the prostate from the background. Furthermore, the complexity of the background texture and the large variations in the size, shape and intensity distribution of the prostate itself make the segmentation more complex. In addition, the characteristics of the focal region of the prostate are visually more indistinguishable than the prostate region, and a professional physician would also need to view data in multiple modalities to determine where the focal region is located. Meanwhile, due to the information scarcity of medical images, multi-dimensional information extraction must be performed by adopting a plurality of methods, so that the abundant information extracted by a network in a plurality of modes makes up for the defect of the prior information of the images, but the prior method cannot fully learn and mine the hidden information in the images.
Disclosure of Invention
Aiming at the problems that the existing prostate focus area detection method still adopts a conventional fixed parameter layer to infer an input prostate image and is difficult to adapt to the changes of the size of the prostate, the boundary blurring and the like in the input image, the invention provides a prostate focus area detection method based on a dynamic multi-scale perception self-adaptive integration network, which utilizes the input prostate MR image to detect focus areas and performs optimization updating through dynamic local pooling and a global high-efficiency attention network, thereby realizing high-quality focus area detection under the scene of the given prostate image.
For this purpose, the invention provides the following technical scheme:
a prostate focus area detection method based on a dynamic multi-scale perception self-adaptive integration network comprises the following steps:
A. acquiring a prostate input image according to a data set of the prostate and obtaining tensors;
B. inputting the tensor into a feature encoder, and obtaining multi-scale coding features based on each image through the feature encoder;
C. for coding features, richer feature representations are obtained through corresponding feature enhancement layers
D. Performing feature decoding on the richer feature representation through a decoder to obtain a final prostate segmentation prediction result, wherein the feature decoding comprises the following steps:
d1, establishing a multi-level feature pyramid by using the characteristics of level complementation through a multi-level integration module with convolution, and adaptively encoding multi-scale information of an image into a current pyramid feature vector to obtain feature information which contains global and local different levels after integration;
the mechanisms of hierarchical complementation include: the features extracted by the network comprise local features and global features. The local features are extracted from a shallow convolutional network, and mainly comprise information about details, textures and the like of the image, and the information does not distinguish whether the information belongs to the foreground or the background; the global features are mainly extracted through a deep convolution module and an attention mechanism, and mainly comprise parts related to high-level semantic information, such as positions, differences of foreground and background and the like. Because the focus of the two features is local and global respectively, and the position and detail information are important for the prostate focus segmentation, the complementary characteristics of global and local features among different levels can be utilized during integration to obtain a better prediction result.
The integration module comprises: the integrated feature map of two continuous layers is obtained through convolution and fusion operation; and then carrying out channel transformation and normalization operation on the obtained integrated feature map to obtain the output of a certain layer of integrated module. The convolution can use a plurality of convolution kernels with different sizes to perform fusion of neighborhood information or transformation of channel number so as to perform the next fusion better.
And D2, dynamically fusing the feature layers among different levels in a progressive manner in a plurality of stages, so that more accurate and richer fusion feature representation is obtained. The fusion weight is automatically learned by convolution operation. And obtaining a final prostate segmentation prediction result graph by carrying out self-adaptive dynamic weighted fusion on the characteristics among different levels.
Further, step a includes:
dividing a training set and a test set with a fixed number according to the prostate dataset, wherein the training set and the test set have two subsets, namely an input image and a true value;
firstly, data enhancement is carried out on data in a prostate training set, including but not limited to, the operation of resizing an input prostate image to H multiplied by W, and the resized prostate image can be set in use so as to be best matched with a network model; secondly, random overturning and random cutting with random probability are used; and carrying out format conversion on the enhanced image to convert the enhanced image into tensors which can be processed by a network, thereby obtaining the tensors with the size of the batch size. It is noted that the above operations are also performed on the true values in the prostate training set.
The data in the prostate test set is different from the above operation. Firstly, the input image is subjected to size adjustment, then the image is directly subjected to tensor processing, and the obtained tensor with the size of the batch size is directly sent to a network for testing.
Further, the batchsize takes 8; h X W is 224X 224.
Further, the feature encoder is a ResNet architecture, and discards the last two layers to preserve the spatial structure, and then adds a global-local complementary module after the output of each layer to extract multi-scale context information; and features of all layers except the first layer are stored in the feature pyramid to facilitate operation of subsequent modules. That is, the feature encoder generates 1 feature pyramid for each image, which includes 4 feature maps with different spatial resolutions and channel numbers.
Further, the ResNet architecture is a ResNet-50 architecture, wherein the number of input channels of the first convolution module is modified to 1 to adapt the number of channels of the input image. The global-local complementary module is a module comprising two branches, namely a dynamic local pooling branch and a global high-efficiency attention branch.
Dynamic local pooling involves dynamically assigning to each layer of the network a pool layer combination of appropriate size to better extract local information based on the position of that layer throughout the network. That is, a large number of combinations of convolution kernel sizes are used at the lower layer of the network to fit a large-sized lower layer feature map; at the higher level of the network, a small number of combinations of convolution kernel sizes are used to fit the small-sized higher level feature map while preventing information loss.
The global high-efficiency attention comprises the operation of directly comparing the similarity of the input feature graphs, so that a similarity weight graph is obtained, and if the dimension of the layer of feature graphs is higher, the dimension can be reduced first to save calculation resources. And then, calculating which areas are focused by the network under the global scope by utilizing the combination of the weight graph and the input image.
Further, the dynamic local pooling convolution kernels are in turn 1,3,5,7,9. Where a size greater than 3 is used, depth separable convolution replaces normal convolution to reduce the number of parameters.
Further, in step D1, if it is for neighborhood fusion, the dynamic kernel K t The sizes of the two are all 3 multiplied by 3; and if the channel number integration is used, the size is 1×1.
Further, step C includes:
in the corresponding feature fusion layer, features with the scale corresponding to the feature encoder are respectively adopted as input;
for the features of the scale, a double-flow attention mechanism and a residual error-attention mechanism are respectively utilized to further optimize the feature expression after passing through a feature encoder;
for the feature expressions with the same spatial resolution after the different attention transforms are respectively used, the summation of the pixel levels is adopted to obtain the more abundant feature expressions after fusion.
Further, for each scale of feature, different attention mechanisms are utilized to make the network focus on different information, so as to promote the richness approval certainty of the feature representation, including:
for dual stream attention, attention is used to emphasize the characteristics of the different regions. The foreground region which is important to be focused by the network under the conventional condition is calculated by using the spatial attention and the channel attention, and then the weighting graph of the foreground attention is normalized and inverted to focus on the background, so that the foreground information hidden in the background feature is better found, and then the fused feature representation is obtained by using the pixel-level addition and convolution operation.
For the residual-attention mechanism, a combination of the residual module and self-attention and mutual attention is utilized. I.e. a multi-path residual convolution operation is performed on the input feature representation. And the information flow between the branches is realized by combining the self-attention weight and the mutual-attention weight, so that richer characteristic expression is obtained.
Further, the number of multi-branches in the residual-attention mechanism is set to 4, and the weights between the self-attention weight map and the mutual-attention weight map are both 1.
The technical scheme provided by the invention has the following beneficial effects:
the invention provides a prostate focus area detection method based on a dynamic multi-scale perception self-adaptive integration network, which considers the coherence among multi-scale information in an input image. Firstly, obtaining multi-scale coding features based on each image through a feature encoder, in the feature encoder, firstly, utilizing dynamic pooling convolution to focus on local information of the images, matching with a global efficient attention mechanism to focus on global information of the images, and integrating the two by pixel addition and convolution operation so as to extract effective information of the input images under different scales to the greatest extent. Secondly, in order to avoid negative influence of noise occurring in the information extraction process on the prediction result, information enhancement operation is performed by using various attention combination modes for the feature map at each scale. On one hand, a double-flow attention mechanism is adopted to enable the network to pay attention to foreground information more and suppress noise; on the other hand, the residual attention is utilized to make the network strengthen the connection among the branches while extracting the integrated information, and the complementary characteristics among the branches are utilized to perform more accurate prediction. Experimental results show that the prostate focus area detection method based on the dynamic multi-scale perception self-adaptive integration network has an accurate prediction effect on prostate focus area segmentation and prostate segmentation.
Based on the reasons, the invention can be widely popularized in the field of the prostate and focus areas.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic representation of an input prostate MR in an embodiment of the present invention;
FIG. 2 is a flow chart of a method for detecting a prostate lesion area based on a dynamic multiscale-aware adaptive integration network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a dynamic adaptive pooling module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a dual stream attention module in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram of the structure of an internal attention module of the residual-attention module in an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 2, a flowchart of a method for detecting a prostate lesion area based on a dynamic multiscale-aware adaptive integration network according to an embodiment of the present invention is shown, and the method includes the following steps:
A. acquiring a prostate input image according to a data set of the prostate and obtaining tensors;
in a specific implementation, the step a specifically includes:
a1, acquiring a prostate image:
an input prostate MR diagram is shown in fig. 1, where the prostate dataset is divided into a training set and a test set according to a certain proportion, and there are two sub-datasets in the training set and the test set, namely, the input image and the true value.
A2, for inputting the prostate image, obtaining tensor T with the channel number of the size of the batch size
Data enhancement is carried out on an input image and a corresponding true value in a prostate training set, firstly, a random clipping strategy with a scale s and a proportion r is adopted for an input MR original image (namely a single-channel gray image) and a GT image, the size is adjusted to H multiplied by W (the resolution of the image adopted by the method is 224 multiplied by 224), and then random inversion with random probability is used; the enhanced gray image is then converted into a network-processable tensor, and the number of channels of the output tensor is adjusted to the size of the batch size by a data loader, so that the network can converge more rapidly, and the value of the batch size is set to 8.
For the input prostate image and the corresponding true value in the prostate test set, the operation of adjusting the input prostate image to H×W (the image resolution adopted by the method is 224×224) is adopted, then the adjusted prostate gray image is converted into a tensor which can be processed by a network, and the data loader is utilized to adjust the channel number of the output tensor to the size of the batch size, so that the network can converge more quickly, and the value of the batch size is set to be 1.
B. Inputting tensors into a feature encoder, and obtaining multi-scale coding features based on each image through the feature encoder
In a specific implementation, the step B specifically includes:
b1, tensor I obtained t Input feature encoder:
the feature encoder employed is a ResNet-50 architecture in which the number of input channels of the first convolution module is modified to 1 to fit the number of channels of the input image, while the last full connection layer is removed to fit the subsequent module.
B2, obtaining multi-scale coding features
The feature encoder will generate 5 multi-scale feature maps with different spatial resolution and channel number for each image, i.eThe resolution and the channel number (W X H X C) are respectively Since the first layer is too noisy and the useful information is less, in subsequent information enhancement, only layer 1-5 features are operated on.
The step B2 specifically includes:
each layer comprises a residual convolution module and a global-local complementary module of a double branch, and the two modules are combined in series.
The residual convolution module is the convolution module added with residual connection. The global-local complementary module of the double branches is a module containing one path of dynamic self-adaptive local pooling and one path of global high-efficiency attention. As shown in fig. 3, the dynamic adaptive local pooling adopts convolution kernels with different sizes to carry out convolution, wherein for the low-layer characteristics, the more the number of convolutions contained in the dynamic adaptive pooling module is, the larger the range involved by the convolution kernels is, so as to make up for the deficiency of information in the low-layer; for the high-level features, the information is sufficient, and more integration and conversion are needed, so that the size of the convolution kernel is smaller and the number of convolution kernels is smaller. The entire flow can be expressed as:
layers(i)∈{RL(j)|j=0,1}∪{OL(k)|1≤k≤5-i,k∈N + } (1)
wherein layers (i) represent feature maps of the i-th layer. RL is a part that is necessary for all feature layers comprising the dynamic adaptive local pooling module, and that comprises a path of global average pooling, up-sampling, and a path of convolution operations with a convolution kernel size of 1. The OL part represents that the composition of the OL part is different for different feature layers, and the convolution kernel size and number of the composition are different (i.e. the value range of k). And the size of the convolution kernel is equal to 2k-1.
The global high-efficiency attention module utilizes the concept of similarity, combines a self-attention mechanism, enhances the capability of the network to pay attention to the global area, and enables the network to learn more information from the global view better, thereby making more accurate predictions. And the residual connection is used to prevent information loss during network operation. The entire flow can be expressed as:
where x represents the input feature map and normal is the normalization operation.
At the end of one layer, the feature maps of the two branches are integrated by pixel-level addition and convolution operations.
C. For coding features, richer feature representations are obtained through corresponding feature enhancement layers
In a specific implementation, the step C specifically includes:
c1, feature of each scaleFed to a dual stream attention module to enhance foreground and suppress background noise:
the conventional attention can only focus on the foreground region, but ignore the foreground information hidden in the background region, so in the dual-stream attention module in the embodiment of the invention, a spatial attention mechanism and a channel attention mechanism are combined, so that the network can focus on the foreground region and also focus on the foreground information hidden in the background region at the same time, as shown in fig. 4. At the end of the dual-flow attention module, the features of the foreground branch and the background branch are integrated in a convolution form, and the information in the features is learned so as to facilitate more accurate segmentation.
C2, feature of each scaleThe information is further integrated in the form of multiple branches and the correlation among the multiple branches is enhanced in the form of residual errors by using an attention mechanism so as to jointly highlight the foreground object.
As shown in fig. 5, there are a total of 4 branches in the residual-attention module, and there are 2 residual convolution modules in each branch. Between each leg, an attention module is used to strengthen the association. The attention module is divided into two parts, and the theme of one part is self-attention and is used for generating a weight weighting matrix related to the self; and the other part is mutual attention, and the similarity of the two branches is utilized to generate a weight weighting matrix when the two branches exchange information. The output of each branch is obtained by the weighted summation of the characteristic diagram of the adjacent branch and the characteristic diagram of the own branch through the calculated matrix. And finally, integrating the results of the 4 branches to obtain the output of the residual error-attention module.
D. Representation of richer features by decoderPerforming feature decoding to obtain final segmentation prediction result of the prostate focus region,
the decoder comprises an integration module and a prediction module, and receives a feature pyramid consisting of 4 different-scale feature graphs from the previous module. The integration module of the decoder starts from the feature map with the smallest size and highest number of layers (i.e. 7 x 2048). The final integrated feature map is obtained by stepwise integration of the feature maps of two consecutive dimensions. The integration operation is mainly divided into two steps of feature map stacking and feature transformation in the channel direction, and the final integrated feature map is sent to a prediction module, and the prediction module predicts the integrated feature map according to the classification number of task requirements.
The method comprises the following specific steps:
and D1, establishing a multi-level feature pyramid by using the characteristics of level complementation through a multi-level integration module with convolution, and adaptively encoding multi-scale information of an image into a current pyramid feature vector to obtain feature information which contains global and local different levels after integration. The mechanisms of hierarchical complementation include: the features extracted by the network comprise local features and global features. The local features are extracted from a shallow convolutional network, and mainly comprise information about details, textures and the like of the image, and the information does not distinguish whether the information belongs to the foreground or the background; the global features are mainly extracted through a deep convolution module and an attention mechanism, and mainly comprise parts related to high-level semantic information, such as positions, differences of foreground and background and the like. Because the focus of the two features is local and global respectively, and the position and detail information are important for the prostate focus segmentation, the complementary characteristics of global and local features among different levels can be utilized during integration to obtain a better prediction result.
The integration module comprises: the integrated feature map of two continuous layers is obtained through convolution and fusion operation; and then carrying out channel transformation and normalization operation on the obtained integrated feature map to obtain the output of a certain layer of integrated module. The convolution can use a plurality of convolution kernels with different sizes to perform fusion of neighborhood information or transformation of channel number so as to perform the next fusion better.
And D2, dynamically fusing the feature layers among different levels in a progressive manner in a plurality of stages, so that more accurate and richer fusion feature representation is obtained. The fusion weight is automatically learned by convolution operation. And obtaining a final prostate segmentation prediction result graph by carrying out self-adaptive dynamic weighted fusion on the characteristics among different levels. E. Training and optimization of dynamic context-aware filter networks:
the whole method can be divided into two stages of training and reasoning, and tensor of a training set is used as input during training to obtain trained network parameters; and testing by using the parameters stored in the training stage in the reasoning stage to obtain a final significance prediction result.
The embodiment of the invention is realized under a Pytorch framework, wherein an ADAMW optimizer is used in a training stage, and the learning rate is 1e -3 Weight decay factor of 5e -2 And batch size 8. During training, the spatial resolution of the image is 224×224, but the model may be applied to any resolution in a full convolution at the time of testing.
According to the prostate focus area detection method based on the dynamic multi-scale perception self-adaptive integration network, which is provided by the embodiment of the invention, a dynamic local pooling matching global attention mechanism is adopted, and the context information input into the prostate MR image is programmed into the current feature matrix, so that the feature vector containing global information and local detail information is obtained, and the method is suitable for the scale change of a target. Secondly, in order to avoid misleading the final significance result, the invention adopts a plurality of attention complementary perception fusion modes, and the feature map generated under each scale uses a plurality of attention to enhance the foreground and inhibit the background noise. Experimental results show that the prostate focus area detection method based on the dynamic multi-scale perception self-adaptive integration network can obtain accurate prediction results for a plurality of prostate size changes and scenes with low input image quality.
Corresponding to the method for detecting a prostate focus area in the above embodiment, the embodiment of the present invention further provides a device for detecting a prostate focus area based on a dynamic multiscale sensing adaptive integration network, including:
a tensor unit for acquiring a prostate input image from a dataset of prostate cancer and obtaining a tensor;
the coding unit is used for inputting the tensor obtained by the tensor unit into the feature coder, utilizing the local information of the convolution attention image of dynamic pooling by the feature coder, combining the global information of the attention image of the global efficient attention mechanism, and integrating the local information and the global information through pixel addition and convolution operation to obtain the multi-scale coding feature based on each image;
the enhancement unit is used for obtaining characteristic representation through the corresponding characteristic enhancement layer aiming at the coding characteristics obtained by the coding unit;
and the prediction unit is used for performing feature decoding on the feature representation obtained by the enhancement unit through a decoder to obtain a final prostate focus region segmentation prediction result.
Since the prostate lesion field detection device according to the embodiment of the present invention corresponds to the prostate lesion field detection method according to the above embodiment, the description is relatively simple, and the description of the device is similar to that described in the above embodiment, and will not be described in detail.
The embodiment of the invention also provides a computer readable storage medium, which stores computer instructions for causing a computer to execute the method for detecting the prostate focus area based on the dynamic multiscale sensing adaptive integration network.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The prostate focus area detection method based on the dynamic multiscale sensing self-adaptive integration network is characterized by comprising the following steps:
acquiring a prostate input image according to a data set of the prostate cancer and obtaining tensors;
inputting the tensor into a feature encoder, utilizing the local information of the dynamic pooling convolution attention image through the feature encoder, combining the global information of the attention image of a global efficient attention mechanism, and integrating the local information and the global information through pixel addition and convolution operation to obtain a multi-scale coding feature based on each image;
aiming at the coding features, obtaining feature representation through corresponding feature enhancement layers;
and performing feature decoding on the feature representation through a decoder to obtain a final prostate focus region segmentation prediction result.
2. The method for detecting a prostate lesion area based on a dynamic multiscale perceptual adaptive integration network according to claim 1, wherein acquiring a prostate input image and obtaining a tensor from a dataset of prostate cancer comprises:
dividing a set of data according to the prostate into a fixed number of training and test sets, wherein both the training and test sets have two subsets, namely an input image and a true value;
data enhancement is carried out on the data in the prostate training set;
converting the format of the enhanced image into tensors which can be processed by a network to obtain tensors with the size of the batch size;
and (3) carrying out size adjustment on the input image on the data in the prostate test set, and directly carrying out tensor processing on the image to obtain the tensor of the batch size.
3. The method for detecting the focal region of the prostate based on the dynamic multi-scale perception adaptive integration network according to claim 1, wherein the method comprises the following steps: the feature encoder is of a ResNet architecture, discards the last two layers to reserve a space structure, and adds a global-local complementary module after the output of each layer to extract multi-scale context information; and features of all layers except the first layer will be stored in the feature pyramid.
4. A method for detecting video saliency based on a dynamic context-aware filter network according to claim 3, wherein: the global-local complementary module comprises a dynamic local pooling branch and a global high-efficiency attention branch;
the dynamic local pooling comprises dynamically distributing pooling layer combinations with proper sizes to each layer of a network according to the position of the layer in the whole network so as to better extract local information;
the global high-efficiency attention comprises the operation of directly comparing the similarity of the input feature images, so as to obtain a similarity weight image; by combining the weight map with the input image, a region that needs to be focused on in the global range is calculated.
5. The method for detecting the focal region of the prostate based on the dynamic multi-scale perception adaptive integration network according to claim 1, wherein the method comprises the following steps: performing feature decoding on the feature representation through a decoder to obtain a final prostate focus region segmentation prediction result, wherein the feature decoding comprises the following steps:
establishing a multi-level feature pyramid by using the characteristics of level complementation through a multi-level integration module with convolution, and adaptively encoding multi-scale information of an image into a current pyramid feature vector to obtain feature information which contains global and local and is integrated in different levels;
and carrying out self-adaptive dynamic weighted fusion on the feature layers among different levels in a progressive manner in a plurality of stages to obtain a final prostate segmentation prediction result graph, wherein the fusion weight is obtained by automatic learning through convolution operation.
6. The method for detecting a prostate lesion area based on a dynamic multiscale perceptual adaptive integration network according to claim 1, wherein, for the coding feature, a feature representation is obtained through a corresponding feature enhancement layer, comprising:
in the corresponding feature fusion layer, features with the scale corresponding to the feature encoder are respectively adopted as input;
for the features of the scale, a double-flow attention mechanism and a residual error-attention mechanism are respectively utilized to further optimize the feature expression after passing through a feature encoder;
for the feature expressions with the same spatial resolution after the different attention transforms are respectively used, the summation of the pixel levels is adopted to obtain the more abundant feature expressions after fusion.
7. The method for detecting prostate focus area based on dynamic multi-scale perception adaptive integration network according to claim 6, wherein the dual-flow attention mechanism uses spatial attention and channel attention to calculate the foreground area which should be focused on by the network under normal conditions, then normalizes and inverts the weighted graph of foreground attention to focus on the background, and then obtains the fused characteristic representation by using pixel-level addition and convolution operation;
the residual-attention mechanism carries out multipath residual convolution operation on the input characteristic representation by utilizing a residual module and self-attention and mutual attention, and information flow among the branches is realized by combining self-attention weight and mutual attention weight among the branches.
8. The method for detecting the focal region of the prostate based on the dynamic multi-scale perception adaptive integration network according to claim 7, wherein the method comprises the following steps: the number of multi-branches in the residual-attention mechanism is set to 4, and the weights between the self-attention weight graph and the mutual-attention weight graph are both 1.
9. A prostate focus area detection device based on a dynamic multiscale perception self-adaptive integration network, which is characterized by comprising:
a tensor unit for acquiring a prostate input image from a dataset of prostate cancer and obtaining a tensor;
the coding unit is used for inputting the tensor obtained by the tensor unit into a feature coder, utilizing the local information of the convolution attention image of dynamic pooling by the feature coder, combining the global information of the attention image of a global efficient attention mechanism, and integrating the local information and the global information through pixel addition and convolution operation to obtain the multi-scale coding feature based on each image;
the enhancement unit is used for obtaining feature representation through a corresponding feature enhancement layer aiming at the coding features obtained by the coding unit;
and the prediction unit is used for performing feature decoding on the feature representation obtained by the enhancement unit through a decoder to obtain a final prostate focus region segmentation prediction result.
10. A computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a computer to perform a method for detecting a prostate lesion area based on a dynamic multiscale perceptual adaptive integration network according to any one of claims 1-8.
CN202310486436.5A 2023-04-28 2023-04-28 Prostate focus area detection method, device and storage medium Pending CN116542924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310486436.5A CN116542924A (en) 2023-04-28 2023-04-28 Prostate focus area detection method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310486436.5A CN116542924A (en) 2023-04-28 2023-04-28 Prostate focus area detection method, device and storage medium

Publications (1)

Publication Number Publication Date
CN116542924A true CN116542924A (en) 2023-08-04

Family

ID=87446403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310486436.5A Pending CN116542924A (en) 2023-04-28 2023-04-28 Prostate focus area detection method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116542924A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593517A (en) * 2024-01-19 2024-02-23 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593517A (en) * 2024-01-19 2024-02-23 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network
CN117593517B (en) * 2024-01-19 2024-04-16 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network

Similar Documents

Publication Publication Date Title
CN110930416B (en) MRI image prostate segmentation method based on U-shaped network
CN111798462B (en) Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
CN113313657B (en) Unsupervised learning method and system for low-illumination image enhancement
CN112785617B (en) Automatic segmentation method for residual UNet rectal cancer tumor magnetic resonance image
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
CN115661144B (en) Adaptive medical image segmentation method based on deformable U-Net
CN110930397A (en) Magnetic resonance image segmentation method and device, terminal equipment and storage medium
CN110889853A (en) Tumor segmentation method based on residual error-attention deep neural network
CN111784628A (en) End-to-end colorectal polyp image segmentation method based on effective learning
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
Liu et al. Deep image inpainting with enhanced normalization and contextual attention
CN114897782B (en) Gastric cancer pathological section image segmentation prediction method based on generation type countermeasure network
CN116542924A (en) Prostate focus area detection method, device and storage medium
CN117115184A (en) Training method and segmentation method of medical image segmentation model and related products
CN116309806A (en) CSAI-Grid RCNN-based thyroid ultrasound image region of interest positioning method
CN116091885A (en) RAU-GAN-based lung nodule data enhancement method
CN117575915A (en) Image super-resolution reconstruction method, terminal equipment and storage medium
Liu et al. Facial image inpainting using multi-level generative network
CN117746042A (en) Liver tumor CT image segmentation method based on APA-UNet
CN117314751A (en) Remote sensing image super-resolution reconstruction method based on generation type countermeasure network
CN116934721A (en) Kidney tumor segmentation method based on multi-scale feature extraction
Du et al. X-ray image super-resolution reconstruction based on a multiple distillation feedback network
Wang et al. MAFUNet: Multi-Attention Fusion Network for Medical Image Segmentation
CN118485834B (en) Tumor segmentation method based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination