WO2024011835A1 - 一种图像处理方法、装置、设备及可读存储介质 - Google Patents

一种图像处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2024011835A1
WO2024011835A1 PCT/CN2022/138163 CN2022138163W WO2024011835A1 WO 2024011835 A1 WO2024011835 A1 WO 2024011835A1 CN 2022138163 W CN2022138163 W CN 2022138163W WO 2024011835 A1 WO2024011835 A1 WO 2024011835A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
decoding
feature
layer
features
Prior art date
Application number
PCT/CN2022/138163
Other languages
English (en)
French (fr)
Inventor
司伟鑫
李才子
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2024011835A1 publication Critical patent/WO2024011835A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30016Brain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Definitions

  • the present application belongs to the field of image processing technology, and in particular relates to an image processing method, device, equipment and readable storage medium.
  • Image segmentation technology can segment the image to be processed into several specific areas with unique properties, and extract target areas from these areas.
  • Image segmentation technology is widely used in fields such as medicine, military, remote sensing, and meteorology.
  • image segmentation technology can be used to segment the subthalamic nucleus and red nucleus in brain magnetic resonance images, and then determine the implantation of stimulation electrodes in the subthalamic nucleus deep brain stimulation (DBS). into position.
  • DBS deep brain stimulation
  • image segmentation models based on deep learning segmentation network are usually used, and the image to be processed is subjected to multi-level convolution operations such as downsampling and upsampling through the "encoder-bottleneck layer-decoder" structure to extract the Low-level features and high-level semantic features in the image to be processed, and output segmentation results based on the extracted features.
  • the low-level features and high-level semantic features extracted by the encoder usually lose information, resulting in deviations in the extraction of semantic information of the image to be processed, and the degree of correlation between each part is insufficient. Based on this, on the one hand, the decoder will continue to amplify this deviation during the decoding process.
  • the insufficient correlation between various parts of the image has a greater impact on the blurred target, which limits the segmentation performance of the image segmentation model, especially When segmenting images to be processed of small targets with variable shapes and blurred boundaries, there is a common problem of false positive areas, which leads to inaccurate image segmentation results.
  • embodiments of the present application provide an image processing method, device, equipment and readable storage medium to solve the problem of inaccurate image segmentation results in existing image processing methods.
  • the first aspect of the embodiment of the present application provides an image processing method.
  • the method includes: acquiring an image to be processed; processing the image to be processed through a trained image segmentation model to obtain a segmented image; wherein the image segmentation model includes sequentially connecting M first coding feature layers, N second coding feature layers, N second decoding feature layers and M first decoding feature layers, M ⁇ 1, N ⁇ 1; M first coding feature layers and M There is a one-to-one correspondence between the first decoding feature layers.
  • An attention mechanism module is provided between the first coding feature layer and the corresponding first decoding feature layer. The attention mechanism module is used to output the corresponding first coding feature layer.
  • the low-level features are subjected to feature enhancement processing to obtain the target area features, and the target area features are input into the corresponding first decoding feature layer; there is a one-to-one correspondence between the N second encoding feature layers and the N second decoding feature layers.
  • a self-attention mechanism module is provided between the second encoding feature layer and the corresponding second decoding feature layer. The self-attention mechanism module is used to extract global context information from the high-level semantic features output by the corresponding second encoding feature layer, and The global context information is input into the corresponding second decoding feature layer.
  • the attention mechanism module is an attention gate structure module; the self-attention mechanism module is a Transformer structure module.
  • inputting the target area features into the corresponding first decoding feature layer includes: combining the target area features with the input information of the corresponding first decoding feature layer After performing dot multiplication, it is input into the first decoding feature layer, and the input information is the output information of the previous layer of the first decoding feature layer.
  • inputting the global context information into the corresponding second decoding feature layer includes: combining the global context information with the input information of the corresponding second decoding feature layer After addition, it is input into the second decoding feature layer, and the input information is the output information of the previous layer of the second decoding feature layer.
  • the image to be processed includes a brain magnetic resonance image
  • the segmented image is an image including a segmentation result of the subthalamic nucleus and the red nucleus.
  • the method further includes: determining the target position coordinates based on the segmented image.
  • the image segmentation model is trained in the following manner: obtaining a training set image, and the training set image is an image marked with a target area; inputting the training set image to be In the trained image segmentation model, the image segmentation model is trained based on the loss function.
  • the loss function is determined based on the sum of cross-entropy loss and Dice loss.
  • a second aspect of the embodiment of the present application provides an image processing device.
  • the device includes: an acquisition unit for acquiring an image to be processed; a processing unit for processing the image to be processed through a trained image segmentation model to obtain segmentation Image; wherein, the image segmentation model includes M first coding feature layers, N second coding feature layers, N second decoding feature layers and M first decoding feature layers connected in sequence, M ⁇ 1, N ⁇ 1 ; There is a one-to-one correspondence between the M first encoding feature layers and the M first decoding feature layers.
  • An attention mechanism module is provided between the first encoding feature layer and the corresponding first decoding feature layer.
  • the attention mechanism module is used to Perform feature enhancement processing on the low-level features output by the corresponding first encoding feature layer to obtain the target area features, and input the target area features into the corresponding first decoding feature layer; N second encoding feature layers and N second There is a one-to-one correspondence between the decoding feature layers.
  • a self-attention mechanism module is provided between the second encoding feature layer and the corresponding second decoding feature layer. The self-attention mechanism module is used to output high-level information from the corresponding second encoding feature layer. Global context information is extracted from the semantic features, and the global context information is input into the corresponding second decoding feature layer.
  • a third aspect of the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program Implement the steps of the method described in any one of the first aspects.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the method according to any one of the first aspects is implemented. A step of.
  • the beneficial effects of the embodiments of the present application are: based on the image processing method, device, equipment and readable storage medium provided by the present application, the method segments the image to be processed based on the image segmentation model to obtain the segmented image .
  • the image segmentation model has an encoder-decoder structure.
  • the M first encoding feature layers in the encoder and the M first decoding feature layers in the decoder are connected in a one-to-one correspondence through the attention mechanism module.
  • the encoder The N second encoding feature layers in and the N second decoding feature layers in the decoder are connected through the self-attention mechanism module.
  • the attention mechanism module is used to perform feature enhancement processing on the low-level features output by the corresponding first encoding feature layer to obtain the target area features, and input the target area features into the corresponding first decoding feature layer to obtain
  • the decoder is caused to generate a first decoding feature map based on the target area features and corresponding input information;
  • the self-attention mechanism module is used to extract global context information from the high-level semantic features output by the corresponding second encoding feature layer, and combine the global context
  • the information is input into the corresponding second decoding feature layer, so that the decoder generates a second decoding feature map according to the global context information and the corresponding input information.
  • This method can target features at different levels and perform targeted processing guided by a hierarchical attention mechanism, thereby improving the segmentation accuracy of the image.
  • Figure 1 is a schematic diagram of the segmentation results of the subthalamic nucleus and the red nucleus in the brain MRI image provided by the embodiment of the present application;
  • Figure 2 is a schematic diagram of the traditional U-Net-based image segmentation model provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of an image segmentation model provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the processing process of the attention gate structure provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of the processing process of the Transformer structure provided by the embodiment of the present application.
  • Figure 6 is a schematic flow chart of an image segmentation method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the process of obtaining segmented images through the image segmentation model provided by the embodiment of the present application.
  • Figure 8 is a partial segmentation result display diagram provided by the embodiment of the present application.
  • Figure 9 is a schematic flow chart of the target positioning method provided by the embodiment of the present application.
  • Figure 10 is a schematic diagram of the positioning process of the target positioning method provided by the embodiment of the present application.
  • Figure 11 is a schematic diagram of an image segmentation device provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • the image segmentation model based on U-Net has been widely used in the field of medical image segmentation, but it has the problem of inaccurate segmentation results.
  • MRI Magnetic Resonance Imaging
  • the annotation result shown in (b) in Figure 1 can be obtained by processing the brain MRI image with the image segmentation model based on U-Net, as shown in (c) in Figure 1 . It can be seen that compared with manual annotation, the annotation results obtained through the U-Net-based image segmentation model have false positive areas, that is, areas detected as target areas by the model are actually non-target areas.
  • the appearance of false positive areas is due to the fact that the image segmentation model based on U-Net usually passes the image to be processed through the "encoder-bottleneck layer-decoder" structure for downsampling and upsampling when performing image segmentation.
  • Level convolution operation extracts low-level features and high-level semantic features in the image to be processed, and outputs segmentation results based on the extracted features.
  • the low-level features and high-level semantic features extracted by the encoder usually lose information, resulting in deviations in the extraction of semantic information of the image to be processed, and the degree of correlation between each part is insufficient. Based on this, on the one hand, the decoder will continue to amplify this deviation during the decoding process.
  • the insufficient correlation between various parts of the image has a greater impact on the blurred target, which limits the segmentation performance of the image segmentation model, especially When segmenting images to be processed of small targets with variable shapes and blurred boundaries, there is a common problem of false positive areas, which leads to inaccurate image segmentation results.
  • embodiments of the present application provide an image processing method.
  • the method is based on an image segmentation model.
  • the image to be processed is passed through an image segmentation model (HAU-Net) equipped with a hierarchical attention mechanism.
  • HAU-Net image segmentation model
  • This method can target different levels of features (low-level features and high-level semantic features) and perform targeted processing guided by a hierarchical attention mechanism, thereby improving the segmentation accuracy of the image to be processed.
  • the hierarchical attention mechanism includes: in the image segmentation model, the low-level features and high-level semantic features of the image to be processed are hierarchically processed in the model according to their respective characteristics.
  • Figure 3 is a schematic diagram of an image segmentation model provided by an embodiment of the present application. As shown in Figure 3, according to the image processing flow, the image segmentation model includes an input end, an encoder, a bottleneck layer, a decoder, and an output end.
  • the input end is used to input the image to be processed to the encoder.
  • the image to be processed is a brain magnetic resonance image.
  • the encoder includes M first coding feature layers (also known as M shallow coding feature layers) and N second coding feature layers (also known as N deep coding feature layers) that are close to the input end of the image segmentation model and connected in sequence. encoding feature layer).
  • M first coding feature layers also known as M shallow coding feature layers
  • N second coding feature layers also known as N deep coding feature layers
  • the respective input information (such as the input image) is sequentially processed through each coding feature layer.
  • the size of the obtained coding feature map is gradually reduced, thereby outputting coding feature maps of different sizes (including the first coding feature map and the second coding feature map).
  • the size of the convolution kernel of each encoding feature layer is the same.
  • the size of the convolution kernel is 3*3.
  • the image to be processed is first subjected to a multi-level convolution operation of dimensionality reduction processing and downsampling through M first coding feature layers to extract low-level features in the image to be processed, and then a first feature of the corresponding size is generated.
  • Encoding feature map For example, if the image to be processed is a brain MRI image with a size of 512*512, the size of the first encoding feature map output after the brain MRI image passes through the first encoding feature layer may be 128*128.
  • the obtained first encoding feature map is continued to be down-sampled for dimensionality reduction through N second encoding feature layers, and the high-level semantic features in the feature map are extracted to obtain high-level semantic feature maps of corresponding sizes, that is, each second The second encoded feature map output by the encoding feature layer.
  • the low-level features of the image to be processed include features with substantial significance, such as color, contour, and specific location of the target area in the image to be processed.
  • High-level semantic features include the meaning of the target area in the image to be processed, which is the semantic abstraction of each target area in the image to be processed, reflecting the neural network's semantic understanding of each target area in the image to be processed.
  • the bottleneck layer is the connection layer between the encoder and the decoder, as shown in Figure 3. It is the convolution layer with the smallest output feature map in the image segmentation model.
  • the bottleneck layer is used to perform a convolution operation on the second coding feature map obtained from the second coding feature layer in the encoder, extract the high-level semantic features of the second coding feature map, generate a bottleneck layer feature map, and then convert the bottleneck layer feature map input into the decoder.
  • the decoder includes M first decoding feature layers (also called M shallow decoding feature layers) and N second decoding feature layers (also called N deep layers) that are close to the output end of the image segmentation model and connected in sequence.
  • Decoding feature layer N second decoding feature layers and N second encoding feature layers are connected through the bottleneck layer.
  • the respective input information such as the input image
  • the size of the obtained decoding feature map gradually increases, thereby outputting decoding feature maps of different sizes (including the first decoding feature map and the second decoding feature map).
  • the size of the convolution kernel of each decoding feature layer is the same.
  • the size of the convolution kernel is 3*3.
  • the decoding feature map output by each of the M first decoding feature layers is the same size as the coding feature map output by each of the M first coding feature layers that are connected correspondingly.
  • the decoding feature map output by each of the N second decoding feature layers has the same size as the encoding feature map output by each of the corresponding N second encoding feature layers.
  • the image segmentation model in this embodiment follows the structure of U-Net as a whole.
  • the encoder of the image segmentation model contains five down-sampling operations to form encoding features at six different scales. picture.
  • the decoder also contains five upsampling operations to form six decoding feature maps of different scales.
  • the features of the image to be processed processed by the convolutional layers of the three scales near the input and output ends of the model are regarded as low-level features
  • the features of the image to be processed processed by the convolutional layers of the other three scales are regarded as low-level features.
  • Features are regarded as high-level semantic features, and both are used to build a hierarchical attention mechanism to process different types of features hierarchically.
  • Skip connections include a one-to-one first jump connection between M first coding feature layers and M first decoding feature layers; a one-to-one correspondence between N second coding feature layers and N second decoding feature layers. A corresponding second hop connection.
  • an attention mechanism module is provided in the first jump connection.
  • the attention mechanism module is used to perform feature enhancement processing on the low-level features output by the corresponding first encoding feature layer to obtain target area features, and input the target area features into the corresponding first decoding feature layer.
  • the feature enhancement processing provided by this embodiment, when the image segmentation model performs low-level feature extraction, there may be problems with the target area in other areas of the image to be processed (ie, areas that are not related to the target area or areas outside the target area).
  • the feature outlines are similar to the feature outlines. Therefore, when extracting the target area, the segmentation error of the target area by the image segmentation model can be reduced by strengthening the target features in the image to be processed.
  • the weight of non-target areas other than the target area in the image to be processed is reduced in the image segmentation model, so that its impact on the segmentation results is reduced, thereby Reduce the segmentation error for the target area.
  • a self-attention mechanism module is provided in the second skip connection for extracting global context information from the high-level semantic features output by the corresponding second coding feature layer, and inputting the global context information to the corresponding third coding feature layer. in the second decoding feature layer.
  • the attention mechanism includes an attention gate structure (Attention gate, AG); the self-attention mechanism includes a Transformer structure.
  • AG is embedded in the skip connection between the first encoding feature layer and the first decoding feature layer to enhance the target features in the image to be processed.
  • the Transformer structure By embedding the Transformer structure in the skip connection between the second encoding feature layer and the second decoding feature layer, it is used to extract global context information of high-level semantic features in the image to be processed.
  • the image segmentation model of the hierarchical attention mechanism provided in this embodiment makes use of the difference between AG's pixel-level attention mechanism and Transformer's self-attention mechanism for building global context association, which can effectively mine features for different characteristics. Corresponding valuable information.
  • the purpose of using the attention gate structure AG is to perform product weighting on each pixel-level feature of the input image feature to be processed, so as to achieve the purpose of strengthening the effective features.
  • the input of the AG module Multiply x and the weight ⁇ pixel by pixel to obtain the weighted output result.
  • the core of AG lies in generating attention weights.
  • the features in the adjacent small-scale decoder of the input x are marked as g, a 1 ⁇ 1 ⁇ 1 convolution operation is applied to x and g respectively, and then the convolution results are added and input into ReLU Activation function and Sigmoid function, the obtained weight matrix is resampled by the interpolation algorithm (Resampler) to obtain the attention weight ⁇ consistent with the input x scale.
  • the essence of the ReLU activation function is to output the features with eigenvalues greater than 0 as they are, and return the features with eigenvalues less than 0 to zero, thereby filtering the features with smaller eigenvalues.
  • the Sigmoid function is a normalization function, which is used to normalize the product features to 0-1 to obtain the probability value of a weighted matrix.
  • the attention gate structure AG can focus attention on target areas of various shapes and sizes through automatic learning.
  • the image segmentation model adding this attention gate structure can highlight specific image feature areas.
  • the Transformer structure as a structure based on the self-attention mechanism, can extract features from the global context relationship information between features in each image to be processed.
  • a learnable parameter matrix with the same shape as E is added to the Transformer structure to characterize the positional relationship between each element in the sequence.
  • This parameter matrix is called position encoding ( Position Embedding (PE).
  • PE Position Embedding
  • the two-dimensional sequence T is used to extract global context information through the multi-head attention module (Multi-head Attention, MSA) and the multi-layer perceptron (MLP).
  • MSA Multi-head Attention
  • MLP multi-layer perceptron
  • the MSA module first performs linear projection through a multi-layer perceptron, and uses three linear mapping layers to obtain Q, K, and V, as shown in formula (1).
  • W Q , W K , W V ⁇ R c ⁇ d are the learnable parameters of three linear layers.
  • the self-attention module can be expressed as:
  • Z i , Q i ⁇ R 1 ⁇ d are the i-th row of Z and Q respectively, Represents the attention map, indicating the similarity between each spatial voxel and other voxels. The higher the similarity, the stronger the connection between the two points. Then matrix multiplication is performed on V and S to obtain attention enhancement features.
  • MSA is an extension of SA and contains multiple SA operations. The obtained results are combined together and linear mapping is used to obtain the results of MSA. The process is as follows: Equation (3) Show.
  • the features in the Transformer structure are normalized by layer normalization (Layer Norm) before being input to MSA and MLP, and finally the high-level semantic features that the Transformer structure can output for feature weighting are obtained.
  • Layer normalization Layer Normalization
  • the attention mechanism module inputs the target area features into the corresponding first decoding feature layer, including: dot-multiplying the target area features and the input information of the corresponding first decoding feature layer and then inputting them into the first decoding feature layer.
  • the input information is the output information of the previous layer of the first decoding feature layer.
  • the input information of the first decoding feature layer e is the output information of the first decoding feature layer d
  • the input information of the first decoding feature layer f is the output information of the first decoding feature layer e.
  • the self-attention mechanism module inputs the global context information into the corresponding second decoding feature layer, including: adding the global context information and the input information of the corresponding second decoding feature layer and then inputting it into the third decoding feature layer.
  • the input information is the output information of the previous layer of the second decoding feature layer.
  • the input information of the second decoding feature layer b is the output information of the first decoding feature layer a
  • the input information of the second decoding feature layer c is the output information of the first decoding feature layer b.
  • the second decoding feature layer when the second decoding feature layer generates the corresponding second decoding feature map, it is based on the global context information extracted by the self-attention mechanism module in the corresponding second skip connection and the corresponding second decoding feature layer. It is generated after feature merging of input information.
  • the global context relationship of the target area is constructed in the second encoding feature map through the self-attention mechanism module, which enables the decoder to accurately obtain the target features of the target area with blurred boundaries when merging features.
  • the first decoding feature layer When the first decoding feature layer generates the corresponding first decoding feature map, it performs feature merging based on the target area features obtained by the attention mechanism module in the corresponding first skip connection and the input information of the corresponding first decoding feature layer. generated later.
  • the decoder when the decoder performs feature merging through the "upsampling-feature merging-convolution" operation, it needs to be performed continuously multiple times until the size of the output first decoded feature map is equal to the size of the input image to be processed. consistent.
  • the first decoding feature map output by the decoder is the high-level semantic feature of the target area determined by the self-attention mechanism module (the meaning of the target area, for example, the target area is the subthalamic nucleus or the red nucleus) And the segmented image generated by the low-level features of the target area (the specific position of the target area in the image to be processed) determined by the self-attention mechanism module.
  • the output end is used to classify the features in the first decoded feature map through a convolution layer with a convolution kernel of size 1 ⁇ 1 and a convolution operation of the Softmax function, and obtain the segmentation result. , and output the segmented image.
  • the segmented image is an image including subthalamic nucleus and red nucleus segmentation results.
  • Figure 6 is a schematic flow chart of an image segmentation method provided by an embodiment of the present application, which is applied to electronic devices. See Figure 6. The method includes the following steps S601-S602.
  • the electronic device obtains the image to be processed.
  • the images to be processed include all images used for segmentation operations in various fields (such as medicine, military, remote sensing, meteorology, etc.).
  • images used for segmentation operations in various fields such as medicine, military, remote sensing, meteorology, etc.
  • MRI images of various parts of the human body such as brain MRI images obtained by an MRI machine.
  • the electronic device when it acquires the image to be processed, it may acquire it through a second device used to acquire the image that needs to be segmented.
  • the second device may be a brain magnetic resonance imager for acquiring MRI images of the brain.
  • the electronic device may be the same device as the second device, or may be a different device.
  • the electronic device processes the image to be processed through the image segmentation model to obtain the segmented image.
  • the electronic device processes the image to be processed through the image segmentation model to obtain a segmented image based on the target area.
  • FIG. 7 is a schematic diagram of the process of obtaining segmented images by an electronic device through an image segmentation model.
  • the image to be processed input by the electronic device is a brain MRI image in the medical field.
  • the image segmentation model is used to segment the subthalamic nucleus and red nucleus in the brain MRI image
  • the output segmentation is obtained. image.
  • the location, shape and size of the subthalamic nucleus and red nucleus can be clearly highlighted in this segmented image.
  • the image segmentation model provided by this application can be applied to various fields such as medical image segmentation, and can also be applied to any technology that needs to achieve segmentation of target areas in the image to be processed.
  • the following takes the segmentation task of the subthalamic nucleus and red nucleus in brain MRI images in the field of medical image segmentation as an example.
  • the three parts of feasibility verification provide an exemplary explanation of the training process and effect of the image segmentation model provided in this application.
  • the brain MRI images of all subjects diagnosed with Parkinson's disease are used as training samples.
  • All images in the training samples are T2 mode images acquired by a 3T MRI scanner.
  • the thickness is 2mm
  • the resolution is 0.6875 ⁇ 0.6875 ⁇ 2
  • the data size is 320 ⁇ 320 ⁇ 70.
  • the subthalamic nucleus and red nucleus in each training sample image were manually outlined by two radiologists with more than 6 years of experience in neuroradiology.
  • a total of 99 MRI image samples and corresponding labels are selected, of which 80 samples are selected as the training sample set, and the remaining 19 samples are used as the test sample set. Perform 5 times of cross-validation on the training sample set, use the image segmentation model verified for each time to obtain the segmentation results on the test sample set, and use the average result on the test sample set to evaluate the performance of the image segmentation model.
  • all training sample images are resampled to the same spatial resolution and cropped to [192, 192, 48] as input images of the image segmentation model.
  • data enhancement can be used to expand the data in the training sample set.
  • Data enhancement methods include random rotation, elastic deformation, Gaussian noise, mirror transformation, and scaling. The angle of random rotation is (- ⁇ /12, ⁇ /12), and the range of scaling is (0.85, 1.25).
  • the sum of cross-entropy loss and Dice loss is used as the loss function, and the stochastic gradient descent (SGD) optimizer is used.
  • the learning rate is set to 0.01
  • the momentum is set to 0.99
  • the weight attenuation is set to 3e-5.
  • the overall training process of the image segmentation model can be implemented using Python, and is trained and tested on the NVIDIA GeForce GTX 3090 GPU based on the PyTorch 1.8.0 framework.
  • the training batch size is set to 2, and all models are trained and iterated based on the nnU-Net framework for 150 rounds, with 250 batches in each round.
  • the method of image segmentation using the above-mentioned trained image segmentation model and the above-mentioned method of training the image segmentation model may be executed by the same electronic device, or may be executed by different electronic devices.
  • the electronic device may not be limited to various smart phones, portable notebooks, tablets, smart wearable devices, computers, robots, etc.
  • the segmented image results obtained by the image segmentation method provided in this embodiment are compared with traditional U-Net, Attention U-Net, R2U-Net, CS2-Net and Fully Convolutional Neural Network (Fully Convolutional Neural Network). Networks, FCN) and compare the results obtained on the 19 test sample sets.
  • the compared indicators include Dice coefficient (Dice Similarity Coefficient, DSC), Jaccard coefficient (JA), sensitivity (Sensitivity, SEN) and 95% Hausdorff distance (HD95). Among them, these indicators are used to evaluate the modular network segmentation results and standard segmentation. The similarity between the results, the greater the Dice coefficient (DSC), Jaccard coefficient (JA), and sensitivity (SEN) index, and the smaller the HD95 index, indicating that the higher the similarity, the better the fitting performance.
  • DSC Dice Similarity Coefficient
  • JA Jaccard coefficient
  • SEN sensitivity
  • HD95 index 95% Hausdorff distance
  • the comparison results are shown in Table 1.
  • Table 1 the image segmentation method provided by this embodiment is better than other methods in all indicators. Specifically, for the subthalamic nucleus and red nucleus, the Dice coefficients reached 88.20% and 92.36% respectively, which were improved by 2.94% and 3.20% respectively compared with the baseline method U-Net.
  • the image segmentation method provided by this embodiment has greater advantages in the Jaccard coefficient. Compared with the baseline method, the subthalamic nucleus and red nucleus have improved by 4.9% and 5.55% respectively. Compared with Attention U-Net, the image segmentation method provided in this embodiment also has a performance improvement of 3.57% and 4.75% on the two targets respectively. These improvements show that the HAU-Net proposed in this embodiment is aimed at the subthalamic nucleus segmentation task. Have better learning ability and generalization ability.
  • the key and difficult areas of segmentation are marked by borders.
  • the method in this embodiment is more consistent with the manual segmentation and is more effective.
  • the image segmentation model (HAU-Net) provided in this embodiment can be applied in the field of medical image segmentation, for example, target positioning during electrode implantation for deep brain stimulation (DBS). Based on this, this embodiment also provides a target positioning method, as shown in FIG. 9 , including the following steps S901-S902.
  • the electronic device acquires the segmented image.
  • the electronic device performs image segmentation by inputting the image to be processed (such as a brain MRI image) into the image segmentation model provided in the above embodiment, and obtains the segmented image.
  • image to be processed such as a brain MRI image
  • the electronic device determines the target position coordinates based on the segmented image.
  • the electronic device measures the position coordinates of the target point in the segmented image, and then marks the position coordinates of the target point in the original image (that is, the brain MRI image).
  • FIG. 10 is a schematic diagram of the positioning process of the target positioning method provided in this embodiment.
  • the electronic device that performs the target positioning method and the electronic device that performs the image segmentation model (HAU-Net) training process and the image segmentation method can be the same electronic device, or they can be different electronic devices. equipment.
  • the image processing method provided in the embodiments of this application targets different levels of features (low-level features and high-level semantic features), and is guided by a hierarchical attention mechanism for targeted processing, using attention gating mechanisms and self-attention-based
  • the Transformer structure improves the extraction efficiency of low-level features and high-level features in the neural network model, and more efficiently mines local features and global context information of the neural network model, thereby improving the segmentation accuracy of the image.
  • image segmentation model provided in this method, automatic feature extraction of the subthalamic nucleus and red nucleus in brain MRI images can be achieved, accurate segmentation of the subthalamic nucleus and red nucleus can be achieved, and the subthalamic nucleus brain can be located based on the segmented image.
  • the position coordinates of the target point in deep electrical stimulation DBS can be used to determine the implantation position of the stimulation electrode, which can improve the efficiency of the surgery.
  • sequence number of each step in the above embodiment does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
  • Figure 11 is a schematic diagram of an image segmentation device provided by an embodiment of the present application.
  • the device includes: an acquisition unit, used to acquire an image to be processed; a processing unit, used to pass the image to be processed through a trained
  • the image segmentation model is processed to obtain segmented images; among them, the image segmentation model includes an encoder and a decoder, and the encoder and the decoder are connected correspondingly through skip connections; the encoder is used to encode the image to be processed, and sequentially generates the first encoding feature map and the second encoding feature map; the skip connection is configured with an attention mechanism module and a self-attention mechanism module.
  • the attention mechanism module is used to perform feature enhancement processing on the low-level features of the first encoding feature map, and the processed low-level features Sent to the decoder, where the feature enhancement process includes strengthening the target area features in the low-level features; the self-attention mechanism module is used to extract the global context information of the high-level semantic features in the second encoding feature map, and combine the high-level semantic features with the global context The information is sent to the decoder; the decoder is used to determine the segmented image based on the processed low-level features, high-level semantic features, and global context information.
  • Figure 12 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 12 of this embodiment includes: a processor 120, a memory 121, and a computer program 122 stored in the memory 121 and executable on the processor 120, such as an image segmentation program.
  • the processor 120 executes the computer program 122, the steps in each of the above image segmentation method embodiments are implemented.
  • the processor 120 executes the computer program 122, the functions of each module/unit in each of the above device embodiments are implemented.
  • the computer program 122 may be divided into one or more modules/units, the one or more modules/units are stored in the memory 121 and executed by the processor 120 to complete this application.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program 122 in the electronic device 12 .
  • the electronic device 12 may be a computing device such as a tablet computer, a tablet computer, a desktop computer, a notebook, a handheld computer, or a cloud server.
  • the electronic device may include, but is not limited to, a processor 120 and a memory 121 .
  • FIG. 12 is only an example of the electronic device 12 and does not constitute a limitation of the electronic device 12. It may include more or fewer components than shown in the figure, or some components may be combined, or different components may be used. , for example, the electronic device may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 120 can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the memory 121 may be an internal storage unit of the electronic device 12 , such as a hard disk or memory of the electronic device 12 .
  • the memory 121 may also be an external storage device of the electronic device 12, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the electronic device 12. Card, Flash Card, etc. Further, the memory 121 may also include both an internal storage unit of the electronic device 12 and an external storage device.
  • the memory 121 is used to store the computer program and other programs and data required by the electronic device.
  • the memory 121 can also be used to temporarily store data that has been output or is to be output.
  • Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
  • Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units.
  • the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application.
  • For the specific working processes of the units and modules in the above system please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
  • the disclosed apparatus/terminal equipment and methods can be implemented in other ways.
  • the device/terminal equipment embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components can be combined or can be integrated into another system, or some features can be omitted, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by using computer program instructions related to hardware.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps of each of the above method embodiments may be implemented.
  • the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunications signals
  • software distribution media etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请提供的一种图像处理方法、装置、设备及可读存储介质,涉及图像处理技术领域,在一定程度上解决现有的图像处理方法中图像分割结果不准确的问题。该方法包括获取待处理图像;通过已训练的图像分割模型对待处理图像进行处理,得到分割图像。在该图像分割模型中,M个第一编码特征层与M个第一解码特征层之间通过注意力机制模块一一对应连接,N个第二编码特征层与N个第二解码特征层之间通过自注意力机制模块一一对应连接。其中,注意力机制模块用于对对应的第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征;自注意力机制模块用于从对应的第二编码特征层输出的高级语义特征中提取全局上下文信息。

Description

一种图像处理方法、装置、设备及可读存储介质 技术领域
本申请属于图像处理技术领域,尤其涉及一种图像处理方法、装置、设备及可读存储介质。
背景技术
图像分割技术能够将待处理图像分割成若干个特定的、具有独特性质的区域,并从这些区域中提取出目标区域。图像分割技术被广泛的应用于医学、军事、遥感以及气象等领域。例如,在医学领域中,可以通过图像分割技术对脑部磁共振图像中丘脑底核以及红核的分割,进而确定丘脑底核脑深部电刺激术(Deep Brain Stimulation,DBS)中刺激电极的植入位置。
目前,通常使用基于深度学习分割网络(U-Net)的图像分割模型,将待处理图像经过“编码器-瓶颈层-解码器”结构进行下采样以及上采样等多级卷积操作,提取该待处理图像中的低级特征以及高级语义特征,并根据提取到的特征输出分割结果。然而在现有的图像分割模型中,编码器提取到的低级特征以及高级语义特征通常存在丢失信息的情况,从而造成待处理图像的语义信息提取出现偏差,且各部分关联程度不够。基于此,解码器在解码的过程中一方面会将该种偏差继续放大,另一方面图像中各部分关联程度不够对于模糊目标的影响较大,使得图像分割模型的分割性能受到限制,尤其是针对形状多变、边界模糊的小目标的待处理图像进行分割时,普遍存在假阳性区域的问题,从而导致图像分割结果不准确。
发明内容
有鉴于此,本申请实施例提供了一种图像处理方法、装置、设备及可读存储介质,以解决现有的图像处理方法中图像分割结果不准确的问题。
本申请实施例的第一方面提供了一种图像处理方法,该方法包括:获取待 处理图像;通过已训练的图像分割模型对待处理图像进行处理,得到分割图像;其中,图像分割模型包括依次连接的M个第一编码特征层、N个第二编码特征层、N个第二解码特征层和M个第一解码特征层,M≥1,N≥1;M个第一编码特征层与M个第一解码特征层之间一一对应,第一编码特征层与对应的第一解码特征层之间设置有注意力机制模块,注意力机制模块用于对对应的第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将目标区域特征输入到对应的第一解码特征层中;N个第二编码特征层与N个第二解码特征层之间一一对应,第二编码特征层与对应的第二解码特征层之间设置有自注意力机制模块,自注意力机制模块用于从对应的第二编码特征层输出的高级语义特征中提取全局上下文信息,并将全局上下文信息输入到对应的第二解码特征层中。
结合第一方面,在第一方面的第一种可能实现方式中,注意力机制模块为注意力门结构模块;自注意力机制模块为Transformer结构模块。
结合第一方面,在第一方面的第二种可能实现方式中,将目标区域特征输入到对应的第一解码特征层中,包括:将目标区域特征与对应的第一解码特征层的输入信息进行点乘后输入到第一解码特征层中,输入信息为第一解码特征层的前一层的输出信息。
结合第一方面,在第一方面的第三种可能实现方式中,将全局上下文信息输入到对应的第二解码特征层中,包括:将全局上下文信息与对应的第二解码特征层的输入信息进行相加后输入到第二解码特征层中,输入信息为第二解码特征层的前一层的输出信息。
结合第一方面,在第一方面的第四种可能实现方式中,待处理图像包括脑部磁共振图像,分割图像为包括有丘脑底核和红核分割结果的图像。
结合第一方面,在第一方面的第五种可能实现方式中,该方法还包括:基于分割图像确定靶点位置坐标。
结合第一方面,在第一方面的第六种可能实现方式中,图像分割模型是通 过以下方式训练的:获取训练集图像,训练及图像为标注有目标区域的图像;将训练集图像输入待训练的图像分割模型中,并基于损失函数对图像分割模型进行训练,损失函数是根据交叉熵损失和Dice损失之和确定的。
本申请实施例的第二方面提供了一种图像处理装置,该装置包括:获取单元,用于获取待处理图像;处理单元,用于通过已训练的图像分割模型对待处理图像进行处理,得到分割图像;其中,图像分割模型包括依次连接的M个第一编码特征层、N个第二编码特征层、N个第二解码特征层和M个第一解码特征层,M≥1,N≥1;M个第一编码特征层与M个第一解码特征层之间一一对应,第一编码特征层与对应的第一解码特征层之间设置有注意力机制模块,注意力机制模块用于对对应的第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将目标区域特征输入到对应的第一解码特征层中;N个第二编码特征层与N个第二解码特征层之间一一对应,第二编码特征层与对应的第二解码特征层之间设置有自注意力机制模块,自注意力机制模块用于从对应的第二编码特征层输出的高级语义特征中提取全局上下文信息,并将全局上下文信息输入到对应的第二解码特征层中。
本申请实施例的第三方面提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面任一项所述方法的步骤。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面任一项所述方法的步骤。
本申请实施例与现有技术相比存在的有益效果是:基于本申请所提供的图像处理方法、装置、设备及可读存储介质,该方法基于图像分割模型对待处理图像进行分割,得到分割图像。该图像分割模型为编码器-解码器结构,编码器中的M个第一编码特征层与解码器中的M个第一解码特征层之间,通过注意力机制模块一一对应连接,编码器中的N个第二编码特征层与解码器中的N个 第二解码特征层之间,通过自注意力机制模块连接。其中,注意力机制模块用于对对应的第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将该目标区域特征输入到对应的所述第一解码特征层中,以使得解码器根据该目标区域特征以及对应的输入信息生成第一解码特征图;自注意力机制模块用于从对应的第二编码特征层输出的高级语义特征中提取全局上下文信息,并将全局上下文信息输入到对应的第二解码特征层中,以使解码器根据该全局上下文信息以及对应的输入信息生成第二解码特征图。该方法能够针对不同层次的特征,以层次化注意力机制为导向进行针对性的处理,从而提高图像的分割精度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的脑部MRI图像中丘脑底核与红核分割结果示意图;
图2是本申请实施例提供的传统基于U-Net的图像分割模型示意图;
图3是本申请一个实施例提供的图像分割模型的示意图;
图4是本申请实施例提供的注意力门结构的处理过程的示意图;
图5是本申请实施例提供的Transformer结构的处理过程的示意图;
图6是本申请一个实施例提供的图像分割方法示意性流程图;
图7是本申请实施例提供的通过图像分割模型得到分割图像的过程示意图;
图8是本申请实施例提供的部分分割结果展示图;
图9是本申请实施例提供的靶点定位方法的流程示意图;
图10是本申请实施例提供的靶点定位方法的定位过程示意图;
图11是本申请实施例提供的图像分割装置的示意图;
图12是本申请实施例提供的电子设备的示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
以下结合具体的实施例对本申请提供的技术方案进行详细的解释说明。
目前,基于U-Net的图像分割模型在医学图像分割领域中得到了广泛的应用,但是其存在分割结果不准确的问题。在一个示例中,以对图1中的(a)所示的脑部(Magnetic Resonance Imaging,MRI)图像中的红核和丘脑底核进行标注为例,通过临床专家人工勾画的方式可以获得如图1中的(b)所示的标注结果,通过基于U-Net的图像分割模型处理脑部MRI图像可以得到如图1中的(c)所示的标注结果。可以看出,相对于人工标注的方式,通过基于U-Net的图像分割模型获得的标注结果存在假阳性区域,即模型检测为目标区域,实际为非目标区域的区域。
参见图2,假阳性区域的出现是由于基于U-Net的图像分割模型在进行图像分割时,通常将待处理图像经过“编码器-瓶颈层-解码器”结构进行下采样以及上采样等多级卷积操作,提取该待处理图像中的低级特征以及高级语义特征,并根据提取到的特征输出分割结果。然而在该模型中,编码器提取到的低级特征以及高级语义特征通常存在丢失信息的情况,从而造成待处理图像的语义信息提取出现偏差,且各部分关联程度不够。基于此,解码器在解码的过程中一方面会将该种偏差继续放大,另一方面图像中各部分关联程度不够对于模糊目标的影响较大,使得图像分割模型的分割性能受到限制,尤其是针对形状多变、边界模糊的小目标的待处理图像进行分割时,普遍存在假阳性区域的问题,从而导致图像分割结果不准确。
基于此,本申请实施例提供一种图像处理方法,该方法基于图像分割模型,在获取到待处理图像后,将待处理图像通过设置有层次化注意力机制的图像分割模型(HAU-Net)进行处理,得到分割图像。该方法能够针对不同层次的特征(低级特征以及高级语义特征),以层次化注意力机制为导向进行针对性的处理,从而提高待处理图像的分割精度。其中,层次化注意力机制包括:在图像分割模型中,将待处理图像的低级特征与高级语义特征依据各自的特性在该模型中进行分层处理。
图3是本申请的一个实施例提供的图像分割模型的示意图。参见图3所示,按照图像处理的流程,该图像分割模型依次包括输入端、编码器、瓶颈层、解码器和输出端。
输入端用于向编码器输入待处理图像。在一个示例中,该待处理图像为脑部磁共振图像。
编码器,包括靠近图像分割模型输入端且依次连接的M个第一编码特征层(也可称为M个浅层编码特征层)和N个第二编码特征层(也可称为N个深层编码特征层)。其中,沿编码器输入端至瓶颈层的方向上,M个第一编码特征层和N个第二编码特征层中,依次通过每层编码特征层对各自的输入信息(例如输入图像)进行下采样的卷积操作后,所得到的编码特征图尺寸逐渐减小,从而输出不同尺寸的编码特征图(包括第一编码特征图和第二编码特征图)。在本申请的一个示例中,每层编码特征层的卷积核大小一致,示例性的,卷积核的大小为3*3。
示例性的,在编码器中,首先通过M个第一编码特征层对待处理图像进行降维处理下采样的多级卷积操作,提取待处理图像中的低级特征,然后生成对应大小的第一编码特征图。以待处理图像为脑部MRI图像大小是512*512为例,脑部MRI图像通过第一编码特征层之后所输出的第一编码特征图的大小可能为128*128。然后将得到的第一编码特征图通过N个第二编码特征层继续进行降维处理的下采样,提取该特征图中的高级语义特征,得到对应大小的高级语 义特征图,即每个第二编码特征层输出的第二编码特征图。
需要说明的是,待处理图像的低级特征包括待处理图像中目标区域的色彩、轮廓、具体位置等具有实体意义的特征。高级语义特征包括目标区域在待处理图像中的意义,是对待处理图像中各个目标区域的语义抽象,反映出神经网络对于待处理图像中各个目标区域的语义理解。
瓶颈层,为编码器与解码器之间的连接层,参见图3所示,其为该图像分割模型中输出的特征图最小的卷积层。瓶颈层用于对编码器中第二编码特征层所得到的第二编码特征图进行卷积操作,提取第二编码特征图的高级语义特征,生成瓶颈层特征图,然后将该瓶颈层特征图输入至解码器中。
解码器,包括靠近图像分割模型输出端且依次连接的M个第一解码特征层(也可称为M个浅层解码特征层)和N个第二解码特征层(也可称为N个深层解码特征层),N个第二解码特征层与N个第二编码特征层通过瓶颈层连接。其中,沿瓶颈层至解码器输出端的方向上,N个第二解码特征层和M个第一解码特征层中,依次通过每层解码特征层对各自的输入信息(例如输入图像)进行上采样的卷积操作后,所得到的解码特征图尺寸逐渐增大,从而输出不同尺寸的解码特征图(包括第一解码特征图和第二解码特征图)。在本申请的一个示例中,每层解码特征层的卷积核大小一致,示例性的,卷积核的大小为3*3。需要说明的是,解码器中,M个第一解码特征层中每一层所输出的解码特征图与对应连接的M个第一编码特征层中每一层所输出的编码特征图尺寸相同,N个第二解码特征层中每一层所输出的解码特征图与对应的N个第二编码特征层中每一层所输出的编码特征图尺寸相同。
示例性的,参见图3所示,本实施例中的图像分割模型整体上遵循U-Net的结构,图像分割模型的编码器中包含五次下采样的操作,形成六个不同尺度的编码特征图。相应地,解码器中同样包含五次上采样的操作,形成六个不同尺度的解码特征图。在该模型中,将该模型靠近输入端以及输出端的三个尺度的卷积层所处理的待处理图像的特征视为低级特征,将另外三个尺度的卷积层 所处理的待处理图像的特征视为高级语义特征,二者用于构建层次化注意力机制,以此分层处理不同类型特征。
跳跃连接,包括M个第一编码特征层与M个第一解码特征层之间为一一对应的第一跳跃连接;N个第二编码特征层与N个第二解码特征层之间为一一对应的第二跳跃连接。
在本实施例中,第一跳跃连接中设置有注意力机制模块。该注意力机制模块用于对对应的第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将目标区域特征输入到对应的第一解码特征层中。本实施例提供的特征增强处理中,由于在图像分割模型进行低级特征提取时,待处理图像的其他区域(即与目标区域不相关的区域或者目标区域以外的区域)内可能会存在与目标区域的特征轮廓相似的特征轮廓。因此,在进行目标区域提取时,可以通过强化待处理图像中的目标特征,降低图像分割模型对于目标区域的分割误差。例如,在图像分割模型训练的过程中,根据目标区域的标注结果,对待处理图像中目标区域以外的非目标区域,降低其在图像分割模型中的权重,使得其对于分割结果的影响降低,从而降低对于目标区域的分割误差。
在本实施例中,第二跳跃连接中设置有自注意力机制模块,用于从对应的第二编码特征层输出的高级语义特征中提取全局上下文信息,并将全局上下文信息输入到对应的第二解码特征层中。
在一些实施例中,注意力机制包括注意力门结构(Attention gate,AG);自注意力机制包括Transformer结构。在本实施例中,通过在第一编码特征层与第一解码特征层之间的跳跃连接中内嵌AG,用于强化待处理图像中的目标特征。通过在第二编码特征层与第二解码特征层之间的跳跃连接中内嵌Transformer结构,用于提取待处理图像中高级语义特征的全局上下文信息。本实施例中提供的层次化注意力机制的图像分割模型,利用了AG对于像素级注意力机制,而Transformer对于构建全局上下文关联的自注意力机制的区别,能够有效地针对不同特性的特征挖掘相应的有价值信息。
本实施例中,采用注意力门结构AG的目的是对输入的待处理图像特征的每一个像素级特征进行乘积加权,以达到强化有效特征的目的,如图4中所示,AG模块的输入x与权重α进行逐像素乘,得到加权后的输出结果。AG的核心在于生成注意力权重。如图4所示,将输入x的相邻小尺度的解码器中的特征标记为g,将x与g分别施加1×1×1的卷积操作,然后将卷积结果相加后输入ReLU激活函数以及Sigmoid函数,得到的权重矩阵通过插值算法重采样(Resampler)得到与输入x尺度一致的注意力权重α。其中,ReLU激活函数的本质是将特征值大于0的特征原样输出,特征值小于0的特征归零,从而将特征值较小的特征进行过滤。Sigmoid函数为归一化的函数,用于将积特征归一化到0-1,得到一个加权矩阵的概率值。
示例性的,在医学图像处理领域,注意力门结构AG,能够通过自动学习的方式把注意力集中在各种形状和大小的目标区域上。加入该注意力门结构的图像分割模型可以突出显示特定的图像特征区域。
本实施例中,如图5所示,Transformer结构作为一种基于自注意力机制的结构,能够对各个待处理图像中特征之间的全局上下文关系信息进行特征提取。具体实施方式,包括:首先将编码器生成的高级语义特征表示为{f l} (D,H,W,C),将其转化为二维序列E∈R N×C,其中N=D×H×W,C代表特征通道数,D,H,W分别代表输入的深度、高度以及宽度。为进行待处理图像空间位置的编码,在Transformer结构中加入一个形状与E相同的可学习的参数矩阵,用于对序列中各个元素之间的位置关系进行表征,该参数矩阵称为位置编码(Position Embedding,PE)。在Transformer结构中,将位置编码PE与二维序列E直接相加获得最终的二维序列T:T=E+PE。而后将二维序列T通过多头自注意力模块(Multi-head Attention,MSA)和多层感知机(MLP)实现全局上下文信息的提取。对二维序列T,MSA模块首先通过多层感知机进行线性投影(Linear Projection),使用三个线性映射层获得Q,K,V,如公式(1)中所示。
Q=TW Q,K=TW K,V=TW V     (1)
式(1)中,W Q,W K,W V∈R c×d是三个线性层的可学习参数,自注意力模块可以表示为:
Figure PCTCN2022138163-appb-000001
式(2)中,Z i,Q i∈R 1×d分别是Z和Q的第i行,
Figure PCTCN2022138163-appb-000002
代表注意度图,表示每个空间体素与其他体素之间的相似性,相似性越高,两点之间的联系越强。然后对V与S进行矩阵乘法,得到注意增强特征,MSA是SA的扩展,包含多个SA操作,将得到的结果组合在一起,用线性映射得到MSA的结果,其流程如下式(3)所示。
MSA(Z)=[SA 1(Z);SA 2(Z);...;SA m(Z)]W o    (3)
式(3)中,W o∈R mh×d,h=C/m,m是MSA中头的个数,MSA的输出将被输入至MLP中,整个过程可以由下式表示,
Z=MSA(T)+MLP(MSA(T))∈R n×d    (4)
需要说明的是,Transformer结构中的特征输入至MSA和MLP之前均经过层标准化(Layer Norm)进行归一化,最终得到Transformer结构能够输出的进行特征加权的高级语义特征。
在本实施例中,注意力机制模块将目标区域特征输入到对应的第一解码特征层中,包括:将目标区域特征与对应的第一解码特征层的输入信息进行点乘后输入到第一解码特征层中,其中,输入信息为第一解码特征层的前一层的输出信息。示例性的,参见图3所示,第一解码特征层e的输入信息为第一解码特征层d的输出信息;第一解码特征层f的输入信息为第一解码特征层e的输出信息。
在本实施例中,自注意力机制模块将全局上下文信息输入到对应的第二解码特征层中,包括:将全局上下文信息与对应的第二解码特征层的输入信息进行相加后输入到第二解码特征层中,输入信息为第二解码特征层的前一层的输出信息。示例性的,参见图3所示,第二解码特征层b的输入信息为第一解码 特征层a的输出信息;第二解码特征层c的输入信息为第一解码特征层b的输出信息。
本实施例中,第二解码特征层在生成对应的第二解码特征图时,是根据对应的第二跳跃连接中的自注意力机制模块所提取的全局上下文信息与对应的第二解码特征层的输入信息进行特征合并后生成的。也就是说,通过自注意力机制模块在第二编码特征图中构建了目标区域的全局上下文关系,能够使得解码器在进行特征合并时,对于边界模糊的目标区域的目标特征进行准确获取。第一解码特征层在生成对应的第一解码特征图时,是根据对应的第一跳跃连接中的注意力机制模块所得到的目标区域特征与对应的第一解码特征层的输入信息进行特征合并后生成的。
在本实施例中,解码器通过“上采样-特征合并-卷积”操作的方式进行特征合并时,需要连续进行多次,直至输出的第一解码特征图大小与输入的待处理图像的大小一致。
在本实施例中,解码器所输出的第一解码特征图即为经过自注意力机制模块确定后的目标区域的高级语义特征(目标区域的含义,例如目标区域为丘脑底核或者红核)以及经过自注意力机制模块确定后的目标区域的低级特征(目标区域在待处理图像中的具体位置)所生成的分割图像。
输出端,用于对第一解码特征图,经过一次大小为1×1的卷积核的卷积层以及Softmax函数的卷积操作,对第一解码特征图中的特征进行分类,得到分割结果,并输出分割图像。示例性的,该分割图像为包括有丘脑底核和红核分割结果的图像。
图6为本申请一个实施例提供的图像分割方法示意性流程图,应用于电子设备,参见图6所示,该方法包括以下步骤S601-S602。
S601、电子设备获取待处理图像。
在本实施例中,待处理图像包括各个领域(例如医学、军事、遥感以及气象等)中用于进行分割操作的所有图像。示例性的,在医学领域,通过核磁共 振成像仪所得到的人体各个部位的MRI图像(例如脑部MRI图像)。
在一些实施例中,电子设备在获取待处理图像时,可以通过用于采集需要进行分割操作图像的第二设备进行获取。例如,该第二设备可以是用于采集脑部MRI图像的脑部核磁共振成像仪。
在一些实施例中,电子设备可以与第二设备是同一设备,也可以是不同的设备。
S602、电子设备将待处理图像通过图像分割模型进行处理,得到分割图像。
本实施例中,电子设备通过图像分割模型对待处理图像进行处理,得到基于目标区域的分割图像。
示例性的,参见图7所示,为电子设备通过图像分割模型得到分割图像的过程示意图。如图7所示,电子设备输入的待处理图像为医学领域中的脑部MRI图像,经过图像分割模型对该脑部MRI图像中的丘脑底核和红核进行分割处理后,得到输出的分割图像。该分割图像中可以明显的突出丘脑底核和红核的所在位置和形状大小。
本申请提供的图像分割模型可以应用于如医学图像分割等各个领域,也可以应用于任何需要实现对待处理图像中的目标区域进行分割的技术中。
以下以医学图像分割领域中脑部MRI图像中丘脑底核和红核的分割任务为例,通过(一)训练样本集的选取、(二)图像分割模型的训练过程、(三)图像分割模型的可行性验证三个部分,对本申请提供的图像分割模型的训练过程和效果进行示例性的说明。
(一)训练样本集的选取
本实施例中以所有被诊断患有帕金森氏病的受试者的脑部MRI图像为训练样本,该训练样本中的所有图像均为通过在3T MRI扫描仪获取的T2模态图像,层厚为2mm,分辨率为0.6875×0.6875×2,数据大小为320×320×70。每张训练样本图像中的丘脑底核和红核均由两名具有6年以上神经放射学经验的放射科医生手动勾画。本实施例中共选择99例MRI图像样本以及对应的标 签,其中选择80例用作训练样本集,剩余的19例用作测试样本集。在训练样本集上进行5次交叉验证,利用每一次验证的图像分割模型分别获得测试样本集上的分割结果,并利用测试样本集上的平均结果来评价图像分割模型的性能。
(二)图像分割模型的训练过程
在本实施例中,图像分割模型在训练之前,所有的训练样本图像均被重采样至相同的空间分辨率并且裁剪至[192,192,48]作为图像分割模型的输入图像。在图像分割模型训练的过程中,可以采用数据增强的方式对训练样本集中的数据进行扩展,其中,数据增强的方式包括随机旋转、弹性形变、高斯噪声、镜像转换以及缩放。随机旋转的角度为(-π/12,π/12),缩放的范围为(0.85,1.25)。
在图像分割模型的训练阶段,以交叉熵损失和Dice损失之和作为损失函数,使用随机梯度下降(Stochastic Gradient Descent,SGD)优化器,学习率设置为0.01,动量设置为0.99,权重衰减设置为3e-5。示例性的,该图像分割模型整体的训练过程可使用Python实现,并基于PyTorch 1.8.0框架,在NVIDIA GeForce GTX 3090 GPU上进行训练和测试。训练批次大小设置为2,所有模型基于nnU-Net框架进行训练迭代150轮,每一轮迭代250个批次。
需要说明的是,采用上述已训练的图像分割模型进行图像分割的方法,和上述训练图像分割模型的方法可以是同一电子设备执行的,也可以是不同电子设备执行。该电子设备可以不限于各种智能手机、便携式笔记本,平板电脑、智能可穿戴设备、计算机、机器人等。
(三)图像分割模型的可行性验证
在本实施例中,将通过本实施例中提供的图像分割方法得到的分割图像结果与传统的U-Net,Attention U-Net,R2U-Net,CS2-Net以及全卷积神经网络(Fully Convolutional Networks,FCN)进行比较,将19个测试样本集上得到的结果进行比较。比较的指标包括Dice系数(Dice Similarity Coefficient,DSC),Jaccard系数(JA),灵敏度(Sensitivity,SEN)和95%Hausdorff距离(HD95),其中,该指标均用于评估模网络分割结果与标准分割结果之间的相似度,Dice 系数(DSC)、Jaccard系数(JA)、灵敏度(SEN)指标越大,HD95指标越小,表明相似度越高,拟合性能越好。
比较结果展示如表1,如表1所示,通过本实施例提供的图像分割方法在所有指标上均优于其他方法。具体来说,针对丘脑底核和红核,Dice系数分别达到88.20%和92.36%,相较基准方法U-Net分别提高了2.94%和3.20%。本实施例提供的图像分割方法在Jaccard系数有更大的优势,相较基准方法在丘脑底核和红核分别提升了4.9%和5.55%。相较Attention U-Net,本实施例提供的图像分割方法在两个目标上也分别有3.57%和4.75%的性能提升,这些提升表明本实施例中提出的HAU-Net针对丘脑底核分割任务有更好的学习能力和泛化能力。
表1不同方法的实验结果
Figure PCTCN2022138163-appb-000003
参见图8中部分分割结果展示,分割的重点难点区域由边框标出,如图8所示,通过本实施例中的方法与手动分割的标注一致性更高,有效性更强。
另外,在本实施例中,同样分别针对本实施例提供的图像分割模型(HAU-Net)中加入的Transformer结构和注意力门结构进行消融实验探究其对实验结果的影响。实验结果展示如表2。当从图像分割模型(HAU-Net)中去除Transformer结构或是注意力门结构后,均会造成模型性能的降低,如表2所示,对于丘脑底核和红核,Dice系数分别降低了2.11%/1.03%和2.17%/0.89%。
表2消融实验结果
Figure PCTCN2022138163-appb-000004
可见,通过在本实施例中提供的图像分割模型(HAU-Net)中加入Transformer结构和注意力门结构,能够有效提高图像分割模型的分割性能。
本实施例提供的图像分割模型(HAU-Net),其可以应用在医学图像分割领域,例如,在脑深部电刺激术(DBS)的电极植入过程中的靶点定位。基于此,本实施例还提供一种靶点定位方法,参见图9所示,包括以下步骤S901-S902。
S901、电子设备获取分割图像。
在本实施例中,电子设备通过将待处理图像(例如脑部MRI图像)输入至上述实施例中提供的图像分割模型中进行图像分割后,得到分割图像。
S902、电子设备基于分割图像确定靶点位置坐标。
电子设备在分割图像中,测量靶点的位置坐标,然后在原图(即脑部MRI图像)中将靶点的位置坐标标出。参见图10所示,为本实施例提供的靶点定位方法的定位过程示意图。
需要说明的是,本实施例中执行靶点定位方法的电子设备与上述执行图像分割模型(HAU-Net)训练过程以及执行图像分割方法的电子设备可以是同一个电子设备,也可以不同的电子设备。
本申请实施例中提供的图像处理方法,针对不同层次的特征(低级特征以及高级语义特征),以层次化注意力机制为导向进行针对性的处理,利用注意力门控机制以及基于自注意力的Transformer结构提升神经网络模型中低级特征和高级特征的提取效率,对神经网络模型局部特征和全局上下文信息进行更 高效地挖掘,从而提高图像的分割精度。通过本方法中提供的图像分割模型能够实现对脑部MRI图像中丘脑底核和红核的自动特征提取,实现丘脑底核以及红核的精准分割,并且能够针对分割图像能够定位丘脑底核脑深部电刺激术DBS中靶点的位置坐标,从而确定刺激电极的植入位置,能够提升手术效率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图11为本申请实施例提供的一种图像分割装置的示意图,如图11所示,该装置包括:获取单元,用于获取待处理图像;处理单元,用于将待处理图像通过已训练的图像分割模型进行处理,得到分割图像;其中,图像分割模型包括编码器和解码器,编码器和解码器通过跳跃连接对应连接;编码器用于对待处理图像进行编码处理,依次生成第一编码特征图和第二编码特征图;跳跃连接中配置有注意力机制模块和自注意力机制模块,注意力机制模块用于对第一编码特征图的低级特征进行特征增强处理,并将处理后的低级特征发送给解码器,其中,特征增强处理包括强化低级特征中的目标区域特征;自注意力机制模块用于提取第二编码特征图中高级语义特征的全局上下文信息,并将高级语义特征和全局上下文信息发送给解码器;解码器用于根据处理后的低级特征、高级语义特征和全局上下文信息,确定分割图像。
图12是本申请一实施例提供的电子设备的示意图。如图12所示,该实施例的电子设备12包括:处理器120、存储器121以及存储在所述存储器121中并可在所述处理器120上运行的计算机程序122,例如图像分割程序。所述处理器120执行所述计算机程序122时实现上述各个图像分割方法实施例中的步骤。或者,所述处理器120执行所述计算机程序122时实现上述各装置实施例中各模块/单元的功能。
示例性的,所述计算机程序122可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器121中,并由所述处理器120执行, 以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序122在所述电子设备12中的执行过程。
所述电子设备12可以是平板电脑、平板电脑、桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述电子设备可包括,但不仅限于,处理器120、存储器121。本领域技术人员可以理解,图12仅仅是电子设备12的示例,并不构成对电子设备12的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器120可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器121可以是所述电子设备12的内部存储单元,例如电子设备12的硬盘或内存。所述存储器121也可以是所述电子设备12的外部存储设备,例如所述电子设备12上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器121还可以既包括所述电子设备12的内部存储单元也包括外部存储设备。所述存储器121用于存储所述计算机程序以及所述电子设备所需的其他程序和数据。所述存储器121还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不 同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取待处理图像;
    通过已训练的图像分割模型对所述待处理图像进行处理,得到分割图像;
    其中,所述图像分割模型包括依次连接的M个第一编码特征层、N个第二编码特征层、N个第二解码特征层和M个第一解码特征层,M≥1,N≥1;
    M个所述第一编码特征层与M个所述第一解码特征层之间一一对应,所述第一编码特征层与对应的所述第一解码特征层之间设置有注意力机制模块,所述注意力机制模块用于对对应的所述第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将所述目标区域特征输入到对应的所述第一解码特征层中;
    N个所述第二编码特征层与N个所述第二解码特征层之间一一对应,所述第二编码特征层与对应的所述第二解码特征层之间设置有自注意力机制模块,所述自注意力机制模块用于从对应的所述第二编码特征层输出的高级语义特征中提取全局上下文信息,并将所述全局上下文信息输入到对应的所述第二解码特征层中。
  2. 根据权利要求1所述的方法,其特征在于,所述注意力机制模块为注意力门结构;所述自注意力机制模块为Transformer结构。
  3. 根据权利要求1所述的方法,其特征在于,所述将所述目标区域特征输入到对应的所述第一解码特征层中,包括:
    将所述目标区域特征与对应的所述第一解码特征层的输入信息进行点乘后输入到所述第一解码特征层中,所述输入信息为所述第一解码特征层的前一层的输出信息。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述全局上下文信息输入到对应的所述第二解码特征层中,包括:
    将所述全局上下文信息与对应的所述第二解码特征层的输入信息进行相加后输入到所述第二解码特征层中,所述输入信息为所述第二解码特征层的前一 层的输出信息。
  5. 根据权利要求1所述的方法,其特征在于,所述待处理图像包括脑部磁共振图像,所述分割图像为包括标记有丘脑底核和红核分割结果的图像。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    基于所述分割图像确定靶点位置坐标。
  7. 根据权利要求1~6任一项所述的方法,其特征在于,所述图像分割模型是通过以下方式训练的:
    获取训练集图像,所述训练及图像为标注有目标区域的图像;
    将所述训练集图像输入待训练的所述图像分割模型中,并基于损失函数对所述图像分割模型进行训练,所述损失函数是根据交叉熵损失和Dice损失之和确定的。
  8. 一种图像处理装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理图像;
    处理单元,用于通过已训练的图像分割模型对所述待处理图像进行处理,得到分割图像;
    其中,所述图像分割模型包括依次连接的M个第一编码特征层、N个第二编码特征层、N个第二解码特征层和M个第一解码特征层,M≥1,N≥1;
    M个所述第一编码特征层与M个所述第一解码特征层之间一一对应,所述第一编码特征层与对应的所述第一解码特征层之间设置有注意力机制模块,所述注意力机制模块用于对对应的所述第一编码特征层输出的低级特征进行特征增强处理,得到目标区域特征,并将所述目标区域特征输入到对应的所述第一解码特征层中;
    N个所述第二编码特征层与N个所述第二解码特征层之间一一对应,所述第二编码特征层与对应的所述第二解码特征层之间设置有自注意力机制模块,所述自注意力机制模块用于从对应的所述第二编码特征层输出的高级语义特征中提取全局上下文信息,并将所述全局上下文信息输入到对应的所述第二解码 特征层中。
  9. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述方法的步骤。
PCT/CN2022/138163 2022-07-15 2022-12-09 一种图像处理方法、装置、设备及可读存储介质 WO2024011835A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210831407.3 2022-07-15
CN202210831407.3A CN115330813A (zh) 2022-07-15 2022-07-15 一种图像处理方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024011835A1 true WO2024011835A1 (zh) 2024-01-18

Family

ID=83916807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138163 WO2024011835A1 (zh) 2022-07-15 2022-12-09 一种图像处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN115330813A (zh)
WO (1) WO2024011835A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330813A (zh) * 2022-07-15 2022-11-11 深圳先进技术研究院 一种图像处理方法、装置、设备及可读存储介质
CN116402996A (zh) * 2023-03-20 2023-07-07 哈尔滨工业大学(威海) 图像分割方法、装置、存储介质及电子装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260663A (zh) * 2020-01-15 2020-06-09 华南理工大学 鼻咽癌的病灶图像分割装置、设备及计算机可读存储介质
CN112465828A (zh) * 2020-12-15 2021-03-09 首都师范大学 一种图像语义分割方法、装置、电子设备及存储介质
CN113744284A (zh) * 2021-09-06 2021-12-03 浙大城市学院 脑肿瘤图像区域分割方法、装置、神经网络及电子设备
US20210390700A1 (en) * 2020-06-12 2021-12-16 Adobe Inc. Referring image segmentation
WO2022032823A1 (zh) * 2020-08-10 2022-02-17 中国科学院深圳先进技术研究院 图像分割方法、装置、设备及存储介质
CN114581662A (zh) * 2022-02-17 2022-06-03 华南理工大学 一种脑肿瘤图像的分割方法、系统、装置及存储介质
CN115330813A (zh) * 2022-07-15 2022-11-11 深圳先进技术研究院 一种图像处理方法、装置、设备及可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260663A (zh) * 2020-01-15 2020-06-09 华南理工大学 鼻咽癌的病灶图像分割装置、设备及计算机可读存储介质
US20210390700A1 (en) * 2020-06-12 2021-12-16 Adobe Inc. Referring image segmentation
WO2022032823A1 (zh) * 2020-08-10 2022-02-17 中国科学院深圳先进技术研究院 图像分割方法、装置、设备及存储介质
CN112465828A (zh) * 2020-12-15 2021-03-09 首都师范大学 一种图像语义分割方法、装置、电子设备及存储介质
CN113744284A (zh) * 2021-09-06 2021-12-03 浙大城市学院 脑肿瘤图像区域分割方法、装置、神经网络及电子设备
CN114581662A (zh) * 2022-02-17 2022-06-03 华南理工大学 一种脑肿瘤图像的分割方法、系统、装置及存储介质
CN115330813A (zh) * 2022-07-15 2022-11-11 深圳先进技术研究院 一种图像处理方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN115330813A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
Nema et al. RescueNet: An unpaired GAN for brain tumor segmentation
Yang et al. Research on feature extraction of tumor image based on convolutional neural network
WO2024011835A1 (zh) 一种图像处理方法、装置、设备及可读存储介质
WO2020108562A1 (zh) 一种ct图像内的肿瘤自动分割方法及系统
KR20210048523A (ko) 이미지 처리 방법, 장치, 전자 기기 및 컴퓨터 판독 가능 기억 매체
CN109145745B (zh) 一种遮挡情况下的人脸识别方法
WO2015090126A1 (zh) 人脸特征的提取、认证方法及装置
WO2022001237A1 (zh) 鼻咽癌原发肿瘤图像自动识别方法及系统
Li et al. Robust iris segmentation algorithm in non-cooperative environments using interleaved residual U-Net
CN115953665B (zh) 一种目标检测方法、装置、设备及存储介质
CN112396605B (zh) 网络训练方法及装置、图像识别方法和电子设备
WO2021017006A1 (zh) 图像处理方法及装置、神经网络及训练方法、存储介质
CN102663446A (zh) 一种医学病灶图像的词袋模型的构建方法
Jiang et al. TransCUNet: UNet cross fused transformer for medical image segmentation
Jiang et al. MTPA_Unet: Multi-scale transformer-position attention retinal vessel segmentation network joint transformer and CNN
Zhang et al. Dermoscopic image retrieval based on rotation-invariance deep hashing
Hu et al. An efficient R-transformer network with dual encoders for brain glioma segmentation in MR images
Peng et al. MShNet: Multi-scale feature combined with h-network for medical image segmentation
CN117036894B (zh) 基于深度学习的多模态数据分类方法、装置及计算机设备
Wu et al. Continuous Refinement-based Digital Pathology Image Assistance Scheme in Medical Decision-Making Systems
Krishna et al. Standard fetal ultrasound plane classification based on stacked ensemble of deep learning models
CN113159053A (zh) 图像识别方法、装置及计算设备
Sun et al. A biologically-inspired framework for contour detection using superpixel-based candidates and hierarchical visual cues
CN116823613A (zh) 基于梯度增强注意力的多模态mr影像超分辨率方法
CN108154107B (zh) 一种确定遥感图像归属的场景类别的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950942

Country of ref document: EP

Kind code of ref document: A1