CN113421276B

CN113421276B - Image processing method, device and storage medium

Info

Publication number: CN113421276B
Application number: CN202110753487.0A
Authority: CN
Inventors: 黄惠; 谢铭锐
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-07-21
Anticipated expiration: 2041-07-02
Also published as: CN113421276A

Abstract

The invention discloses an image processing method, an image processing device and a storage medium, wherein the method comprises the following steps: acquiring a medical image to be segmented, inputting the medical image into a pre-trained image segmentation model for segmentation processing, and obtaining a segmentation result diagram corresponding to the medical image; wherein the model employs an encoder-decoder architecture, the encoder comprising an attention model; and obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point. According to the invention, the attention model is introduced into the encoder, so that the relation among pixels in the image to be segmented is effectively reserved, and the problem that the segmented lesion edges are discontinuous when the medical image is processed by the existing depth convolution network of the encoder-decoder structure of the U-net, so that the image segmentation result is inaccurate is solved.

Description

Image processing method, device and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an image processing method, apparatus, and storage medium.

Background

In recent years, the image processing gets rid of the limitations of the original equipment and technology, and becomes a new and promising subject. A large number of scholars and researchers at home and abroad are searching and researching image understanding and machine vision, and have achieved a plurality of important achievements. Image segmentation is one of key technologies of image processing, and with the development of deep learning in recent years, the application of a deep learning method to image segmentation has achieved the most efficient result at present.

Most records of the image processing neighborhood are refreshed after adding the deep learning method, which proves the superiority of the deep learning in image processing. The first popular deep learning method for segmentation tasks was image block classification, i.e. each pixel is classified independently using the image blocks surrounding the pixel. The main reason for using image block classification is that the classification network is typically a fully connected layer and requires a fixed size image. In 2014, long et al, university of california, berkeley division, proposed a full convolution network with which image segmentation maps of arbitrary size can be generated, and which is much faster than image block classification. After that, almost all advanced methods in the segmentation field employ this model. But this method loses much detail information in the up-sampling, which is not ideal for a smaller segmentation of the data set of the medical image. Until 2015, olaf et al proposed a deep convolutional network of the encoder-decoder structure of U-net, with great success in medical image segmentation. However, when the existing depth convolution network of the encoder-decoder structure of the U-net processes the medical image, the segmented lesion edges are discontinuous, i.e. the boundaries of the lesions cannot be completely detected, so that the accuracy of the image segmentation result is reduced.

Accordingly, there is a need for improvement and development in the art.

Disclosure of Invention

The invention aims to solve the technical problems of the prior art, and provides an image processing method, an image processing device and a storage medium, which aim to solve the problem that when a depth convolution network of an existing U-net encoder-decoder structure processes medical images, the segmented lesion edges are discontinuous, so that the image segmentation result is inaccurate.

The technical scheme adopted by the invention for solving the problems is as follows:

in a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:

acquiring a medical image to be segmented, and inputting the medical image into a pre-trained image segmentation model;

extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information;

Inputting the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image;

and obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point.

In one embodiment, the training process of the image segmentation model specifically includes:

acquiring a training sample set, wherein the training sample set comprises a plurality of training images, and each pixel point in the plurality of training images comprises a true corresponding standard channel value;

inputting training images in the training sample set into a preset network model, and outputting predicted channel values corresponding to each pixel point respectively through the preset network model;

the model parameters of the preset network model are adjusted through the prediction channel values corresponding to the pixel points and the standard channel values corresponding to the pixel points, and the step of inputting the training images in the training sample set into the preset network model is continuously executed until preset training conditions are met, so that the image segmentation model is obtained; the preset training condition is that the difference value between the predicted channel value corresponding to each pixel point and the standard channel value corresponding to each pixel point is smaller than a preset error threshold value.

In one embodiment, the acquiring a training sample set includes:

acquiring an original training sample set;

and carrying out data augmentation processing on each original training image in the original training sample set to obtain the training sample set.

In one embodiment, the data augmentation process includes one or more of image contrast adjustment, image sharpness adjustment, and image illuminance uniformity adjustment.

In one embodiment, the image segmentation model includes an encoder module that includes: a plurality of feature extraction layers and attention models which are sequentially cascaded;

the feature extraction layers are used for extracting features of the medical image;

and the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information.

In one embodiment, the image segmentation model includes a decoder module including a convolutional layer, an upsampling layer, a fusion layer;

the convolution layer is used for unifying the channel number of the feature images and the channel number of the enhancement feature images output by the feature extraction layers to the target channel number to obtain a plurality of standard feature images and standard enhancement feature images;

The up-sampling layer is used for up-sampling a plurality of feature images to be amplified respectively to obtain a plurality of target feature images; the feature images to be amplified are standard feature images corresponding to feature extraction layers other than the first layer in the feature extraction layers and the standard enhancement feature images;

the fusion layer is used for fusing the target feature images and the standard feature images corresponding to the first layer in the feature extraction layers to obtain the segmentation result images.

In one embodiment, the classifying each pixel according to the channel value corresponding to each pixel includes:

for each pixel point, a preset threshold value is obtained, and a channel value corresponding to the pixel point is compared with the preset threshold value;

and when the channel value corresponding to the pixel point is larger than the preset threshold value, determining that the pixel point is classified as an abnormal pixel point.

In one embodiment, the image segmentation model includes an error optimization module for optimizing network parameters in the encoder and the decoder.

In one embodiment, the number of channels of each pixel in the segmentation result map is 1.

In a second aspect, an embodiment of the present invention further provides an image processing apparatus, where the image processing apparatus includes:

the input module is used for acquiring a medical image to be segmented and inputting the medical image into a pre-trained image segmentation model;

the image segmentation model comprises an encoder module 03 and a decoder module 04;

the encoder module includes: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information;

the decoder module is used for generating a segmentation result diagram corresponding to the medical image according to the feature diagrams and the enhancement feature diagrams;

and the classification module is used for acquiring channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point.

In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory and one or more processors; the memory stores one or more programs; the program contains instructions for performing the image segmentation method as set forth in any one of the above; the processor is configured to execute the program.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a plurality of instructions, wherein the instructions are loaded and executed by a processor to implement the steps of any of the above-described image segmentation methods.

The invention has the beneficial effects that: according to the embodiment of the invention, the medical image to be segmented is acquired and is input into the pre-trained image segmentation model; extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information; inputting the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image; and obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point. According to the invention, the attention model is introduced into the encoder, so that the relation among pixels in the image to be segmented is effectively reserved, and the problem that the segmented lesion edges are discontinuous when the medical image is processed by the existing depth convolution network of the encoder-decoder structure of the U-net, so that the image segmentation result is inaccurate is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

Fig. 1 is a schematic step diagram of an image processing method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of image contrast adjustment according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of image sharpness adjustment provided by an embodiment of the present invention.

Fig. 4 is a schematic diagram of image illumination uniformity adjustment according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of an overall structure of an image segmentation model according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of an attention module according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a decoder according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a segmentation effect of an image segmentation model according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of a training process of an image segmentation model according to an embodiment of the present invention.

Fig. 10 is a schematic block diagram of a terminal according to an embodiment of the present invention.

Fig. 11 is a schematic diagram illustrating connection of internal modules of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In view of the above-mentioned drawbacks of the prior art, the present invention provides an image processing method, which includes acquiring a medical image to be segmented, and inputting the medical image to a pre-trained image segmentation model; extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information; inputting the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image; and obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point. According to the invention, the attention model is introduced into the encoder, so that the relation among pixels in the image to be segmented is effectively reserved, and the problem that the segmented lesion edges are discontinuous when the medical image is processed by the existing depth convolution network of the encoder-decoder structure of the U-net, so that the image segmentation result is inaccurate is solved.

As shown in fig. 1, the present embodiment provides an image processing method, which includes the steps of:

step S100, acquiring a medical image to be segmented, and inputting the medical image into a pre-trained image segmentation model.

The medical images to be segmented in the present embodiment may be different types of dermatoscopic medical images, and the corresponding image segmentation models thereof are determined according to the types of the medical images, such as a pigment network type medical image, a negative pigment network type medical image, a stripe structure type medical image, a papulotumor type medical image, and a spherical structure type medical image, which are different in type, so that the 5 types of medical images need to be respectively subjected to image segmentation by the 5 image segmentation models.

As shown in fig. 1, the method further comprises the steps of:

step 200, extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; and the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information.

Because the encoder in the image segmentation model adopted in the embodiment has the attention model, the relationship between each pixel point in the medical image to be segmented can be extracted through the attention model, so that more detail information in the medical image to be segmented is reserved, and the segmentation accuracy of the image segmentation model is improved. The encoder module in this embodiment is mainly responsible for feature extraction of medical images of the input image segmentation model. After the medical image is input into the encoder module, a first layer of a plurality of feature extraction layers is firstly input, the first feature extraction layer is used for extracting features of the medical image to generate a first feature image, the first feature image is input into a second feature extraction layer, and the like until the feature image output by the last feature extraction layer is obtained, namely, the number of feature extraction layers is more, and the number of feature images can be generated. In addition, the embodiment is further provided with an attention model after the last feature extraction layer, the feature map output by the last feature extraction layer is input into the attention model, modeling is performed according to the attention model according to the relation among pixels in the feature map, and an enhanced feature map containing rich context information is output.

In one implementation, as shown in FIG. 6, the feature extraction layers may be 4 layers in the backbone network ResNet-101. For the feature map output by the last layer of the feature extraction layers, the embodiment adopts the attention module to model the relation among pixel points in the feature map, so as to generate an enhanced feature map containing rich context information, thereby providing rich features for the subsequent decoder and reducing the loss of detail information after passing through the 4-layer feature extraction layers. As shown in FIG. 6, FIG. 6 shows the structure of the attention module used in the present embodiment, and the feature map input to the attention module, i.e. the feature map output by the last feature extraction layer is X ε R ^H×W×C Attention toThe enhanced feature map output by the module is Y epsilon R ^H×W×C . The specific working principle of the attention module is as follows:

A＝softmax(δ(X)*θ(X)) (5)

wherein, delta, theta, eta,four volumes, i.e. convolution layers with a kernel size of 1 x 1, with a step size of 1 are shown. "x" denotes the multiplication between matrices. "+" indicates a sum operation between matrix elements. To calculate the relationship between each pixel and all other pixels, delta (X) E R is needed ^H×W×C Deformed into delta (X) epsilon R ^HW×C And θ (X) ∈R ^H×W×C Deformation into θ (X) ∈R ^C×HW . After multiplication operation is carried out on the two matrices after deformation, an attention matrix A epsilon R is obtained ^HW×HW Wherein the h·w elements of the i-th row of the attention matrix represent the correlation between the i-th pixel and all the remaining pixels in the original input feature map. +.>The weighted sum of all other pixels is calculated for each pixel, and then the weighted sum is fused with the original input feature image X to obtain the final output enhancement feature image Y epsilon R of the attention module ^H×W×C 。

As shown in fig. 1, the method further comprises the steps of:

and step S300, inputting the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image.

The decoder module of the embodiment is mainly responsible for decoding a plurality of feature maps output by the encoder module and an enhanced feature map output by the attention model, so that detailed information in the image is recovered, thereby generating a high-resolution segmented image.

In one implementation, as shown in fig. 7, the image segmentation model includes a decoder module including a convolutional layer, an upsampling layer, and a fusion layer.

Specifically, since the resolution of the feature map output by each of the feature extraction layers and the resolution of the enhanced feature map are different, a map with a smaller resolution contains more global feature information, but lacks sufficient local feature information. Conversely, a map with greater resolution contains more local feature information, but tends to have smaller receptive fields and thus less global feature information. The global feature and the global feature have a crucial effect on generating an accurate segmentation result diagram finally, so in order to balance the two feature information, the embodiment adopts a step-by-step up-sampling and fusion mode to fuse the feature diagrams containing different feature information. Firstly, in this embodiment, dimension reduction is required according to the feature map output by each of the plurality of feature extraction layers and the enhanced feature map of the convolutional layer in the decoder, so that the number of channels before each feature map is the same number of layers. Then, since the size differences among the feature maps are the same multiple, in order to enable the feature maps to be fused in sequence, that is, to sequentially perform the summation operation among elements, the feature maps need to be up-sampled from the smallest feature map to obtain a first enlarged feature map, and then the first enlarged feature map is fused with the feature maps with the same size to obtain a first fused map. And then up-sampling the first fusion map to obtain a second expansion feature map, then fusing the second expansion feature map with the same size to obtain a second fusion map, and the like until all the feature maps are fused, and finally generating a segmentation result map according to the final fusion to obtain a target fusion map.

In one implementation, the plurality of feature extraction layers are 3 feature extraction layers, the 3 feature extraction layers sequentially output a first feature map, a second feature map, a third feature map, and the attention model outputs an enhanced feature map. The first feature map, the second feature map, the third feature map, and the enhanced feature map output by the attention model are all inputs to the decoder. And after the first feature map, the second feature map, the third feature map and the enhancement feature map are input into the decoder, respectively inputting a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer to obtain a first convolution map, a second convolution map, a third convolution map and a fourth convolution map. And then the fourth convolution map is input into a first up-sampling layer to obtain a first expansion feature map, and the first expansion feature map and the third convolution map are input into a first fusion layer to obtain a first fusion map. And then inputting the first fusion map into a second upsampling layer to obtain a second enlarged feature map, and inputting the second enlarged feature map and the second convolution map into a second fusion layer to obtain a second fusion map. And then inputting the second fusion map into a third upsampling layer to obtain a third enlarged feature map, and inputting the third enlarged feature map and the first convolution map into a third fusion layer to obtain a third fusion map. And then fusing the first fusion map, the second fusion map, the third fusion map and the enhanced feature map through a series of convolution layers, an activation layer and a fusion layer to obtain a segmentation result map with the same size as the input medical image.

In one implementation, as shown in fig. 7, fig. 7 illustrates a specific structure of the decoder module, where X1 is a feature map output by a first layer of feature extraction layers in the encoder module, X2 is a feature map output by a second layer of feature extraction layers in the encoder module, X3 is a feature map output by a third layer of feature extraction layers in the encoder module, X4 is an enhancement feature map output by the attention model, and Y is a segmentation result map finally output.

In one implementation, the first, second, third, and fourth convolution layers are convolution layers having 256 convolution kernels and a size of 1X 1.

In one implementation, the first upsampling layer, the second upsampling layer, and the third upsampling layer are upsampling layers having a scaling factor of 2.

It should be noted that, the image segmentation model adopted in this embodiment is trained in advance, and the training process specifically includes:

step S1, acquiring a training sample set, wherein the training sample set comprises a plurality of training images, and each pixel point in the plurality of training images comprises a real corresponding standard channel value;

S2, inputting training images in the training sample set into a preset network model, and outputting predicted channel values corresponding to each pixel point respectively through the preset network model;

step S3, model parameters of the preset network model are adjusted through the prediction channel values corresponding to the pixel points and the standard channel values corresponding to the pixel points, and the step of inputting training images in the training sample set into the preset network model is continuously executed until preset training conditions are met, so that the image segmentation model is obtained; the preset training condition is that the difference value between the predicted channel value corresponding to each pixel point and the standard channel value corresponding to each pixel point is smaller than a preset error threshold value.

In short, the embodiment predetermines the real standard channel value of each pixel point in the training image, then determines the error feedback between the standard channel value and the predicted channel value of each pixel point of the preset network model through the standard channel value and the predicted channel value of each pixel point, and continuously adjusts the model parameters of the preset network model according to the error feedback, so that the predicted channel value output by the preset network model is continuously close to the standard channel value until the error of the two is acceptable, namely, the training is completed, and the model obtained after the training is used as the image segmentation model. It can be understood that different image segmentation models can be trained by using different types of training samples, for example, a medical image of a pigment network type, a medical image of a negative pigment network type, a medical image of a strip structure type, a medical image of a papulous tumor type and a medical image of a spherical structure type can be respectively trained to obtain 5 image segmentation models.

In order to avoid the overfitting of the network model to a certain extent, the network model learns the characteristic with better generalization performance. The training sample set in this embodiment requires a sufficient number of training images in order to obtain a large number of training images. In one implementation, after the original training sample set is obtained, the present implementation needs to perform data augmentation processing on each original training image in the original training sample set to obtain the training sample set. The data augmentation process includes one or more of image contrast adjustment, image sharpness adjustment, and image illumination uniformity adjustment.

Specifically, as shown in fig. 2, the method for adjusting the image contrast is as follows:

p＝max(X)-λ·max(X) (2)

q＝min(X)+μ·max(X) (3)

wherein X and Y are the input image and the processed output image, respectively. (p, q) and (m, n) are the upper and lower limits of the pixel values of the input image and the output image, respectively. λ and μ are random scalar values within a preset range, which may be set to 0.1 to 0.2, for example. The present embodiment controls the contrast of the output image by adjusting p and q. In one implementation, to obtain an augmented picture with low contrast, p and q may be set to the minimum and maximum values in the pixels of the input image X, while m and n are calculated according to equations (2) (3), and the values of the pixels of the output picture are calculated by substituting equation (1). In another implementation, to obtain an enhanced picture of high contrast, p and q may be set to 5% and 95% of the pixel value, respectively, and m and n may be set to 0 and 255.

Specifically, as shown in fig. 3, the adjustment method of the image sharpness is as follows:

in one implementation, to reduce the sharpness of the original training image, the original training image may be blurred by using a Gaussian filter. In one implementation, a two-dimensional gaussian kernel function may be used and the standard deviation σ of the two-dimensional gaussian kernel function is set to a random number within a preset range, for example, the standard deviation may be a random number between 0.5 and 1.1. And determining the size of the window based on the standard deviation, e.g., the window size may be set toFinally, to enhance the sharpness of an image, the following formula can be used for calculation:

Y＝(1+α)·X-α(X*g(σ)) (4)

wherein X and Y are the input image and the processed output image, respectively. g (σ) is a gaussian kernel function with standard deviation σ. * Representing a convolution operation. The sharpness of the generated image is determined by both σ and α. In one implementation, σ may be set to 2 and α to 1.

Specifically, the adjustment method of the image illuminance uniformity is as follows: for an input image of size h·w, a scalar α is randomly selected within a first range of values, which may be set to between 0.6 and 0.7, for example. And randomly selecting a scalar beta within a second range of values that is greater than the first range of values, e.g., the second range of values may be set to 1.1 to 1.2. The H values are then sampled at equal intervals between α and β, thereby yielding a vector of length H. And then, transversely splicing the W identical vectors to obtain an illuminance gradient map with the same size of the input image. In practice, two illumination gradient maps in the vertical direction and in the horizontal direction may be generated as desired. In summary, the background portion of the medical image may appear as a non-uniform bright-dark image under different illumination environments, so in order to obtain more training images, the present embodiment may simulate the medical image background under different illumination environments by multiplying the original training image with a randomly generated illumination gradient map, so as to derive more training images from one training image.

In particular, when training a network model, in order to prevent problems such as gradient explosion and gradient dispersion, the training process of the network model may be divided into a preheating stage and a formal training stage. As shown in fig. 9, during the initial 5000 iterations, the network is easily affected by the parameter initialization value, and thus a gradient explosion is generated. Thus, the initialization parameters of the network can be fine-tuned first with a lower learning rate. The learning rate of the warm-up phase increases linearly. As the number of iterations increases, the learning rate also gradually increases. The increase in learning rate is shown in the equation of fig. 9, and in one implementation, the initial learning rate lr_init may be set to 0.001 and the target learning rate to 0.01. When the number of iterations is greater than a preset number of thresholds (e.g., 5000), the network enters a formal training phase. In one implementation, the learning rate may be adjusted using a poly learning strategy, as shown in the formula in fig. 9, and in one implementation, power may be set to 0.9 and max_iter to 40000, with an initial learning rate lr_init equal to the target learning rate of 0.01 for the warm-up phase. In the formal training stage, the learning rate of the network gradually decreases with the increase of the iteration number, so that the network can better converge at an optimal value. After the network model is trained, an image segmentation model which can be directly put into use is obtained.

As shown in fig. 1, the method further comprises the steps of:

step 400, obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point.

Since each pixel has a corresponding channel value, the channel value may reflect some characteristics of the pixel, such as color shade, etc., the embodiment will perform two classifications for each pixel according to the channel value corresponding to each pixel. In one implementation manner, in order to accurately classify the pixels, the number of channels of each pixel in the segmentation result graph is 1, that is, each pixel has only one channel value, and in this embodiment, each pixel is classified into two categories according to the channel value corresponding to each pixel.

In one implementation, the step S400 specifically includes the following steps:

step S401, for each pixel point, a preset threshold value is obtained, and a channel value corresponding to the pixel point is compared with the preset threshold value;

step S402, when the channel value corresponding to the pixel point is larger than the preset threshold value, determining that the classification of the pixel point is an abnormal pixel point.

Specifically, the present implementation may consider the channel value corresponding to each pixel point as a probability value in two classifications, where the larger the channel value corresponding to a certain pixel point, the larger the probability value corresponding to one of the classification types. In this embodiment, all pixels are classified into two types, i.e., normal pixels and abnormal pixels, and a preset threshold is preset as a classification criterion. If the channel corresponding to a pixel point is larger than the preset threshold value, classifying the pixel point as an abnormal pixel point; and if the channel value corresponding to a certain pixel point is smaller than or equal to the preset threshold value, classifying the pixel point as a normal pixel point.

In one implementation, as shown in fig. 5, the image segmentation model further includes an error optimization module for optimizing network parameters in the encoder and the decoder.

Specifically, the error optimization module is used for calculating the error of the whole neural network and performing back propagation calculation gradient to optimize parameters in the network. The loss function of the error optimization module is calculated as follows:

wherein y is _p Is a segmentation result graph of image segmentation model prediction. y is _t The segmentation result graph is manually marked. Alpha and beta are smoothing parameters set specifically according to the lesion class predicted by the image segmentation model. The loss function can enable the image segmentation model to be converged more quickly and effectively, and enable the neural network to have certain robustness.

In order to prove the technical effect of the invention, the inventor builds the image segmentation model by adopting a PyTorch frame, and carries out model training by using four Quadro P6000 video cards with 24G video memory. The initial values of the parameters in the feature extraction layer of the encoder, resNet-101, are pre-trained from the ImageNet dataset. The migration learning mode enables the network to converge more effectively and achieve better segmentation effect when the network trains other scene pictures in the follow-up mode. The specific software and hardware development environments of the image segmentation model are shown in tables 1 and 2.

TABLE 1 hardware development Environment

Operating system	Linux64bit
		Processor and method for controlling the same	Intel(R)Xeon(R)CPUE5-2670v3@
Memory	128GB
		Display card	4*QuadroP6000
Video memory	24G

TABLE 2 software development Environment

The inventor adopts random gradient descent to carry out iterative optimization on parameters of the image segmentation model. The momentum coefficient of the optimizer is 0.9. The whole training process is iterated for 45000 times to complete convergence. When the image is segmented and predicted, the inventor adopts a multi-scale prediction method, and sets 6 scaling scales [0.75,1.0,1.25,1.5,1.75,2.0] so as to finally generate a more accurate segmentation result graph.

The inventors tested the image segmentation model on the ISIC2018 dataset. The image segmentation model is used for medical image segmentation of skin lesions. In particular, for skin lesion attribute detection, the dataset contains 2594 dermoscopic images. Each image corresponds to a truth-value diagram marked by five manual experts, and corresponds to five types of skin mirror characteristic structures, namely a pigment network structure, a negative pigment network structure, a strip-shaped structure, a papular tumor and a spherical structure. In addition, the data set includes a verification set and a test set with a size of 100 and a size of 1000, which are used for testing the performance of the image segmentation model. As a result of the test, as shown in table 3, the inventors used three different feature extraction layers to perform feature extraction on a lesion image, and used cumulative jacobian coefficients (C-JSI) and cumulative dess coefficients (C-DSC) to verify the segmentation effect of an image segmentation model.

TABLE 3 validation results on ISIC2018 dataset

As shown in table 3, the inventors used three different feature extraction layers to perform feature extraction on lesion images and used cumulative jacal coefficients (C-JSI) and cumulative dess coefficients (C-DSC) to verify the algorithm effect.

As shown in fig. 8, fig. 8 illustrates the recognition result of the skin mirror image by the image segmentation model. Each row in fig. 8 represents a different type of dermatological feature. The first column is an original input image, the second column is a segmentation result image marked by an artificial expert, and the third column is a segmentation result image output by an image segmentation model. It can be seen that the image segmentation model has a certain accuracy for the identification of the type of dermatological feature.

Based on the above embodiments, the present invention also provides an image processing apparatus, as shown in fig. 11, including:

the input module 01 acquires a medical image to be segmented, and inputs the medical image into a pre-trained image segmentation model;

the image segmentation model 02 comprises an encoder module 03 and a decoder module 04;

the encoder module 03 includes: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information;

The decoder module 04 is used for generating a segmentation result diagram corresponding to the medical image according to the feature diagrams and the enhancement feature diagrams;

the classification module 05 is configured to obtain a channel value corresponding to each pixel point in the segmentation result graph, and classify each pixel point according to the channel value corresponding to each pixel point.

Based on the above embodiment, the present invention also provides a terminal, and a functional block diagram thereof may be shown in fig. 10. The terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein the processor of the terminal is adapted to provide computing and control capabilities. The memory of the terminal includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the terminal is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image processing method. The display screen of the terminal may be a liquid crystal display screen or an electronic ink display screen.

It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, and that a particular terminal may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

In one implementation, one or more programs are stored in a memory of the terminal and configured to be executed by one or more processors, the one or more programs including instructions for performing an image processing method.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

In summary, the invention discloses an image processing method, an image processing device and a storage medium, wherein the method comprises the following steps: acquiring a medical image to be segmented, and inputting the medical image into a pre-trained image segmentation model; extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input to the first layer of the feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information; inputting the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image; and obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point. According to the invention, the attention model is introduced into the encoder, so that the relation among pixels in the image to be segmented is effectively reserved, and the problem that the segmented lesion edges are discontinuous when the medical image is processed by the existing depth convolution network of the encoder-decoder structure of the U-net, so that the image segmentation result is inaccurate is solved.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. An image processing method, the method comprising:

extracting features of the medical image through an encoder module in the image segmentation model to obtain a plurality of feature images and enhanced feature images; wherein the encoder module comprises: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input of a first layer in a plurality of feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information;

inputting a plurality of the feature images and the enhancement feature images into a decoder module in the image segmentation model to obtain a segmentation result image corresponding to the medical image;

Obtaining channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point;

the method for acquiring the training sample set of the image segmentation model comprises the following steps: acquiring an original training sample set; performing data augmentation processing on each original training image in the original training sample set to obtain the training sample set; the data augmentation process includes one or more of image contrast adjustment, image sharpness adjustment, and image illumination uniformity adjustment; the adjustment method of the image illumination uniformity comprises the following steps: for an input image of size H.W, randomly selecting a scalar alpha 'in a first numerical range and randomly selecting a scalar beta' in a second numerical range, wherein the second numerical range is larger than the first numerical range; equally spaced H values between alpha 'and beta', thereby obtaining a vector of length H; after the W identical vectors are transversely spliced, an illuminance gradient diagram with the same size as the input image is obtained, wherein the type of the illuminance gradient diagram comprises two illumination gradients in the vertical direction and the horizontal direction;

The working principle of the decoder is as follows: the number of the feature extraction layers is 3, the 3 feature extraction layers sequentially output a first feature image, a second feature image and a third feature image, and the attention model outputs the enhanced feature image; after the first feature map, the second feature map, the third feature map and the enhancement feature map are input into the decoder, respectively inputting a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer to obtain a first convolution map, a second convolution map, a third convolution map and a fourth convolution map; inputting the fourth convolution map into a first up-sampling layer to obtain a first enlarged feature map, and inputting the first enlarged feature map and the third convolution map into a first fusion layer to obtain a first fusion map; inputting the first fusion map into a second upsampling layer to obtain a second enlarged feature map, and inputting the second enlarged feature map and the second convolution map into a second fusion layer to obtain a second fusion map; inputting the second fusion map into a third upsampling layer to obtain a third enlarged feature map, and inputting the third enlarged feature map and the first convolution map into a third fusion layer to obtain a third fusion map; fusing the first fusion map, the second fusion map, the third fusion map and the enhancement feature map through a series of convolution layers, an activation layer and a fusion layer to obtain a segmentation result map with the same size as an input medical image;

The image segmentation model comprises an error optimization module, wherein the error optimization module is used for calculating the error of the whole neural network and performing counter propagation to calculate gradients so as to optimize parameters in the network; the loss function of the error optimization module is calculated as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a segmentation result graph of image segmentation model prediction; />The segmentation result graph is marked manually; alpha and beta are smoothing parameters set specifically according to the lesion class predicted by the image segmentation model.

2. The image processing method according to claim 1, wherein the training process of the image segmentation model specifically includes:

acquiring a training sample set, wherein the training sample set comprises a plurality of training images, and each pixel point in the training images comprises a true corresponding standard channel value;

3. The image processing method according to claim 1, wherein the image segmentation model comprises a decoder module, the decoder comprising a convolutional layer, an upsampling layer, a fusion layer;

the convolution layer is used for unifying the channel numbers of the feature images and the channel numbers of the enhancement feature images to the target channel numbers to obtain a plurality of standard feature images and standard enhancement feature images;

the up-sampling layer is used for up-sampling a plurality of feature images to be amplified respectively to obtain a plurality of target feature images; the feature images to be amplified are standard feature images corresponding to feature extraction layers which are not the first layer in the feature extraction layers respectively, and the standard enhancement feature images;

and the fusion layer is used for fusing the target feature images and the standard feature images corresponding to the first layer in the feature extraction layers to obtain the segmentation result images.

4. The image processing method according to claim 1, wherein the classifying each pixel according to the channel value corresponding to each pixel includes:

acquiring a preset threshold value corresponding to each pixel point, and comparing a channel value corresponding to the pixel point with the preset threshold value;

5. The image processing method according to claim 1, wherein the number of channels of each pixel in the segmentation result map is 1.

6. An image processing apparatus, characterized in that the image processing apparatus comprises:

the image segmentation model comprises an encoder module and a decoder module;

the encoder module includes: a plurality of feature extraction layers and attention models which are sequentially cascaded; the medical image is input of a first layer in a plurality of feature extraction layers, and the feature extraction layers respectively output a feature image; the attention model is used for modeling the relation among pixels according to the feature image output by the last layer in the feature extraction layers to obtain an enhanced feature image containing context information;

the decoder module is used for generating a segmentation result diagram corresponding to the medical image according to a plurality of the feature diagrams and the enhancement feature diagrams;

The classification module is used for acquiring channel values corresponding to each pixel point in the segmentation result diagram, and classifying each pixel point according to the channel values corresponding to each pixel point;

7. A computer readable storage medium having stored thereon a plurality of instructions loaded and executed by a processor for carrying out the steps of the image processing method according to any of the preceding claims 1-5.