CN115311219A

CN115311219A - Image processing method, image processing device, terminal device and storage medium

Info

Publication number: CN115311219A
Application number: CN202210885602.4A
Authority: CN
Inventors: 刘振东; 马骏; 郑凌霄; 兰宏志
Original assignee: Shenzhen Raysight Intelligent Medical Technology Co Ltd
Current assignee: Shenzhen Raysight Intelligent Medical Technology Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-08

Abstract

The invention discloses an image processing method, an image processing device, terminal equipment and a storage medium, wherein an image to be processed is obtained; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

Description

Image processing method, image processing device, terminal device and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an image processing method and apparatus, a terminal device, and a storage medium.

Background

With the rapid development of medical imaging equipment, physicians can use CTA and MRA to image blood vessels in various parts of the whole body of a patient. However, it is obvious to the imaging physician that it is a very time-consuming and labor-consuming task to manually analyze the vascular lesion in the massive image data. In recent years, with the development of deep learning, the automatic blood vessel segmentation method based on the Convolutional Neural Network (CNN) has achieved significant results in analyzing blood vessel images. However, due to the locality of convolution operation, it is difficult for the CNN-based method to learn global context information and long-distance spatial dependency. Furthermore, since the direct processing of 3D medical data is very computationally expensive, it is often processed separately for each image block and integrated into the final segmentation result. However, this approach does not take into account the interdependencies between adjacent image blocks, and thus cannot accurately segment the entire blood vessel.

Therefore, it is necessary to provide a solution for improving the accuracy of image segmentation.

The above is only for the purpose of assisting understanding of the technical solution of the present invention, and does not represent an admission that the above is the prior art.

Disclosure of Invention

The invention mainly aims to provide an image processing method, an image processing device, a terminal device and a storage medium, aiming at improving the accuracy of image segmentation.

In order to achieve the above object, the present invention provides an image processing method comprising:

acquiring an image to be processed;

processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result;

learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image;

and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image.

Optionally, the step of acquiring the image to be processed includes:

acquiring an original image;

normalizing the original image to obtain a normalized image;

and performing gray scale cutting on the standardized image to obtain the image to be processed.

Optionally, the multi-head attention mechanism model includes an encoder and a decoder, feature fusion is performed between the encoder and the decoder by using a jump connection and an attention mechanism, the pre-trained multi-head attention mechanism model is used to process the image to be processed, and the step of obtaining the image block segmentation result further includes:

acquiring a sample image and a corresponding real label;

inputting the sample image into the encoder to perform abstract feature extraction, so as to obtain a fused abstract feature;

the fused abstract features pass through the decoder layer by layer to obtain a corresponding probability map;

calculating the loss of the probability graph output by each layer in the decoder relative to the corresponding real label to obtain the total loss;

and repeating parameters in the loop until the total loss is converged to obtain the multi-head attention mechanism model.

Optionally, the deep bidirectional learning model includes sequence learning layers, convolution layers, and logistic regression layers, and the step of learning the image block segmentation result based on the pre-trained deep bidirectional learning model to obtain an initial segmentation image further includes:

acquiring the sample image and a corresponding real label;

sequentially inputting the characteristic sequences in the sample images into a sequence learning layer for splicing and fusing to obtain first learning information;

obtaining a segmentation probability map according to the first learning information through the convolution layer and the logistic regression layer;

calculating the loss of the segmentation probability graph relative to the corresponding real label to obtain predicted loss;

and performing parameter iteration by the loop until the predicted loss is converged to obtain the deep bidirectional learning model.

Optionally, the post-processing algorithm includes a connected domain volume post-processing algorithm and/or a connected domain distance post-processing algorithm, and the step of processing the primary segmentation image through the post-processing algorithm to obtain the target segmentation image includes:

carrying out volume post-processing on the primary segmentation image through the connected domain volume post-processing algorithm to obtain a volume post-processing image;

and performing distance post-processing on the volume post-processing image through the connected domain distance post-processing algorithm to obtain the target segmentation image.

Optionally, the step of performing volume postprocessing on the initially segmented image by using the connected domain volume postprocessing algorithm to obtain a volume postprocessing image includes:

acquiring each connected domain in the primary segmentation image, and calculating the volume of each connected domain in the primary segmentation image;

calculating a first volume sum of the connected domains according to the volume of each connected domain in the initially segmented image;

calculating rejection rate of each connected domain in the initial segmentation image according to the volume of each connected domain in the initial segmentation image and the first volume of each connected domain;

and clearing connected domains with rejection rates smaller than a preset rejection rate threshold value in all connected domains in the initially segmented image to obtain the volume post-processing image.

Optionally, the step of performing distance post-processing on the volume post-processed image through the connected component distance post-processing algorithm to obtain the target segmentation image includes:

acquiring each connected domain of the volume post-processing image, and calculating the volume of each connected domain in the volume post-processing image;

selecting an initial connected domain from each connected domain in the volume post-processing image based on a preset rule;

calculating the representative coordinates of the initial connected domain, and sequentially calculating the distance value from each connected domain to the initial connected domain in the volume post-processing image according to the representative coordinates of the initial connected domain;

and removing the connected domains of which the distance values are greater than a preset distance threshold value in each connected domain in the volume post-processing image to obtain the target segmentation image.

Further, to achieve the above object, the present invention also provides an image processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed;

the image block segmentation module is used for processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result;

the sequence learning module is used for learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image;

and the post-processing module is used for processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image.

Furthermore, to achieve the above object, the present invention further provides a terminal device, which includes a memory, a processor, and an image processing program stored on the memory and executable on the processor, wherein the image processing program, when executed by the processor, implements the steps of the image processing method as described above.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an image processing program which, when executed by a processor, implements the steps of the image processing method as described above.

The embodiment of the invention provides an image processing method, an image processing device, terminal equipment and a storage medium, wherein the image to be processed is obtained; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

Drawings

FIG. 1 is a functional block diagram of a terminal device to which an image processing apparatus of the present invention belongs;

FIG. 2 is a flowchart illustrating an exemplary embodiment of an image processing method according to the present invention;

FIG. 3 is a diagram illustrating pre-processing an image according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of another exemplary embodiment of an image processing method of the present invention;

FIG. 5 is a schematic structural diagram of a multi-head attention mechanism model in an embodiment of the present invention;

FIG. 6 is a schematic flow chart diagram of yet another exemplary embodiment of an image processing method of the present invention;

FIG. 7 is a diagram illustrating a first structure of a deep bidirectional learning model according to an embodiment of the present invention;

FIG. 8 is a second structural diagram of the deep bidirectional learning model according to the embodiment of the present invention;

FIG. 9 is a detailed flowchart of step S40 in the embodiment of FIG. 2;

FIG. 10 is an overall schematic diagram of an embodiment of the invention;

FIG. 11 is a schematic diagram of a real tag in an embodiment of the invention;

FIG. 12 is a diagram illustrating an initial segmentation result mask0 according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating a mask1 result of connected component volume post-processing in an embodiment of the present invention;

fig. 14 is a schematic diagram of a result of mask2 of the connected component distance post-processing result in the embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The main solution of the embodiment of the invention is as follows: obtaining an image to be processed; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

The technical terms related to the embodiment of the invention are as follows:

CTA: electron computed tomography angiography;

MRA, magnetic resonance angiography;

Encoder/Decoder: an encoder/decoder;

U-Net: a deep learning network structure mainly comprises an Encoder part and a Decoder part;

CNN: convolutional neural networks, a deep learning network structure;

C-RNN/C-LSTM/C-GRU: a convolution cyclic neural network/a convolution long-short term memory network/a convolution gating cyclic neural network;

transformer: a deep learning network architecture;

DICOM: i.e., medical digital imaging and communications, is an international standard for medical images and related information.

At present, vascular diseases become important lethal causes and seriously threaten the life and health of human beings. With the rapid development of medical imaging equipment, physicians can use CTA and MRA to image blood vessels in various parts of the whole body of a patient. However, it is obvious to the imaging physician that it is a very time-consuming and labor-consuming task to manually analyze the vascular lesion in the massive image data. In recent years, with the development of deep learning, 1. A blood vessel automatic segmentation method based on a Convolutional Neural Network (CNN) has achieved remarkable results in analyzing blood vessel images. However, due to the locality of convolution operation, it is difficult for the CNN-based method to learn global context information and long-distance spatial dependency. Furthermore, since the direct processing of 3D medical data is very computationally expensive, it is often processed separately for each image patch and integrated into the final segmentation result. However, this approach does not take into account the interdependence between adjacent image blocks, so that the complete blood vessel cannot be accurately segmented, and in particular, the current blood vessel segmentation techniques have the following disadvantages:

1. the traditional blood vessel segmentation technology has low efficiency, high labor cost and poor segmentation effect;

2. the existing automatic vessel segmentation technology is mostly based on CNN model to extract image features, and can not capture the global information of input data and the long-distance spatial dependency well;

3. some existing automatic vessel segmentation technologies adopt the same weight to learn the background and all vessels, and have poor vessel segmentation effect on vessels with large differences in form and size, for example, the difference between the form and size of vessels from the aorta of the head and neck CTA to the intracranial vessels is very large;

4. some existing automatic vessel segmentation techniques process each image block separately, ignoring the connection between adjacent image blocks. Vessels such as head and neck CTAs are spatial structures from the aorta to the carotid and then to the intracranial vessels;

5. some existing automated vessel segmentation techniques lack comprehensive and versatile post-processing operations.

The invention provides a solution, which is characterized in that original medical image data are obtained, data preprocessing (including gray cutting, data normalization and the like) is carried out, and each data is cut into a plurality of image blocks along the depth direction; second, each tile is trained using a Transformer and CNN based model that contains attention-bearing modules. Obtaining the output characteristics of the penultimate convolutional layer of the decoder part from the trained model, and performing sequence learning on the characteristics by adopting a deep bidirectional C-LSTM (C-RNN or C-GRU) method so as to obtain the initial segmentation result of the whole blood vessel; and finally, performing connected domain post-processing (including connected domain volume post-processing and connected domain distance post-processing) on the initial segmentation result of the whole blood vessel to obtain a final segmentation result. Compared with the traditional blood vessel segmentation method, the scheme can realize automatic blood vessel segmentation without manual intervention, and can achieve higher segmentation accuracy rate by combining with the deep learning technology; compared with the existing automatic blood segmentation method, the scheme provides more effective blood vessel segmentation and a comprehensive image post-processing technology.

Specifically, referring to fig. 1, fig. 1 is a schematic diagram of functional modules of a terminal device to which the image processing apparatus of the present invention belongs. The image processing apparatus may be an apparatus capable of image processing independent of the terminal device, and may be carried on the terminal device in the form of hardware or software. The terminal equipment can be an intelligent mobile terminal with a data processing function, such as a mobile phone, a tablet personal computer and the like, and can also be fixed terminal equipment or a server and the like with the data processing function.

In the present embodiment, the terminal device to which the image processing apparatus belongs includes at least an output module 110, a processor 120, a memory 130, and a communication module 140.

The memory 130 stores an operating system and an image processing program, and the image processing apparatus may process an acquired image to be processed, a multi-head attention mechanism model trained in advance on the basis of the acquired image to be processed, obtain an image block segmentation result, learn the image block segmentation result on the basis of a depth bidirectional learning model trained in advance, obtain a primary segmentation image, process the primary segmentation image through a post-processing algorithm, and store information such as an obtained target segmentation image in the memory 130; the output module 110 may be a display screen or the like. The communication module 140 may include a WIFI module, a mobile communication module, a bluetooth module, and the like, and communicates with an external device or a server through the communication module 140.

Wherein the image processing program in the memory 130 when executed by the processor implements the steps of:

acquiring an image to be processed;

Further, the image processing program in the memory 130 when executed by the processor further realizes the steps of:

acquiring an original image;

normalizing the original image to obtain a normalized image;

Further, the image processing program in the memory 130 when executed by the processor further implements the steps of:

acquiring a sample image and a corresponding real label;

acquiring the sample image and a corresponding real label;

calculating the loss of the segmentation probability graph relative to the corresponding real label to obtain the predicted loss;

calculating rejection rate of each connected domain in the primary segmentation image according to the volume of each connected domain in the primary segmentation image and the first volume of each connected domain;

and clearing the connected domain with rejection rate smaller than a preset rejection rate threshold value in each connected domain in the initially segmented image to obtain the volume post-processing image.

According to the scheme, the image to be processed is obtained; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

Based on the above terminal device architecture but not limited to the above architecture, embodiments of the method of the present invention are presented.

The execution subject of the method of the embodiment may be an image processing apparatus or a terminal device, and the image processing apparatus is exemplified in the embodiment.

Referring to fig. 2, fig. 2 is a flowchart illustrating an exemplary embodiment of an image processing method according to the present invention. The image processing method comprises the following steps:

s10, acquiring an image to be processed;

specifically, in the embodiment of the present application, an image to be processed is obtained by performing image preprocessing on original medical image data, and the specific process includes:

acquiring an original image;

normalizing the original image to obtain a normalized image;

The raw medical image data used in the embodiments of the present application includes blood vessel image data such as CTA (computed tomography angiography) and MRA (magnetic resonance angiography), which is generally in DICOM format and is injected with a contrast agent for making blood vessels clearer.

Optionally, after the original image is acquired, setting a window width and window level to cut a gray value range of the medical image, normalizing the image to be between [0 and 1] by the maximum value and the minimum value to obtain a normalized image, and randomly cutting the preprocessed three-dimensional data into image blocks with the same size along the depth direction. Referring to fig. 3, fig. 3 is a schematic diagram of a preprocessed image in an embodiment of the present invention, and as shown in fig. 3, after head and neck CTA data is taken as an example, an image can be clearer after the head and neck CTA image is preprocessed.

Step S20, processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result;

further, in the embodiment of the application, a multi-head attention mechanism model is obtained by adopting pre-training, wherein a frame of the multi-head attention mechanism model is composed of a transformer serving as an encoder and a decoder based on CNN, a jump connection and an attention mechanism are adopted between the encoder and the decoder for feature fusion, an image to be processed is input into the transformer module to extract abstract features, the skip connection and the features of a layer corresponding to the decoder are adopted for abstract feature fusion based on the attention mechanism, the fused features are subjected to prediction of a corresponding probability map layer by layer through the decoder based on CNN, and an image block segmentation result can be obtained.

Step S30, learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image;

furthermore, in the embodiment of the application, the deep bidirectional learning model is obtained by adopting pre-training, and is not limited to the sequence model of the LSTM, the deep bidirectional C-LSTM model internally comprises a plurality of sub-modules BDC-LSTM, the BDC-LSTM is formed by stacking two layers of C-LSTM, and respectively learns z ⁺ And z ^- Contextual information of direction (z is the depth direction along the three-dimensional data, z ⁺ And z ^- Meaning two opposite directions). And then splicing and fusing the information of the current sequence z and the learned context information, inputting the information into the next BDC-LSTM to learn other sequences, and finally obtaining a 3D segmentation probability map through the convolution layer and the softmax layer to obtain a primary segmentation image.

And S40, processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image.

Furthermore, in the embodiment of the present application, a post-processing algorithm is used to perform post-processing on the initially segmented image, where the post-processing algorithm includes a connected domain volume post-processing algorithm and/or a connected domain distance post-processing algorithm, and the two algorithms may be used as a cascaded processing operation, that is, an output of the connected domain volume post-processing may be used as an input of the connected domain distance post-processing.

It should be noted that, in the embodiment of the present application, the way of performing the connected domain volume postprocessing first and then performing the connected domain distance postprocessing does not constitute a limitation on the postprocessing method, and in other embodiments, the connected domain distance postprocessing may be performed first and then the connected domain volume postprocessing is performed, or the connected domain volume postprocessing or the connected domain distance postprocessing may be performed on the image alone.

In the embodiment, the image to be processed is obtained; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

Referring to fig. 4, fig. 4 is a schematic flowchart of another exemplary embodiment of the image processing method of the present invention. Based on the embodiment shown in fig. 2, in the present embodiment, before step S20, the image processing method further includes:

and S01, training to obtain the multi-head attention mechanism model. In this embodiment, step S01 may be implemented before step S10, and in other embodiments, step S01 may also be implemented between step S10 and step S20.

Compared with the embodiment shown in fig. 2, this embodiment further includes a scheme for training to obtain the multi-head attention mechanism model.

Specifically, the step of training to obtain the multi-head attention mechanism model may include:

step S011, acquiring a sample image and a corresponding real label;

referring to fig. 5, fig. 5 is a schematic structural diagram of a multi-head attention mechanism model in an embodiment of the present invention, and as shown in fig. 5, as an embodiment of the present invention, a transform is used as an encoder and a decoder based on CNN to form a model frame, a large amount of acquired original image data is preprocessed to obtain a sample image, and the sample image and a corresponding real tag are obtained, which can be used for model training.

Step S012, inputting the sample image into the coder to extract abstract features, and obtaining fused abstract features;

inputting the sample image into a transform module to extract abstract features, and performing feature fusion based on an attention mechanism with features of a decoder corresponding layer through jump connection; the mathematical formula for the attention mechanism is as follows, where f _e ,f _d The characteristics of the corresponding positions encoder and decoder output, respectively, +, are the pixel-by-pixel addition and multiplication:

Atten(f _e ,f _d )＝f _d *(Sigmoid(ReLU(f _e +f _d )))

step S013, making the fused abstract features pass through the decoder layer by layer to obtain corresponding probability maps;

and predicting the corresponding probability graph by the fused abstract features layer by layer through a decoder based on the CNN.

Step S014, calculating the loss of the probability graph output by each layer in the decoder relative to the corresponding real label to obtain the total loss;

and in the loss calculation, a deep supervision mode is adopted, and the loss of the probability graph of the prediction output relative to the real label is calculated for each layer of the decoder. And weighting the loss of each layer to obtain the total loss, and updating the learning parameters of the depth model through back propagation.

And step S015, performing parameter iteration by the loop until the total loss is converged to obtain the multi-head attention mechanism model.

Through the learning and iteration of a large amount of labeled data, the prediction loss gradually tends to zero, and therefore the final multi-head attention mechanism model is obtained.

According to the scheme, the sample image and the corresponding real label are obtained; inputting the sample image into the encoder to perform abstract feature extraction, so as to obtain a fused abstract feature; passing the fused abstract features layer by layer through the decoder to obtain a corresponding probability map; calculating the loss of the probability graph output by each layer in the decoder relative to the corresponding real label to obtain the total loss; and performing parameter iteration by the loop until the total loss is converged to obtain the multi-head attention mechanism model. By adopting the learning and iteration of a large amount of labeled data for a model frame comprising an encoder and a decoder, the prediction result of the model is more accurate, so that the model is used for processing the image to be processed to obtain a more accurate image block segmentation result.

Referring to fig. 6, fig. 6 is a flowchart illustrating an image processing method according to another exemplary embodiment of the present invention. Based on the embodiment shown in fig. 2, in this embodiment, before step S30, the image processing method further includes:

and S02, training to obtain the deep bidirectional learning model. In this embodiment, step S01 may be implemented before step S10, and in other embodiments, step S00 may also be implemented between step S10 and step S20, or between step S20 and step S30.

Compared with the embodiment shown in fig. 2, the embodiment further includes a scheme for obtaining the deep bidirectional learning model through training.

Specifically, the step of training to obtain the deep bidirectional learning model may include:

step S021, acquiring the sample image and a corresponding real label;

referring to fig. 7 and 8, fig. 7 is a first structural schematic diagram of a depth bidirectional learning model in an embodiment of the present invention, and fig. 8 is a second structural schematic diagram of the depth bidirectional learning model in an embodiment of the present invention, as shown in fig. 7, a depth bidirectional C-LSTM model (not limited to a sequence model of LSTM) is used in the embodiment of the present invention, and as shown in fig. 8, a sub-module BDC-LSTM inside the depth bidirectional C-LSTM model is formed by stacking two layers of C-LSTMs. Preprocessing a large amount of acquired original image data to obtain a sample image, and acquiring the sample image and a corresponding real label, namely using the sample image and the corresponding real label for model training.

Step S022, sequentially inputting the characteristic sequences in the sample images into a sequence learning layer for splicing and fusion to obtain first learning information;

specifically, BDC-LSTM is stacked of two layers of C-LSTM, learning z separately ⁺ And z ^- Contextual information of orientation (z is the depth direction along the 3-dimensional data, z ⁺ And z ^- Meaning two opposite directions). And then splicing and fusing the information of the current sequence z and the learned context information to obtain first learning information.

Step S023, obtaining a segmentation probability map according to the first learning information through the convolution layer and the logistic regression layer;

further, the first learning information obtained by splicing and fusion is input to the next BDC-LSTM for learning other sequences. And finally, obtaining a 3D segmentation probability map through the convolution layer and the softmax layer.

S024, calculating the loss of the segmentation probability map relative to the corresponding real label to obtain a predicted loss;

and calculating the loss of the probability graph and the real label to obtain the predicted loss for parameter iteration.

And step S025, performing parameter iteration in a loop until the prediction loss is converged to obtain the deep bidirectional learning model.

And updating the network parameters through back propagation until the prediction loss gradually approaches zero, thereby obtaining the final deep two-way learning model.

According to the scheme, the sample image and the corresponding real label are obtained; sequentially inputting the characteristic sequences in the sample images into a sequence learning layer for splicing and fusing to obtain first learning information; obtaining a segmentation probability map according to the first learning information through the convolution layer and the logistic regression layer; calculating the loss of the segmentation probability graph relative to the corresponding real label to obtain the predicted loss; and performing parameter iteration by the loop until the predicted loss is converged to obtain the deep bidirectional learning model. By training the depth bidirectional C-LSTM model, a depth bidirectional learning model which can accurately learn the image block segmentation result to obtain the primary segmentation image is finally obtained, so that the relation between adjacent image blocks can be established, the rationality and the integrity of the whole segmentation result are improved, and the accuracy of image segmentation is improved.

Referring to fig. 9, fig. 9 is a schematic specific flowchart of step S40 in the embodiment of fig. 2. This embodiment is based on the embodiment shown in fig. 2, in this embodiment, the step S40 includes:

step S401, carrying out volume post-processing on the primary segmentation image through the connected domain volume post-processing algorithm to obtain a volume post-processing image;

specifically, the step of performing volume postprocessing on the initially segmented image by using a connected domain volume postprocessing algorithm to obtain a volume postprocessing image includes:

calculating a first volume sum of the connected domains according to the volume of each connected domain in the primary segmentation image;

In the embodiment of the application, all connected domains of the initial segmentation result mask0 are obtained, and the volume sum V of all the connected domains is calculated _t ；

Obtaining the volume v of the ith connected component _i And calculating a rejection rate r _reject The formula is as follows:

r _reject ＝v _i /v _t

one threshold, e.g., 0.1, is set, and other reasonable thresholds may be set. If r is _reject <0.1, clearing the connected domain in the mask0 of the initial segmentation result;

and repeating the operation until each connected domain is traversed to obtain a segmentation result mask1 of the connected domain volume post-processing.

And S402, performing distance post-processing on the volume post-processing image through the connected domain distance post-processing algorithm to obtain the target segmentation image.

Specifically, the step of performing distance post-processing on the volume post-processed image through the connected domain distance post-processing algorithm to obtain the target segmentation image includes:

In the embodiment of the application, all connected domains c of the segmentation result mask1 are obtained _i And calculating the volume v of all connected components _i Selecting the connected domain c with the largest volume _j As an initial connected domain; calculating c _j End points of three directions [ [ x ] _js ，x _je ],[y _js ，y _je ],[z _js ，z _je ]](subscript s denotes a start point, e denotes an end point), connected domain c _j Is a representative coordinate coord _j Given by the following equation:

obtaining the ith connected domain c _i Is a representative coordinate coord _i The connected component c with the largest volume is calculated by the following formula _j A distance value d of _i :

Setting upOne distance threshold, e.g., 1e4, may also set other reasonable thresholds. If d is _i >1e4, clearing the connected domain in the initial segmentation result mask 1;

and repeating the operation steps until all connected domains are traversed to obtain a segmentation result mask2 of the connected domain distance post-processing, namely the target segmentation image in the embodiment of the application.

According to the scheme, the initial segmentation image is subjected to volume post-processing through a connected domain volume post-processing algorithm to obtain a volume post-processing image; and performing distance post-processing on the volume post-processing image through the connected domain distance post-processing algorithm to obtain the target segmentation image. The post-processing algorithm is adopted to process the primary segmentation image to obtain a target segmentation image, so that the accuracy of image segmentation is improved.

Furthermore, an embodiment of the present invention further provides an image processing apparatus, including:

the acquisition module is used for acquiring an image to be processed;

Referring to fig. 10, fig. 10 is a schematic overall principle diagram in an embodiment of the present invention, and as shown in fig. 10, the main implementation steps include:

A. obtaining raw medical image data

The CTA and MRA vessel image data is typically in DICOM format and is injected with contrast agent that makes the vessels more clear.

B. Data pre-processing

Setting window width and window level to cut the gray value range of the medical image, normalizing the image to be between 0 and 1 according to the maximum value and the minimum value to obtain a normalized image, and randomly cutting the preprocessed three-dimensional data into image blocks with the same size along the depth direction.

C. The model framework of the segmentation result of the image block obtained by adopting the multi-head attention mechanism model of the transform-CNN is composed of a transform serving as an encoder and a decoder based on the CNN, and the encoder and the decoder are subjected to feature fusion by adopting a jump connection and an attention mechanism. The method comprises the following specific steps:

(1) Inputting the image block into a transform module to extract abstract features, and performing feature fusion based on an attention mechanism with features of a decoder corresponding layer through jump connection; the mathematical formula for the attention mechanism is as follows, where f _e ,f _d The characteristics of the corresponding positions encoder and decoder output, respectively, +, are the pixel-by-pixel addition and multiplication:

Atten(f _e ,f _d )＝f _d *(Sigmoid(ReLU(f _e +f _d )))

(2) Predicting a corresponding probability chart by the fused abstract features layer by layer through a decoder based on the CNN;

(3) And in the process of calculating the loss, a deep supervision mode is adopted, and the loss of the probability graph of the prediction output relative to the real label is calculated for each layer of the decoder. Weighting the loss of each layer to obtain total loss, and updating the learning parameters of the depth model through back propagation;

(4) Through the learning and iteration of a large amount of labeled data, the prediction loss gradually approaches zero, and therefore the final depth model is obtained.

D. Learning the characteristic sequence of the image block by adopting the deep bidirectional C-LSTM and obtaining the initial segmentation result of the whole image

The adopted depth bidirectional C-LSTM model (not limited to the sequence model of LSTM) has its internal sub-module BDC-LSTM. BDC-LSTM is formed by stacking two layers of C-LSTM, learning z separately ⁺ And z ^- Contextual information of direction (z is the depth direction along the three-dimensional data, z ⁺ And z ^- Meaning two opposite directions). Then the information of the current sequence z is combined with the learned informationAnd splicing and fusing the context information, and inputting the context information into the next BDC-LSTM for learning other sequences. And finally, obtaining a 3D segmentation probability map through the convolution layer and the softmax layer, and calculating the loss of the real label. And updating the network parameters through back propagation until the prediction loss gradually approaches zero, thereby obtaining a final model.

E. The initial segmentation result is operated by adopting an image post-processing algorithm to obtain a final segmentation result

The post-processing algorithm provided by the invention comprises connected domain volume post-processing and connected domain distance post-processing algorithms. The two algorithms are a cascaded processing operation, i.e. the output of the connected domain volume post-processing is the input of the connected domain distance post-processing.

The specific implementation steps of the connected domain volume post-processing are as follows:

1) Acquiring all connected domains of the initial segmentation result mask0, and calculating the volume and V of all the connected domains _t ；

2) Obtaining the volume v of the ith connected component _i And calculating a rejection rate r _reject The formula is as follows:

r _reject ＝v _i /v _t

3) One threshold is set, for example 0.1, and other reasonable thresholds may be set. If r _reject <0.1, clearing the connected domain in the initial segmentation result mask 0;

4) And repeating the operations 2) to 3) until the operation on each connected domain is finished, and obtaining a segmentation result mask1 of the connected domain volume post-processing.

The specific implementation steps of the connected domain distance post-processing are as follows:

1) Acquiring all connected domains c of the segmentation result mask1 _i And calculating the volume v of all connected components _i Selecting the connected domain c with the largest volume _j As an initial connected domain; calculation of c _j End points of three directions [ [ x ] _js ，x _je ],[y _js ，y _je ],[z _js ，z _je ]](subscript s denotes a start point, e denotes an end point), connected domain c _j Is a representative coordinate coord _j Given by the following equation:

2) Obtaining the ith connected domain c _i Is a representative coordinate coord _i The connected component c with the largest volume is calculated by the following formula _j A distance value d of _i :

3) A distance threshold, e.g. 1e4, is set, other reasonable thresholds may be set as well. If d is _i >1e4, clearing the connected domain in the initial segmentation result mask 1;

4) And repeating the operations of 2) to 3) until each connected domain is operated, and obtaining a segmentation result mask2 of the connected domain distance post-processing.

Referring to fig. 11, 12, 13, and 14, fig. 11 is a schematic diagram of a real tag in an embodiment of the present invention, fig. 12 is a schematic diagram of a mask0 of an initial segmentation result in an embodiment of the present invention, fig. 13 is a schematic diagram of a mask1 result of a connected domain volume post-processing result in an embodiment of the present invention, fig. 14 is a schematic diagram of a mask2 result of a connected domain distance post-processing result in an embodiment of the present invention, and a corresponding lower number is a segmentation accuracy. As can be seen by comparison, the segmentation accuracy of the mask2 of the connected domain distance post-processing result is the highest.

Compared with the traditional blood vessel segmentation technology which needs manual feature extraction, the method is high in cost, low in efficiency and inaccurate in segmentation result, and the automatic segmentation method based on the deep learning network is adopted in the scheme to segment the blood vessel image, so that manual intervention is not needed, and the segmentation accuracy is high. The existing automatic vessel segmentation technology mostly adopts a depth model based on CNN, and has poor performance on retaining characteristic global information and spatial information; the scheme adopts transform-based encoder, and can better capture the global context information and the long-distance spatial dependency relationship of the input features. Some existing automatic vessel segmentation technologies lack the study on local details of images, and have poor vessel segmentation effect on large differences of shapes and sizes; according to the scheme, detailed information of the features is kept by adopting an attention mechanism and a deep supervision learning mode, so that the blood vessels with large differences in shape and size can be more accurately segmented. Some existing automatic vessel segmentation technologies process each image block independently, and neglect the connection between adjacent image blocks; according to the scheme, the image block sequence is learned in a depth bidirectional C-LSTM-based mode, so that the whole segmentation result is more reasonable and complete. Some existing automatic vessel segmentation techniques lack comprehensive and general post-processing operations; the scheme provides two cascaded post-processing operations, so that the final segmentation result is more accurate.

In addition, an embodiment of the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and an image processing program that is stored in the memory and is executable on the processor, and when the image processing program is executed by the processor, the image processing method implements the steps of the image processing method described above.

Since the image processing program is executed by the processor, all technical solutions of all the foregoing embodiments are adopted, so that at least all the beneficial effects brought by all the technical solutions of all the foregoing embodiments are achieved, and details are not repeated herein.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which an image processing program is stored, and the image processing program, when executed by a processor, implements the steps of the image processing method as described above.

Compared with the prior art, the image processing method, the image processing device, the terminal equipment and the storage medium provided by the embodiment of the invention acquire the image to be processed; processing the image to be processed based on a pre-trained multi-head attention mechanism model to obtain an image block segmentation result; learning the image block segmentation result based on a pre-trained deep bidirectional learning model to obtain a primary segmentation image; and processing the primary segmentation image through a post-processing algorithm to obtain a target segmentation image. The method comprises the steps of processing an image to be processed through a multi-head attention mechanism model to obtain an image block segmentation result, then learning the image block segmentation result through a deep bidirectional learning model to obtain a primary segmentation image, so that a relation can be established between adjacent image blocks, the rationality and the integrity of the whole segmentation result are improved, the primary segmentation image is processed through a post-processing algorithm to obtain a target segmentation image, and the accuracy of image segmentation is improved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An image processing method, characterized by comprising the steps of:

acquiring an image to be processed;

2. The image processing method according to claim 1, wherein the step of acquiring the image to be processed comprises:

acquiring an original image;

normalizing the original image to obtain a normalized image;

3. The image processing method as claimed in claim 1, wherein the multi-head attention mechanism model includes an encoder and a decoder, feature fusion is performed between the encoder and the decoder by using a jump connection and an attention mechanism, and the step of processing the image to be processed based on the pre-trained multi-head attention mechanism model to obtain the image block segmentation result further includes:

acquiring a sample image and a corresponding real label;

inputting the sample image into the encoder to perform abstract feature extraction to obtain a fused abstract feature;

passing the fused abstract features layer by layer through the decoder to obtain a corresponding probability map;

4. The image processing method according to claim 3, wherein the deep bi-directional learning model includes a sequence learning layer, a convolution layer, and a logistic regression layer, and the step of learning the image block segmentation result based on the pre-trained deep bi-directional learning model to obtain the initial segmentation image further includes:

acquiring the sample image and a corresponding real label;

5. The image processing method according to claim 1, wherein the post-processing algorithm comprises a connected domain volume post-processing algorithm and/or a connected domain distance post-processing algorithm, and the step of processing the primary segmentation image by the post-processing algorithm to obtain the target segmentation image comprises:

6. The image processing method as claimed in claim 5, wherein the step of performing volume post-processing on the initial segmentation image by the connected domain volume post-processing algorithm to obtain a volume post-processed image comprises:

7. The image processing method of claim 5, wherein the step of performing distance post-processing on the volume post-processed image by the connected component distance post-processing algorithm to obtain the target segmentation image comprises:

and removing connected domains of which the distance values are greater than a preset distance threshold value in each connected domain in the volume post-processing image to obtain the target segmentation image.

8. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an image to be processed;

9. A terminal device, characterized in that the terminal device comprises a memory, a processor and an image processing program stored on the memory and executable on the processor, the image processing program, when executed by the processor, implementing the steps of the image processing method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that an image processing program is stored thereon, which when executed by a processor implements the steps of the image processing method according to any one of claims 1 to 7.