CN116188396A

CN116188396A - Image segmentation method, device, equipment and medium

Info

Publication number: CN116188396A
Application number: CN202310004030.9A
Authority: CN
Inventors: 凌雅婷; 刘倩; 蒋佳欣; 孔德兴
Original assignee: Haiyan Nanbei Lake Medical Artificial Intelligence Research Institute; Puyang Institute Of Big Data And Artificial Intelligence
Current assignee: Haiyan Nanbei Lake Medical Artificial Intelligence Research Institute; Puyang Institute Of Big Data And Artificial Intelligence
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-05-30

Abstract

The invention provides an image segmentation method, an image segmentation device, image segmentation equipment and a medium, which relate to the field of medical image processing and comprise the following steps: establishing an image segmentation model, and training by adopting a plurality of sample images; extracting features of the sample image to obtain feature data of different levels, wherein the feature data comprises a plurality of high-level feature data and a plurality of low-level feature data; sequentially fusing the feature data and outputting a plurality of fused feature data; performing aggregate feature extraction on each high-level feature data to obtain reference feature data; performing attention mechanism decoding on each fusion characteristic data based on the reference characteristic number to generate a plurality of output results; establishing a loss function based on the output result, and performing multi-output learning in training to obtain a target model; the target model is adopted to process the image to be processed, and multiple outputs iterate to generate a target result, so that the problem of insufficient segmentation accuracy caused by only single-scale output in the existing image segmentation model processing process is solved.

Description

Image segmentation method, device, equipment and medium

Technical Field

The present invention relates to the field of medical image processing technologies, and in particular, to an image segmentation method, apparatus, device, and medium.

Background

The image field is rapidly evolving. Medical imaging technology provides doctors with the internal focus condition of the patient's body directly through imaging display of specific parts of the patient, and is one of the indispensable auxiliary means for clinical diagnosis and treatment. The medical image segmentation can well provide important information such as the detailed size, shape and position of a target object for doctors, so that the medical image segmentation has a critical role in the clinical disease diagnosis and screening process.

However, the manual segmentation judgment of the doctor is slow, and the accuracy of the segmentation result is greatly dependent on the professional level of the doctor, so that the discovery and treatment probability of potential patients in remote areas or small hospital health centers is greatly limited, and the automatic medical image segmentation is of great significance.

The high-precision high-speed recognition segmentation processing of medical images attracts more and more researchers' attention, wherein a series of field scholars such as mathematics, statistics, imaging science, computer science and the like are applied to join cooperation, so that the medical image processing is rapidly developed and the achievement is happy, and the traditional full-manual segmentation recognition is developed to the semi-automatic and full-automatic segmentation processing nowadays.

For over a decade, neural network-based deep learning algorithms have performed excellently in numerous medical image segmentation tasks. However, these neural network-based deep learning algorithms mostly have only one scale of output, which is not satisfactory in terms of segmentation performance of small objects. In the field of medical graph segmentation, high sensitivity to small objects is very important, which is directly related to early detection and diagnosis of diseases and is a crucial step in the establishment of intelligent medical systems.

Disclosure of Invention

In order to overcome the technical defects, the invention aims to provide an image segmentation method, an image segmentation device, an image segmentation equipment and a medium, which solve the problem that the segmentation accuracy is insufficient due to the fact that only single-scale output exists in the existing image segmentation model processing process.

The invention discloses an image segmentation method, which comprises the following steps:

establishing an image segmentation model, and training by adopting a plurality of sample images;

in the image segmentation model, extracting features of a sample image to obtain feature data of different levels, wherein the feature data comprises a plurality of high-level feature data and a plurality of low-level feature data;

sequentially fusing the characteristic data and outputting a plurality of fused characteristic data;

performing aggregate feature extraction on each high-level feature data to obtain reference feature data;

performing attention mechanism decoding on each fusion characteristic data based on the reference characteristic number to generate a plurality of output results;

establishing a loss function based on the output result, and performing multi-output learning in training to obtain a target model after training;

and processing the image to be processed by adopting the target model, generating a target result by multi-output iteration, and carrying out image segmentation.

Preferably, the extracting features of the sample image to obtain feature data with different resolutions includes:

and carrying out feature extraction on the sample image by adopting a plurality of convolution networks with different levels, wherein each convolution network correspondingly outputs each high-level feature data and each low-level feature data level in a descending order.

Preferably, the feature data are sequentially fused, and a plurality of fused feature data are output, including:

sequentially marking the n pieces of characteristic data as i-th characteristic data according to level decrease, wherein i=1, 2,3, … … and n; n is more than or equal to 5;

for the j-th characteristic data, wherein j=1, 2,3, … … and n-1, up-sampling the characteristic data, and after splicing the j-th characteristic data with the j+1-th characteristic data, performing characteristic fusion through a deconvolution layer, an activation layer and a BN layer to obtain corresponding fusion characteristic data;

iterating to output a plurality of fusion feature data.

Preferably, the performing aggregate feature extraction on each high-level feature data to obtain reference feature data includes:

and respectively up-sampling each high-level characteristic data, and sequentially passing through a convolution layer, an activation function layer and a BN layer of the multi-size convolution kernel to obtain reference characteristic data.

Preferably, the performing attention mechanism decoding on each fusion feature data based on the reference feature number to generate a plurality of output results includes:

sequentially marking the n pieces of characteristic data as j-th fusion characteristic data according to the descending order of the layers, wherein j=1, 2,3, … … and n-1; n is more than or equal to 5;

for the n-1 fusion characteristic data, processing the reference characteristic data by adopting a Sigmoid function, performing characteristic overturning, multiplying the reference characteristic data by the n-1 fusion characteristic data, adding the reference characteristic data, and outputting an n-1 output result after processing by a convolution layer and an activation function;

for the mth fusion profile, m=1, 2,3 … …, n-2: after the m+1th output result is processed by adopting a Sigmoid function, feature overturning is executed, the m+1th output result is added after the m+1th output result is multiplied by the m characteristic data, and the m output result is output after the m+1th output result is processed by a convolution layer and an activation function;

and iterating to generate a plurality of output results.

Preferably, the building a loss function based on the output result, performing multi-output learning in training, includes:

building IoU loss functions and cross entropy loss functions based on each output result;

and adding the loss functions corresponding to the output results to generate a loss function for the image segmentation model, and performing multi-output learning in training.

Preferably, the target model is adopted to process the image to be processed, and multiple output iteration generates a target result, including:

extracting features of an image to be processed to obtain target feature data of different levels, wherein the target feature data comprises a plurality of high-level target feature data and a plurality of low-level target feature data;

sampling and fusing the target feature data in sequence, and outputting a plurality of target fusion feature data;

performing aggregate feature extraction on each high-level target feature data to obtain target reference feature data;

and performing attention mechanism decoding on each target fusion characteristic data one by one based on the target reference characteristic number, and iteratively obtaining a target result based on a plurality of output result data.

The present invention also provides an image segmentation apparatus including:

the preprocessing module is used for establishing an image segmentation model and training by adopting a plurality of sample images;

the feature extraction module is used for carrying out feature extraction on the sample image to obtain feature data of different levels, wherein the feature data comprises at least three high-level feature data and at least two low-level feature data;

the feature fusion module is used for sequentially sampling and fusing the feature data and outputting at least four fused feature data;

the parallel decoding module is used for carrying out aggregation feature extraction on each high-level feature data to obtain reference feature data;

the attention mechanism decoding module is used for carrying out attention mechanism decoding on each fusion characteristic data one by one based on the reference characteristic number so as to iteratively generate at least four output results;

the adjusting module is used for establishing a loss function based on the output result, and performing multi-output learning in training to obtain a target model after training;

and the execution module is used for processing the image to be processed by adopting the target model, generating a target result by multiple output iterations and carrying out image segmentation.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the image segmentation method when executing the computer program.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of image segmentation.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

the utility model provides a multi-output medical image segmentation model based on attention mechanism, use the characteristic processing and the characteristic fusion of current convolution god with the network for medical image processing, set up parallel partial decoder and attention mechanism decoder and carry out dual decoding to the network feature, obtain the characteristic of a plurality of levels, correspond and generate a plurality of outputs, finally iterate through the multi-output study and generate unique prediction result, improve the accuracy of the prediction result that is used for image segmentation, only have single scale output in the current image segmentation model processing procedure, lead to the problem that segmentation accuracy is insufficient.

Drawings

FIG. 1 is a flow chart of an embodiment of an image segmentation method according to the present invention;

fig. 2,3 and 4 are reference diagrams of processing a skin cancer image and two polyp images respectively by using a target model and outputting a target result in an embodiment of an image segmentation method according to the present invention;

FIG. 5 is a schematic block diagram illustrating a second embodiment of an image segmentation apparatus according to the present invention;

fig. 6 is a schematic block diagram of a computer device according to the present invention.

Reference numerals:

8-image segmentation means; 81-a pretreatment module; 82-a feature extraction module; 83-feature fusion module; 84-parallel decoding module; 85-an attention mechanism decoding module; 86-an adjustment module; 87-execute module.

Detailed Description

Advantages of the invention are further illustrated in the following description, taken in conjunction with the accompanying drawings and detailed description.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.

In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present invention, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.

Embodiment one: the embodiment discloses an image segmentation method, which is used for processing segmentation of medical images, adopts technologies such as transfer learning, feature fusion, parallel partial decoder, attention mechanism decoder, multi-output learning and the like, can realize high-precision full-automatic segmentation of multi-scale medical images, has better performance than the existing algorithm on a public data set, improves sensitivity to small objects, and specifically, referring to fig. 1-4, comprises the following steps:

s100: establishing an image segmentation model, and training by adopting a plurality of sample images;

in this embodiment, the image segmentation model includes modules such as feature extraction (based on migration of classical convolutional neural network), feature fusion, parallel decoding, attention mechanism decoding, and multi-output learning, specifically, extracting feature data of different depths (levels) and fusing, then extracting high-level feature data again, decoding each fused feature, and iteratively outputting multiple results to perform multi-output learning, so that the model learns features of different levels (deep features and surface features), and includes large objects and small objects at the same time, so as to improve accuracy of segmentation results.

S200: in the image segmentation model, extracting features of a sample image to obtain feature data of different levels, wherein the feature data comprises a plurality of high-level feature data and a plurality of low-level feature data;

in the above steps, feature data of different levels, that is, a plurality of deep features and surface features are extracted. Specifically, the feature extraction of the sample image to obtain feature data with different resolutions includes: and carrying out feature extraction on the sample image by adopting a plurality of convolution networks with different levels, wherein each convolution network correspondingly outputs each level of high-level feature data and each level of low-level feature data in sequential increasing mode, namely the resolution of the output feature data is sequentially reduced.

Specifically, feature extraction and feature fusion described below of this embodiment may be set based on the structure of the existing classical convolutional network, including but not limited to Res Net50, res Net101, densnet 121, res2Net50, res2Net101, etc., such as training classical convolutional neural networks on a huge data volume of natural image dataset ImageNet. The feature extraction part of the classical convolutional neural network is used as a backbone network (for processing feature extraction and feature fusion) of the image segmentation model of the embodiment, and the feature extraction of the natural image is migrated and learned to the feature extraction of the medical image, so that the training efficiency of the model is improved, and the occurrence of over fitting is prevented. As an example, the backbone network provided in this embodiment extracts five different levels of features (primarily extracted surface features), denoted as f ₁ 、f ₂ 、f ₃ 、f ₄ And f ₅ Its corresponding resolution decreases in turn. Furthermore, the five surface features extracted from the method can be divided into two types, namely low-level features f_1 and f_2 and high-level features f_3, f_4 and f_5, the convolution layers corresponding to the low-level features are fewer, the resolution is higher, and more features/small target features are reserved; the high-level features correspond to more convolutional layers, lower resolution, which preserve fewer features/large target features.

S300: sequentially fusing the characteristic data and outputting a plurality of fused characteristic data;

in this embodiment, it should be noted that, instead of fusing the extracted features into images, the feature fusion is fused into a plurality of outputs, and based on the above, the resolution of the feature correspondence of five different levels is reduced, so that the feature fusion can be performed from up-sampling to the post-stitching fusion of the feature correspondence of the next level one by one, for example, f ₅ Up-sampling and then mixing with f ₄ Splice and fuse to d ₄ Based on this, the feature fusion network uses upsampling to sequentially generate high-level abstract features corresponding to the four low-level surface features: d, d ₄ 、d ₃ 、d ₂ And d ₁ 。

Specifically, the above-mentioned feature data are sequentially fused, and a plurality of fused feature data are output, including:

s310: sequentially marking the n pieces of characteristic data as i-th characteristic data according to level decrease, wherein i=1, 2,3, … … and n; n is more than or equal to 5;

in the above steps, for convenience of description, the feature data are ordered and numbered, and the level is decreased, that is, the feature data of high level are ordered sequentially from the feature data of low level.

S320: for the j-th characteristic data, wherein j=1, 2,3, … … and n-1, up-sampling the characteristic data, and after splicing the j-th characteristic data with the j+1-th characteristic data, performing characteristic fusion through a deconvolution layer, an activation layer and a BN layer to obtain corresponding fusion characteristic data;

in the above step, the upsampling may allow the low resolution picture (i.e., the j-th feature data) containing the high level abstract feature to be changed to high resolution while retaining the high level abstract feature, and then perform a stitching (splicing) operation with the corresponding low level surface layer feature (i.e., the j+1-th feature data). The feature fusion network further comprises a deconvolution layer, an activation layer and a BN layer for further fusion and learning of the fused features.

S330: iterating to output a plurality of fusion feature data.

In the above step, the four high-level abstract features (i.e. fusion feature data) that have been spliced, deconvolved, and activated in the above example are denoted as o ₁ 、o ₂ 、o ₃ 、o ₄ . In this embodiment, feature o after the fusion of the existing features is not necessarily spliced by d_1 and f_1 ₄ The output is obtained after activation by the activation function, but four fused features are output.

S400: performing aggregate feature extraction on each high-level feature data to obtain reference feature data;

in the step, the features of different levels are aggregated for medical image segmentation, the low-level surface features of all levels are aggregated in the feature fusion link described in the step S300, and the low-level surface features (feature data) and the high-level abstract features (fusion feature data) are subjected to splicing operation to realize image segmentation. Since the calculation amount brought by the advanced features is small and the influence on the segmentation result is large, the parallel feature decoder pair f is arranged ₃ 、f ₄ And f ₅ (deep features of low resolution) the information is extracted by re-aggregation.

Specifically, the performing feature extraction on each high-level feature data to obtain reference feature data includes: and respectively up-sampling each high-level characteristic data, and sequentially passing through a convolution layer, an activation function layer and a BN layer of the multi-size convolution kernel to obtain reference characteristic data.

In the above steps, a parallel feature decoder is set to realize the aggregation of the last high-level feature data, wherein the parallel feature decoder comprises an up-sampling layer, a convolution layer of a multi-size convolution kernel, an activation function layer and a BN layer, and finally the feature p with uniform size is obtained ₅ The present embodiment is described specifically by taking, as an example, extraction of five features (3 high-level features and 2 low-level features) and four fusion feature data, that is, the parallel feature decoder pair f ₃ 、f ₄ And f ₅ And (5) processing.

S500: performing attention mechanism decoding on each fusion characteristic data based on the reference characteristic number to generate a plurality of output results;

in the above steps, attention mechanism decoding is performed by using an attention mechanism network and a decoder, so that the model focuses on the feature distribution to output a prediction result for image segmentation. In the present embodiment, the plurality of output results are not independent, but it is necessary to generate an output result corresponding to the feature data of the subsequent hierarchy (the hierarchy is lower) based on the output result of the feature data of the previous hierarchy (the hierarchy is higher).

Specifically, the above-mentioned decoding the attention mechanism of each fusion feature data based on the reference feature number, to generate a plurality of output results, including:

s510: sequentially marking the n pieces of characteristic data as j-th fusion characteristic data according to the descending order of the layers, wherein j=1, 2,3, … … and n-1; n is more than or equal to 5;

in the above step, for convenience of description, the fused feature data is marked in order, and the jth feature data corresponds to the jth fused feature data corresponding to the above step S310.

S520: for the n-1 fusion characteristic data, after the reference characteristic data is processed by adopting a Sigmoid function, characteristic overturning is executed, the n-1 fusion characteristic data is multiplied by the n-1 fusion characteristic data and then added with the reference characteristic data, and the n-1 output result is output through a convolution layer and an activation function;

s530: for the mth fusion profile, m=1, 2,3 … …, n-2: after the m+1th output result is processed by adopting a Sigmoid function, feature overturning is executed, the m+1th output result is added after the m+1th output result is multiplied by the m characteristic data, and the m output result is output through a convolution layer and an activation function;

it should be noted that, as described above, it is necessary to generate the output result corresponding to the feature data of the subsequent hierarchy (the hierarchy is lower) based on the output result of the feature data of the previous hierarchy (the hierarchy is higher), that is, the above o is given as an example ₁ The corresponding output result needs to be based on o ₂ Corresponding output result o ₂ The corresponding output result needs to be based on o ₃ Corresponding output result o ₃ The corresponding output result needs to be based on o ₄ Corresponding output result o ₄ The corresponding output result is based on the reference feature data output in the above steps, and can be iterated by o ₁ The corresponding output results are taken as final results, but multi-output learning is still required to output each corresponding result.

Illustratively, the expression of the Sigmoid function is:

the range of the Sigmoid function is limited between (0, 1). For this step output P ₄ First, it is limited between (0, 1) through a Sigmoid function, then 1 is subtracted from it so that the feature is flipped, and f obtained in step S100 ₄ Multiplying the obtained characteristic with P obtained in the step S400 ₅ Adding to obtain the final result P ₄ 。

S540: and iterating to generate a plurality of output results.

By way of example, p ₄ ＝(1-f(p ₅ ))*f ₄ +p ₅

Similarly, other characteristic expressions are:

P ₃ ＝(1-f(p ₄ ))*f ₃ +p ₄ ；P ₂ ＝(1-f(p ₃ ))*f ₂ +p ₃ ；P ₁ ＝(1-f(p ₂ ))*f ₁ +p ₂

will P ₄ 、P ₃ 、P ₂ And P ₁ The final output characteristic diagram output is obtained through the convolution layer with the convolution kernel of 1*1 and activation by using an activation function ₄ 、output ₃ 、output ₂ And output set ₁ . That is, features of different levels can be obtained, while encompassing multiple output results for large and small targets.

S600: establishing a loss function based on the output result, and performing multi-output learning in training to obtain a target model after training;

in the above embodiment, the respective outputs are learned, i.e., the corresponding loss function is established for each output and adjusted in training. Specifically, the step of establishing a loss function based on the output result, and performing multi-output learning in training includes: building IoU loss functions and cross entropy loss functions based on each output result; and adding the loss functions corresponding to the output results to generate a loss function for the image segmentation model, and performing multi-output learning in training.

As an example, the loss function is: l (x) =l _I (x)+L _B (x) Wherein L is _I As IoU loss function, L _B For cross entropy loss functions, the duty cycle of the two loss functions can be adjusted according to the distribution situation of different data sets (i.e. sample images). The final loss function of the image segmentation model is:

L(output)＝L(output ₁ )+L(output ₂ )+L(output ₃ )+L(output ₄ ). The sample image (set) can be divided into a training set, a verification set and a test set according to the ratio of 8:1:1, the performance of the model on the verification set is monitored by using the evaluation index in the iterative learning process of the model, and meanwhile, the model parameters with the best evaluation index on the verification set are saved. The final model will be in output ₁ And (3) taking the model as a unique output prediction graph, testing by using a test set, and obtaining a target model after completion. The output results are images containing mask prediction frames, the areas of the mask prediction frames are continuously adjusted in the iterative process, and real mask labels can be associated with sample images and used for training and adjusting models.

S700: and processing the image to be processed by adopting the target model, and generating a target result through multiple output iteration so as to divide the image.

In the present embodiment, as an explanation, the target model is generated by adjusting the loss function, model parameters, and the like of the image segmentation model in the training process in advance of the image segmentation model matching the results of the image segmentation model, and fixing the parameters after the training is completed, and therefore, the processing of the target model is similar to the training process, and the output target result is an image including a mask prediction frame. Specifically, the target model is adopted to process the image to be processed, and multiple output iterations are performed to generate a target result, which comprises the following steps:

s710: extracting features of an image to be processed to obtain target feature data of different levels, wherein the target feature data comprises a plurality of high-level target feature data and a plurality of low-level target feature data;

in the above step, the feature extraction is performed on the sample image by using a plurality of convolution networks of different levels, and the number of the target feature data is consistent with the number of the feature extraction networks of different levels set in the model, so that in order to facilitate subsequent multi-output, the target feature data preferably exceeds 5.

S720: sampling and fusing the target feature data in sequence, and outputting a plurality of target fusion feature data;

similar to the above step S300, the resolution of the feature correspondence of the five different levels is decreased, so that the feature fusion network may further include a deconvolution layer, an activation layer, and a BN layer to further process the fused features after up-sampling to the next level of feature correspondence, so as to obtain the target fusion feature data.

S730: performing aggregate feature extraction on each high-level target feature data to obtain target reference feature data;

specifically, since the amount of computation by the advanced features is small and the influence on the segmentation result is large, the parallel feature decoder is used to re-aggregate the high-level target feature data to extract information. The parallel feature decoder includes an upsampling layer, a convolution layer of a multi-sized convolution kernel, an activation function layer, and a BN layer.

S740: and performing attention mechanism decoding on each target fusion characteristic data one by one based on the target reference characteristic number, and iteratively obtaining a target result based on a plurality of output result data.

Specifically, it should be noted that, unlike in S600 described above, the final output target result is the final result of the multi-output iterative output, i.e., the output ₁ ，output ₁ Based on other output iteration generation, after multi-output learning is performed, the accuracy of the output predictive graph for image segmentation is effectively improved.

Referring to fig. 2,3, and 4, reference diagrams for processing a skin cancer image and two polyp images respectively using a target model and outputting a target result are shown, wherein fig. 2 (a) is a skin cancer image and fig. 2 (b) is a real mask; fig. 2 (c) shows a target image (predicted image) output for image segmentation using the target model according to the present embodiment; fig. 3 (a) is a polyp image, and fig. 3 (b) is a true mask; fig. 3 (c) is a view showing an output of a target image (predicted image) for image segmentation using the target model provided in the present embodiment; fig. 4 (a) is another polyp image, and fig. 4 (b) is a true mask; fig. 4 (c) shows that the target model provided in this embodiment outputs a target image (predicted image) for image segmentation, and the predicted result output by the target model for image segmentation provided in this embodiment has high consistency with a real mask to realize image segmentation, so that the accuracy of the image segmentation method provided in this embodiment is high.

In the embodiment, a multi-output medical image segmentation model based on an attention mechanism is provided, so that the manual segmentation pressure can be well relieved, and a foundation is laid for an automatic medical evaluation system. The characteristic processing and characteristic fusion of the existing convolution god network are used for medical image processing through transfer learning, so that training efficiency can be improved well, and the phenomenon of overfitting is prevented; setting a parallel part decoder (used for decoding high-level characteristic data) and an attention mechanism decoder (used for decoding all characteristic data) to perform double decoding on network characteristics, so that the accuracy of extracting the characteristics is improved; through multi-output learning, the method is excellent in multiple public medical image segmentation task data sets, including a CVC-ClinicDB data set and an ISIC2018 tumor edge segmentation data set, can be well suitable for small-object and multi-object segmentation tasks, saves a large amount of manpower and material resources, and is high in feasibility.

Embodiment two: the present embodiment also provides an image segmentation apparatus 8, referring to fig. 5, including:

the preprocessing module 81 is used for establishing an image segmentation model and training by adopting a plurality of sample images;

specifically, the established image segmentation model performs steps of feature extraction, feature fusion, parallel decoding, attention mechanism decoding, multi-output learning and the like, and the sample image can be divided into a training set, a verification set and a test set according to a preset proportion so as to train and improve model accuracy.

The feature extraction module 82 is configured to perform feature extraction on the sample image to obtain feature data of different levels, where the feature data includes at least three high-level feature data and at least two low-level feature data;

the feature extraction module uses a feature extraction part of a classical convolutional neural network as a backbone network (for processing feature extraction and feature fusion) of the image segmentation model of the embodiment, and performs migration learning on feature extraction of a natural image to feature extraction of a medical image so as to improve training efficiency of the model, and specifically, a plurality of convolutional networks with different levels are adopted to perform feature extraction on the sample image.

The feature fusion module 83 is configured to sequentially perform sampling and fusion on the feature data, and output at least four fused feature data;

specifically, the feature fusion module performs up-sampling on each feature data one by one, and then performs splicing fusion after the next-level feature corresponds to the next-level feature, and the up-sampling can enable a low-resolution picture containing high-level abstract features to be changed into high-resolution while the high-level abstract features are reserved, and then performs splicing (splicing) operation with the corresponding low-level surface features. The feature fusion network also comprises a deconvolution layer, an activation layer and a BN layer.

The parallel decoding module 84 is configured to perform aggregate feature extraction on each high-level feature data, so as to obtain reference feature data;

specifically, the parallel decoding module performs feature aggregation on each high-level feature data by considering that the calculation amount brought by the high-level features is small and the influence on the segmentation result is large, and the parallel decoding module comprises an up-sampling layer, a convolution layer of a multi-size convolution kernel, an activation function layer and a BN layer.

An attention mechanism decoding module 85, configured to perform attention mechanism decoding on each fusion feature data based on the reference feature number one by one, so as to iteratively generate at least four output results;

specifically, the attention mechanism decoding module enables the model to pay attention to feature distribution so as to output a prediction result, facilitate image segmentation, generate an output result corresponding to the next (lower level) based on the output result of the previous (higher level), learn the features of different levels based on a plurality of output results, and improve the accuracy of the output result.

An adjustment module 86, configured to establish a loss function based on the output result, and perform multi-output learning in training to obtain a target model after training;

specifically, in the adjustment module, each output is learned, that is, a corresponding loss function is established for each output result, and the total loss function of the model is added to the loss function corresponding to each output, so that the model is adjusted in the training process, and the model is optimized independently in the iterative learning process of the model.

And the execution module 87 is used for processing the image to be processed by adopting the target model, and generating a target result through multiple output iteration so as to divide the image.

Specifically, after the modules execute training of an image segmentation model and obtain a target model, image segmentation results are predicted based on an image to be processed, feature extraction is performed by using the modules to obtain target feature data of different levels, then feature fusion is performed on each target feature data one by one to obtain corresponding target fusion features, a parallel partial decoder and an attention mechanism decoder are used for carrying out double decoding on network features, aggregation is performed on each high-level target feature data, attention mechanism decoding is performed on each target fusion feature data, a plurality of output results are iterated finally, the final iterated result is output as a target result, multi-output medical image segmentation based on the attention mechanism is performed, and the accuracy of the results is effectively improved.

In this embodiment, an image segmentation model is pre-established by using a preprocessing module, and then model training is performed based on a feature extraction module, a feature fusion module, a parallel decoding module, an attention mechanism decoding module and an adjustment module, and after the training is completed to obtain a target model, the processing of the image to be processed is performed by using the execution module and the above modules. Specifically, the method comprises the steps of carrying out feature extraction on images by using a plurality of different-level convolution networks of a feature extraction module, carrying out up-sampling on each obtained feature data one by one in a feature fusion module, carrying out splicing fusion after the next-level feature corresponds to each obtained feature data, obtaining a plurality of fusion feature data, then carrying out aggregation on high-level feature data in the fusion feature data by using a parallel decoding module, decoding each fusion feature data by using an attention mechanism decoding module and learning an attention mechanism, generating a plurality of output results, finally iterating to generate a unique target result, providing a multi-output medical image segmentation model based on an attention mechanism, well relieving manual segmentation pressure, carrying out feature processing and feature fusion of the network by using the existing convolution spirit for medical image processing, setting a parallel part decoder (used for decoding the high-level feature data) and an attention mechanism decoder (used for decoding all feature data) to carry out double decoding on network features, obtaining features of different levels, focusing on a large target and a small target, and improving the accuracy of a prediction result of the medical image segmentation model.

Embodiment III: to achieve the above object, the present invention further provides a computer device 9, which may include a plurality of computer devices, and the components of the image segmentation apparatus 8 of the second embodiment may be dispersed in different computer devices 9, and the computer device 9 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, or the like, which executes a program. The computer device of the present embodiment includes at least, but is not limited to: a memory 91, a processor 92, and an image dividing device 8 which can be communicatively connected to each other via a system bus. Referring to FIG. 6, it should be noted that FIG. 6 only shows a computer device having components-but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.

In this embodiment, the memory 91 may include a storage program area and a storage data area, where the storage program area may store an application program required for operating a system and at least one function; the storage data area may store data of a user at the computer device. Further, the memory 91 may include high speed random access memory and may also include non-volatile memory, and in some embodiments, the memory 91 may optionally include memory 91 located remotely from the processor, such remote memory being connectable over a network. Examples of such networks include, but are not limited to, the internet, local area networks, and the like.

The processor 92 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device. In the present embodiment, the processor 92 is configured to execute the program code stored in the memory 91 or process data, for example, execute the image segmentation apparatus 8, to implement the image segmentation method of the first embodiment.

It is noted that only computer device 9 having components 91-92 is shown, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.

Embodiment four:

to achieve the above object, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic disk, an optical disk, a server, etc., on which a computer program is stored which when executed by the processor 92 performs the corresponding functions. The computer-readable storage medium of the present embodiment is for storing the image segmentation apparatus 8, and when executed by the processor 92, implements the image segmentation method of the first embodiment.

It should be noted that the embodiments of the present invention are preferred and not limited in any way, and any person skilled in the art may make use of the above-disclosed technical content to change or modify the same into equivalent effective embodiments without departing from the technical scope of the present invention, and any modification or equivalent change and modification of the above-described embodiments according to the technical substance of the present invention still falls within the scope of the technical scope of the present invention.

Claims

1. An image segmentation method, comprising:

establishing a loss function based on the output result, and performing multi-output learning in training to obtain a target model after training; and processing the image to be processed by adopting the target model, generating a target result by multi-output iteration, and carrying out image segmentation.

2. The image segmentation method according to claim 1, wherein the feature extraction of the sample image to obtain feature data of different resolutions includes:

3. The image segmentation method according to claim 1, wherein the feature data are sequentially fused to output a plurality of fused feature data, comprising:

sequentially marking the n pieces of characteristic data as i-th characteristic data according to level decrease, wherein i=1, 2,3, … … and n;

n≥5；

iterating to output a plurality of fusion feature data.

4. The image segmentation method according to claim 1, wherein the performing aggregate feature extraction on each high-level feature data to obtain reference feature data includes:

5. The image segmentation method according to claim 1, wherein the performing attention mechanism decoding on each of the fused feature data based on the reference feature number to generate a plurality of output results includes:

and iterating to generate a plurality of output results.

6. The image segmentation method according to claim 1, wherein the creating a loss function based on the output result, performing multi-output learning in training, comprises:

7. The image segmentation method according to claim 1, characterized in that the target model is adopted to process the image to be processed, and multiple output iteration generates a target result, comprising:

8. An image dividing apparatus, comprising:

the parallel decoding module is used for carrying out aggregation feature extraction on each high-level feature data to obtain reference feature data; the attention mechanism decoding module is used for carrying out attention mechanism decoding on each fusion characteristic data one by one based on the reference characteristic number so as to iteratively generate at least four output results;

9. A computer device, characterized in that it comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which processor, when executing the computer program, implements the steps of the image segmentation method according to any one of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of image segmentation as claimed in any of the preceding claims 1-7.