CN109872306B

CN109872306B - Medical image segmentation method, device and storage medium

Info

Publication number: CN109872306B
Application number: CN201910082092.5A
Authority: CN
Inventors: 陈雄涛; 马锴; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2021-01-08
Anticipated expiration: 2039-01-28
Also published as: CN109872306A

Abstract

The embodiment of the invention discloses a medical image segmentation method, a medical image segmentation device and a storage medium; after a medical image to be segmented is obtained, feature extraction can be carried out on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, the high-level feature map is identified by adopting different receptive fields to obtain local feature information, on the other hand, the correlation between each pixel point in the high-level feature map and the high-level feature map is calculated to obtain global feature information, then, the local feature information and the global feature information are fused, and a target area and a target object in the medical image to be segmented are segmented based on the fused high-level feature map and the plurality of low-level feature maps; the scheme can improve the accuracy of detail segmentation and improve the overall segmentation effect of the medical image.

Description

Medical image segmentation method, device and storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a medical image segmentation method, a medical image segmentation device and a storage medium.

Background

With the development of Artificial Intelligence (AI), AI is also becoming more widely used in the medical field. Taking medical image segmentation as an example, a convolutional neural Network is generally adopted to extract high-level features of an image to be segmented, then a Pyramid Scene analysis Network (PSPNet, Pyramid segmenting Network) and an image segmentation model (deep lab) are adopted to respectively adopt Pooling operations (Pooling) and void Convolution (atmospheric Convolution) operations of different scales on the extracted high-level features so as to fuse the high-level features and the low-level features, and accordingly, a target object is segmented from the image to be segmented, for example, a liver cancer region is segmented from a nuclear magnetic resonance image of a liver, and the like.

However, in the process of research and practice of the prior art, the inventor of the present invention finds that the high-level features extracted by the existing scheme have very limited feature expression capability, which results in low accuracy of detail segmentation and greatly affects the overall segmentation effect of the medical image.

Disclosure of Invention

The embodiment of the invention provides a medical image segmentation method, a medical image segmentation device and a storage medium. The accuracy of detail segmentation can be improved, and the overall segmentation effect of the medical image is improved.

A medical image segmentation method, comprising:

acquiring a medical image to be segmented;

extracting the features of the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps;

identifying the high-level feature map by adopting different receptive fields to obtain local feature information;

calculating the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information;

fusing the local feature information and the global feature information to obtain a fused high-level feature map;

and respectively segmenting a target region and a target object in the medical image to be segmented based on the plurality of low-level characteristic graphs and the fused high-level characteristic graph to obtain a segmentation result.

Correspondingly, an embodiment of the present invention provides a medical image segmentation apparatus, including:

an acquisition unit for acquiring a medical image to be segmented;

the extraction unit is used for extracting the features of the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps;

the identification unit is used for identifying the high-level feature map by adopting different receptive fields to obtain local feature information;

the calculation unit is used for calculating the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information;

the fusion unit is used for fusing the local feature information and the global feature information to obtain a fused high-level feature map;

and the segmentation unit is used for segmenting the target region and the target object in the medical image to be segmented respectively based on the plurality of low-level characteristic graphs and the fused high-level characteristic graph to obtain a segmentation result.

Optionally, in some embodiments, the segmentation unit is specifically configured to predict a type of each pixel point in the medical image to be segmented according to the multiple low-level characteristic maps and the fused high-level characteristic map by using a trained image segmentation model, and segment a target region and a target object in the medical image to be segmented based on the predicted type to obtain a segmentation result.

Optionally, in some embodiments, the trained image segmentation model includes a region segmentation mesh and an object segmentation mesh, then:

the segmentation unit may be specifically configured to predict a type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic maps and the fused high-level characteristic map by using a region segmentation mesh to obtain a first pixel type, and segment a target region in the medical image to be segmented based on the predicted first pixel type; and predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation grid to obtain a second pixel type, and segmenting the target object in the medical image to be segmented based on the predicted second pixel type.

Optionally, in some embodiments, the segmentation unit may be specifically configured to, through a preset attention jump layer, respectively assign corresponding weights to the plurality of low-layer characteristic maps according to positions of the respective low-layer characteristic maps in the medical image to be segmented to obtain a plurality of first weighted low-layer characteristic maps, and predict a type of each pixel point in the medical image to be segmented according to the plurality of first weighted low-layer characteristic maps and the fused high-layer characteristic map by using a region segmentation mesh to obtain a first pixel type.

Optionally, in some embodiments, the segmentation unit may be specifically configured to, based on the predicted first pixel type, screen pixel points that meet a pixel type of the target region from the medical image to be segmented to obtain a candidate pixel set corresponding to the target region, determine boundary points of the target region in the medical image to be segmented according to the candidate pixel set corresponding to the target region, and segment the medical image to be segmented based on the boundary points of the target region in the medical image to be segmented to obtain the target region.

Optionally, in some embodiments, the segmentation unit may be specifically configured to assign corresponding weights to the plurality of low-level characteristic maps according to positions of the respective low-level characteristic maps in the medical image to be segmented and positions of the respective low-level characteristic maps in the target region by presetting the attention jump layer, obtain a plurality of second weighted low-level characteristic maps, use an object segmentation mesh, and predict a type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-level characteristic maps and the fused high-level characteristic map, so as to obtain the second pixel type.

Optionally, in some embodiments, the segmentation unit may be specifically configured to, based on the predicted second pixel type, screen pixel points that meet a pixel type of the target object from the medical image to be segmented to obtain a candidate pixel set corresponding to the target object, determine boundary points of the target object in the medical image to be segmented according to the candidate pixel set corresponding to the target object, and segment the medical image to be segmented based on the boundary points of the target object in the medical image to be segmented to obtain the target object.

Optionally, in some embodiments, the trained image segmentation model further includes a local pyramid attention network, where the local pyramid attention network includes a plurality of convolution layers with different receptive fields, and the identifying unit may be specifically configured to identify the high-level feature map through the local pyramid attention network to obtain local feature information.

Optionally, in some embodiments, the identification unit may be specifically configured to perform feature screening on the high-level feature map through the local pyramid attention network to obtain local feature maps of multiple scales, and perform weighted summation on the local feature maps of multiple scales to obtain local feature information.

Optionally, in some embodiments, the trained image segmentation model further includes a global attention network, where the global attention network includes a plurality of layers of convolutional layers arranged in parallel and having different channel sizes, and the computing unit is specifically configured to compute, through the global attention network, a correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information.

Optionally, in some embodiments, the convolutional layers in the global attention network include a first convolutional layer, a second convolutional layer and a third convolutional layer, then:

the calculation unit may be specifically configured to calculate, through the first convolution layer and the second convolution layer, a correlation between each pixel point in the high-level feature map, and calculate, by using a third convolution layer, a correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between each pixel point, so as to obtain global feature information.

The calculation unit may be specifically configured to normalize the correlation between the pixel points to obtain an attention map corresponding to the high-level feature map, perform convolution processing on the high-level feature map through a third convolution layer to obtain an initial feature map, and perform pixel-by-pixel multiplication on the attention map and the initial feature map to obtain global feature information.

Optionally, in some embodiments, the medical image segmentation apparatus may further include an acquisition unit and a training unit, as follows:

an acquisition unit for acquiring a medical image sample, the medical image sample being labeled with a target region and a target object;

and the training unit is used for utilizing an image segmentation model to segment a target area and a target object in the medical image sample respectively to obtain a predicted segmentation result, and converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result through an interpolation loss function to obtain a trained image segmentation model.

Optionally, in some embodiments, the training unit may be specifically configured to adjust, by using a cross entropy loss function, a parameter for pixel classification in the image segmentation model according to the label of the medical image sample and the predicted segmentation result, and adjust, by using an interpolation loss function, a parameter for segmentation in the image segmentation model according to the label of the medical image sample and the predicted segmentation result, to obtain the trained image segmentation model.

In addition, the embodiment of the present invention further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to perform the steps in any one of the medical image segmentation methods provided by the embodiments of the present invention.

After a medical image to be segmented is obtained, feature extraction can be carried out on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, the high-level feature map is identified by adopting different receptive fields to obtain local feature information, on the other hand, the correlation between each pixel point in the high-level feature map and the high-level feature map is calculated to obtain global feature information, then, the local feature information and the global feature information are fused, and a target area and a target object in the medical image to be segmented are segmented based on the fused high-level feature map and the plurality of low-level feature maps; according to the scheme, when the high-level features are processed, the global feature information (global context information) is also taken as one of the consideration factors while the local feature information is concerned, so that the dependency relationship from the local region to the global region can be better processed, the feature expression capability of the high-level features is improved, the accuracy of detail segmentation can be improved, and the overall segmentation effect of the medical image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a scene schematic diagram of a medical image segmentation method provided by an embodiment of the invention;

FIG. 2 is a flow chart of a medical image segmentation method provided by an embodiment of the invention;

FIG. 3 is an exemplary diagram of a trained image segmentation model provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of a receptive field in a medical image segmentation method provided by an embodiment of the invention;

FIG. 5 is a block diagram of a global-local attention module according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a local pyramid attention network according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a global attention network according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an attention jump layer provided by an embodiment of the present invention;

FIG. 9 is a diagram illustrating an exemplary structure of an image segmentation model according to an embodiment of the present invention;

FIG. 10 is a scene diagram illustrating enhancing continuity of predicted frames in a medical image segmentation method provided by an embodiment of the present invention;

FIG. 11 is a schematic flow chart of a medical image segmentation method provided by an embodiment of the invention;

FIG. 12 is a schematic diagram of pancreatic cancer (tumor) segmentation provided by an embodiment of the present invention;

FIG. 13 is a schematic diagram of brain tumor segmentation provided by an embodiment of the present invention;

FIG. 14 is a schematic structural diagram of a medical image segmentation apparatus provided by an embodiment of the present invention;

FIG. 15 is a schematic diagram of another structure of a medical image segmentation apparatus provided by an embodiment of the present invention;

fig. 16 is a schematic structural diagram of a network device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a medical image segmentation method, a medical image segmentation device and a storage medium. The medical image segmentation apparatus may be integrated in a network device, and the network device may be a server or a terminal.

By image segmentation is meant the technique and process of dividing an image into specific regions of unique properties and presenting objects of interest. In the embodiment of the present invention, the segmentation is mainly performed on the medical image, and a desired target region and a desired target object are found, for example, a liver region and a liver cancer region are segmented from the medical image, and the segmented target region and the segmented target object may be subsequently analyzed by a medical staff or other medical experts, so as to perform further operations.

For example, referring to fig. 1, taking the example that the medical image segmentation apparatus is integrated in a network device, after the network device acquires a medical image acquired by each medical image acquisition device, the network device may determine a medical image to be segmented according to the received medical image, perform feature extraction on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, on one hand, identify the high-level feature map by using different receptive fields (perceptual fields) to obtain local feature information, on the other hand, calculate a correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information, and then fuse the local feature information and the global feature information to obtain a fused high-level feature map, and segment a target region and a target object in the medical image to be segmented respectively based on the plurality of low-level feature maps and the fused high-level feature map, and obtaining a segmentation result.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

In this embodiment, a medical image segmentation apparatus will be described in terms of a medical image segmentation apparatus, where the medical image segmentation apparatus may be specifically integrated in a network device, and the network device may be a server or a terminal; the terminal may include a tablet Computer, a notebook Computer, a Personal Computer (PC), and other devices.

A medical image segmentation method, comprising: acquiring a medical image to be segmented, performing feature extraction on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, on one hand, identifying the high-level feature map by adopting different receptive fields to obtain local feature information, on the other hand, calculating the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information, then, fusing the local feature information and the global feature information to obtain a fused high-level feature map, and segmenting a target area and a target object in the medical image to be segmented respectively based on the plurality of low-level feature maps and the fused high-level feature map to obtain segmentation results.

As shown in fig. 2, the specific flow of the medical image segmentation method is as follows:

101. and acquiring a medical image to be segmented.

For example, the medical image segmentation apparatus may specifically receive a medical image transmitted by the medical image acquisition device, and determine a medical image to be segmented according to the received medical image, wherein the medical image is acquired by each medical image acquisition device, such as a Magnetic Resonance Imaging (MRI), such as a Computed Tomography (CT), such as a colposcope or an endoscope, and then provided to the medical image segmentation apparatus.

For example, if the medical image is a two-dimensional image, the two-dimensional image may be determined as a medical image to be segmented, and if the medical image is a three-dimensional image, the three-dimensional image may be divided into a plurality of single-frame slices (slices for short), where each slice is the medical image to be segmented.

The medical image to be segmented refers to a medical image that needs to be segmented, and the medical image refers to an image obtained by image-capturing a certain component of a living body, such as the brain, heart, spleen, stomach, or vagina of a human body, by a medical image capturing device. The living body is an independent individual which has a life form and can correspondingly reflect external stimuli, such as a human, a cat or a dog.

102. And performing feature extraction on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps.

The high-level feature map is a feature map finally output by the residual error network, and the "high-level feature" may generally include information related to a category, a high-level abstraction, and the like. The low-level feature map refers to a feature map obtained by a residual network in a process of extracting features of a medical image to be segmented, and the low-level feature map may generally include image details such as edges and textures. For example, taking the example that the residual error network includes a plurality of residual error modules connected in series, at this time, the high-level feature map refers to the feature map output by the last residual error module, and the low-level feature map refers to the feature maps output by the residual error modules other than the first residual error module and the last residual error module.

For example, as shown in fig. 3, if the residual network includes a residual Block 1(Block1), a residual Block 2(Block2), a residual Block 3(Block3), a residual Block 4(Block4), and a residual Block 5(Block5), the feature map output by the residual Block5 is a high-level feature map, and the feature maps output by the residual Block2, the residual Block3, and the residual Block4 are low-level feature maps.

103. And identifying the high-level characteristic diagram by adopting different receptive fields to obtain local characteristic information.

In the convolutional neural network, the receptive field determines the area size of an input layer corresponding to one element in an output result of a certain layer. That is, the receptive field is the size of the mapping of the element points on the output result (i.e., feature map) of a certain layer in the convolutional neural network on the input image, for example, see fig. 4. Typically, a first layer of a convolutional layer (e.g., C)₁) The size of the field of the output feature map pixels of (a) is equal to the size of the convolution kernel, while the higher layer convolution layer (e.g. C) is₄Etc.) has a relation with the sizes and step lengths of convolution kernels of all previous layers, so that information of different levels can be captured based on different receptive fields, and the purpose of extracting characteristic information of different scales is achieved.

Optionally, a local pyramid attention network may be established, and different receptive fields are respectively set for the multilayer convolution layers of the local pyramid attention network, so that the local pyramid attention network may be used to process the high-level feature map to obtain local feature information; that is, the step of "identifying the high-level feature map by using different receptive fields to obtain local feature information" may include:

and identifying the high-level feature map through the local pyramid attention network to obtain local feature information. Wherein, the local pyramid attention network comprises a plurality of layers of convolution layers with different receptive fields.

For example, the high-level feature map may be specifically subjected to feature screening through the local pyramid attention network to obtain local feature maps of multiple scales, and then the local feature maps of the multiple scales are subjected to weighted summation to obtain local feature information.

For example, referring to fig. 5 and fig. 6, taking the convolutional layer of the local pyramid Attention network including Conv1, Conv2, and Conv3 as an example, at this time, the higher-level feature Map may be convolved by using Conv1, Conv2, and Conv3 in sequence, where the convolutional cores of Conv1, Conv2, and Conv3 may all be determined according to the requirements of the practical application, for example, may be set to 3 × 3, and in addition, the receptive fields of Conv1, Conv2, and Conv3 are also gradually expanded (that is, the receptive field of Conv3 > the receptive field of Conv2 > the receptive field of Conv 1), so that each convolutional layer may be used to generate an Attention Map (Attention Map) of a specific scale, and perform corresponding scale feature on the higher-level feature Map according to the generated Attention Map, to obtain a plurality of local pyramid Attention maps 1, such as local feature Map 1, and generate a local pyramid Attention Map 35 Conv2 generates local feature map 2, Conv3 generates local feature map 3, and so on, after which the local feature maps of the multiple scales can be weighted and summed to obtain local feature information.

104. And calculating the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information.

For example, a global attention network may be established, and convolutional layers with different channel sizes are set in the global attention network, so that at this time, the correlation between each pixel point in the high-level feature map and the high-level feature map may be calculated through the global attention network to obtain global feature information, for example, see fig. 5.

The correlation refers to a degree of association between two variables, that is, the degree of association between each pixel point in the high-level feature map and the high-level feature map can be calculated through the global attention network to obtain global feature information.

For example, taking an example that the convolutional layers in the global attention network include a first convolutional layer, a second convolutional layer and a third convolutional layer, in this case, the step "calculating the correlation between each pixel point in the high-level feature map and the high-level feature map through the global attention network to obtain global feature information" may include:

calculating the correlation between each pixel point in the high-level feature map through the first convolution layer and the second convolution layer; and calculating the correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between the pixel points by using the third convolution layer to obtain global feature information.

Wherein, the step of calculating the correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between the pixel points by using the third convolution layer to obtain the global feature information may include:

normalizing the correlation among the pixel points to obtain an attention map corresponding to the high-level feature map; performing convolution processing on the high-level feature map through a third convolution layer to obtain an initial feature map; and multiplying the attention map and the initial feature map pixel by pixel to obtain global feature information.

For example, as shown in fig. 7, if the global attention network includes three convolutional layers: f. g and h, calculating the correlation between each pixel point in the high-level feature map through f and g respectively, normalizing the correlation between each pixel point to obtain an attention map corresponding to the high-level feature map, and then acting the attention map on the high-level feature map to obtain features focused on the basis of global context, namely global feature information corresponding to the high-level feature map.

Wherein, the relationship att of every two pixel points (i.e. positions) j and i in the attention map corresponding to the high-level feature map_j,iThe calculation can be as follows:

wherein the content of the first and second substances,

s_i,j＝f(x_i)^Tg(x_j),

optionally, there are multiple ways of applying the attention map to the high-level feature map, for example, the high-level feature map may be convolved by h to obtain an initial feature map, and then the attention map and the initial feature map are multiplied by each other pixel by pixel to obtain global feature information corresponding to the high-level feature map, and the like, which is expressed by a formula:

wherein, O_jThe global feature information corresponding to the pixel point j, f (x), g (x), and h (x) are functions corresponding to the network structures of f, g, and h, respectively, wherein:

f(x)＝W_fx；

g(x)＝W_gx；

h(x)＝W_hx；

W_f、W_gand W_hThe network structure parameters f, g and h may be determined according to the requirements of the actual application. For example, the convolution kernels of its convolutional layers may each be set to 1 × 1, and so on.

Using the above formula for O_jAnd calculating to obtain the global feature information corresponding to each pixel point in the high-level feature map, and collecting the global feature information corresponding to all the pixel points in the high-level feature map to obtain the global feature information corresponding to the high-level feature map.

It should be noted that, for convenience of description, in the embodiment of the present invention, the local pyramid attention network and the global attention network are collectively referred to as "global-local attention module", see fig. 3; i.e. the structure of the global-local attention module, can be seen in particular in fig. 5.

The execution sequence of

steps

103 and 104 may not be sequential.

105. And fusing the local feature information and the global feature information to obtain a fused high-level feature map.

For example, as shown in fig. 5, the high-level feature map, the local feature information corresponding to the high-level feature map, and the global feature information corresponding to the high-level feature map may be subjected to weighted summation to obtain the fused high-level feature map.

The weight of the high-level feature map, the weight of the local feature information corresponding to the high-level feature map, and the weight of the global feature information corresponding to the high-level feature map may be flexibly set and adjusted according to the requirements of practical applications, which is not described herein again.

106. And respectively segmenting a target region and a target object in the medical image to be segmented based on the plurality of low-level characteristic graphs and the fused high-level characteristic graph to obtain a segmentation result.

For example, a trained image segmentation model may be specifically used, the type of each pixel point in the medical image to be segmented is predicted according to the plurality of low-level feature maps and the fused high-level feature map, and the target region and the target object in the medical image to be segmented are segmented based on the predicted type to obtain a segmentation result.

The target region includes a target object, and both the target region and the target object may be set according to requirements of practical applications, for example, if the target object is a left atrium, the target region may be set as a heart, if the target object is a duodenum, the target region may be set as a stomach, if the target object is a liver cancer, the target region is a liver, if the target object is a brain cancer, the target region is a brain, and the like. Optionally, the target object may also be provided in multiple numbers, where one target object corresponds to a region with a certain area in the target region, and an inclusion relationship exists between the regions of the target objects; for example, if the target area is a chest region, the first target object may be arranged as a heart (where the heart is located in the chest region), while the second target object may be arranged as a left atrium (where the left atrium is part of the heart), and so on. For convenience of description, in the embodiments of the present invention, a target object will be taken as an example for illustration.

The structure of the trained image segmentation model may be determined according to the requirements of practical applications, and may include, for example, a region segmentation network and an object segmentation network (the region segmentation network and the object segmentation network are arranged in parallel), where each region segmentation network corresponds to one target region and each object segmentation network corresponds to one target object; if a plurality of target objects need to be divided, a plurality of parallel region division objects may be set, and if a plurality of target objects exist, the target objects need to be in a hierarchical descending relationship, for example, target object 1 includes target object 2, target object 2 includes target object 3, and so on.

Taking the example that the trained image segmentation model includes a region segmentation network and an object segmentation network, the step "adopting the trained image segmentation model, predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic graphs and the fused high-level characteristic graph, and segmenting the target region and the target object in the medical image to be segmented based on the predicted type" may include:

(1) and predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting a region segmentation grid to obtain a first pixel type, and segmenting a target region in the medical image to be segmented based on the predicted first pixel type.

For example, the low-layer characteristic map may be directly fused with the output of the corresponding convolutional layer in the region segmentation mesh (for example, simple feature addition or channel stitching is performed), and then the fused low-layer characteristic map is used as the input of the next convolutional layer of the corresponding convolutional layer, so as to predict the type of each pixel point in the medical image to be segmented, and segment the target region based on the type.

For example, referring to fig. 3 (ignoring the attention jump layer therein), the area-dividing mesh may include convolutional layers H1_5, H1_4, H1_3, H1_2, and H1_1, if the low-level feature map output by the residual module 2 is "low-level feature map 2", the low-level feature map output by the residual module 3 is "low-level feature map 3", and the low-level feature map output by the residual module 4 is "low-level feature map 4", after the high-level feature map is processed by H1_5, the outputs of the low-level feature map 4 and H1_5 may be directly merged to serve as the input of H1_ 4; then fusing the output of the low-level feature diagram 3 and the output of H1_4 to be used as the input of H1_ 3; similarly, the low-level feature map 2 may be fused with the output of H1_3 to serve as the input of H1_2, and so on, and finally the target region is output by H1_ 1.

Optionally, since not all the low-level features act on the specific segmentation task, that is, different features act in the specific task with different weights, in order to effectively assign the different features with their due importance, so that the features can be better utilized to improve the accuracy of image segmentation, the attention mechanism can be used to enable the network to automatically assign different weights to the low-level feature maps at different positions, so that the network can selectively fuse the low-level features, that is, specifically, the module that can assign different weights to the low-level feature maps at different positions by using the attention mechanism in the embodiment of the present invention can assign corresponding weights to the low-level feature maps according to the positions of the low-level feature maps in the medical image to be segmented, and then fuse the weights with the outputs of the corresponding convolutional layers in the region segmentation grid, referred to as an "attention Skip Layer"; that is to say, the step "adopting the region segmentation mesh, predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic maps and the fused high-level characteristic map, and obtaining the first pixel type" may include:

respectively giving corresponding weights to the plurality of low-level characteristic graphs according to the positions of the low-level characteristic graphs in the medical image to be segmented by presetting the attention jump layers to obtain a plurality of first weighted low-level characteristic graphs; and predicting the type of each pixel point in the medical image to be segmented according to the plurality of weighted low-level characteristic graphs and the fused high-level characteristic graph by adopting a region segmentation grid to obtain a first pixel type.

Then, a target region in the medical image to be segmented may be segmented based on the predicted first pixel type, for example, based on the predicted first pixel type, a pixel point meeting the pixel type of the target region may be screened from the medical image to be segmented to obtain a candidate pixel set corresponding to the target region, then, a boundary point of the target region in the medical image to be segmented is determined according to the candidate pixel set corresponding to the target region, and the medical image to be segmented is segmented based on the boundary point of the target region in the medical image to be segmented to obtain the target region, and so on.

For example, as shown in fig. 3, after the high-level feature map is processed in H1_5, an attention jump layer may be used to give corresponding weights to the low-level feature map 4 according to the position of the low-level feature map 4 in the medical image to be segmented, so as to obtain a first weighted low-level feature map 4, and then the first weighted low-level feature map 4 is fused (e.g., feature addition or channel stitching) with the output of H1_5 to serve as the input of H1_ 4; similarly, the attention jump layer may be adopted, and corresponding weights are given to the low-level characteristic map 3 according to the position of the low-level characteristic map 3 in the medical image to be segmented to obtain a first weighted low-level characteristic map 3, and then the first weighted low-level characteristic map 3 is fused with the output of H1_4 (for example, feature addition or channel splicing) to be used as the input of H1_ 3; and giving corresponding weight to the low-level characteristic diagram 2 according to the position of the low-level characteristic diagram 2 in the medical image to be segmented by adopting the attention jump layer to obtain a first weighted low-level characteristic diagram 2, then fusing the first weighted low-level characteristic diagram 2 with the output of H1_3 (such as characteristic addition or channel splicing), taking the fused result as the input of H1_2, and so on, and finally outputting the target region by H1_ 1.

(2) And predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation grid to obtain a second pixel type, and segmenting the target object in the medical image to be segmented based on the predicted second pixel type.

For example, the low-level characteristic map may be directly fused with the output of the corresponding convolutional layer in the object segmentation mesh, and then used as the input of the next convolutional layer of the corresponding convolutional layer, so as to predict the type of each pixel point in the medical image to be segmented, and segment the target object based on the type.

For example, referring to fig. 3 (ignoring the attention jump layer therein), the object segmentation mesh may include convolutional layers H2_5, H2_4, H2_3, H2_2, and H2_1, if the low-level feature map output by the residual module 2 is "low-level feature map 2", the low-level feature map output by the residual module 3 is "low-level feature map 3", and the low-level feature map output by the residual module 4 is "low-level feature map 4", after the high-level feature map is processed by H2_5, the outputs of the low-level feature map 4 and H2_5 may be directly merged to serve as the input of H2_ 4; then fusing the output of the low-level feature diagram 3 and the output of H2_4 to be used as the input of H2_ 3; similarly, the low-level feature map 2 may be fused with the output of H2_3 to serve as the input of H2_2, and so on, and finally the target object is output by H2_ 1.

Optionally, not all the low-level features are effective for a specific segmentation task, that is, different features have different weights in effective functions of a specific task, and some feature classes are not simply a binary problem but have a certain correlation. Taking the segmentation of the liver and the liver cancer as an example, the liver cancer is located inside a liver organ, and the accurate identification of the liver is undoubtedly crucial to further improve the positioning of the liver cancer, so that in order to effectively endow different features with due importance, so that the features can be further better utilized, and the accuracy of image segmentation can be improved, on the basis of endowing corresponding weights to the plurality of low-layer characteristic diagrams according to the positions of the low-layer characteristic diagrams in the medical image to be segmented, the corresponding weights can be further endowed to the plurality of low-layer characteristic diagrams respectively according to the positions of the low-layer characteristic diagrams in a target area, and then the low-layer characteristic diagrams are fused with the output of corresponding convolutional layers in the object segmentation grid, namely, the step of adopting the object segmentation grid, predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic diagrams and the fused high-layer characteristic diagram, deriving the second pixel type "may include:

respectively giving corresponding weights to the plurality of low-level characteristic graphs according to the positions of the low-level characteristic graphs in the medical image to be segmented and the positions of the low-level characteristic graphs in the target area through a preset attention jump layer to obtain a plurality of second weighted low-level characteristic graphs; and predicting the type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-level characteristic graphs and the fused high-level characteristic graph by adopting an object segmentation grid to obtain a second pixel type.

Then, the target object in the medical image to be segmented may be segmented based on the predicted second pixel type, for example, specifically, based on the predicted second pixel type, a pixel point meeting the pixel type of the target object is screened from the medical image to be segmented to obtain a candidate pixel set corresponding to the target object, a boundary point of the target object in the medical image to be segmented is determined according to the candidate pixel set corresponding to the target object, and the medical image to be segmented is segmented based on the boundary point of the target object in the medical image to be segmented to obtain the target object.

For example, as shown in fig. 3, after the high-level feature map is processed by H2_5, an attention jump layer may be used to assign corresponding weights to the low-level feature map 4 according to the position of the low-level feature map 4 in the medical image to be segmented to obtain a first weighted low-level feature map 4, and then assign corresponding weights to the first weighted low-level feature map 4 according to the position of the low-level feature map 4 in the target region by another attention jump layer to obtain a second weighted low-level feature map 4, and then the second weighted low-level feature map 4 is fused (e.g., feature addition or channel splicing) with the output of H2_5 to be used as the input of H2_ 4; similarly, an attention jump layer may be adopted, and corresponding weight is given to the low-layer characteristic map 3 according to the position of the low-layer characteristic map 3 in the medical image to be segmented to obtain a first weighted low-layer characteristic map 3, and then, through another attention jump layer, corresponding weight is given to the first weighted low-layer characteristic map 3 according to the position of the low-layer characteristic map 3 in the target region to obtain a second weighted low-layer characteristic map 3, and then, the second weighted low-layer characteristic map 3 is fused (for example, feature addition or channel splicing) with the output of H2_4 to be used as the input of H2_ 3; and using an attention jump layer, giving corresponding weight to the low-layer characteristic diagram 2 according to the position of the low-layer characteristic diagram 2 in the medical image to be segmented to obtain a first weighted low-layer characteristic diagram 2, giving corresponding weight to the first weighted low-layer characteristic diagram 2 according to the position of the low-layer characteristic diagram 2 in the target region through another attention jump layer to obtain a second weighted low-layer characteristic diagram 2, fusing the second weighted low-layer characteristic diagram 2 with the output of H2_3 (such as feature addition or channel splicing) to be used as the input of H2_2, and so on, and finally outputting the target object by H2_ 1.

Optionally, if there are multiple target objects, when weights are assigned to the multiple low-level property graphs, the position of each low-level property graph in the previous target object (i.e., the previous target object of the current target object, where the previous target object includes the current target object) may also be used as one of the weighing factors. For example, there are two target objects, one for "whole tumor region", one for "tumor nucleus", corresponding object segmentation networks, namely a 'tumor whole area' segmentation network and a 'tumor kernel' segmentation network, can be respectively set for the two target objects, so that the layer can be jumped through attention, respectively endowing corresponding weights to the plurality of low-level characteristic maps according to the position of each low-level characteristic map in the medical image to be segmented and the position of each low-level characteristic map in the target area to obtain a plurality of second weighted low-level characteristic maps 1, then, a tumor whole region segmentation grid is adopted, the type of each pixel point in the medical image to be segmented is predicted according to the plurality of weighted low-layer characteristic graphs 1 and the fused high-layer characteristic graph to obtain a second pixel type 1, and the tumor whole region in the medical image to be segmented is segmented based on the predicted second pixel type 1. Respectively giving corresponding weights to the plurality of low-layer characteristic graphs according to the positions of the low-layer characteristic graphs in the medical image to be segmented, the positions of the low-layer characteristic graphs in the target area and the positions of the low-layer characteristic graphs in the whole tumor area of the first target object through the attention jump layer to obtain a plurality of second weighted low-layer characteristic graphs 2, then adopting a tumor kernel segmentation grid, predicting the type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-layer characteristic graphs 2 and the fused high-layer characteristic graph to obtain a second pixel type 2, and segmenting the tumor kernel in the medical image to be segmented based on the predicted second pixel type 2; and so on, and so on.

It should be noted that the structure of the Attention jump layer provided in the embodiment of the present invention may be set according to the requirements of practical applications, for example, the Attention jump layer may be implemented by using an Attention mechanism such as Spatial Attention (Spatial Attention) and/or Channel Attention (Channel Attention). Where spatial attention emphasizes the importance of different regions on the image plane, while channel attention emphasizes the importance of different channel profiles.

For example, as shown in fig. 8, after extracting the features of the high-level feature map, the convolutional layer of the partition network may send the extracted high-level features as guidance information to a 3x3 convolutional layer and a non-linear unit (e.g., sigmoid activation function) for processing, so as to obtain a planar weight matrix a _ s, and after performing convolution processing on the high-level features by global pooling (global pooling) and 1x1, obtain a weight vector a _ c on the channel. The plane weight matrices a _ s and a _ c are continuously applied (e.g., by matrix dot multiplication) to the underlying feature map, so as to perform feature screening on the underlying feature map in a plane space and/or a channel, that is, after the feature screening, a first weighted low-level feature map or a second weighted low-level feature map can be obtained.

After that, the filtered features and the extracted high-level features may be fused, for example, the filtered features and the extracted high-level features may be added, and then sent to a layer of 3x3 convolutional layer for convolution processing, and the result after convolution processing is up-sampled by using bilinear interpolation to obtain a feature matrix with 2 times of resolution, and then the obtained feature matrix with 2 times of resolution is used as the input of the next 3x3 convolutional layer and processed by using the above process, and the above process is continued until a segmentation prediction map with the same size as the original image is obtained, so that the required segmentation result, that is, the predicted target region or target object, may be obtained.

Because the application process of the attention mechanism enhances the activation value of the important feature of a specific segmentation task and ignores irrelevant features as much as possible, due importance can be effectively given to different features, so that the features can be further better utilized, and the accuracy of image segmentation can be improved.

Optionally, after the predicted target area and target object are obtained, the target area and target object may be output for a user, such as a medical staff, to browse for further judgment.

It should be noted that, if the medical image to be segmented is a slice, the target region and the target object obtained at this time are the target region and the target object in the slice, and therefore, the target region and the target object of the slice belonging to the same three-dimensional medical image may also be integrated to obtain the target region and the target object corresponding to the three-dimensional medical image, that is, the target region and the target object in a three-dimensional format may be obtained.

In addition, it should be further noted that, in the embodiment of the present invention, the trained image segmentation model may include, in addition to the residual error network, the region segmentation network and the object segmentation network, a global-local attention module (such as a local pyramid attention network and a global attention network), an attention jump layer and the like provided in the embodiment of the present invention, that is, the local pyramid attention network, the global attention network, the attention jump layer and other modules may also be regarded as a part of the trained image segmentation model; the residual network, the local pyramid attention network, the global attention network, the attention jump layer, and other modules may be regarded as an encoder (encoder) portion of the trained image segmentation model for feature extraction, and the region segmentation network and the object segmentation network may be regarded as a decoder portion of the trained image segmentation model for classifying and segmenting pixel types according to the extracted features, that is, the region segmentation network and the object segmentation network use the same encoder, as shown in fig. 3.

Optionally, the trained image segmentation model may be set by an operation and maintenance person in advance, or may be obtained by self-training of the image segmentation apparatus. Before the step of predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic graphs and the fused high-level characteristic graph by using the trained image segmentation model, the image segmentation method may further include:

s1, acquiring a medical image sample, the medical image sample labeling the target region and the target object.

For example, a plurality of medical images may be collected as an original data set, for example, the original data set is obtained from a database or a network, then the medical images in the original data set are preprocessed to obtain images meeting the input standard of a preset segmentation network, and then the preprocessed images are labeled with a target region and a target object, so as to obtain a plurality of medical image samples labeled with the target region and the target object.

The preprocessing may include operations such as deduplication, cropping, rotation, and/or flipping. For example, taking the input size of the predetermined segmentation network as "128 × 32" (width × height × depth) "as an example, the images in the original data set may be cut to" 28 × 128 × 32 ", and of course, other preprocessing operations may be further performed on these images.

And S2, segmenting the target region and the target object in the medical image sample by using the image segmentation model respectively to obtain a predicted segmentation result.

For example, feature extraction may be performed on a medical image sample through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps corresponding to the medical image sample, then, on one hand, the high-level feature map is identified by using different receptive fields to obtain local feature information, on the other hand, correlation between each pixel point in the high-level feature map and the high-level feature map is calculated to obtain global feature information, then, the local feature information and the global feature information are fused to obtain a fused high-level feature map, and a target region and a target object in the medical image sample are respectively segmented based on the plurality of low-level feature maps and the fused high-level feature map to obtain a predicted segmentation result.

The method for acquiring the local feature information, the global feature information, and the fused high-level feature map, and the method for segmenting the target region and the target object in the medical image sample are the same as the method for processing the image to be segmented, which may be specifically referred to in the foregoing embodiments and will not be described herein again.

And S3, converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result to obtain the trained image segmentation model.

Since a medical image is generally in a three-dimensional data format such as CT or MRI, and may include a plurality of single-frame slices (slices for short), and a plurality of consecutive slices of the same organ tissue often have relatively strong continuity, the image segmentation method provided in the embodiment of the present invention mainly performs independent segmentation on the slices and integrates the slices into a final three-dimensional result, so that in order to avoid discontinuity of the segmentation result in the Z-axis direction, an interpolation loss function may be used to perform convergence to improve the continuity of the segmentation result. That is, the step of "converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result to obtain the trained image segmentation model" may include:

and converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result through an interpolation loss function to obtain the trained image segmentation model. For example, the following may be specifically mentioned:

and adjusting parameters for pixel classification in the segmentation grid according to the label of the medical image sample and the predicted segmentation result by adopting a Dice function (a loss function), and adjusting the parameters for segmentation in the segmentation grid according to the label of the medical image sample and the predicted segmentation result by an interpolation loss function to obtain a trained image segmentation model.

Optionally, in order to improve the pixel classification accuracy, in addition to using the Dice function, other loss functions, such as a cross entropy loss function, may also be used for convergence, that is, the step "converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using an interpolation loss function, so as to obtain the trained image segmentation model" may also include:

and adjusting parameters for pixel classification in the image segmentation model according to the label of the medical image sample and the predicted segmentation result by adopting a cross entropy loss function, and adjusting the parameters for segmentation in the image segmentation model according to the label of the medical image sample and the predicted segmentation result by adopting an interpolation loss function to obtain the trained image segmentation model.

As can be seen from the above, in this embodiment, after the medical image to be segmented is obtained, feature extraction may be performed on the medical image to be segmented through a residual network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, the high-level feature map is identified by using different receptive fields to obtain local feature information, on the other hand, correlation between each pixel point in the high-level feature map and the high-level feature map is calculated to obtain global feature information, and then, the local feature information and the global feature information are fused, and a target region and a target object in the medical image to be segmented are segmented based on the fused high-level feature map and the plurality of low-level feature maps; according to the scheme, when the high-level features are processed, the global feature information (global context information) is also taken as one of the consideration factors while the local feature information is concerned, so that the dependency relationship from the local region to the global region can be better processed, the feature expression capability of the high-level features is improved, the accuracy of detail segmentation can be improved, and the overall segmentation effect of the medical image is improved.

The method described in the above examples is further illustrated in detail below by way of example.

In this embodiment, the medical image segmentation apparatus is specifically integrated in a network device, and a target region of the medical image segmentation apparatus is specifically a liver region, and a target object is specifically a liver cancer region.

And (I) training an image segmentation model.

First, the network device may collect a plurality of medical images related to liver and liver cancer, for example, acquire a plurality of medical images related to liver and liver cancer from a database or a network, and then preprocess the medical images, for example, perform operations such as deduplication, cropping, rotation and/or inversion, to obtain images that meet input criteria of a preset segmentation network, and then mark liver and liver cancer portions in the preprocessed images, so as to obtain a plurality of medical image samples marked with liver and liver cancer portions.

Secondly, the network device may input the medical image sample to a preset image segmentation model, perform feature extraction on the medical image sample through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps corresponding to the medical image sample, then, on one hand, identify the high-level feature map through a local pyramid attention network (including a plurality of convolutional layers with different receptive fields) to obtain local feature information corresponding to the high-level feature map, on the other hand, calculate the correlation between each pixel point in the high-level feature map and the high-level feature map through a global attention network to obtain global feature information corresponding to the high-level feature map, then, fuse the local feature information and the global feature information to obtain a fused high-level feature map, and segment a target region and a target object in the medical image sample based on the plurality of low-level feature maps and the fused high-level feature map respectively, and obtaining a predicted segmentation result.

Furthermore, the network device converges the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using an interpolation loss function to obtain a trained image segmentation model, for example, a cross entropy loss function may be specifically used to adjust parameters for pixel classification in the image segmentation model according to the label of the medical image sample and the predicted segmentation result, and adjust parameters for segmentation in the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using the interpolation loss function to obtain the trained image segmentation model.

The cross entropy loss function may be set according to the condition of the decoder portion (i.e., the region segmentation network and the object segmentation network) in the image segmentation model, for example, the cross entropy loss of each decoder may be calculated, and then the cross entropy loss function is determined according to the cross entropy loss of each decoder and the number of decoders, and is expressed by a formula:

wherein L is_clsFor the cross entropy loss function, K represents the number of decoder branches,

represents the cross-entropy loss, λ, of the ith decoder_iThe corresponding weight is lost for each decoder.

For example, as shown in FIG. 9, in the case of liver and liver cancer segmentation, there are two segmentation networks, namely a region segmentation network and an object segmentation networkThe decoder branches, so k is 2 at this time, and in addition, λ may be set₁＝0.5，λ₂1.0), the cross-loss function of the region segmentation network in the image segmentation model shown in fig. 9 can be obtained as follows:

and, the cross-loss function of the object segmentation network in the image segmentation model shown in fig. 9 can be obtained as follows:

and so on, and so on.

Similarly, the interpolation loss function may also be set according to the requirements of practical applications, such as the number of predicted segmentation results (prediction maps for short). For example, the image segmentation model receives 5 frames of continuous slices as input, and outputs a prediction map of the middle 3 frames

And

for example. Then to enhance the continuity of the 3-frame prediction graph, a pre-trained interpolation network command may be used as shown in FIG. 10

And

to interpolate the intermediate prediction map to obtain

And obtained by the interpolation

From the original prediction map

A Dice loss function is calculated.

Wherein the Dice loss function L_diceThe definition is as follows:

where N is the total number of pixel values of the prediction map, x_iAnd y_iThe pixel values of the prediction maps x and y at the ith position, respectively.

After the Dice loss function is obtained, the intermediate frame picture directly predicted and the predicted picture obtained by interpolation are made to approach as much as possible, so that certain continuity is increased between adjacent predicted pictures, and then the interpolation loss function is obtained, and the definition of the interpolation loss function can be specifically as follows:

where I is an interpolation network, L_diceIs a Dice loss function.

From this, a total loss function L for optimizing the network training can be derived_totalComprises the following steps:

it should be noted that, in the embodiment of the present invention, the model may be optimized by using a self-Learning rate optimization algorithm based on Adam, and the specific parameters may be determined according to the requirements of the practical application, for example, the initial Learning rate (Learning rate is finer when Learning rate is smaller, but Learning speed is also reduced and convergence is slower) may be set to 1e-3, the batch size may be set to 16, 50 epochs are trained (1 epoch is equal to one time of training using all samples in a training set), and the like, which are not described herein again.

And secondly, segmenting the medical image to be segmented by the trained image segmentation model.

The trained image segmentation model comprises a residual network, a global-local attention module, an attention jump layer and a segmentation network, wherein the global-local attention module comprises a local pyramid attention network and a global attention network, and the segmentation network comprises a region segmentation network H1 for liver region segmentation and an object segmentation network H2 for liver cancer region segmentation.

As shown in fig. 11, a medical image segmentation method may specifically include the following steps:

201. the network device acquires a medical image to be segmented about the liver.

For example, the network device may receive medical images transmitted after image acquisition of the liver of the human body by various medical image acquisition devices, such as MRI or CT, and then determine the medical image to be segmented according to the received medical images.

For example, if the medical image is a two-dimensional image, the two-dimensional image may be determined as a medical image to be segmented, and if the medical image is a three-dimensional image, the three-dimensional image may be divided into a plurality of slices, where each slice is a medical image to be segmented, and so on.

202. The network equipment acquires the trained image segmentation model, and performs feature extraction on the medical image to be segmented through a residual error network of the image segmentation model to obtain a high-level feature map and a plurality of low-level feature maps.

For example, as shown in fig. 9, if the residual network includes a residual Block 1(Block1), a residual Block 2(Block2), a residual Block 3(Block3), a residual Block 4(Block4), and a residual Block 5(Block5), the feature map output by the residual Block5 is a high-level feature map, and the feature maps output by the residual Block2, the residual Block3, and the residual Block4 are low-level feature maps.

203. And the network equipment identifies the high-level feature map through the local pyramid attention network to obtain local feature information.

For example, the network device may perform feature screening on the high-level feature map through the local pyramid attention network to obtain local feature maps of multiple scales, and then perform weighted summation on the local feature maps of multiple scales to obtain local feature information.

Wherein, the local pyramid attention network comprises a plurality of layers of convolution layers with different receptive fields. For example, taking the example that the convolutional layer of the local pyramid attention network includes Conv1, Conv2, and Conv3, and the perceptual domain of Conv3 > the perceptual domain of Conv2 > the perceptual domain of Conv1, at this time, Conv1, Conv2, and Conv3 may respectively generate attention maps of corresponding scales, and perform feature selection of corresponding scales on the high-level feature map according to the generated attention maps, so as to obtain local feature maps of multiple scales, such as local feature map 1, local feature map 2, and local feature map 3, and then perform weighted summation on the local feature maps of multiple scales, so as to obtain local feature information.

204. And the network equipment calculates the correlation between each pixel point in the high-level feature map and the high-level feature map through the global attention network to obtain global feature information.

For example, taking the example that the convolutional layer in the global attention network includes convolutional layer f, convolutional layer g and convolutional layer h, at this time, the correlation between each pixel in the high-level feature map may be calculated through convolutional layer f and convolutional layer g, and normalized to obtain the attention map corresponding to the high-level feature map, then, the high-level feature map is convolved through convolutional layer h to obtain an initial feature map, and then the attention map and the initial feature map are multiplied pixel by pixel to obtain global feature information. Is expressed by the formula:

wherein, O_jIs global characteristic information corresponding to the pixel point j att_j,iFor the relationship between each two pixel points (i.e. positions) j and i in the attention map, the formula is as follows:

wherein the content of the first and second substances,

s_i,j＝f(x_i)^Tg(x_j),

further, f (x), g (x), and h (x) are functions corresponding to the network structures of f, g, and h, respectively, wherein:

f(x)＝W_fx；

g(x)＝W_gx；

h(x)＝W_hx；

wherein, W_f、W_gAnd W_hThe network structure parameters f, g and h may be determined according to the requirements of the actual application. For example, the convolution kernels of its convolutional layers may each be set to 1 × 1, and so on.

It should be noted that, the execution order of

steps

203 and 204 may not be sequential.

205. And the network equipment performs weighted summation on the high-level feature map, the local feature information and the global feature information to obtain the fused high-level feature map.

The weight of the high-level feature map, the weight of the local feature information, and the weight of the global feature information may be flexibly set and adjusted according to the requirements of practical applications, which are not described herein again.

206. The network equipment predicts the type of each pixel point in the medical image to be segmented according to the low-layer characteristic graphs and the fused high-layer characteristic graph by adopting a region segmentation grid to obtain a predicted first pixel type, and segments the liver region in the medical image to be segmented based on the predicted first pixel type. For example, referring to fig. 9, the following may be specified:

the network device respectively gives corresponding weights to the plurality of low-layer characteristic graphs according to the positions of the low-layer characteristic graphs in the medical image to be segmented through the attention jump layer to obtain a plurality of first weighted low-layer characteristic graphs, then, the type of each pixel point in the medical image to be segmented is predicted according to the plurality of first weighted low-layer characteristic graphs and the fused high-layer characteristic graph by adopting a region segmentation grid, and for example, the type of each pixel point in the medical image to be segmented can be predicted according to which pixel points belong to a liver region and which pixel points do not belong to the liver region to obtain a first pixel type.

Then, the network device may screen, based on the predicted first pixel type, pixel points that meet the pixel type of the liver region (i.e., the target region) from the medical image to be segmented to obtain a candidate pixel set corresponding to the liver region, determine boundary points of the liver region in the medical image to be segmented according to the candidate pixel set corresponding to the liver region, and segment the medical image to be segmented according to the boundary points to obtain the predicted liver region.

For example, as shown in fig. 9, after the high-level feature map is processed in H1_5, an attention jump layer may be used to give corresponding weights to the low-level feature map 4 according to the position of the low-level feature map 4 in the medical image to be segmented, so as to obtain a first weighted low-level feature map 4, and then the output of the first weighted low-level feature map 4 and the output of H1_5 are fused and used as the input of H1_ 4; similarly, the attention jump layer may be adopted, and corresponding weight is given to the low-level characteristic map 3 according to the position of the low-level characteristic map 3 in the medical image to be segmented to obtain a first weighted low-level characteristic map 3, and then the first weighted low-level characteristic map 3 is fused with the output of H1_4 to be used as the input of H1_ 3; and adopting an attention jump layer, giving corresponding weight to the low-layer characteristic diagram 2 according to the position of the low-layer characteristic diagram 2 in the medical image to be segmented to obtain a first weighted low-layer characteristic diagram 2, then fusing the output of the first weighted low-layer characteristic diagram 2 and the output of H1_3 to be used as the input of H1_2, and so on, and finally outputting the predicted liver region image by H1_ 1.

207. And the network equipment predicts the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation grid to obtain a predicted second pixel type, and segments the liver cancer region in the medical image to be segmented based on the predicted second pixel type. For example, as shown in fig. 9, the following may be specifically mentioned:

the network device gives corresponding weights to the plurality of low-layer characteristic graphs according to the positions of the low-layer characteristic graphs in the medical image to be segmented and the positions of the low-layer characteristic graphs in the liver region respectively through the attention jump layer to obtain a plurality of second weighted low-layer characteristic graphs, then predicts the type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation grid, for example, which pixel points belong to a liver cancer region and which pixel points do not belong to the liver cancer region in the medical image to be segmented can be predicted to obtain a second pixel type.

Then, the network device can screen pixel points which accord with the pixel type of the liver cancer region from the medical image to be segmented based on the predicted second pixel type to obtain a candidate pixel set corresponding to the liver cancer region, then determine boundary points of the liver cancer region in the medical image to be segmented according to the candidate pixel set corresponding to the liver cancer region, and segment the medical image to be segmented according to the boundary points to obtain the predicted liver cancer region.

For example, as shown in fig. 9, after the high-level feature map is processed in H2_5, an attention jump layer may be used to assign a corresponding weight to the low-level feature map 4 according to the position of the low-level feature map 4 in the medical image to be segmented to obtain a first weighted low-level feature map 4, and then assign a corresponding weight to the first weighted low-level feature map 4 according to the position of the low-level feature map 4 in the target region through another attention jump layer to obtain a second weighted low-level feature map 4, and then the outputs of the second weighted low-level feature map 4 and H2_5 are fused to be used as the input of H2_ 4; similarly, an attention jump layer may be adopted, a corresponding weight is given to the low-level characteristic map 3 according to the position of the low-level characteristic map 3 in the medical image to be segmented to obtain a first weighted low-level characteristic map 3, then, through another attention jump layer, a corresponding weight is given to the first weighted low-level characteristic map 3 according to the position of the low-level characteristic map 3 in the target region to obtain a second weighted low-level characteristic map 3, and then, the second weighted low-level characteristic map 3 is fused with the output of H2_4 to be used as the input of H2_ 3; and adopting an attention jump layer, endowing the low-layer characteristic diagram 2 with corresponding weight according to the position of the low-layer characteristic diagram 2 in the medical image to be segmented to obtain a first weighted low-layer characteristic diagram 2, endowing the first weighted low-layer characteristic diagram 2 with corresponding weight according to the position of the low-layer characteristic diagram 2 in the target region through another attention jump layer to obtain a second weighted low-layer characteristic diagram 2, fusing the second weighted low-layer characteristic diagram 2 with the output of H2_3 to be used as the input of H2_2, and so on, and finally outputting the predicted liver cancer region image through H2_ 1.

The execution order of

steps

206 and 207 may not be sequential.

Optionally, after obtaining the predicted liver region and liver cancer region, the liver region and liver cancer region may be output for a user, such as a medical staff, to browse for further judgment.

It should be noted that, if the medical image to be segmented is a slice, the liver region and the liver cancer region obtained at this time are the liver region and the liver cancer region in the slice, and therefore, the liver region and the liver cancer region of the slice belonging to the same medical image can also be integrated to obtain the liver region and the liver cancer region corresponding to the three-dimensional medical image, that is, the liver region and the liver cancer region in a three-dimensional format.

It should be noted that the present embodiment is only described by taking liver cancer as an example, and it should be understood that the method is also applicable to segmentation of medical images of other living body tissues, such as cervix, Pancreas (Pancreas), brain, and the like. For example, see fig. 12 and 13, where fig. 12 is an image segmentation of a pancreas, with the target region being a "pancreas" region and the target object being a "tumor" region of the pancreas; fig. 13 is an image segmentation of a brain cancer, which includes two target regions, one being a whole tumor region, the other being a tumor nucleus (the tumor nucleus region is located in the whole tumor region), and the target object being a tumor enhancement site region (the tumor enhancement site region is located in the tumor nucleus region).

It should be noted that, when the medical image segmentation model is used for medical image segmentation of other living body tissues, the training samples adopted by the image segmentation model need to be adjusted to the medical image samples of the corresponding living body tissues. In addition, the decoder portion (i.e. the region segmentation network and the object segmentation network) in the image segmentation model can also be adaptively adjusted according to specific requirements, for example, a plurality of region segmentation networks can be set.

As can be seen from the above, in this embodiment, after the medical image to be segmented is obtained, feature extraction may be performed on the medical image to be segmented through a residual network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, the local pyramid attention network is used to identify the high-level feature map to obtain local feature information, on the other hand, correlation between each pixel point in the high-level feature map and the high-level feature map is calculated through a global attention network to obtain global feature information, and then, the local feature information and the global feature information are fused, and a target region and a target object in the medical image to be segmented, such as a liver region and a liver cancer region, are segmented based on the fused high-level feature map and the plurality of low-level feature maps obtained by the segmentation network; according to the scheme, when the high-level features are processed, the global feature information (global context information) is also taken as one of the consideration factors while the local feature information is concerned, so that the dependency relationship between different positions from a local region to a global region can be better processed, and the feature expression capability of the high-level features is improved.

In addition, since the scheme is provided with a plurality of parallel segmentation network branches to select and independently predict the low-level features from a large target to a small target, the influence of organ identification errors on organ lesion area identification can be reduced while better processing the dependency relationship between organs and organ lesion areas, and the model has strong expansibility and adaptability. In addition, because the scheme can also give different weights to the low-layer characteristics at different positions through attention jump layers, the characteristics can be further better utilized, and therefore, the detail segmentation accuracy can be greatly improved. In a word, compared with the prior art, the scheme can greatly improve the accuracy of image detail segmentation and improve the overall segmentation effect of the medical image.

In order to better implement the above method, an embodiment of the present invention further provides a medical image segmentation apparatus, which may be integrated in a network device, such as a server or a terminal, and the terminal may include a tablet computer, a notebook computer, and/or a personal computer.

For example, as shown in fig. 14, the medical image segmentation apparatus may include an acquisition unit 301, an extraction unit 302, a recognition unit 303, a calculation unit 304, a fusion unit 305, and a segmentation unit 306, as follows:

(1) an acquisition unit 301;

an acquisition unit 301 for acquiring a medical image to be segmented.

For example, the living tissue is specifically subjected to image acquisition by various medical image acquisition devices, such as MRI, CT, colposcope, endoscope, or the like, and then provided to the acquisition unit 301, that is:

the obtaining unit 301 may be specifically configured to receive a medical image sent by a medical image acquisition device, and determine a medical image to be segmented according to the received medical image. For example, if the medical image is a two-dimensional image, the two-dimensional image may be determined as a medical image to be segmented, and if the medical image is a three-dimensional image, the three-dimensional image may be divided into a plurality of slices, where each slice is a medical image to be segmented.

(2) An extraction unit 302;

an extracting unit 302, configured to perform feature extraction on the medical image to be segmented through a residual error network, so as to obtain a high-level feature map and multiple low-level feature maps.

(3) An identification unit 303;

an identifying unit 303, configured to identify the high-level feature map by using different receptive fields to obtain local feature information;

optionally, a local pyramid attention network may be established, and different receptive fields are respectively set for the multilayer convolution layers of the local pyramid attention network, so that the local pyramid attention network may be used to process the high-level feature map to obtain local feature information; namely:

the identifying unit 303 may be specifically configured to identify the high-level feature map through the local pyramid attention network to obtain local feature information.

For example, the identifying unit 303 may be specifically configured to perform feature screening on the high-level feature map through the local pyramid attention network to obtain local feature maps of multiple scales, and perform weighted summation on the local feature maps of multiple scales to obtain local feature information.

The structure of the local pyramid attention network may be set according to the requirements of practical applications, which may be referred to in the foregoing method embodiments and will not be described herein again.

(4) A calculation unit 304;

a calculating unit 304, configured to calculate a correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information;

for example, a global attention network may be established, and convolutional layers with different channel sizes are set in the global attention network, so that the correlation between each pixel point in the high-level feature map and the high-level feature map may be calculated through the global attention network to obtain global feature information; namely:

the calculating unit 304 may be specifically configured to calculate, through the global attention network, a correlation between each pixel point in the high-level feature map and the high-level feature map, to obtain global feature information.

For example, taking the example that the convolutional layers in the global attention network include a first convolutional layer, a second convolutional layer and a third convolutional layer, then:

the calculating unit 304 may be specifically configured to calculate a correlation between each pixel point in the high-level feature map through the first convolution layer and the second convolution layer, and calculate a correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between each pixel point by using the third convolution layer, so as to obtain the global feature information.

For example, the calculation unit 304 may normalize the correlation between the pixel points to obtain an attention map corresponding to the high-level feature map, perform convolution processing on the high-level feature map through a third convolution layer to obtain an initial feature map, and then perform pixel-by-pixel multiplication on the attention map and the initial feature map to obtain global feature information, which is specifically referred to the foregoing method embodiment and is not described herein again.

(5) A fusion unit 305;

and a fusion unit 305, configured to fuse the local feature information and the global feature information to obtain a fused high-level feature map.

For example, the fusion unit 305 may be specifically configured to perform weighted summation on the high-level feature map, the local feature information corresponding to the high-level feature map, and the global feature information corresponding to the high-level feature map to obtain the fused high-level feature map.

(6) A dividing unit 306;

a segmentation unit 306, configured to segment the target region and the target object in the medical image to be segmented respectively based on the multiple low-level feature maps and the fused high-level feature map, so as to obtain a segmentation result.

For example, the segmentation unit 306 is specifically configured to predict a type of each pixel point in the medical image to be segmented according to the multiple low-level feature maps and the fused high-level feature map by using a trained image segmentation model, and segment a target region and a target object in the medical image to be segmented based on the predicted type to obtain a segmentation result.

The target region includes a target object, and both the target region and the target object may be set according to requirements of practical applications, for example, if the target object is a left atrium, the target region may be set as a heart, if the target object is a liver cancer, the target region is a liver, if the target object is a brain cancer, the target region is a brain, and the like. Optionally, the target area may also be set to be multiple, where one target area corresponds to an area including a certain area of the target object, for example, if the target object is a tumor enhancement region of a brain, a first target area may be set to be a whole tumor area of the brain, a second target area may be set to be a tumor nucleus of the brain, and so on, which may be specifically referred to the foregoing embodiments and will not be described herein again.

The structure of the trained image segmentation model may be determined according to the requirements of practical applications, for example, the trained image segmentation model may include a region segmentation network and an object segmentation network; each region segmentation network corresponds to a target region, and each object segmentation network corresponds to a target object; if a plurality of target objects need to be divided, a plurality of parallel region division objects may be provided, and if a plurality of target objects exist, the target objects need to be in a hierarchical inclusion relationship, for example, target object 1 includes target object 2, target object 2 includes target object 3, and so on.

Taking the example that the trained image segmentation model comprises a region segmentation network and an object segmentation network, then:

the segmentation unit 306 may be specifically configured to predict a type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic maps and the fused high-level characteristic map by using a region segmentation mesh to obtain a first pixel type, and segment a target region in the medical image to be segmented based on the predicted first pixel type; and predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation grid to obtain a second pixel type, and segmenting the target object in the medical image to be segmented based on the predicted second pixel type.

For example, the segmentation unit 306 may specifically assign corresponding weights to the plurality of low-layer characteristic maps according to positions of the low-layer characteristic maps in the medical image to be segmented by presetting the attention jump layer, to obtain a plurality of first weighted low-layer characteristic maps, and predict a type of each pixel point in the medical image to be segmented according to the plurality of first weighted low-layer characteristic maps and the fused high-layer characteristic map by using a region segmentation mesh to obtain a first pixel type; thereafter, a target region in the medical image to be segmented may be segmented based on the predicted first pixel type. Such as:

the segmentation unit 306 may filter, based on the predicted first pixel type, pixel points that meet the pixel type of the target region from the medical image to be segmented to obtain a candidate pixel set corresponding to the target region, determine boundary points of the target region in the medical image to be segmented according to the candidate pixel set corresponding to the target region, and segment, based on the boundary points of the target region in the medical image to be segmented, the medical image to be segmented to obtain the target region.

Similarly, the segmentation unit 306 may also assign corresponding weights to the plurality of low-layer characteristic maps according to the positions of the low-layer characteristic maps in the medical image to be segmented and the positions of the low-layer characteristic maps in the target region by presetting the attention jump layer, to obtain a plurality of second weighted low-layer characteristic maps, and predict the type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-layer characteristic maps and the fused high-layer characteristic map by using the object segmentation mesh to obtain the second pixel type. Thereafter, the target object in the medical image to be segmented may be segmented based on the predicted second pixel type. Such as:

the segmentation unit 306 may filter, based on the predicted second pixel type, pixel points that meet the pixel type of the target object from the medical image to be segmented to obtain a candidate pixel set corresponding to the target object, determine boundary points of the target object in the medical image to be segmented according to the candidate pixel set corresponding to the target object, and segment, based on the boundary points of the target object in the medical image to be segmented, the medical image to be segmented to obtain the target object.

The attention jump layer structure may be set according to the requirements of practical applications, which may specifically refer to the foregoing method embodiments and will not be described herein.

It should be noted that, if the medical image to be segmented is a slice, the target region and the target object obtained by the segmentation unit 306 are the target region and the target object in the slice, and therefore, optionally, the segmentation unit 306 may further integrate the target region and the target object of the slice belonging to the same three-dimensional medical image to obtain the target region and the target object corresponding to the three-dimensional medical image, that is, obtain the target region and the target object in a three-dimensional format.

In addition, it should be further noted that, in the embodiment of the present invention, the trained image segmentation model may include, in addition to the residual network and the segmentation network (e.g., the region segmentation network and the object segmentation network), a global-local attention module (e.g., the local pyramid attention network and the global attention network), an attention jump layer, and the like provided in the embodiment of the present invention, that is, the trained image segmentation model may include the residual network, the local pyramid attention network, the global attention network, the attention jump layer, and the segmentation network.

Optionally, the trained image segmentation model may be set by an operation and maintenance person in advance, or may be obtained by self-training of the image segmentation apparatus. That is, as shown in fig. 15, the medical image segmentation apparatus may further include an acquisition unit 307 and a training unit 308, as follows:

the acquisition unit 307 may be configured to acquire a medical image sample, the medical image sample being labeled with the target region and the target object.

For example, the acquiring unit 307 may acquire a plurality of medical images as an original data set, for example, acquire the original data set from a database or a network, pre-process the medical images in the original data set to obtain images meeting the input standard of a preset segmentation network, and label the target areas and the target objects on the pre-processed images to obtain a plurality of medical image samples labeled with the target areas and the target objects.

The preprocessing may include operations such as deduplication, cropping, rotation, and/or flipping.

A training unit 308, configured to separately segment the target region and the target object in the medical image sample by using an image segmentation model to obtain a predicted segmentation result, and converge the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using an interpolation loss function to obtain a trained image segmentation model.

For example, the training unit 308 may be specifically configured to perform feature extraction on a medical image sample through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps corresponding to the medical image sample, on one hand, identify the high-level feature map by using different receptive fields to obtain local feature information, on the other hand, calculate a correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information, and then fuse the local feature information and the global feature information to obtain a fused high-level feature map, and segment a target region and a target object in the medical image sample based on the plurality of low-level feature maps and the fused high-level feature map to obtain a predicted segmentation result; then, the image segmentation model can be converged according to the label of the medical image sample and the predicted segmentation result, so as to obtain the trained image segmentation model.

For example, a preset interpolation loss function may be adopted, and the image segmentation model is converged according to the label of the medical image sample and the predicted segmentation result to obtain a trained image segmentation model, which is specifically as follows:

the training unit 308 may be specifically configured to adjust parameters for pixel classification in the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using a cross entropy loss function, and adjust parameters for segmentation in the image segmentation model according to the label of the medical image sample and the predicted segmentation result by using an interpolation loss function, so as to obtain a trained image segmentation model.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in this embodiment, after the medical image to be segmented is obtained, the extracting unit 302 may perform feature extraction on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, the identifying unit 303 identifies the high-level feature map by using different receptive fields to obtain local feature information, on the other hand, the calculating unit 304 calculates the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information, then, the fusing unit 305 fuses the local feature information and the global feature information, and the segmenting unit 306 segments the target region and the target object in the medical image to be segmented based on the fused high-level feature map and the plurality of low-level feature maps; according to the scheme, when the high-level features are processed, the global feature information (global context information) is also taken as one of the consideration factors while the local feature information is concerned, so that the dependency relationship between different positions from a local region to a global region can be better processed, and the feature expression capability of the high-level features is improved.

Optionally, the segmentation unit 306 in the medical image segmentation apparatus may set a plurality of parallel segmentation network branches to select and independently predict low-level features from a large target to a small target, so that not only may the dependency relationship between the organ and the lesion region of the organ be better handled, but also the influence of the organ identification error on the identification of the lesion region of the organ be reduced, and further, the expansibility and adaptability of the model structure may be greatly improved. In addition, because the scheme can also give different weights to the low-layer characteristics at different positions through attention jump layers, the characteristics can be further better utilized, and therefore, the detail segmentation accuracy can be greatly improved. In a word, compared with the prior art, the scheme can greatly improve the accuracy of detail segmentation and improve the overall segmentation effect of the medical image.

An embodiment of the present invention further provides a server, as shown in fig. 16, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:

the server may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the server architecture shown in FIG. 16 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the server. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The server further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The server may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 401 in the server loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

acquiring a medical image to be segmented, performing feature extraction on the medical image to be segmented through a residual error network to obtain a high-level feature map and a plurality of low-level feature maps, on one hand, identifying the high-level feature map by adopting different receptive fields to obtain local feature information, on the other hand, calculating the correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information, then, fusing the local feature information and the global feature information to obtain a fused high-level feature map, and segmenting a target area and a target object in the medical image to be segmented respectively based on the plurality of low-level feature maps and the fused high-level feature map to obtain segmentation results.

For example, feature screening may be specifically performed on the high-level feature map through the local pyramid attention network to obtain local feature maps of multiple scales, and then the local feature maps of the multiple scales are subjected to weighted summation to obtain local feature information; calculating the correlation between each pixel point in the high-level feature map and the high-level feature map through the global attention network to obtain global feature information; then, carrying out weighted summation on the high-level feature map, local feature information corresponding to the high-level feature map and global feature information corresponding to the high-level feature map to obtain a fused high-level feature map, predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level feature maps and the fused high-level feature map by adopting a trained image segmentation model, and segmenting a target region and a target object in the medical image to be segmented based on the predicted type to obtain a segmentation result; wherein the target area contains the target object.

Optionally, the trained image segmentation model may be set by an operation and maintenance person in advance, or may be obtained by self-training of the image segmentation apparatus, that is, the processor 401 may also run an application program stored in the memory 402, so as to implement the following functions:

the method comprises the steps of collecting a medical image sample, marking a target area and a target object in the medical image sample, utilizing an image segmentation model to segment the target area and the target object in the medical image sample respectively to obtain a predicted segmentation result, and converging the image segmentation model according to the marking of the medical image sample and the predicted segmentation result to obtain a trained image segmentation model.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, after the network device of this embodiment acquires the medical image to be segmented, feature extraction may be performed on the medical image to be segmented through a residual network to obtain a high-level feature map and a plurality of low-level feature maps, then, on one hand, different receptive fields are used to identify the high-level feature map to obtain local feature information, on the other hand, correlation between each pixel point in the high-level feature map and the high-level feature map is calculated to obtain global feature information, and then, the local feature information and the global feature information are fused, and a target region and a target object in the medical image to be segmented are segmented based on the fused high-level feature map and the plurality of low-level feature maps obtained by fusion; according to the scheme, when the high-level features are processed, the global feature information (global context information) is also taken as one of the consideration factors while the local feature information is concerned, so that the dependency relationship from the local region to the global region can be better processed, the feature expression capability of the high-level features is improved, the accuracy of detail segmentation can be improved, and the overall segmentation effect of the medical image is improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the embodiment of the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the medical image segmentation methods provided by the embodiment of the present invention. For example, the instructions may perform the steps of:

Optionally, the trained image segmentation model may be set in advance by an operation and maintenance person, or may be obtained by self-training of the image segmentation apparatus, that is, the instruction may further perform the following steps:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any medical image segmentation method provided by the embodiment of the present invention, the beneficial effects that can be achieved by any medical image segmentation method provided by the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The medical image segmentation method, the medical image segmentation device and the storage medium provided by the embodiment of the invention are described in detail, a specific example is applied in the description to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A medical image segmentation method, comprising:

acquiring a medical image to be segmented;

respectively segmenting a target region and a target object in the medical image to be segmented based on the plurality of low-level characteristic graphs and the fused high-level characteristic graph to obtain a segmentation result;

the step of segmenting a target region and a target object in a medical image to be segmented respectively based on the plurality of low-level characteristic graphs and the fused high-level characteristic graph to obtain a segmentation result comprises the following steps:

predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level characteristic graphs and the fused high-level characteristic graph by adopting a trained image segmentation model;

segmenting a target region and a target object in a medical image to be segmented based on the predicted type to obtain a segmentation result;

wherein

The trained image segmentation model comprises a region segmentation network and an object segmentation network, the type of each pixel point in the medical image to be segmented is predicted according to the low-layer characteristic graphs and the fused high-layer characteristic graph by adopting the trained image segmentation model, and a target region and a target object in the medical image to be segmented are segmented based on the predicted type, and the method comprises the following steps:

predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting a region segmentation network to obtain a first pixel type, and segmenting a target region in the medical image to be segmented based on the predicted first pixel type; and

and predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation network to obtain a second pixel type, and segmenting the target object in the medical image to be segmented based on the predicted second pixel type.

2. The method according to claim 1, wherein the predicting, by using a region segmentation network, a type of each pixel point in the medical image to be segmented according to the plurality of low-level feature maps and the fused high-level feature map to obtain a first pixel type comprises:

respectively giving corresponding weights to the plurality of low-level characteristic graphs according to the positions of the low-level characteristic graphs in the medical image to be segmented by presetting the attention jump layers to obtain a plurality of first weighted low-level characteristic graphs;

and predicting the type of each pixel point in the medical image to be segmented according to the plurality of weighted low-level characteristic graphs and the fused high-level characteristic graph by adopting a region segmentation network to obtain a first pixel type.

3. The method of claim 1, wherein segmenting the target region in the medical image to be segmented based on the predicted first pixel type comprises:

based on the predicted first pixel type, screening pixel points which accord with the pixel type of a target region from the medical image to be segmented to obtain a candidate pixel set corresponding to the target region;

determining boundary points of the target area in the medical image to be segmented according to the candidate pixel set corresponding to the target area;

and segmenting the medical image to be segmented based on the boundary point of the target region in the medical image to be segmented to obtain the target region.

4. The method according to claim 1, wherein the predicting, by using the object segmentation network, a type of each pixel point in the medical image to be segmented according to the plurality of low-level feature maps and the fused high-level feature map to obtain a second pixel type comprises:

respectively giving corresponding weights to the plurality of low-level characteristic graphs according to the positions of the low-level characteristic graphs in the medical image to be segmented and the positions of the low-level characteristic graphs in the target area through a preset attention jump layer to obtain a plurality of second weighted low-level characteristic graphs;

and predicting the type of each pixel point in the medical image to be segmented according to the plurality of second weighted low-level characteristic graphs and the fused high-level characteristic graph by adopting an object segmentation network to obtain a second pixel type.

5. The method of claim 1, wherein segmenting the target object in the medical image to be segmented based on the predicted second pixel type comprises:

based on the predicted second pixel type, screening pixel points which accord with the pixel type of the target object from the medical image to be segmented to obtain a candidate pixel set corresponding to the target object;

determining boundary points of the target object in the medical image to be segmented according to the candidate pixel set corresponding to the target object;

and segmenting the medical image to be segmented based on the boundary point of the target object in the medical image to be segmented to obtain the target object.

6. The method according to any one of claims 1 to 5, wherein the trained image segmentation model further includes a local pyramid attention network, the local pyramid attention network includes a plurality of layers of convolutional layers with different receptive fields, and the identifying the high-level feature map with the different receptive fields to obtain local feature information includes:

and identifying the high-level feature map through the local pyramid attention network to obtain local feature information.

7. The method of claim 6, wherein the identifying the high-level feature map through the local pyramid attention network to obtain local feature information comprises:

performing feature screening on the high-level feature map through the local pyramid attention network to obtain local feature maps with multiple scales;

and carrying out weighted summation on the local feature maps of the multiple scales to obtain local feature information.

8. The method according to any one of claims 1 to 5, wherein the trained image segmentation model further includes a global attention network, the global attention network includes a plurality of layers of convolutional layers arranged in parallel and provided with different channel sizes, and the calculating a correlation between each pixel point in the high-level feature map and the high-level feature map to obtain global feature information includes:

and calculating the correlation between each pixel point in the high-level feature map and the high-level feature map through the global attention network to obtain global feature information.

9. The method of claim 8, wherein the convolutional layers in the global attention network comprise a first convolutional layer, a second convolutional layer and a third convolutional layer, and the calculating the correlation between each pixel point in the high-level feature map and the high-level feature map through the global attention network to obtain global feature information comprises:

calculating the correlation between each pixel point in the high-level feature map through the first convolution layer and the second convolution layer;

and calculating the correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between the pixel points by using a third convolution layer to obtain global feature information.

10. The method according to claim 9, wherein the calculating, by using a third convolutional layer, a correlation between each pixel point in the high-level feature map and the high-level feature map according to the correlation between the pixel points to obtain global feature information includes:

normalizing the correlation among the pixel points to obtain an attention map corresponding to the high-level feature map;

performing convolution processing on the high-level feature map through a third convolution layer to obtain an initial feature map;

and multiplying the attention map and the initial feature map pixel by pixel to obtain global feature information.

11. The method according to any one of claims 1 to 5, wherein before predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-level feature maps and the fused high-level feature map by using the trained image segmentation model, the method further comprises:

acquiring a medical image sample, wherein the medical image sample is marked with a target area and a target object;

respectively segmenting a target area and a target object in the medical image sample by using an image segmentation model to obtain a predicted segmentation result;

and converging the image segmentation model according to the label of the medical image sample and the predicted segmentation result through an interpolation loss function to obtain the trained image segmentation model.

12. The method according to claim 11, wherein converging the image segmentation model according to the labeling of the medical image sample and the predicted segmentation result by an interpolation loss function to obtain an image segmentation network comprises:

adjusting parameters for pixel classification in the image segmentation model based on the labeling of the medical image samples and the predicted segmentation results using a cross-entropy loss function, an

And adjusting parameters for segmentation in the image segmentation model according to the labeling of the medical image sample and the predicted segmentation result through an interpolation loss function to obtain the trained image segmentation model.

13. A medical image segmentation apparatus, characterized by comprising:

an acquisition unit for acquiring a medical image to be segmented;

the segmentation unit is used for predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting a region segmentation network to obtain a first pixel type, and segmenting a target region in the medical image to be segmented based on the predicted first pixel type; and predicting the type of each pixel point in the medical image to be segmented according to the plurality of low-layer characteristic graphs and the fused high-layer characteristic graph by adopting an object segmentation network to obtain a second pixel type, and segmenting the target object in the medical image to be segmented based on the predicted second pixel type to obtain a segmentation result.

14. A storage medium storing a plurality of instructions adapted to be loaded by a processor for performing the steps of the medical image segmentation method according to any one of claims 1 to 12.