CN114782686A

CN114782686A - Image segmentation method and device, terminal equipment and storage medium

Info

Publication number: CN114782686A
Application number: CN202210295676.2A
Authority: CN
Inventors: 郭菲; 马世强; 唐继军; 郗文辉
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-07-22
Also published as: WO2023179095A1

Abstract

The application provides an image segmentation method, an image segmentation device, terminal equipment and a storage medium, and relates to the technical field of image processing. The image segmentation method comprises the following steps: acquiring an image to be processed; inputting an image to be processed into a trained image segmentation model for processing, and outputting a segmentation image; the image segmentation model is of an encoder-decoder structure, a group integration module is arranged between the encoder and the decoder which correspond to each other, and the group integration module is used for extracting m compensation information with different scales from the input characteristics of the encoder and inputting the m compensation information with different scales into the decoder, wherein m is more than or equal to 2. The image segmentation method, the image segmentation device, the terminal equipment and the storage medium can solve the problem that the image segmentation result of the existing image segmentation method is inaccurate to a certain extent.

Description

Image segmentation method and device, terminal equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image segmentation method and apparatus, a terminal device, and a storage medium.

Background

An image segmentation technique is a technique of dividing an image into a number of specific regions, thereby extracting a specific type of object. Image segmentation technology has been widely applied to the fields of remote sensing, medicine and the like. Taking the medical field as an example, the region where an organ or a lesion is located in a medical image can be extracted through an image segmentation technology to assist a medical expert in making a corresponding medical diagnosis.

With the rapid development of the deep learning technology, an image segmentation method based on deep learning has become an important image segmentation technology. In the prior art, an image is generally segmented by adopting a U-Net network model, but the network model only adopts simple jump connection to fuse low-level feature information and high-level semantic information, so that the high-level semantic information is diluted, and the image segmentation result is inaccurate.

Disclosure of Invention

The embodiment of the application provides an image segmentation method, an image segmentation device, terminal equipment and a storage medium, which can solve the problem that the image segmentation result of the existing image segmentation method is inaccurate.

In a first aspect, an embodiment of the present application provides an image segmentation method, where the method includes: acquiring an image to be processed; inputting an image to be processed into the trained image segmentation model for processing, and outputting a segmented image; the image segmentation model is of an encoder-decoder structure, a group integration module is arranged between the encoder and the decoder which correspond to each other, the group integration module is used for extracting m compensation information with different scales from the first input characteristic of the encoder and inputting the m compensation information with different scales into the decoder, and m is larger than or equal to 2.

In one possible implementation, the group integration module comprises a first convolution layer and m-1 semantic compensation modules; the first convolution layer is used for extracting compensation information of a first scale from first input features of the encoder; the m-1 semantic compensation modules are used for extracting m-1 compensation information with different scales from the first scale from the compensation information with the first scale, wherein the output of the ith semantic compensation module in the m-1 semantic compensation modules is the input of the (i + 1) th semantic compensation module, and i is more than or equal to 1 and less than or equal to m-1.

In one possible implementation manner, the semantic compensation module includes a second convolutional layer, a plurality of third convolutional layers and a void convolutional layer, which are connected in sequence.

In one possible implementation, the decoder includes m groups of convolution modules;

inputting compensation information of m different scales into a decoder, comprising:

dividing the second input features of the decoder into m groups;

decoding a group of second input features and compensation information of one scale through each group convolution module in m groups of convolution modules to obtain m groups of decoding features;

and obtaining the output characteristic of the decoder according to the m groups of decoding characteristics.

In one possible implementation, the group convolution module includes an upsampling module and a first convolution module;

decoding a set of second input features and compensation information of a scale by each of m sets of convolution modules, comprising:

and performing upsampling processing on a group of second input features through an upsampling module in each group of convolution modules, and splicing the output of the upsampling module and compensation information of one scale to be used as the input of a first convolution module in each group of convolution modules.

In one possible implementation, the encoder includes a second convolution module, a downsampling module, and a compression and excitation module connected in sequence, and the second convolution module includes a batch normalization layer, an excitation layer, and a convolution layer connected in sequence.

In one possible implementation, the image to be processed is a medical image.

In a second aspect, an embodiment of the present application provides an image segmentation apparatus, including: an acquisition unit that acquires an image to be processed; the segmentation unit is used for inputting an image to be processed into a trained image segmentation model for processing and outputting a segmented image, the image segmentation model is of an encoder-decoder structure, a group integration module is arranged between an encoder and a decoder which correspond to each other, and the group integration module is used for extracting m compensation information with different scales from input characteristics of the encoder and inputting the m compensation information with different scales into the decoder, wherein m is more than or equal to 2.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to any one of the first aspect is implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method according to any one of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the method of any one of the above first aspects.

Based on the image segmentation method, the device, the terminal equipment and the storage medium provided by the application, a group integration module is arranged between an encoder and a decoder which correspond to each other, a plurality of compensation information with different scales are extracted from a first input feature of the encoder through the group integration module, and the plurality of compensation information with different scales are input into the corresponding decoder to be processed, the group integration module can increase the receptive field of a network layer, the difference between low-layer edge detail information in the encoder and high-level semantic feature information in the corresponding decoder is reduced through the plurality of compensation information with different scales, the high-level semantic feature information is prevented from being diluted due to the direct fusion of the low-layer edge detail information and the high-level semantic feature information, and therefore the segmentation precision of the model to be processed images is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic structural diagram of an image segmentation model according to an embodiment of the present application;

fig. 2 is another structural diagram of an image segmentation model according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an encoder according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a compression and excitation module according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a group integration module according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a semantic compensation module according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a group convolution module in a decoder according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. And the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "first," "second," "third," and the like are used solely to distinguish one from another as to indicate or imply relative importance.

The method comprises the steps that after an image to be processed is obtained, the image to be processed is input into an image segmentation model based on an encoder-decoder structure and processed, and then a segmented image of the image to be processed is obtained.

In the image segmentation model based on the encoder-decoder structure, the group integration module is arranged between the encoder and the decoder which correspond to each other, the compensation information of a plurality of different scales is extracted from the first input feature of the encoder through the group integration module and is input into the corresponding decoder, the field of experience of a network layer is increased, the difference between the low-layer edge detail information in the encoder and the high-level semantic feature information in the corresponding decoder is reduced through the compensation information of the plurality of different scales, the problem that the high-level semantic feature information is diluted due to the fact that the low-layer edge detail information and the high-level semantic feature information are directly spliced in the existing U-Net network model is solved, and therefore the segmentation accuracy of the model on an image to be processed is improved.

Fig. 1 is a schematic structural diagram of an image segmentation model according to an embodiment of the present application. The image segmentation model includes 2 convolutional layers, 3 encoders and 3 decoders corresponding to the 3 encoders one to one, and specifically includes: convolutional layer 1, encoder 2, encoder 3, decoder 2, decoder 1, and convolutional layer 2 connected in this order.

As shown in fig. 2, the convolutional layer 1 includes 32 convolutional kernels, and the convolutional layer 1 processes an input image to be processed, and increases the number of channels of the feature map, so as to obtain a feature map with the number of channels of 32 and a size of 176 × 144. The convolutional layer 2 includes 1 convolutional kernel, and the convolutional layer 2 processes the feature map having the size of 176 × 144 and the number of channels output from the decoder 1 is 32, thereby obtaining a segmented image having the size of 176 × 144.

As shown in fig. 3, each encoder includes a second convolution module, a down-sampling module, and a compress and Excitation (SE) module connected in sequence. The second convolution module is used for increasing the number of channels of the feature map, the down-sampling module is used for reducing the scale of the feature map, and the compression and excitation module is used for carrying out recalibration processing on the feature map of each channel.

The second convolution module comprises a Batch Normalization (BN) layer, an excitation layer and a convolution layer which are connected in sequence. Illustratively, the excitation layer may be a ReLU (rectifying linear unit) excitation layer.

As shown in FIG. 4, the compress and excite module includes a convolution operation F_tr(. theta.), Squeeze operation F_sq(. The) Excitation operation F_ex(. w) and weighting operation F_scal(. cndot.). Specifically, the input channel number is c _1 bits by convolution operationThe feature map is converted into a feature map with the channel number c _ 2. Feature compression is carried out on the feature map with the number of channels c _2 through an Squeeze operation, the feature map of each channel is converted into a real number, and therefore the feature map with the number of the feature channels c _2 is converted into a feature vector with the dimension of c _ 2. And finally, weighting the weight w and the feature diagram with the feature channel number of c _2 through weighting operation, thereby recalibrating the original features in the channel dimension.

In the embodiment of the application, a group integration module is arranged between the encoder and the decoder which correspond to each other, and the group integration module is used for extracting m compensation information with different scales from the input characteristics of the encoder and inputting the m compensation information with different scales into the decoder, wherein m is larger than or equal to 2. That is, as shown in fig. 1 and fig. 2, the group integration module 1 is configured to extract m compensation information of different scales from the input features of the encoder 1 and input the compensation information into the decoder 1, the group integration module 2 is configured to extract m compensation information of different scales from the input features of the encoder 2 and input the compensation information into the decoder 2, and the group integration module 3 is configured to extract m compensation information of different scales from the input features of the encoder 3 and input the compensation information into the decoder 3.

As shown in fig. 5, a pack integration Module (GE block) includes a first convolution layer and m-1 Semantic Compensation Modules (SCMs). The first convolution layer is used for extracting compensation information of a first scale from input features of the encoder, and the first convolution layer is a convolution kernel with the size of 1 x 1. The m-1 semantic compensation modules are used for extracting m-1 compensation information with different scales from the compensation information with the first scale, wherein the output of the ith semantic compensation module in the m-1 semantic compensation modules is the input of the (i + 1) th semantic compensation module, and i is more than or equal to 1 and less than or equal to m-1.

As shown in fig. 6, each semantic compensation module includes one second convolutional layer, at least two third convolutional layers, and a void (Atrous) convolutional layer, which are sequentially connected. The convolution kernels of the second convolution layer are 1 × 1, and the convolution kernels of each third convolution layer are 3 × 3. The hole convolution layer may increase the receptive field in order to eliminate the semantic gap between the low-level edge information of the encoder and the high-level semantic information of the decoder.

Illustratively, as shown in FIG. 5, m-1 semantic compensation modules in a group integration module include SCM-1 through SCM-m-1. The first convolution layer in the assembly module is used for extracting compensation information y of a first scale from the input features of the encoder₁. SCM _1 is used for compensating information y of first scale₁Is processed to obtain y₂. SCM _3 for compensating information y for the second dimension₂Processing to obtain compensation information y of a third scale₃And so on until SCM _ m-1 compensates the information y of m-1 scale_m-1Processing to obtain the compensation information y of the mth scale_m。

As shown in fig. 7, each decoder includes m sets of convolution modules. Each group convolution module includes an upsampling module and a first convolution module. And dividing the second input features input into the decoder into m groups, wherein the m groups of second input features correspond to the m groups of convolution modules one to one. And decoding a group of second input features and compensation information of one scale through each group convolution module in the m groups of convolution modules to obtain m groups of decoding features, and obtaining the output features of the decoder according to the m groups of decoding features. The second input characteristic of the decoder is the output of the network structure located at the layer above the decoder, and the network structure located at the layer above the decoder may be the last decoder in the image segmentation model or another decoder located at the layer above the decoder.

Specifically, for each group convolution module, an up-sampling module in the group convolution module performs up-sampling processing on a group of second input features, and the output of the up-sampling module and compensation information y of one scale are combined_iAnd splicing the data to be used as the input of a first convolution module in the group of convolution modules, wherein i is more than or equal to 1 and less than or equal to m. And for each decoder, splicing the outputs of the m groups of convolution modules to serve as the output of the decoder.

The up-sampling module comprises a convolution layer and an anti-convolution layer, wherein the convolution kernel of the convolution layer is 1 x 1. The first convolution module includes a Batch Normalization (BN) layer, an excitation layer, and a convolution layer connected in sequence. Illustratively, the excitation layer may be a ReLU (rectifying linear unit).

In one example, taking the encoder 3, the group integration module 3, and the decoder 3 as an example, assuming that m is 4, the first input feature of the encoder 3 is a feature map with a channel number of 128 and a size of 44 × 36, and the decoder includes 4 group convolution modules, i.e., group convolution module 1 to group convolution module 4. Inputting the first input characteristic of the encoder 3 into the group integration module 3 for processing to obtain 4 compensation information of different scales, namely y₁、y₂、y₃And y₄. Furthermore, the second convolution block, the down-sampling module and the compression and excitation module in the encoder 3 process the first input features in sequence to obtain output features as the second input features of the decoder 3, which include a feature map with 256 channels and 22 × 18 channels. The second input features are divided into 4 groups on the channels, and each group of second input features includes a feature map of 128/4 ═ 32 channels. Decoding a group of second input features and compensation information of a scale through each group convolution module in 4 groups of convolution modules to obtain 4 groups of decoding features, and splicing the 4 groups of decoding features to obtain the output of the decoder 3.

For example, for a first set of second input features, the upsampling module in the group convolution module 1 is used to decode the features of the potential space from the first set of second input features, and the output of the upsampling module is compared with the compensation information y of the first scale₁After being spliced on the channels, the signals are input into a first convolution module in a group convolution module 1, and are sequentially processed by a batch normalization layer, an excitation layer and a convolution layer in the first convolution module, and then characteristic diagrams with the number of output channels being 128/4-32 and the size being 44 x 36 are output. Similarly, for the second set of second input features, the upsampling module in the set of convolution module 2 is used to process the second set of second input features, and the output of the upsampling module and the compensation information y of the second scale are combined₂After being spliced on the channel, the signals are input into a first convolution module in a group convolution module 2 and sequentially pass through a batch normalization layer, an excitation layer and a convolution in the first convolution moduleThe number of output channels after the stacking process is 128/4-32, and the size is 44 × 36. By analogy, the 4 groups of convolution modules in the final decoder 3 each output a feature map with channel number 128/4-32 and size 44 × 36. The output features of 4 groups of convolution modules in the decoder 3 are spliced on channels to obtain a feature map with the channel number of 128 and the size of 44 × 36, which is used as a second input feature of the decoder 2.

It should be noted that the decoder in the present application replaces the convolutional blocks in the conventional decoder with the group convolutional. The m groups of convolution modules in each decoder are subnetworks with the same structure, each subnetwork is equivalent to one classifier, the number of parameters can be effectively reduced by utilizing group convolution, and the segmentation accuracy of the model can be improved according to the prediction diversity of a plurality of classifiers.

The image segmentation model provided by the application can be applied to the field of medical image segmentation, and can be used for extracting regions such as focuses and organs, for example, tasks such as brain tissue segmentation, brain tumor segmentation and lung nodule segmentation. The method can also be applied to any task needing to segment or extract the target area in the image to be processed.

Illustratively, taking a brain tumor segmentation task as an example, the training process and the effect of the image segmentation model provided by the present application are exemplarily described.

An initial model of image segmentation is first constructed. Aiming at a brain tumor segmentation task, a corresponding training set is acquired, the training set comprises a plurality of image sample pairs, and each image sample pair comprises a brain tumor image sample and a segmentation image sample corresponding to the brain tumor image sample. The initial image segmentation model can be iteratively trained by using the training set to minimize a loss function, so that a trained image segmentation model is obtained.

It is understood that the method for image segmentation using the trained image segmentation model and the method for training the image segmentation initial model may be executed by the same terminal device, or may be executed by different terminal devices. The terminal device can be not limited to various smart phones, portable notebooks, tablet computers, intelligent wearable devices, computers, robots and the like.

In order to verify the feasibility of the image segmentation model provided by the embodiment of the present application, the image segmentation model provided by the present application is applied to brain tumor segmentation. Firstly, the influence of the number of the group convolution modules of the decoder in the image segmentation model GEU-Net on the brain tumor segmentation result is verified through experiments, and the realization result is compared with the segmentation result of the traditional small U-Net network based on a plurality of different indexes.

TABLE 1

As shown in Table 1, the evaluation indexes include Dice similarity coeffient, Sensitivity, Specificity, Hausdorff 95, and Parameter. Wherein, the Dice similarity coeffient is used to measure the similarity between the network segmentation result and the standard segmentation result. Sensivity is used to measure the recognition ability of a model to a correct case. Specificity is used to measure the recognition ability of a model for negative examples. Hausdorff 95 is used to measure shape similarity. The larger the values of the Dice similarity coeffient, Sensitivity and Specificity are, the better the performance of the model is represented, and the smaller the Parameter value is, the smaller the Parameter quantity in the model is represented.

As can be seen from table 1, the number of the set of convolution modules in the decoder has a large influence on the brain tumor segmentation result. When the number of the group convolution modules is 4, the performance of the GEU-Net model provided by the application on the index Dice similarity coeffient is optimal. In practical application, the number of the convolution modules can be designed according to different segmentation tasks.

In addition, parameters with different quantities in the traditional U-Net model are reduced, and a small U-Net model and a medium U-Net model are obtained respectively. Based on the multiple evaluation indexes, the segmentation result of the image segmentation model of the application is compared with the brain tumor segmentation results of the traditional U-Net model, the small U-Net model, the medium U-Net model, the deep LabV3+ model, the PSP-Net (Pyramid Scene Parsing Network) model and the Attention U-Net model, and the experimental results are shown in Table 2. As can be seen from Table 2, the GEU-Net model provided by the application has stronger advantages in indexes of Dice similarity coefficient, Sensitivity and parameter quantity compared with other models.

TABLE 2

In summary, according to the image segmentation model provided in the embodiment of the present application, the grouping module is disposed between the encoder and the decoder corresponding to each other, the compensation information of a plurality of different scales is extracted from the first input feature of the encoder by the grouping module and is input into the corresponding decoder, the field of view of the network layer is increased, and the difference between the low-layer edge detail information in the encoder and the high-level semantic feature information in the corresponding decoder is reduced by the compensation information of the plurality of different scales. Secondly, the traditional single convolution block is replaced by a plurality of group convolution modules in the decoder, so that the decoder is similar to a plurality of integrated classifiers, the number of parameters can be greatly reduced, and the output combination of the classifiers can be integrated to improve the segmentation precision of the model. The image segmentation model provided by the embodiment of the application solves the problem that high-level semantic feature information is diluted due to the fact that low-level edge detail information and high-level semantic feature information are directly spliced in the existing U-Net network model, so that the segmentation precision of the model for the image to be processed is improved, and the problem that the precision of a traditional U-Net model segmentation result is low is solved to a certain extent.

Fig. 8 is a block diagram of an embodiment of an image segmentation apparatus according to an embodiment of the present application, which corresponds to an image segmentation method according to the foregoing embodiment. Referring to fig. 8, the image segmentation apparatus may include:

the acquisition unit 801 acquires an image to be processed.

The segmentation unit 802 is configured to input an image to be processed into a trained image segmentation model for processing, and output a segmented image, where the image segmentation model is of an encoder-decoder structure, and a group integration module is arranged between an encoder and a decoder that correspond to each other, and the group integration module is configured to extract m compensation information of different scales from input features of the encoder and input the m compensation information of different scales into the decoder, where m is greater than or equal to 2.

Optionally, the group integration module includes a first convolution layer and m-1 semantic compensation modules; the first convolution layer is used for extracting compensation information of a first scale from first input features of the encoder; the m-1 semantic compensation modules are used for extracting m-1 compensation information with different scales from the first scale from the compensation information with the first scale, wherein the output of the ith semantic compensation module in the m-1 semantic compensation modules is the input of the (i + 1) th semantic compensation module, and i is more than or equal to 1 and less than or equal to m-1.

Optionally, the semantic compensation module includes a second convolution layer, a plurality of third convolution layers, and a void convolution layer, which are connected in sequence.

Optionally, the decoder comprises m groups of convolution modules;

dividing the second input features of the decoder into m groups;

decoding a group of second input features and compensation information of one scale through each group convolution module in m groups of convolution modules to obtain m groups of decoding features; and obtaining the output characteristic of the decoder according to the m groups of decoding characteristics.

Optionally, the group convolution module includes an up-sampling module and a first convolution module;

and performing upsampling processing on a group of second input features through an upsampling module in each group of convolution modules, and splicing the output of the upsampling module with compensation information of one scale to be used as the input of a first convolution module in each group of convolution modules.

Optionally, the encoder includes a second convolution module, a down-sampling module, and a compression and excitation module, which are connected in sequence, and the second convolution module includes a batch normalization layer, an excitation layer, and a convolution layer, which are connected in sequence.

Optionally, the image to be processed is a medical image.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

Based on the same inventive concept, the embodiment of the application also provides the terminal equipment. As shown in fig. 9, the terminal apparatus 900 of this embodiment includes: a processor 901, a memory 902, and a computer program 904 stored in the memory 902 and executable on the processor 901. The computer program 904 may be executed by the processor 901 to generate the instructions 903, and the processor 901 may implement the steps in the above embodiments of the image color optimization method according to the instructions 903. Alternatively, the processor 901 implements the functions of each module/unit in each apparatus embodiment described above, for example, the functions of the unit 801 and the unit 802 shown in fig. 8, when executing the computer program 904.

Illustratively, the computer program 904 may be divided into one or more modules/units, which are stored in the memory 902 and executed by the processor 901 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of computer program 904 in terminal device 900.

Those skilled in the art will appreciate that fig. 9 is merely an example of a terminal device 900 and does not constitute a limitation of terminal device 900 and may include more or fewer components than shown, or some of the components may be combined, or different components, e.g., terminal device 900 may also include input-output devices, network access devices, buses, etc.

The Processor 901 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 902 may be an internal storage unit of the terminal device 900, such as a hard disk or a memory of the terminal device 900. The memory 902 may also be an external storage device of the terminal device 900, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc., provided on the terminal device 900. Further, the memory 902 may also include both internal storage units and external storage devices of the terminal device 900. The memory 902 is used for storing computer programs and other programs and data required by the terminal device 900. The memory 902 may also be used to temporarily store data that has been output or is to be output.

The terminal device provided in this embodiment may execute the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method described in the above method embodiments.

The embodiment of the present application further provides a computer program product, which when running on a terminal device, enables the terminal device to implement the method described in the above method embodiment when executed.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.

Reference throughout this application to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.

In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.

In addition, in the present application, unless explicitly stated or limited otherwise, the terms "connected" and the like are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection; the terms may be directly connected or indirectly connected through an intermediate medium, and may be used for communicating the inside of two elements or for interacting the two elements, unless otherwise specifically defined, and the specific meaning of the terms in the present application may be understood by those skilled in the art according to the specific circumstances.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An image segmentation method, characterized in that the method comprises:

acquiring an image to be processed;

inputting the image to be processed into a trained image segmentation model for processing, and outputting a segmented image;

the image segmentation model is of an encoder-decoder structure, a group integration module is arranged between the encoder and the decoder which correspond to each other, the group integration module is used for extracting m compensation information with different scales from first input features of the encoder and inputting the m compensation information with different scales into the decoder, and m is larger than or equal to 2.

2. The image segmentation method according to claim 1, wherein the set of integration modules includes a first convolution layer and m-1 semantic compensation modules;

the first convolution layer is used for extracting compensation information of a first scale from first input features of the encoder;

the m-1 semantic compensation modules are used for extracting m-1 compensation information with different scales from the first scale compensation information, wherein the output of the ith semantic compensation module in the m-1 semantic compensation modules is the input of the (i + 1) th semantic compensation module, and i is more than or equal to 1 and less than or equal to m-1.

3. The image segmentation method according to claim 2, wherein the semantic compensation module includes a second convolutional layer, a plurality of third convolutional layers, and a void convolutional layer, which are connected in sequence.

4. The image segmentation method according to claim 1, characterized in that the decoder comprises m groups of convolution modules;

the inputting the compensation information of the m different scales into the decoder comprises:

dividing second input features of the decoder into m groups;

decoding a group of second input features and the compensation information of one scale through each group of convolution modules in the m groups of convolution modules to obtain m groups of decoding features;

and obtaining the output characteristics of the decoder according to the m groups of decoding characteristics.

5. The image segmentation method of claim 4, wherein the set of convolution modules includes an upsampling module and a first convolution module;

said decoding a set of said second input features and said compensation information for a scale by each of said m sets of convolution modules comprises:

and performing upsampling processing on a group of second input features through an upsampling module in each group of convolution modules, and splicing the output of the upsampling module and the compensation information of one scale to be used as the input of a first convolution module in the group of convolution modules.

6. The image segmentation method according to any one of claims 1 to 5, wherein the encoder comprises a second convolution module, a downsampling module, and a compression and excitation module connected in sequence, and the second convolution module comprises a batch normalization layer, an excitation layer, and a convolution layer connected in sequence.

7. The image segmentation method according to claim 6, characterized in that the image to be processed is a medical image.

8. An image segmentation apparatus, comprising:

an acquisition unit that acquires an image to be processed;

and the segmentation unit is used for inputting the image to be processed into a trained image segmentation model for processing and outputting a segmented image, the image segmentation model is of an encoder-decoder structure, a group integration module is arranged between an encoder and a decoder which correspond to each other, the group integration module is used for extracting m compensation information with different scales from the input characteristics of the encoder and inputting the m compensation information with different scales into the decoder, and m is more than or equal to 2.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.