CN111028237B

CN111028237B - Image segmentation method and device and terminal equipment

Info

Publication number: CN111028237B
Application number: CN201911172254.0A
Authority: CN
Inventors: 司伟鑫; 李才子; 王琼; 王平安
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-06-06
Anticipated expiration: 2039-11-26
Also published as: WO2021104060A1; CN111028237A

Abstract

The application is applicable to the technical field of image processing, and provides an image segmentation method, an image segmentation device and terminal equipment, wherein the image segmentation method comprises the following steps: acquiring a target to-be-segmented image, and generating a preset number of first to-be-segmented images based on the target to-be-segmented image; wherein the resolution of each first image to be segmented is different; carrying out convolution processing on each first image to be segmented through a plurality of convolution layers; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer; and carrying out image segmentation according to the output result of the convolution processing. The method and the device can improve the accuracy of image segmentation.

Description

Image segmentation method and device and terminal equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an image segmentation method, an image segmentation device and terminal equipment.

Background

More research has been done on neural network systems in image segmentation, but in most cases humans can easily extract different information of an image on a range of spatial scales, resulting in image details and features including small to large areas, which is a relatively challenging task for computer equipment. In addition, the neural network training requires a large number of parameters to participate in operation, and the process is complicated, so that the cost of image segmentation by using the neural network is high, and the accuracy is poor.

Disclosure of Invention

In order to overcome at least one problem in the related art, embodiments of the present application provide an image segmentation method, an image segmentation device, and a terminal device.

The application is realized by the following technical scheme:

in a first aspect, an embodiment of the present application provides an image segmentation method, including:

acquiring a target to-be-segmented image, and generating a preset number of first to-be-segmented images based on the target to-be-segmented image; wherein the resolution of each first image to be segmented is different;

carrying out convolution processing on each first image to be segmented through a plurality of convolution layers; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer;

and carrying out image segmentation according to the output result of the convolution processing.

In a possible implementation manner of the first aspect, the convolving each of the first images to be segmented with multiple convolution layers includes:

correspondingly inputting each first image to be segmented into each convolution layer in the first layer of convolution layers;

for each convolution layer in the first convolution layer, respectively carrying out convolution processing on the first image to be segmented, and carrying out splicing fusion on the current feature map and the previous feature map after upsampling;

And for each convolution layer in other layers except the first layer of convolution layer, respectively carrying out convolution processing on the output results of the adjacent convolution layers in the previous layer of convolution layer, and carrying out splicing fusion on the current feature map after upsampling and the previous feature map.

In a possible implementation manner of the first aspect, the multi-layer convolution layer includes a plurality of scale convolution layers, each scale convolution layer includes a plurality of convolution layers, and the number of convolution layers of each scale convolution layer is different;

the inputting each image to be segmented into each convolution layer in the first layer of convolution layers correspondingly comprises the following steps:

and inputting each image to be segmented into the corresponding convolution layer according to the corresponding relation between the resolution ratio from high to low and the number of the convolution layers contained in each scale from large to small.

In one possible implementation manner of the first aspect, the number of convolution layers from the first layer of convolution layer to the i layer of convolution layer is the same, and the number of convolution layers from the i+1 layer of convolution layer to the n layer of convolution layer is sequentially reduced, where the number of convolution layers from the i layer of convolution layer is greater than the number of convolution layers from the i+1 layer of convolution layer, and n is the number of layers of the multi-layer convolution layer.

In a possible implementation manner of the first aspect, each convolution layer of the current layer convolution layer is connected to a convolution layer in a previous layer convolution layer, and is:

the convolution layer of the current scale of the convolution layer of the current layer is respectively connected with the convolution layer of the current scale and the convolution layer of the scale adjacent to the current scale in the previous convolution layer.

In a possible implementation manner of the first aspect, the generating a preset number of first images to be segmented based on the target images to be segmented includes:

compressing the target image to be segmented to generate a plurality of second images to be segmented with different resolutions;

and taking the plurality of second images to be segmented and the target images to be segmented as the first images to be segmented of the preset number.

In a possible implementation manner of the first aspect, the compressing the target image to be segmented to generate a plurality of second images to be segmented with different resolutions includes:

and compressing the target image to be segmented by a bilinear interpolation method to generate a plurality of second images to be segmented with different resolutions.

In a second aspect, an embodiment of the present application provides an image segmentation apparatus, including:

The image conversion module is used for acquiring target images to be segmented and generating a preset number of first images to be segmented based on the target images to be segmented; wherein the resolution of each first image to be segmented is different;

the convolution processing module is used for carrying out convolution processing on each first image to be segmented through a plurality of layers of convolution layers; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer;

and the segmentation module is used for carrying out image segmentation according to the output result of the convolution processing.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the image segmentation method according to any one of the first aspects when the processor executes the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the image segmentation method according to any one of the first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the image segmentation method according to any one of the first aspects above.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

according to the method, the device and the system, the target to-be-segmented image is obtained, a plurality of first to-be-segmented images with different resolutions are generated based on the target to-be-segmented image, convolution processing is conducted on each first to-be-segmented image through the multi-layer convolution layers, and image segmentation is conducted according to the output result of the convolution processing.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application environment of an image segmentation method according to an embodiment of the present application;

FIG. 2 is a flow chart of an image segmentation method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of an image segmentation method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a parallel mechanism between convolutional layers of a neural network according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a serial mechanism between convolutional layers of a neural network according to one embodiment of the present application;

FIG. 6 is a schematic diagram of a multi-layer convolution layer provided by an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer to which the image segmentation method provided in the embodiment of the present application is applied.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Improving the multiscale expression capability of the neural network is an important way for improving the multi-tissue segmentation of the cardiac MRI image. Currently, in the field of computer vision, image pyramids are widely used in computer vision tasks in a variety of forms and methods. Although many studies have been made on neural network systems in image segmentation, in many cases, humans can easily extract different information of an image on a series of spatial scales, resulting in image details and features including small to large areas, which is a relatively challenging task for computer equipment. In addition, the neural network training requires a large number of parameters to participate in operation, and the process is complicated, so that the cost of image segmentation by using the neural network is high, and the accuracy is poor.

Based on the above problems, the embodiment of the application provides an image segmentation method, an image segmentation device and terminal equipment, which can connect feature graphs with different scales in a neural network in series, so that feature information with different scales can be continuously and fully fused, and feature information with different resolutions in a pyramid can be fused to enhance semantic information of a smaller receptive field convolution layer in the pyramid, thereby being more beneficial to prediction under high-resolution features and improving the fineness of segmentation results.

Specifically, a target to-be-segmented image is obtained, a preset number of first to-be-segmented images with different resolutions are generated based on the target to-be-segmented image, and then convolution processing is carried out on each first to-be-segmented image through a plurality of layers of convolution layers, wherein each layer of convolution layers in the plurality of layers of convolution layers comprises a plurality of convolution layers which are connected in series, each convolution layer of the current layer of convolution layers is respectively connected with the convolution layer in the previous layer of convolution layer, and image segmentation is carried out according to the output result of the convolution processing, so that feature images with different scales can be connected in series, and feature information with different scales can be continuously and fully fused.

For example, the embodiment of the present application may be applied to an exemplary scenario as shown in fig. 1, in which a magnetic resonance scanning apparatus 10 scans a certain part of a human body, obtains a scanned image of the part, segments the scanned image, for example, may be a heart image, and sends the scanned image to an image segmentation apparatus 20. After the image segmentation apparatus 20 acquires the scanned image, the scanned image is used as a target image to be segmented, a plurality of first images to be segmented with different resolutions are generated based on the scanned image, and convolution processing is performed on each first image to be segmented through a plurality of convolution layers of a neural network; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer; and finally, image segmentation is carried out according to the output result of the convolution processing.

It should be noted that, the application scenario is not limited to the application scenario when the embodiments of the present application are implemented, and in fact, the embodiments of the present application may be applied to other application scenarios. For example, in other exemplary application scenarios, the medical staff may pick the target image to be segmented and send the target image to the image segmentation device.

In order to better understand the solution of the present invention by those skilled in the art, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to fig. 1, and it is obvious that the described embodiment is only a part of embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 2 is a schematic flowchart of an image segmentation method according to an embodiment of the present application, and referring to fig. 1, the image segmentation method is described in detail as follows:

in step 101, a target image to be segmented is acquired, and a preset number of first images to be segmented are generated based on the target image to be segmented.

Wherein the resolution of each first image to be segmented is different.

Referring to fig. 3, in some embodiments, the generating a preset number of first images to be segmented based on the target images to be segmented may include following disadvantages:

in step 1011, the target image to be segmented is compressed to generate a plurality of second images to be segmented with different resolutions.

The target image to be segmented can be compressed into an image with a corresponding proportion according to a plurality of preset proportions, and a plurality of second images to be segmented with different resolutions are obtained.

For example, the number of the second images to be segmented may be determined according to the scale of the neural network convolutional layer. For example, if the scale of the neural network convolution layer is 5, the target image to be segmented may be compressed into 5 second images to be segmented, where the compression ratio may be a value between (0, 1).

In step 1012, the plurality of second images to be segmented and the target image to be segmented are used as the first images to be segmented of the preset number.

For example, the target image to be segmented may be compressed by bilinear interpolation to generate a plurality of second images to be segmented with different resolutions. For example, under the condition that the scale of the neural network convolution layer is 5, the input target to-be-segmented image can be scaled to 1/2, 1/4, 1/8 and 1/16 respectively by a bilinear interpolation method, and 4 images generated by scaling are added with the target to-be-segmented image to form 5 second to-be-segmented images.

Of course, in other embodiments, instead of the target image to be segmented itself, the second image to be segmented generated by the compression ratio between (0, 1) may be used, which is not limited in the embodiments of the present application.

In step 102, convolution processing is performed on each of the first images to be segmented through a plurality of convolution layers.

Each of the multiple convolution layers comprises a plurality of serially connected convolution layers, and each convolution layer of the current convolution layer is connected with the convolution layer of the previous convolution layer. For example, a convolution layer of a current scale of a current layer convolution layer may be connected to both the convolution layer of the current scale and a convolution layer of a scale adjacent to the current scale in a previous layer convolution layer, respectively.

In order to facilitate understanding of the above image segmentation method, a network structure used in the embodiments of the present application will be described.

1. Parallel multi-scale cross fusion pyramid

The traditional U-Net structure extracts the characteristics of different levels of images by successively carrying out convolution and pooling on the input images, sequentially restores the deep characteristic images with strong semantics to the original image size through continuous deconvolution operation, plays a very large role in enhancing the characteristic expression capacity of the convolution layers on a restoration path through cross-connection operation in the size restoring process, enables the convolution layers with good contour characteristics on a contraction path and the strong semantic convolution layers on an expansion path to be mutually fused, and finally completes the pixel-by-pixel classification and identification. However, the U-Net structure has a disadvantage in segmenting cardiac MRI images. The cross-connection mechanism of the U-Net is only aimed at between convolution layers with the same scale, so that the feature fusion capability is insufficient, and targets of the left ventricle, the right ventricle and the cardiac muscle are usually smaller at the basal end and the top end of a segmentation target, especially at the top end part, so that the segmentation capability of the U-Net at the positions has obvious defects. In addition, in the process from the high-resolution convolution layer to the low-resolution convolution layer, the number of the characteristic graphs is increased by 2 times, so that the parameter quantity of the model is large, and a certain burden is brought to calculation resources.

In order to avoid the defect of the shrinkage and expansion structural design of the U-Net in the aspect of multi-scale information fusion and to reduce the parameter number of a segmentation model, the multi-tissue segmentation capability of an image, such as a heart MRI image, is further improved, and a parallel cross-scale neural network structure is provided in the embodiment of the application.

The core of the parallel cross-scale neural network structure is the mutual fusion of the characteristics of the convolution layers on each scale, and the multi-scale information exchange among the convolution layers of the neural network is enhanced. A parallel multiscale fusion unit is shown in equation (1):

in a parallel multiscale fusion unit, the output convolution layer is assumed to be

Indicating the (i+1) th layer and the (n) th convolution layers, the input convolution layers being +.>

An nth convolution layer and adjacent two scale convolution layers respectively representing an ith layer,/->

Representing a stitching operation, function F () represents a set of operations, which may include convolution, batch normalization, and activation, for example. The three input convolution layers contain features of different scales, and the large scale feature map is downsampled to the output convolution layer +.>

Is up-sampled to the output convolution layer>

The two sampled convolution layers are spliced with the input convolution layer with the same output scale by taking the channel as an axis, 3*3 convolution, batch normalization layer and ReLU activation operation are carried out on the spliced convolution layer, nearest neighbor interpolation and maximum pooling are respectively adopted for up-sampling and down-sampling operation, and after multi-scale feature fusion is carried out, the output convolution layer is equivalent to extracting feature information with different scales. It should be noted that when- >

Without the larger scale adjacent convolution layers or the smaller scale adjacent convolution layers, there are only two input convolution layers.

A parallel multi-scale cross pyramid unit comprises an input characteristic pyramid P _i And output feature pyramidP _i+1 FIG. 4 shows P _i And P _i+1 The connection relation between the two pyramids has a parallel cross scale fusion unit on each scale, which is equivalent to the capability of the convolution layer on each scale to accept information from the corresponding scale convolution layer of the input pyramid and all adjacent layers thereof for the output pyramid.

In this embodiment, the advantages of the parallel multi-scale fusion pyramid for semantic segmentation include three points:

1) Compared with the traditional coding and decoding structure, the parallel cross-scale fusion pyramid can greatly enhance the multi-scale feature fusion capability from the shallow feature layer to the deep feature layer;

2) The continuous application of the feature pyramid can exponentially increase the receptive field of the neural network, and is more beneficial to classification at the pixel level;

3) The convolution layer with the same resolution can directly interact, and information loss caused by pooling resampling can be reduced.

2. Serial multi-scale fusion pyramid

The parallel cross-scale fusion pyramid can integrate semantic features of different scales due to effective communication capability among feature graphs of different scales. However, for the multi-scale fusion shown in fig. 4, although there is multi-scale fusion between different pyramids, there is not enough multi-scale fusion in the pyramids, in order to further fully improve the capability of multi-scale information exchange, in the embodiment of the present application, on the basis of fig. 4, a multi-scale fusion pyramid with a serial mechanism is further constructed, for example, a thick solid arrow in fig. 5 is a schematic diagram of a serial mechanism, and in the interior of one pyramid, the low-resolution feature map is subjected to up-sampling and high-resolution feature map one by one to perform stitching fusion, which is equivalent to transferring semantic features contained in the low-resolution feature map to high-resolution features, and further improving the semantic information of the high-resolution features.

Based on the ideas of the parallel multi-scale cross fusion pyramid and the serial multi-scale fusion pyramid, a series-parallel multi-scale fusion pyramid network is constructed in the embodiment of the application, and the network structure is shown in fig. 6.

It should be noted that only the convolution layers are shown in fig. 6, and each convolution layer is followed by the application of a batch normalization layer and a ReLU activation layer, which are implicit in the convolution operation for the sake of brevity.

As shown in fig. 6, the multi-layer convolution layers may include n convolution layers, specifically, layer 1 convolution layer, layer 2 convolution layer, … …, layer i convolution layer, layer i+1 convolution layer, … …, layer m convolution layer.

The number of the convolution layers from the 1 st layer of the multi-layer convolution layers to the i st layer of the multi-layer convolution layers is the same, the number of the convolution layers from the i+1 th layer of the multi-layer convolution layers to the n th layer of the multi-layer convolution layers is sequentially reduced, wherein the number of the convolution layers from the i th layer of the multi-layer convolution layers is larger than the number of the convolution layers from the i+1 th layer of the multi-layer convolution layers, and n is the number of the multi-layer convolution layers.

The multi-layer convolution layer may comprise a plurality of scale convolution layers, each scale convolution layer comprises a plurality of convolution layers, and the number of the convolution layers of each scale convolution layer is different. For example, in fig. 6, the multi-layer convolution layer is formed by 5 scale convolution layers, and each row of convolution layers corresponds to the same scale, which may be sequentially from top to bottom, scale 1, scale 2, scale 3, scale 4, and scale 5, which are of course not limited to 5 scales, and are merely illustrative. The number of convolution layers corresponding to the scale 1 is the largest, and the number of convolution layers corresponding to the scale 2, the scale 3, the scale 4 and the scale 5 is reduced accordingly.

The multi-layer convolution layer is composed of convolution layers with 5 scales, each row of convolution layers corresponds to the same scale, and the scale 1, the scale 2, the scale 3, the scale 4 and the scale 5 can be sequentially arranged from top to bottom. Of course, other implementations are not limited to 5 dimensions, and are merely illustrative.

In particular, the multi-layer convolution layer may be divided into a cascade pyramid-type convolution layer and a shrink pyramid-type convolution layer. For the cascade pyramid type convolution layers, the cascade pyramid type convolution layers are formed by the 1 st layer of convolution layers to the i th layer of convolution layers, all the convolution layers of each layer of convolution layers are connected in series, and the convolution layer corresponding to the scale n in the k th layer of convolution layers is connected with the convolution layer corresponding to the scale n, the scale n+1 and the scale n-1 in the k-1 th layer of convolution layers, wherein k is more than or equal to 2, and n is more than or equal to 2 and less than or equal to 4. For the convolution layer corresponding to the scale 1 in the k-1 convolution layer, connecting with the convolution layers corresponding to the scale 1 and the scale 2 in the k-1 convolution layer; and for the convolution layers corresponding to the scale 5 in the k-th convolution layer, connecting the convolution layers corresponding to the scale 4 and the scale 5 in the k-1 th convolution layer.

For the shrink pyramid type convolution layers, the number of convolution layers corresponding to the i+1th layer convolution layer to the m layer convolution layer is sequentially reduced and is less than the number 5 of convolution layers in each layer of convolution layers of the cascade pyramid type. For example, the i+1st layer convolution layer includes 4 convolution layers, and the m layer convolution layer includes 1 convolution layer. It should be noted that, in fig. 6, the 4-layer convolution layer is only an example of the shrink pyramid convolution layer, but the present invention is not limited thereto, and may be specifically set according to actual needs.

For the contracted pyramid type convolution layers, the convolution layers of each layer are connected in series. For the ith layer of convolution layer +1, the scale n convolution layer is connected with the convolution layers corresponding to the scale n, the scale n +1 and the scale n-1 in the ith layer of convolution layer, wherein n is more than or equal to 2 and less than or equal to 3; the convolution layers corresponding to the scale 1 are connected with the convolution layers corresponding to the scale 1 and the scale 2 in the convolution layer of the ith layer; and for the convolution layers corresponding to the scale 4 in the (i+1) th convolution layer, connecting the convolution layers corresponding to the scale 3 and the scale 4 in the (i) th convolution layer.

The structure of the other convolution layers of the shrink pyramid type convolution layer may be seen in fig. 6 and will not be described in detail herein.

And each layer of convolution layer discards one convolution layer with the lowest resolution until the last output layer (the mth layer of convolution layer) has only one convolution layer with the highest resolution, and the shrinkage pyramid gradually reduces the convolution layers with the low resolution, so that the redundancy convolution layers are discarded, and the completeness of the characteristic information of the output layer is ensured.

The series-parallel multi-scale fusion pyramid network is a grid type semantic segmentation network consisting of convolution layers with different resolutions, parallel connection and serial connection of pyramids form a horizontal path and a vertical path for information transmission of the grid type segmentation network, all semantic features are fused to the convolution layers with high resolution, and the continuous high-resolution convolution layers ensure that low-level profile features and high-level semantic features are not lost due to pooling in forward computation of convolution, and the whole network forms a relatively complete mapping from an input layer to an output layer.

Referring to fig. 3, in some embodiments, step 102 may be implemented by:

in step 1021, each first image to be segmented is input to each of the first layer convolution layers.

For example, referring to fig. 6, for the first image to be segmented, the input image may be scaled to 1/2, 1/4, 1/8, and 1/16 by bilinear interpolation, and then the first image to be segmented with five resolutions is formed with the input image, so as to form an input pyramid. As shown in fig. 6, the first image to be segmented 1, the first image to be segmented 2, the first image to be segmented 3, the first image to be segmented 4, and the first image to be segmented 5 are sequentially in the order from top to bottom, and the resolutions of the respective images are reduced accordingly.

In step 1022, for each convolution layer in the first layer of convolution layers, convolution processing is performed on the first image to be segmented, and the current feature map is up-sampled and then is merged with the previous feature map.

The first to-be-segmented image 1 may be input to a convolution layer corresponding to a scale 1 in a 1 st layer convolution layer, the first to-be-segmented image 2 may be input to a convolution layer corresponding to a scale 2 in the 1 st layer convolution layer, the first to-be-segmented image 3 may be input to a convolution layer corresponding to a scale 3 in the 1 st layer convolution layer, the first to-be-segmented image 4 may be input to a convolution layer corresponding to a scale 4 in the 1 st layer convolution layer, and the first to-be-segmented image 5 may be input to a convolution layer corresponding to a scale 5 in the 1 st layer convolution layer.

Specifically, the convolution layer corresponding to the scale 1 in the 1 st layer convolution layer performs convolution processing on the first image to be segmented 1, the convolution layer corresponding to the scale 2 in the 1 st layer convolution layer performs convolution processing on the first image to be segmented 2, the convolution layer corresponding to the scale 3 in the 1 st layer convolution layer performs convolution processing on the first image to be segmented 3, the convolution layer corresponding to the scale 4 in the 1 st layer convolution layer performs convolution processing on the first image to be segmented 4, and the convolution layer corresponding to the scale 5 in the 1 st layer convolution layer performs convolution processing on the first image to be segmented 5.

In addition, the low-resolution feature images in the scale 5 convolution layer are up-sampled and then sent to the scale 4 convolution layer, and are spliced and fused with the high-resolution feature values in the scale 4 convolution layer; the low-resolution feature map and the high-resolution feature map are relatively speaking, and specific values of the resolution are not limited.

Up-sampling the low-resolution feature map in the scale 4 convolution layer, and then sending the up-sampled low-resolution feature map to the scale 3 convolution layer, and splicing and fusing the up-sampled low-resolution feature map with the high-resolution feature map in the scale 3 convolution layer; the low-resolution feature map and the high-resolution feature map are relatively speaking, and specific values of the resolution are not limited.

Up-sampling the low-resolution feature map in the scale 3 convolution layer, and then sending the up-sampled low-resolution feature map to the scale 2 convolution layer, and splicing and fusing the up-sampled low-resolution feature map with the high-resolution feature map in the scale 2 convolution layer; the low-resolution feature map and the high-resolution feature map are relatively speaking, and specific values of the resolution are not limited.

And upsampling the low-resolution feature map in the scale 2 convolution layer, and then sending the upsampled low-resolution feature map to the scale 1 convolution layer, and splicing and fusing the upsampled low-resolution feature map with the high-resolution feature map in the scale 1 convolution layer. The low-resolution feature map and the high-resolution feature map are relatively speaking, and specific values of the resolution are not limited.

In this embodiment, the above-mentioned up-sampling of the current feature map and the splicing and fusion of the previous feature map may specifically be: and after up-sampling the characteristic information extracted by the current scale convolution layer, splicing and fusing the characteristic information extracted by the previous scale convolution layer with the characteristic information graph.

In step 1023, for each convolution layer in the other convolution layers except the first convolution layer, convolution processing is performed on the output results of the adjacent convolution layers in the previous convolution layer, and the current feature map is spliced and fused with the previous feature map after upsampling.

For example, for the ith layer of convolution layer, the convolution layer corresponding to scale 1 convolves the output results of scale 1 and scale 2 convolutions of the ith-1 layer of convolution layer, the convolution layer corresponding to scale 2 convolutions the output results of scale 1, scale 2 and scale 3 convolutions of the ith-1 layer of convolution layer, the convolution layer corresponding to scale 3 convolutions the output results of scale 2, scale 3 and scale 4 convolutions of the ith-1 layer of convolution layer, the convolution layer corresponding to scale 4 convolutions the output results of scale 3, scale 4 and scale 5 convolutions of the ith-1 layer of convolution layer, and the convolution layer corresponding to scale 5 convolutions the output results of scale 4 and scale 5 convolutions of the ith-1 layer of convolution layer.

In step 103, image segmentation is performed according to the output result of the convolution processing.

After the target image to be segmented is subjected to convolution processing through the multi-layer convolution layer, image segmentation is performed according to the output result of the convolution processing, and an image segmentation result is obtained.

According to the image segmentation method, the target image to be segmented is obtained, a plurality of first images to be segmented with different resolutions are generated based on the target image to be segmented, convolution processing is conducted on each first image to be segmented through the plurality of layers of convolution layers, and image segmentation is conducted according to the output result of the convolution processing.

The above image segmentation method was experimentally verified as follows.

Two groups of experiments are set, each group of 5 contrast sub-items is used for verifying contribution of the serial feature pyramid and the parallel feature pyramid to the segmentation result, and the 5 contrast sub-items of the two groups of experiments are respectively provided with the layers N of the cascade pyramid as 1, 4, 7, 10 and 13 so as to highlight influence of different layers of pyramids on the segmentation result.

First, the performance of a parallel multi-scale cross fusion pyramid (PP-Net) is verified and the number of pyramid layers is determined. The experiment network of the first group of experiments only comprises a parallel multi-scale fusion mechanism, and five groups of experiments are PP-Net-1, PP-Net-4, PP-Net-7, PP-Net-10 and PP-Net-13 according to the layer number of the cascade pyramid. Table 1 shows the Dice similarity of the experimental network of this section in diastole and systole on the left and right ventricles and myocardium. It can be seen that the Dice similarity coefficient increases with the increase of the pyramid layer number in most cases, and the parallel feature fusion of the pyramid can bring a certain help to the increase of the precision no matter in diastole or systole. The parameter quantity is approximately 9 ten thousand different among the network structures of the experiment, namely the parallel multi-scale fusion pyramid can improve the segmentation precision under the condition that the parameter quantity increase amplitude is limited.

TABLE 1 parallel Multi-scale feature fusion pyramid experiment results

The second group of experiments are carried out on the basis of the first group of experiments by adding a serial mechanism on the basis of the parallel multi-scale feature fusion pyramid so as to verify the effectiveness of transmitting the information of the low-resolution feature images to the high-resolution feature images in the single feature pyramid, and the experimental results are shown in a table 2, so that no matter what the number of layers of the pyramid is, the integral segmentation effect can be further improved on the basis of the parallel feature fusion by increasing the cascading times. The method has the advantages that the lifting amplitude is more obvious in the segmentation of the right ventricle, the proportion of pixels occupied by the right ventricle in the image is smaller, the extracted related features of the right ventricle are fewer, and the continuous serial feature fusion mechanism can greatly promote the feature information interaction of the whole network, so that the effectiveness of the network structure of the section is also proved.

TABLE 2 series-parallel Multi-scale feature fusion pyramid experiment results

And finally, verifying the validity of the series-parallel multi-scale feature fusion pyramid by adopting an online test mode. The online test does not disclose the label of the data set, so that the online test is more convincing for the test of the network performance, the online test shows the comparison of the Dice similarity coefficient and Hausdorff distance with other algorithms, the online test network of the embodiment of the application adopts SPP-Net-13, and the table 3 shows the comparison of the experimental result of the embodiment of the application and the five previous competition list.

Table 3 comparison of results of on-line tests

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Corresponding to the image segmentation method described in the above embodiments, fig. 7 shows a block diagram of the image segmentation apparatus provided in the embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 7, the image segmentation apparatus in the embodiment of the present application may include an image conversion module 201, a convolution processing module 202, and a segmentation module 203.

The image generation module 201 is configured to obtain a target image to be segmented, and generate a preset number of first images to be segmented based on the target image to be segmented; wherein the resolution of each first image to be segmented is different;

a convolution processing module 202, configured to perform convolution processing on each of the first images to be segmented through multiple convolution layers; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer;

And the segmentation module 203 is used for carrying out image segmentation according to the output result of the convolution processing.

Alternatively, the convolution processing module 202 may specifically be configured to:

the convolving each first image to be segmented by a plurality of convolving layers comprises:

Optionally, the multi-layer convolution layer comprises a plurality of scale convolution layers, each scale convolution layer comprises a plurality of convolution layers, and the number of the convolution layers of each scale convolution layer is different;

As an implementation manner, the number of the convolution layers from the first layer of the multi-layer convolution layer to the i layer of the multi-layer convolution layer is the same, and the number of the convolution layers from the i+1 layer of the multi-layer convolution layer to the n layer of the multi-layer convolution layer is sequentially reduced, wherein the number of the convolution layers from the i layer of the multi-layer convolution layer is greater than the number of the convolution layers from the i+1 layer of the multi-layer convolution layer, and n is the number of the multi-layer convolution layers.

Optionally, each convolution layer of the current layer convolution layer is connected with a convolution layer in the previous layer convolution layer respectively, and is:

Alternatively, the image generation module 201 may specifically be configured to:

Illustratively, the compressing the target image to be segmented to generate a plurality of second images to be segmented with different resolutions includes:

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the present application further provides a terminal device, referring to fig. 8, the terminal device 300 may include: at least one processor 310, a memory 320 and a computer program stored in the memory 320 and executable on the at least one processor 310, which processor 310, when executing the computer program, implements the steps of any of the various method embodiments described above, e.g. steps S101 to S103 in the embodiment shown in fig. 2. Alternatively, the processor 310 may implement the functions of the modules/units in the above-described embodiments of the apparatus, such as the functions of the modules 201 to 203 shown in fig. 7, when executing the computer program.

By way of example, a computer program may be partitioned into one or more modules/units that are stored in memory 320 and executed by processor 310 to complete the present application. The one or more modules/units may be a series of computer program segments capable of performing specific functions for describing the execution of the computer program in the terminal device 300.

It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device and is not limiting of the terminal device and may include more or fewer components than shown, or may combine certain components, or different components, such as input-output devices, network access devices, buses, etc.

The processor 310 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 320 may be an internal storage unit of the terminal device, or may be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like. The memory 320 is used for storing the computer program and other programs and data required by the terminal device. The memory 320 may also be used to temporarily store data that has been output or is to be output.

The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.

The image segmentation method provided by the embodiment of the application can be applied to terminal equipment such as computers, tablet computers, notebook computers, netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the specific type of the terminal equipment is not limited.

Taking the terminal device as a computer as an example. Fig. 9 is a block diagram showing a part of the structure of a computer provided with an embodiment of the present application. Referring to fig. 9, a computer includes: communication circuitry 410, memory 420, input unit 430, display unit 440, audio circuitry 450, wireless fidelity (wireless fidelity, wiFi) module 460, processor 470, and power supply 480. Those skilled in the art will appreciate that the computer architecture shown in fig. 8 is not limiting and that more or fewer components than shown may be included, or that certain components may be combined, or that different arrangements of components may be utilized.

The following describes the components of the computer in detail with reference to fig. 9:

the communication circuit 410 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, in particular, after receiving the image sample transmitted by the image acquisition device, the signal is processed by the processor 470; in addition, an image acquisition instruction is sent to the image acquisition apparatus. Typically, the communication circuitry includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, and the like. In addition, the communication circuit 410 may also communicate with networks and other devices through wireless communication. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (Global System of Mobile communication, GSM), general packet radio service (General Packet Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE)), email, short message service (Short Messaging Service, SMS), and the like.

The memory 420 may be used to store software programs and modules, and the processor 470 performs various functional applications and data processing of the computer by executing the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer (such as audio data, phonebooks, etc.), and the like. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 430 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the computer. In particular, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 431 or thereabout using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 470 and can receive commands from the processor 470 and execute them. In addition, the touch panel 431 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 440 may be used to display information input by a user or information provided to the user as well as various menus of a computer. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 431 may cover the display panel 441, and when the touch panel 431 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 470 to determine the type of the touch event, and then the processor 470 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 9, the touch panel 431 and the display panel 441 are two separate components to implement the input and input functions of the computer, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the computer.

Audio circuitry 450 may provide an audio interface between a user and a computer. The audio circuit 450 may convert the received audio data into an electrical signal, transmit the electrical signal to a speaker, and convert the electrical signal into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 450 and converted into audio data, which are processed by the audio data output processor 470 for transmission to, for example, another computer via the communication circuit 410, or which are output to the memory 420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a computer can help a user to send and receive emails, browse webpages, access streaming media and the like through the WiFi module 460, so that wireless broadband Internet access is provided for the user. Although fig. 9 shows a WiFi module 460, it is understood that it does not belong to the essential constitution of a computer, and can be omitted entirely as required within the scope of not changing the essence of the invention.

Processor 470 is the control center of the computer, and uses various interfaces and lines to connect the various parts of the entire computer, and by running or executing software programs and/or modules stored in memory 420, and invoking data stored in memory 420, performs various functions of the computer and processes the data, thereby performing overall monitoring of the computer. Optionally, the processor 470 may include one or more processing units; preferably, the processor 470 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 470.

The computer also includes a power supply 480 (e.g., a battery) for powering the various components, and preferably the power supply 480 can be logically connected to the processor 470 via a power management system so as to perform functions such as managing charge, discharge, and power consumption via the power management system.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the embodiments of the image segmentation method described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the various embodiments of the image segmentation method described above to be carried out.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An image segmentation method, comprising:

Performing convolution processing on each first image to be segmented through a plurality of convolution layers, wherein the convolution processing comprises the following steps:

the multi-layer convolution layer comprises a plurality of scale convolution layers, each scale convolution layer comprises a plurality of convolution layers, and the number of the convolution layers of each scale convolution layer is different;

inputting each first image to be segmented to each of the first layer convolution layers, comprising:

inputting each image to be segmented into the corresponding convolution layer according to the corresponding relation from high to low of resolution and from large to small of the number of the convolution layers contained in each scale;

for each convolution layer in other layers except the first layer convolution layer, respectively carrying out convolution processing on the output results of the adjacent convolution layers in the previous layer convolution layer, and carrying out splicing fusion on the current feature map after upsampling and the previous feature map; each of the multiple convolution layers comprises a plurality of convolution layers which are connected in series, and each convolution layer of the current convolution layer is respectively connected with the convolution layer of the previous convolution layer; the number of the convolution layers from the first layer of the multi-layer convolution layers to the i layer of the multi-layer convolution layers is the same, the number of the convolution layers from the i+1 layer of the multi-layer convolution layers to the n layer of the multi-layer convolution layers is sequentially reduced, wherein the number of the convolution layers from the i layer of the multi-layer convolution layers is greater than the number of the convolution layers from the i+1 layer of the multi-layer convolution layers, and n is the number of the multi-layer convolution layers;

2. The image segmentation method as set forth in claim 1, wherein each convolution layer of the current layer convolution layer is connected to a convolution layer of a previous layer convolution layer, respectively, and is:

3. The image segmentation method as set forth in claim 1, wherein the generating a preset number of first images to be segmented based on the target images to be segmented includes:

4. The image segmentation method as set forth in claim 3, wherein compressing the target image to be segmented to generate a plurality of second images to be segmented of different resolutions comprises:

5. An image dividing apparatus, comprising:

the convolution processing module is configured to perform convolution processing on each of the first images to be segmented through multiple convolution layers, and includes:

6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.

7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 4.