CN110176012B

CN110176012B - Object segmentation method in image, pooling method, device and storage medium

Info

Publication number: CN110176012B
Application number: CN201910452561.8A
Authority: CN
Inventors: 张逸鹤; 伍健荣; 钱天翼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2022-12-13
Anticipated expiration: 2039-05-28
Also published as: CN110176012A

Abstract

The embodiment of the application discloses a target segmentation method, a pooling method and device in an image and a storage medium. The method comprises the following steps: acquiring an image to be segmented, wherein the image to be segmented comprises a target object to be segmented and extracted; preprocessing an image to be segmented to obtain an input image with a standard size; processing the input image through the trained image segmentation model to obtain a segmentation result of a target object in the image to be segmented; the image segmentation model comprises an attention-guided pooling module, and the attention-guided pooling module is used for determining a region of interest in a feature map of an input image based on an attention mechanism and pooling the feature map by taking the region of interest as a center. The method and the device for segmenting the target information can effectively enhance the target information and inhibit the non-target information, and accuracy of the finally obtained segmentation result is improved.

Description

Object segmentation method in image, pooling method, device and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a target segmentation method, a pooling method, a device and a storage medium in an image.

Background

Target segmentation in an image refers to classifying each pixel in the image and marking the area where a target object is located. The image segmentation can be applied to the fields of medical image analysis, unmanned vehicle driving, geographic information systems, underwater object detection and the like. For example, in the field of medical image analysis, image segmentation may be used to perform tasks such as localization of tumors and other lesions, measurement of tissue volumes, study of anatomical structures, and the like.

In the related art, a neural network is trained through machine learning, an image segmentation model is constructed, and then a segmentation result of an image is generated using the image segmentation model. An image segmentation model constructed based on a CNN (Convolutional Neural Networks) has better performance. Convolutional neural networks typically include convolutional layers, pooling layers, upsampling layers, fully-connected layers, and the like. The pooling layer is used for performing down-sampling on the feature map output by the convolution layer, so that feature selection is realized, and learning complexity is reduced. The pooling layer may use a central pooling (central pooling) approach to non-uniform downsampling of the feature map, wherein the high-sampling site is at the center of the feature map and the low-sampling site is at the outer edge of the feature map.

However, when the target object to be segmented and recognized from the image is required to be shifted from the exact center of the image, the above-mentioned center pooling method may inhibit the feature information of the target object, and the final segmentation result may be inaccurate.

Disclosure of Invention

The embodiment of the application provides a target segmentation method, a pooling method, a device and a storage medium in an image, which can be used for solving the problem that the segmentation result finally obtained by a model is inaccurate due to the fact that the positive center position of a feature map is taken as a pooling repetition in a central pooling mode provided by the related technology. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for segmenting a target in an image, where the method includes:

acquiring an image to be segmented, wherein the image to be segmented comprises a target object to be segmented and extracted;

preprocessing the image to be segmented to obtain an input image with a standard size;

processing the input image through the trained image segmentation model to obtain a segmentation result of the target object in the image to be segmented;

the image segmentation model comprises an attention-guided pooling module, and the attention-guided pooling module is used for determining a region of interest in a feature map of the input image based on an attention mechanism and pooling the feature map by taking the region of interest as a center.

In another aspect, an embodiment of the present application provides a pooling method, including:

acquiring a feature map of a target image, wherein the feature map is obtained by processing the target image by a feature extraction module of a deep neural network;

processing the feature map based on an attention mechanism to generate an attention-enhanced feature map;

determining a region of interest in the feature map according to the feature map after attention enhancement;

and performing pooling treatment on the feature map by taking the concerned area as a center.

In another aspect, an embodiment of the present application provides an apparatus for segmenting an object in an image, where the apparatus includes:

the image acquisition module is used for acquiring an image to be segmented, wherein the image to be segmented comprises a target object to be segmented and extracted;

the image preprocessing module is used for preprocessing the image to be segmented to obtain an input image with a standard size;

the target segmentation module is used for processing the input image through the trained image segmentation model to obtain a segmentation result of the target object in the image to be segmented;

In yet another aspect, embodiments of the present application provide a pooling device, the device comprising:

the map acquisition module is used for acquiring a characteristic map of a target image, wherein the characteristic map is obtained by processing the target image through a characteristic extraction module of a deep neural network;

the map generation module is used for processing the feature map based on an attention mechanism to generate the feature map with enhanced attention;

the region determining module is used for determining a region of interest in the feature map according to the feature map after attention enhancement;

and the map pooling module is used for pooling the characteristic map by taking the concerned area as a center.

In yet another aspect, an embodiment of the present application provides a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the target segmentation method in the image.

In yet another aspect, an embodiment of the present application provides a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the above pooling method.

In yet another aspect, an embodiment of the present application provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the target segmentation method in the image.

In yet another aspect, an embodiment of the present application provides a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the above-mentioned pooling method.

In yet another aspect, the present application provides a computer program product for performing the object segmentation method in the image when the computer program product is executed.

In yet another aspect, the present application provides a computer program product for performing the pooling method when the computer program product is executed.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

when a target object needing to be segmented and identified from an image deviates from the center of the image, the feature map is processed by an attention-based guiding pooling method, namely, target information of the feature map is enhanced based on an attention mechanism to obtain the feature map after the attention is enhanced, then an attention area in the feature map is determined, and finally the feature map is pooled by taking the attention area as the center, so that the target information can be effectively enhanced, non-target information can be inhibited, and the accuracy of the finally obtained segmentation result is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a diagram illustrating target recognition results in a different manner;

FIG. 2 is a flow chart of a pooling method provided by one embodiment of the present application;

FIG. 3 illustrates an architectural diagram of a pooling method;

FIG. 4 is a schematic diagram of a lookup table provided by one embodiment of the present application;

FIG. 5 illustrates a schematic diagram of a pooled core;

fig. 6 illustrates a schematic structural diagram of the pooling method in the present application applied to a uet network;

FIG. 7 is a diagram illustrating lung nodule segmentation results in a different manner;

FIG. 8 is a flowchart of a method for segmenting an object in an image according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating a lung nodule segmentation process flow;

FIG. 10 is a schematic diagram of an image segmentation model provided by an embodiment of the present application;

FIG. 11 is a block diagram of an apparatus for object segmentation in an image according to an embodiment of the present application;

FIG. 12 is a block diagram of an apparatus for object segmentation in an image according to another embodiment of the present application;

FIG. 13 is a block diagram of a pooling device provided by one embodiment of the present application;

FIG. 14 is a block diagram of a pooling device provided by another embodiment of the present application;

fig. 15 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

Before the technical solution of the present application is described, the related terms related to the embodiments of the present application are described.

Deep learning: a machine learning technique using a deep neural network model.

Characteristic spectrum: intermediate results processed by a specific module (e.g., convolutional layer) in the deep learning neural network.

Attention mechanism: the attention mechanism similar to human eyes is utilized in deep learning to realize a pixel-based information enhancement mechanism for a specific target in a feature map.

Central pooling: in the deep learning, in order to reduce the learning complexity, the feature map is subjected to a non-uniform down-sampling (down-sampling) operation, wherein a high-sampling part is at the center of the feature map, and a low-sampling part is at the outer edge of the feature map.

Maximum pooling: and (4) carrying out dimensionality reduction operation on the feature map by utilizing uniform sampling, wherein the spatial dimensionality of the sampled feature map is reduced to be half of that before sampling.

And (3) segmenting the model: and acquiring parameters of the mathematical model in the learning and training process by adopting a mathematical model acquired after learning the labeled sample by adopting a machine learning technology, and loading the parameters of the mathematical model during the identification and prediction and calculating to generate a segmentation result corresponding to the input image.

CT (Computed Tomography) image: a set of CT images consists of several dimensional slices. The CT image is composed of a certain number of pixels with different gray scales from black to white arranged in a matrix. These pixels reflect the X-ray absorption coefficient of the corresponding voxel. CT images are represented in different gray scales, reflecting the degree of absorption of X-rays by organs and tissues. Therefore, like the black and white image shown in the X-ray image, the black image represents a low absorption region, i.e., a low density region; the white shading represents the high absorption region, i.e., the high density region. The density resolution of CT is high, i.e. high density resolution. Therefore, the CT image can better display organs composed of soft tissues, such as brain, spinal cord, mediastinum, lung, liver, gallbladder, pancreas, and pelvic organs, and the like, and show an image of a lesion on a good anatomical image background. The lung CT image is the CT scanning image of the human lung.

According to the embodiment of the application, the target information of the feature map is enhanced based on the attention mechanism, the feature map with enhanced attention is obtained, then the attention area in the feature map is determined, and finally the feature map is subjected to pooling treatment by taking the attention area as the center. According to the technical scheme provided by the embodiment of the application, the target information (namely the attention area) can be enhanced under the condition that the target information is not in the center of the feature map, and the enhancement of the target information and the inhibition of irrelevant information are more flexibly realized. The technical scheme provided by the embodiment of the application integrates the advantages of an attention mechanism and a pooling module, can be flexibly applied to various image processing related deep learning networks which need to enhance target information for segmentation or identification, such as segmentation and identification of cancer parts of CT images, tracking of single targets in complex images, image detection, identification and segmentation application of various deep learning, and the like, and enhances model expression.

Fig. 1 is a schematic diagram illustrating an example of a target recognition result in a different manner. In fig. 1, (a) is an original image of a lung nodule CT, (b) is a colorized original image, (c) is an output result after maximum pooling, (d) is an output result after center pooling, and (e) is an output result after the pooling method according to the embodiment of the present invention is used. As is clear from fig. 1, when the position of the target information is not at the exact center, the pooling method of the embodiment of the present application can still effectively capture the nodule position and effectively suppress other image information that is not the target.

According to the method provided by the embodiment of the application, the execution main body of each step is computer equipment. The computer device may be any electronic device having computing, processing, and storage capabilities. For example, the Computer device may be a PC (Personal Computer) or a server, may also be a terminal device such as a mobile phone, a tablet Computer, a multimedia player, a wearable device, a smart television, and may also be other devices such as a medical device, an unmanned aerial vehicle, and a vehicle-mounted terminal, which is not limited in this embodiment of the present application.

For convenience of description, in the following method embodiments, only the execution subject of each step is described as a computer device, but the method is not limited to this.

Referring to fig. 2, a flow chart of a pooling method provided by an embodiment of the present application is shown. The method may include the following steps.

Step 201, obtaining a feature map of a target image.

In the embodiment of the application, the feature map is obtained by processing the target image by a feature extraction module of the deep neural network. For example, the feature map is obtained by convolving a target image by a convolution layer in a deep neural network.

Illustratively, the feature map is a multi-dimensional feature map, e.g. comprising channel and spatial dimensions. The channel dimension indicates the number of feature maps and may be denoted by the letter C. The spatial dimension indicates the amount of space occupied by the feature map. Assuming that the feature map is a two-dimensional map, the spatial dimension can be represented by L x W, wherein L represents the length of the feature map, and W represents the width of the feature map; assuming that the feature map is a three-dimensional map, the spatial dimension can be represented by H L W, H representing the height of the feature map. C. H, L and W are positive integers. In practical applications, the feature map may not be a two-dimensional or three-dimensional map, but may be a multi-dimensional (e.g., four-dimensional and higher) map.

Illustratively, as shown in fig. 3, the feature map may be represented by a cube, the number of cubes represents the number of channels C, and there are H × L × W pixels in the cube, and each pixel corresponds to a gray value (or intensity value). As shown in fig. 3, the channel dimension corresponding to the feature map is 3, i.e., C is 3; the feature map is spatially a three-dimensional cube, i.e., H = L = W.

And 202, processing the feature map based on the attention mechanism to generate the feature map with enhanced attention.

The attention mechanism is a mechanism that can enhance the target information in the feature map. After the feature map is processed based on the attention mechanism, the target information in the feature map is enhanced. The attention-enhanced feature map enables enhancement of object-based voxel-level information.

Illustratively, step 202 may include the following two substeps:

1. and acquiring a channel dimension attention map and a space dimension attention map corresponding to the feature map.

The channel dimension attention map is obtained by compressing the feature map in a space dimension, and the channel dimension attention map is used for emphasizing target information (namely a following attention area) of the feature map; the spatial dimension attention map is obtained by compressing the feature map in the channel dimension, and the spatial dimension attention map is used for emphasizing the spatial position of the target information.

Illustratively, the channel dimension attention map and the space dimension attention map spectrum may also be represented by a cube, and the space dimension attention map spectrum may include a channel dimension and a space dimension, where the channel dimension of the space dimension attention map is 1, that is, C corresponds to a value of 1, and the space dimension may be represented by HxLxW; the channel dimension attention map spectrum may include a channel dimension and a spatial dimension, where the spatial dimension of the channel dimension attention map is 1, i.e. H, L, W each correspond to a value of 1, and the channel dimension may be denoted by C.

2. And multiplying the feature map with the channel dimension attention map and the space dimension attention map in sequence to obtain the feature map with enhanced attention.

The channel dimensional attention map and the feature map may be pixel-based multiplied, as may the spatial dimensional attention map and the feature map. Alternatively, pixel-based multiplication refers to multiplication based on the gray value (or intensity value) of a pixel. The feature map can be multiplied by the channel dimension attention map to obtain a first map, and then the first map is multiplied by the space dimension attention map to obtain the feature map with enhanced attention; or, the feature map may be multiplied by the spatial dimension attention map to obtain a second map, and then the second map is multiplied by the channel dimension attention map to obtain the feature map after attention enhancement.

The feature map is multiplied by the channel dimension attention map and the space dimension attention map in sequence to obtain a feature map with enhanced attention, which can be characterized by the following formula:

wherein F represents a feature map; f' represents a channel dimension enhancement map; f' represents a feature map after attention enhancement; m _C Representing a channel dimension attention map; m _S Representing a spatial dimension attention map;

representing pixel-based multiplication.

Taking a certain channel as an example, the gray value corresponding to the target pixel point in the feature map is 1, the gray values corresponding to the target pixel point in the channel dimension attention map of each channel are 2, 3 and 4 respectively, and then the gray value corresponding to the target pixel point in the channel dimension enhancement map is (1 × 2+1 × 3+1 × 4= 9); if the gray value corresponding to the target pixel point in the spatial dimension attention map is 2, the gray value corresponding to the target pixel point in the feature map after attention enhancement is (9 × 2= 18). The spatial positions of the target pixel points in the feature map, the channel dimension attention map, the channel dimension enhancement map, the space dimension attention map and the feature map after attention enhancement of each channel are consistent. The way of calculating the gray values corresponding to the pixel points in the channel dimension enhancement map or the feature map after attention enhancement is similar to the above-described way.

And step 203, determining the attention area in the feature map according to the feature map after the attention enhancement.

Because the target information in the feature map after attention enhancement is enhanced, the region of interest in the feature map determined according to the target information is more accurate. The region of interest refers to a region including target information, which may refer to information to be recognized, to be segmented, or to be tracked.

Illustratively, step 203 may include several sub-steps:

1. averaging all channels of the feature map after attention enhancement to obtain an average feature map after attention enhancement;

taking fig. 3 as an example, the feature map after attention enhancement has 3 channels, and the average feature map after attention enhancement can be obtained by averaging the 3 channels, where the number of channels of the average feature map after attention enhancement is 1. Assuming that gray values corresponding to the target pixel points in the feature map after attention enhancement of each channel are respectively 2, 3 and 4, the gray value corresponding to the target pixel point in the average feature map after attention enhancement is (2 +3+ 4)/3 =3, wherein spatial positions of the target pixel points in the feature map after attention enhancement of each channel and the average feature map after attention enhancement are consistent.

2. Carrying out binarization segmentation on the average feature map after attention enhancement to obtain a binarization feature map;

optionally, binarizing and segmenting the average feature map after attention enhancement by adopting an Otsu method to obtain a binarized feature map. The Otsu method is also called the Dalui method or the variance between the largest classes. The principle is that a threshold value is used for dividing an image into a foreground part and a background part, and an analysis of variance method is adopted, so that intra-class variance is required to be minimum, and inter-class variance is required to be maximum.

Exemplarily, smoothing the average feature map after attention enhancement to obtain a smoothed average feature map; and carrying out binarization segmentation on the average feature map after the smoothing treatment to obtain a binarization feature map. The smoothing processing is an image processing method for highlighting a wide area, a low-frequency component and a main part of an image or suppressing image noise and interfering high-frequency components, so that the brightness of the image is gradually changed, abrupt change gradient is reduced, and the image quality is improved. For example, the smoothing process may be performed by an interpolation method, a linear smoothing method, a convolution method, a gaussian method, or the like. And the average characteristic map after the smoothing processing is subjected to binary segmentation, so that the result of the binary segmentation is more accurate.

3. And selecting a pixel region which meets preset conditions from the binarization feature map as a concerned region.

Illustratively, a pixel region with the largest cluster (cluster) center is selected from the binarization feature map as the region of interest.

And step 204, performing pooling treatment on the feature map by taking the attention area as a center.

Pooling may also be referred to as sampling. Pooling generally occurs at a pooling layer of the neural network for feature dimension reduction, compressing the number of data and parameters, reducing overfitting, and improving the fault tolerance of the neural network model. Pooling can be divided into uniform sampling and non-uniform sampling: uniform sampling includes maximum pooling and average pooling; non-uniform sampling includes central pooling.

According to the pooling method provided by the embodiment of the application, the spatial dimension of the feature map after pooling treatment is the same as that of the traditional maximum pooling, and the distortion phenomenon of the feature map is avoided. And the pooling module provided by the embodiment of the application can be used in the neural network for multiple times.

To sum up, in the technical scheme provided in the embodiment of the present application, the target information is enhanced on the feature map based on the attention mechanism to obtain the feature map with enhanced attention, then the attention area in the feature map is determined, and finally the feature map is pooled with the attention area as the center. According to the technical scheme provided by the embodiment of the application, the target information can be enhanced under the condition that the attention area is not in the right center of the feature map, and the enhancement of the target information and the suppression of irrelevant information are more flexibly realized.

In addition, the pooling method provided by the embodiment of the application has diversified selections in the application mode of the product side, and can be used as a sub-module to be embedded into various deep learning networks with image segmentation, pattern recognition and target detection as targets, so that the aims of segmentation, recognition and detection result enhancement are fulfilled.

Hereinafter, the pooling will be exemplified by a pooling method as a center.

Illustratively, the step 204 includes the following sub-steps:

1. acquiring a central coordinate of a region of interest;

optionally, the central coordinate of the region of interest refers to the geometric center of this region of interest, and the central coordinate of the region of interest may be obtained, for example, by a graphical process. The center coordinates are shown as dots in fig. 3.

2. And performing pooling treatment on the feature map by taking the central coordinate of the attention area as a center.

The high sampling site is near the center coordinates of the region of interest, and the low sampling site is distant from the center coordinates of the region of interest.

2.1, determining the final distribution of the pooling cores in the feature map according to the central coordinates of the attention area;

in the embodiment of the present application, in the final distribution state, the size of the pooling kernel is in a positive correlation with the distance between the pooling kernel and the center coordinate of the region of interest. Alternatively, the pooling cores are 1, 2, and 3 in size; when the size of the pooling core is 1, the corresponding sampling rate is 1, namely high-precision sampling is carried out; when the size of the pooling kernel is 2, the corresponding sampling rate is 2, namely low-precision sampling is carried out; a pooling kernel size of 3 corresponds to a sampling rate of 3, i.e. lower precision sampling. A pooling kernel of size 1 is closest to the center coordinate of the region of interest, a pooling kernel of size 2 is next to, and a pooling kernel of size 3 is farthest from the center coordinate of the region of interest.

Optionally, step 2.1 above comprises the following substeps:

2.1.1, determining the initial distribution of the pooling kernels in the feature map according to the central coordinates of the feature map;

in the embodiment of the present application, in the initial distribution state, the size of the pooling kernel is in a positive correlation with the distance between the pooling kernel and the center coordinate of the feature map. Alternatively, the pooled nuclei have sizes of 1, 2 and 3; when the size of the pooling core is 1, the corresponding sampling rate is 1, namely high-precision sampling is carried out; when the size of the pooling core is 2, the corresponding sampling rate is 2, namely low-precision sampling is carried out; a pooling kernel size of 3 corresponds to a sampling rate of 3, i.e. a lower precision sampling. Pooling kernels of size 1 are closest to the feature map's center coordinates, pooling kernels of size 2 are next to, pooling kernels of size 3 are farthest from the feature map's center coordinates.

Assuming that the spatial dimension corresponding to the feature map is O x O, the sizes of the pooling kernels are 1, 2 and 3, and the corresponding pooling sampling functions have sampling rates of 1, 2 and 3, respectively, the numbers of the pooling kernels having the sizes of 1, 2 and 3 are respectively:

wherein, in the initial distribution state, n ₁ Represents the number of pooling kernels of size 1 (i.e., the number of pooling sampling functions for a sampling rate of 1); n is ₂ Representing the number of pooled kernels of size 2 (i.e., the number of pooled sampling functions with a sampling rate of 2); n is ₃ Represents the number of pooled kernels of size 3 (i.e., the number of pooled sampling functions with a sampling rate of 3); floor [ 2 ]]Represents a lower rounding operation, L [ i, r ]]Is a predefined look-up table for handling the case where the input size (O) cannot be divided exactly by 8; o represents the side length of the characteristic map; i represents the sampling rate; r represents the number of refills.

The look-up table is shown in FIG. 4, assuming that O is 16, then n ₁ ＝floor[16/8]＝2，n ₂ ＝floor[16/4]＝4，n ₃ ＝floor[16/8]=2; assuming that O is 12, n ₁ ＝floor[12/8]+L[1,4]＝1+1＝2，n ₂ ＝floor[12/4]＝3，n ₃ ＝floor[12/8]+L[3,4]=1+0=1. Generally, the number of pooled sampling functions with a sampling rate of 2 equals the number of pooled sampling functions with a sampling rate of 1 plus the number of pooled sampling functions with a sampling rate of 3. Therefore, the size of O is generally greater than (1 + 2+ 1+ 3) =8.

Taking the central coordinate of the feature map as a boundary, the numbers of the pooling cores with the sizes of 1, 2 and 3 on the two sides of the central coordinate of the feature map are respectively as follows:

n _i,2 ＝n _i -n _i,1 ；

wherein n is _i Representing the number of pooled sampling functions with a sampling rate of i; ceil [ 2 ]]Representing a ceiling operation; n is a radical of an alkyl radical _i,1 Representing the number of the pooling sampling functions with the sampling rate of i on the left side or the upper side or the front side of the central coordinate of the feature map; n is _i,2 And the number of the pooled sampling functions with the sampling rate of i on the right side or the lower side or the rear side of the central coordinate of the feature map is represented.

Still taking O as 16 for illustration, then

n _1,2 ＝n ₁ -n _1,1 ＝2-1＝1；

n _2,2 ＝n ₂ -n _2,1 ＝3-2＝1；

n _3,2 ＝n ₃ -n _3,1 ＝2-1＝1；

The initial distribution state of the pooling nuclei having the sizes of 1, 2 and 3 at this time is shown in fig. 5, and fig. 5 shows the initial distribution state of the pooling nuclei under the two-dimensional feature map (the feature map is a square). The pooling kernel of size 1 is closest to the feature map's center coordinate, the pooling kernel of size 2 is next to, and the pooling kernel of size 3 is farthest from the feature map's center coordinate.

2.1.2, calculating the offset between the central coordinate of the attention area and the central coordinate of the feature map;

offset = center coordinate of region of interest-center coordinate of feature map;

still taking the feature map as a plane square as an example, if the side length of the square is 16, the central coordinate of the feature map is (3 + 1+ 2+1 ) =8, if the central coordinate of the region of interest is (12, 10), the offset in the X direction is (12-8) =4, and the offset in the y direction is (10-8) =2.

And 2.1.3, adjusting the initial distribution of the pooling cores in the feature map according to the offset to obtain the final distribution of the pooling cores in the feature map.

Generally, when the offset is greater than 0, the pooling kernel is adjusted from the right side to the left side or from the lower side to the upper side; when the offset amount is less than 0, the pooling nucleus is adjusted from the left side to the right side or from the upper side to the upper side. During adjustment, the pooling cores with larger sizes are preferentially adjusted, and the adjustment number of the pooling cores cannot exceed the original number; when the offset divides the pool kernel size by 0, the next pool kernel is selected, and the number of each pool kernel to be adjusted multiplied by the size of each pool kernel equals the cheapness.

Still taking the above example as an example, since the offset amount in the X direction is larger than 0, the adjustment of the pooling kernel needs to be performed from the lower side to the upper side; since the offset amount in the Y direction is larger than 0, the adjustment of the pooling kernel needs to be performed from the right side to the left side. The adjustment is performed by preferentially adjusting the pooled kernel with a larger size, for example, a pooled kernel with a size of 3, a pooled kernel with a size of 2, and a pooled kernel with a size of 1. Adjusting the pooling nucleus in the X direction, as shown in fig. 5, 4/3 is rounded to 1, if there is a pooling nucleus of size 3 on the lower side, one pooling nucleus of size 3 on the lower side is moved upward, and after the adjustment, the offset amount becomes 4-3=1,1 is divisible into 0 by 2, so the pooling nucleus of size 2 cannot be moved; 1 can divide 1 into 1, and when there is a pooling nucleus of size 1 on the lower side, one pooling nucleus of size 1 is moved upward, and the offset amount is 0 after adjustment, so that the final distribution state of the pooling nuclei in the X direction is as shown in fig. 5; then adjusting the pooling kernel in the Y direction, since 2 is divided by 3 to be 0, a pooling kernel of size 3 cannot be adjusted; 2, dividing 2 into 1, and adjusting a convolution kernel with the size of 2 to the left side under the condition that a convolution kernel with the size of 2 exists on the right side; the offset amount becomes 0 after the adjustment, and thus the final state of the pooling nucleus in the Y direction is as shown in FIG. 5.

And 2.2, performing pooling treatment on the characteristic map according to the final distribution of the pooling kernels in the characteristic map.

Illustratively, a pooling kernel of size 1, selecting as output the original values in the feature map; pooling kernels of size 2, selecting the maximum or average value in the feature map as output; pooling kernels of size 3, the maximum or average in the feature map is selected as output.

As shown in fig. 3, for a three-dimensional feature map, the feature map may be pooled from the row direction (i.e., X direction or W dimension) to obtain a feature map after row pooling; pooling the characteristic maps subjected to row pooling in the column direction (namely Y direction or L dimension) to obtain feature maps subjected to column pooling; finally, pooling the characteristic maps after the rows are pooled from the H dimension to obtain an output characteristic map, which can also be called an ARP (Attention constrained Pooling) output map.

Referring to fig. 6, fig. 6 is a schematic structural diagram illustrating a pooling method in the present application applied to a pnet network according to an exemplary embodiment.

The Unet network model is an image segmentation network model applied in the medical field, can freely deepen a network structure according to a selected data set when processing a target with a larger receptive field, and can adopt a stacking method when carrying out shallow feature fusion.

As shown in fig. 6, a first map (i.e. input layer) with channel number 3 is subjected to a double convolutional layer with jump connection to obtain a second map with channel number 32; passing the second map with the channel number of 32 through a first downsampling layer (namely, an ARP process or a pooling method in the application) to obtain a third map with the channel number of 32 and reduced spatial dimension; obtaining a fourth map with the channel number of 64 by using the double convolution layer of the jump connection of the third map (the space dimension of the third map is consistent with that of the fourth map); the fourth map is subjected to a second downsampling layer (namely an ARP process or a pooling method in the application) to obtain a fifth map with 64 channels and reduced spatial dimension; performing double convolution layers of jump connection on the fifth map to obtain a sixth map with the channel number of 128 (the space dimensions of the sixth map and the fifth map are consistent); passing the sixth map through a third downsampling layer (i.e. an ARP process or pooling method in this application) to obtain a seventh map with 128 channels but reduced spatial dimensions; the seventh map is subjected to jump-connected double convolution layers to obtain an eighth map with the channel number of 256 (the spatial dimension of the eighth map is consistent with that of the seventh map); performing a fourth downsampling layer (namely performing maximum pooling treatment) on the eighth map to obtain a ninth map with 256 channels and reduced spatial dimension; obtaining a tenth map with the channel number of 256 by using the double convolution layer of the jump connection of the ninth map (the ninth map and the tenth map have the same spatial dimension); stacking the tenth map and the eighth map after passing through a first upper sampling layer based on an attention mechanism to obtain an eleventh map with the channel number of 256+ 512; the eleventh map is subjected to jump-connected double convolution layers to obtain a twelfth map with 256 channels; after the twelfth map passes through a second upper sampling layer based on an attention mechanism, stacking processing is carried out on the twelfth map and the sixth map, and a thirteenth map with the channel number of 128+256 is obtained; performing double convolution layers of jump connection on the thirteenth map to obtain a fourteenth map with the channel number of 128; the fourteenth map passes through a third upper sampling layer based on an attention mechanism and is stacked with the fourth map to obtain a fifteenth map with the channel number of 64+ 128; performing jump connection on the fifteenth map to obtain a sixteenth map with 64 channels; enabling the sixteenth map to pass through a fourth upper sampling layer based on an attention mechanism, and stacking the sixteenth map and the second map to obtain a seventeenth map; and (4) processing the seventeenth map by a multi-scale rolling block to obtain an eighteenth map (namely an output layer). The double convolutional layers of the jump connection comprise convolution processing, batch standardization processing, an excitation function and the like.

A CT Lung nodule Image in a LIDC (The Lung Image Database Consortium) dataset was segmented on a pixel basis using a Unet network as shown in fig. 6. Fig. 7 is a comparison diagram of the standard segmentation result, the segmentation result obtained by using the conventional Unet network, the attention-based Unet network, the segmentation result obtained by using the conventional centralized pooling Unet network, and the Unet network of the present application, in sequence from left to right. As is clear from fig. 7, the pooling method of the present application has good performance in the lung nodule image segmentation task, and no over-segmentation or under-segmentation occurs under the condition that the surrounding tissue is complex, which indicates that the pooling method of the present application can well enhance the target information and suppress the non-target information.

Referring to fig. 8, a flowchart of a method for segmenting an object in an image according to an embodiment of the present application is shown, where the method includes the following steps:

step 801, acquiring an image to be segmented.

In the embodiment of the application, the image to be segmented comprises the target object to be segmented and extracted. The segmentation extraction may refer to segmentation extraction, recognition extraction, tracking extraction, detection extraction, and the like, and the embodiment of the present application is not limited to the type of the segmentation extraction.

The image to be segmented may be a CT image, for example a lung nodule CT image as shown in fig. 9.

Step 802, preprocessing an image to be segmented to obtain an input image with a standard size.

Illustratively, if the size of the image to be segmented is smaller than the standard size, the pixel supplement processing is carried out on the outer side of the image to be segmented, and an input image with the standard size is generated. At this time, the size of the target object in the image to be segmented is the same as that in the input image.

And if the size of the image to be segmented is larger than the standard size, performing down-sampling processing on the image to be segmented to generate an input image with the standard size.

In a lung image, the size range of nodules is relatively wide, and a segmentation model usually requires input of a standard size, and in the related art, small nodules are subjected to amplification processing, such as manual interpolation, and finally the image is distorted.

Alternatively, in the embodiment of the present application, a method for pixel-complementing 0 may be performed on the outer side of the small node, so that the preprocessing of the small node is realized without losing image information.

And 803, processing the input image through the trained image segmentation model to obtain a segmentation result of the target object in the image to be segmented.

In this embodiment of the present application, the image segmentation model includes an attention-guided pooling module, and the attention-guided pooling module is configured to determine a region of interest in a feature map of the input image based on an attention mechanism, and perform pooling on the feature map with the region of interest as a center. Reference may be made to the above embodiments with respect to the attention-directed pooling module, which is not described in detail herein. Illustratively, a robust image segmentation model can be obtained by using a large amount of sample data in combination with a preprocessing method.

Optionally, step 803 includes several substeps:

1. processing an input image through the trained image segmentation model, and outputting an initial segmentation result of an image to be segmented;

image segmentation model referring to fig. 9 and 6, the image segmentation model includes an input layer, four downsampling layers, four upsampling layers, and an output layer. The down-sampling layer comprises a pooling module guided based on an attention mechanism, and the up-sampling layer is an up-sampling layer based on the attention mechanism. The four upsampling layers enable segmentation of lung nodules using multi-resolution information.

2. And screening the segmentation results which do not meet the conditions from the initial segmentation results to obtain the segmentation results of the target object.

Optionally, this step comprises the following two substeps:

2.1, screening out the segmentation results with diameters not meeting the conditions from the initial segmentation results to obtain the segmentation results after primary screening;

still taking the nodule as an example, the nodule has a reasonable diameter range, the size of the initial segmentation result can be changed based on the original size of the nodule, and then the segmentation result with the diameter not meeting the condition is screened out, so as to obtain the segmentation result after the initial screening. For example, a reasonable diameter range of a nodule is 0-6mm, and a nodule with a nodule diameter of 7mm exists in the initial segmentation result, and the nodule needs to be screened out.

And 2.2, screening out the segmentation results of which the positions do not accord with the conditions from the segmentation results after the primary screening by adopting a non-maximum value inhibition method to obtain the segmentation results of the target object.

The Non-Maximum Suppression method (NMS) may also be referred to as an edge thinning technique, and the principle thereof is to suppress elements that are not Maximum. Non-maxima suppression methods may be applied to "thin" the edges, which may help suppress all gradient values outside the local maxima, indicating the locations with the most intense intensity value changes. And screening the segmentation result of the position which does not accord with the position of the nodule from the segmentation result after the primary screening by adopting a non-maximum inhibition method to obtain the segmentation result of the lung nodule.

The above embodiment is described by taking an example that the target object is not at the center of the image to be segmented as an example, in a possible implementation manner, the image to be segmented may be an image with a label for the target object, at this time, the image to be segmented may be preprocessed by combining label information, at this time, the obtained standard size results in that the target object in the input image is generally located at the center of the input image, and then the input image may be processed by combining only the attention mechanism and the conventional center pooling in the image segmentation model, at this time, the image segmentation model may be as shown in fig. 10. The down-sampling layer comprises an attention mechanism and a central pooling module, and the up-sampling layer is still an attention mechanism-based up-sampling layer. The description of the image segmentation model in fig. 10 is similar to that in fig. 6, the first to third downsampling layers are the attention mechanism plus center pooling module, the fourth downsampling layer is the attention mechanism plus maximum pooling module, and the rest of the structure is similar. Taking a target object as a nodule as an example, the nodule marking adopts the latest marking standard and is corrected by combining a plurality of professional doctors, so that the actual clinical requirements under the current conditions are better met. The training data utilizes the pathological data of multicenter to carry out the network training, covers the node under the multiple condition such as calcification, mill glass, reality, nearly chest wall, can more comprehensive study node characteristic during the training to greatly promote the lung nodule and cut apart the rate of accuracy of discernment.

The above embodiment is only described by taking the target object as a lung nodule as an example, and other types of medical image recognition can be implemented by the present solution, for example, recognition of fundus glycocalyx, recognition of breast cancer, and the like, and only different types of training data need to be provided.

In summary, in the technical solution provided in the embodiment of the present application, when a target object to be segmented and identified from an image deviates from the center of the image, an attention-based pooling method is used to process a feature map, that is, target information of the feature map is enhanced based on an attention mechanism to obtain the feature map after attention enhancement, then an attention region in the feature map is determined, and finally the feature map is pooled around the attention region, so that target information can be effectively enhanced, non-target information can be suppressed, and the accuracy of a segmentation result obtained finally is improved.

In addition, the image segmentation model provided by the embodiment of the application has the characteristics of quick response, high accuracy, strong robustness, low labor cost and the like. The image segmentation model can be oriented to a hospital or personal auxiliary medical system, and can help a patient to quickly and efficiently detect the position and the size of a lung nodule as a main link of the system.

In addition, the technical scheme provided by the embodiment of the application greatly liberates medical resources. The image segmentation model is a full-automatic identification model, manual intervention is not needed in the identification process, medical personnel can process and assist identification in large batch, and misdiagnosis rate can be greatly reduced. And the identification capability is continuously evolving, so that a benign closed-loop process is brought. According to the scheme, sample data is continuously collected, the training data set of the offline model is enlarged, and the model identification capability can be further and continuously improved on the basis of high precision.

In addition, the texture of the lung nodule image is very complicated due to the surrounding of tissues such as blood vessels, the chest wall and the like, and information related to the nodule cannot be effectively acquired by simply utilizing maximum pooling down-sampling, so that a trained model cannot adapt to the nodule under various conditions, and the condition of over-segmentation or under-segmentation can occur when a complex texture image is input. And based on the attention guide pooling, the information of the nodules can be effectively enhanced under the condition of complex textures, and the segmentation and identification accuracy of the pulmonary nodules is greatly improved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 11, a block diagram of a target segmentation apparatus in an image according to an embodiment of the present application is shown. The device has the functions of realizing the method examples, and the functions can be realized by hardware or by hardware executing corresponding software. The device may be the computer device described above, or may be provided on a computer device. The apparatus 1100 may include: an image acquisition module 1110, an image pre-processing module 1120, and a target segmentation module 1130.

The image obtaining module 1110 is configured to obtain an image to be segmented, where the image to be segmented includes a target object that needs to be segmented and extracted.

The image preprocessing module 1120 is configured to preprocess the image to be segmented to obtain an input image with a standard size.

The target segmentation module 1130 is configured to process the input image through the trained image segmentation model to obtain a segmentation result of the target object in the image to be segmented;

In summary, in the technical solution provided in the embodiment of the present application, when a target object to be segmented and identified from an image deviates from the center of the image, an attention-based pooling method is used to process a feature map, that is, the feature map is subjected to target information enhancement based on an attention mechanism to obtain a feature map after attention enhancement, then an attention region in the feature map is determined, and finally the feature map is pooled around the attention region, so that target information can be effectively enhanced, non-target information can be suppressed, and the accuracy of a final segmentation result is improved.

In an exemplary embodiment, the image pre-processing module 1120 is configured to:

if the size of the image to be segmented is smaller than the standard size, performing pixel supplement processing on the outer side of the image to be segmented to generate the input image with the standard size;

wherein the size of the target object in the image to be segmented is the same as the size in the input image.

In an exemplary embodiment, as shown in fig. 12, the object segmentation module 1130 includes: a target primary segmentation unit 1131 and a target segmentation unit 1132.

The target primary segmentation unit 1131 is configured to process the input image through the trained image segmentation model, and output an initial segmentation result of the image to be segmented.

The target segmenting unit 1132 is configured to filter out a segmenting result that does not meet the condition from the initial segmenting result, and obtain a segmenting result of the target object.

In an exemplary embodiment, the target segmentation unit 1132 is configured to:

screening out the segmentation results with the diameters not meeting the conditions from the initial segmentation results to obtain the segmentation results after primary screening;

and screening the segmentation results of which the positions do not meet the conditions from the segmentation results after the preliminary screening by adopting a non-maximum value inhibition method to obtain the segmentation results of the target object.

Referring to fig. 13, a block diagram of a pooling device provided by an embodiment of the present application is shown. The device has the functions of implementing the method examples, and the functions can be realized by hardware or by hardware executing corresponding software. The device may be the computer device described above, or may be provided on a computer device. The apparatus 1300 may include: an atlas acquisition module 1310, an atlas generation module 1320, a region determination module 1330, and an atlas pooling module 1340.

The map obtaining module 1310 is configured to obtain a feature map of a target image, where the feature map is obtained by processing the target image by a feature extracting module of a deep neural network.

The map generation module 1320 is configured to process the feature map based on an attention mechanism, and generate a feature map with enhanced attention.

The region determining module 1330 is configured to determine, according to the feature map after attention enhancement, a region of interest in the feature map.

The map pooling module 1340 is configured to pool the feature map with the region of interest as a center.

To sum up, in the technical solution provided in the embodiment of the present application, the target information of the feature map is enhanced based on the attention mechanism, so as to obtain the feature map with enhanced attention, then the attention area in the feature map is determined, and finally the pooling process is performed on the feature map with the attention area as the center. According to the technical scheme provided by the embodiment of the application, the target information can be enhanced under the condition that the attention area is not in the center of the feature map, and the enhancement of the target information and the inhibition of irrelevant information are more flexibly realized.

In an exemplary embodiment, as shown in fig. 14, the region determination module 1330 includes: a channel averaging unit 1331, an atlas segmentation unit 1332 and a region determination unit 1333.

The channel averaging unit 1331 is configured to perform an averaging process on each channel of the feature map after attention enhancement to obtain an average feature map after attention enhancement.

The map segmentation unit 1332 is configured to perform binarization segmentation on the average feature map after attention enhancement to obtain a binarization feature map.

The region determining unit 1333 is configured to select a pixel region meeting a preset condition from the binarization feature map as the region of interest.

In an exemplary embodiment, the region determining module 1330 further comprises: an atlas smoothing unit 1334.

The map smoothing module 1334 is configured to smooth the average feature map after attention enhancement to obtain a smoothed average feature map;

the atlas segmentation unit 1332 is further configured to:

and carrying out binarization segmentation on the smoothed average feature map to obtain the binarization feature map.

In the illustrative embodiment, the map pooling module 1340, includes: a coordinate acquisition unit 1341 and an atlas pooling unit 1342.

The coordinate acquiring unit 1341 is configured to acquire a center coordinate of the attention area.

The map pooling unit 1342 is configured to pool the feature map with the center coordinate of the region of interest as a center.

In an exemplary embodiment, the map pooling unit 1342 includes: a distribution determination subunit 1343 and an atlas pooling subunit 1344.

The distribution determining subunit 1343 is configured to determine a final distribution of the pooling kernel in the feature map according to the center coordinate of the region of interest; wherein, in the final distribution state, the size of the pooling kernel has a positive correlation with the distance between the pooling kernel and the center coordinate of the region of interest.

The map pooling subunit 1344 is configured to perform pooling on the feature map according to the final distribution of the pooling kernel in the feature map.

In the exemplary embodiment, the distribution determining subunit 1343 is configured to:

determining the initial distribution of the pooling kernels in the feature map according to the central coordinates of the feature map; wherein, in the initial distribution state, the size of the pooling kernel is in positive correlation with the distance between the pooling kernel and the central coordinate of the feature map;

calculating the offset between the central coordinates of the attention area and the central coordinates of the feature map;

and adjusting the initial distribution of the pooling cores in the feature map according to the offset to obtain the final distribution of the pooling cores in the feature map.

Optionally, the map generating module 1320 is configured to:

acquiring a channel dimension attention map and a space dimension attention map corresponding to the feature map;

and multiplying the feature map with the channel dimensional attention map and the space dimensional attention map in sequence to obtain the feature map with enhanced attention.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the content structure of the device may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, which are not described herein again.

Referring to fig. 15, a schematic structural diagram of a computer device 1500 according to an embodiment of the present application is shown. The computer device 1500 may be used to implement the object segmentation method or pooling method in the images provided in the above embodiments. Specifically, the method comprises the following steps:

the computer device 1500 includes a Central Processing Unit (CPU) 1501, a system memory 1504 including a Random Access Memory (RAM) 1502 and a Read Only Memory (ROM) 1503, and a system bus 1505 connecting the system memory 1504 and the central processing unit 1501. The computer device 1500 also includes a basic input/output system (I/O system) 1506 for facilitating information transfer between devices within the computer, and a mass storage device 1507 for storing an operating system 1513, application programs 1514, and other program modules 1515.

The basic input/output system 1506 includes a display 1508 for displaying information and an input device 1509, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 1508 and input device 1509 are connected to the central processing unit 1501 via an input-output controller 1510 connected to the system bus 1505. The basic input/output system 1506 may also include an input/output controller 1510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1507 is connected to the central processing unit 1501 through a mass storage controller (not shown) connected to the system bus 1505. The mass storage device 1507 and its associated computer-readable media provide non-volatile storage for the computer device 1500. That is, the mass storage device 1507 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.

Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1504 and mass storage device 1507 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 1500 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1500 may be connected to the network 1512 through the network interface unit 1511 connected to the system bus 1505 or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1511.

The memory also includes one or more programs stored in the memory and configured to be executed by one or more processors. The one or more programs include instructions for implementing a target segmentation method or pooling method in the image.

In an exemplary embodiment, a computer device is also provided that includes a processor and a memory having at least one instruction, at least one program, set of codes, or set of instructions stored therein. The at least one instruction, at least one program, set of codes, or set of instructions is configured to be executed by the processor to implement a target segmentation method or pooling method in the image described above.

In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions which, when executed by a processor of a terminal, implements a target segmentation method or pooling method in an image as described above. Alternatively, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which, when executed, is adapted to implement the above-mentioned target segmentation method or pooling method in an image.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of segmenting an object in an image, the method comprising:

the image segmentation model comprises an attention-guided pooling module, and the attention-guided pooling module is used for determining a region of interest in a feature map of the input image based on an attention mechanism and performing central pooling on the feature map by taking the region of interest as a center.

2. The method according to claim 1, wherein the preprocessing the image to be segmented to obtain an input image with a standard size comprises:

3. The method according to claim 1, wherein the processing the input image through the trained image segmentation model to obtain the segmentation result of the target object in the image to be segmented comprises:

processing the input image through the trained image segmentation model, and outputting an initial segmentation result of the image to be segmented;

and screening the segmentation results which do not meet the conditions from the initial segmentation results to obtain the segmentation results of the target object.

4. The method according to claim 3, wherein the screening out the unqualified segmentation results from the initial segmentation results to obtain the segmentation results of the target object comprises:

5. A pooling process, characterized in that the process comprises:

and performing central pooling treatment on the feature map by taking the concerned area as a center.

6. The method according to claim 5, wherein the determining the region of interest in the feature map from the attention-enhanced feature map comprises:

averaging all channels of the feature map after attention enhancement to obtain an average feature map after attention enhancement;

carrying out binarization segmentation on the average feature map after attention enhancement to obtain a binarization feature map;

and selecting a pixel region which meets preset conditions from the binarization feature map as the concerned region.

7. The method according to claim 6, wherein the averaging process for each channel of the attention-enhanced feature map to obtain an attention-enhanced average feature map further comprises:

carrying out smoothing treatment on the average characteristic map after the attention is enhanced to obtain the average characteristic map after the smoothing treatment;

the binarizing and segmenting the average feature map after attention enhancement to obtain a binarized feature map comprises the following steps:

8. The method of claim 5, wherein the performing a center pooling of the feature maps around the region of interest comprises:

acquiring the central coordinates of the attention area;

and performing pooling treatment on the feature map by taking the central coordinate of the attention area as a center.

9. The method of claim 8, wherein the pooling the feature map with the center coordinates of the region of interest as a center comprises:

determining the final distribution of the pooling kernels in the feature map according to the central coordinates of the region of interest; wherein, in the final distribution state, the size of the pooling kernel has a positive correlation with the distance between the pooling kernel and the center coordinates of the region of interest;

and performing pooling treatment on the feature map according to the final distribution of the pooling cores in the feature map.

10. The method of claim 9, wherein determining a final distribution of pooling kernels in the feature map based on the center coordinates of the region of interest comprises:

11. The method according to any one of claims 5 to 10, wherein the processing the feature map based on the attention mechanism to generate an attention-enhanced feature map comprises:

12. An apparatus for segmenting an object in an image, the apparatus comprising:

13. A pooling device, comprising:

and the map pooling module is used for performing central pooling treatment on the characteristic map by taking the concerned area as a center.

14. A computer device comprising a processor and a memory, the memory having stored therein at least one program which is loaded and executed by the processor to implement a method of object segmentation in an image according to any one of claims 1 to 4 or to implement a pooling method according to any one of claims 5 to 11.

15. A computer-readable storage medium, in which at least one program is stored, which is loaded and executed by a processor to implement a method of object segmentation in an image according to any one of claims 1 to 4, or to implement a pooling method according to any one of claims 5 to 11.