WO2021115061A1

WO2021115061A1 - Image segmentation method and apparatus, and server

Info

Publication number: WO2021115061A1
Application number: PCT/CN2020/129521
Authority: WO
Inventors: 廖祥云; 孙寅紫; 王琼; 王平安
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2019-12-11
Filing date: 2020-11-17
Publication date: 2021-06-17
Also published as: CN111145196A

Abstract

An image segmentation method and apparatus, and a server. The method comprises: inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented so as to generate a feature graph, and calculating a spatial position relation between pixel points in the feature graph so as to obtain spatial position information (S110); fusing the feature graph and the spatial position information to obtain a feature graph comprising the spatial position information (S120); and according to the feature graph comprising the spatial position information, segmenting the image to be segmented, and outputting a target image (S130). The method solves the problem that the boundary between different targets to be segmented cannot be accurately segmented.

Description

Image segmentation method, device and server

Technical field

The present invention relates to the technical field of image segmentation, in particular to an image segmentation method, device and server.

Background technique

Image segmentation is one of the research hotspots of computer graphics, and it has important applications in the fields of medical disease diagnosis and unmanned driving. At present, there are many methods for image segmentation algorithms, among which (U-Net) U-shaped neural network algorithm is one of the most commonly used algorithms. The U-shaped neural network algorithm is composed of an encoder and a decoder, and the encoder and the decoder are connected by splicing in the image channel dimension. Specifically, the image to be segmented is first extracted through an encoder for image feature extraction. The encoder is composed of multiple convolutional layers, and the convolutional layers are connected by a pooling layer, thereby reducing the dimension of the original image to a certain size. After that, the image output from the encoder is restored to the original image size by the decoder. The decoder is composed of multiple convolutional layers, and the convolutional layers are connected by transposed convolutional layers. Finally, the output image is converted into a probability map using the softmax activation function. Compared with traditional image segmentation algorithms, such as threshold segmentation, region segmentation, and edge segmentation, the UNet algorithm has a simple network structure and high accuracy of image segmentation. However, the current UNet image segmentation algorithm questions will exaggerate the difference between similar objects (inter-class distinction) or the similarity of objects of different categories (intra-class consistency), and it is impossible to separate "different types of similar features" and "same characteristics". The boundary between the “species difference characteristics”. As a result, the boundary between different objects to be segmented cannot be accurately segmented, and the segmentation accuracy of the image is low.

Summary of the invention

In view of this, the embodiments of the present invention provide an image segmentation method, device, and server to solve the problem that the boundary between different objects to be segmented cannot be accurately segmented.

The first aspect of the embodiments of the present invention provides an image segmentation method, including:

Input the image to be divided into an image segmentation model, perform feature extraction on the image to be divided to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain spatial position information;

Fusing the feature map and the spatial location information to obtain a feature map containing the spatial location information;

The image to be divided is segmented according to the feature map containing the spatial position information, and the target image is output.

In an implementation example, the image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information, include:

Perform feature extraction on the image to be segmented by N information extraction modules connected in series in the encoder to generate a feature map; the N information extraction modules are set according to preset scale information, N≥1;

For each of the information extraction modules, the spatial position relationship between pixels in the feature map generated by the information extraction module is calculated to obtain spatial position information.

When the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information;

Fusion of the feature map and the spatial position information to generate a new feature map and output to the next information extraction module, so that the next information extraction module performs feature extraction and spatial position relationship calculation on the new feature map .

In an implementation example, the fusing the feature map and the spatial location information to obtain a feature map containing spatial location information includes:

Context information is generated by fusing the feature map and spatial position information output by the Nth information extraction module through the encoder.

In an implementation example, for each of the information extraction modules, calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information includes:

For each of the information extraction modules, convolve the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network, and calculate the spatial position relationship between pixels in the feature map Get spatial location information.

If the feature map generated by the information extraction module is a two-dimensional feature map, the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:

Where a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w _{(i, j)} is the pixel with the coordinate (i, j) in the feature map The weight coefficient of the point; k is the number of channels of the feature map; b is the offset; ⊙ is the Hadaman product.

If the feature map generated by the information extraction module is a three-dimensional feature map, the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:

Wherein, a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w _{(i, j, k)} is the coordinates in the feature map as (i, j, k) the weight coefficient of the pixel; m is the number of channels of the feature map; b is the offset; ⊙ is the Hadaman product.

In an implementation example, the segmenting the image to be segmented according to the feature map containing the spatial position information and outputting the target image includes:

The decoder divides the image to be divided according to the context information, and outputs the target image.

A second aspect of the embodiments of the present invention provides an image segmentation device, including:

The image feature and location information extraction module is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the spatial position information;

The feature fusion module is used to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information;

The image segmentation module is configured to segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.

The third aspect of the embodiment provides a server, including: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the first computer program when the computer program is executed. On the one hand, the image segmentation method.

According to an image segmentation method, device and server provided by embodiments of the present invention, an image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the pixel points in the feature map To obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output the target image. The spatial position information is obtained by calculating the spatial position relationship between pixels in the feature map, and the relative position relationship of pixels in different spatial positions in the feature map can be extracted. After fusing the feature map containing the image information with the calculated spatial location information to obtain the feature map containing the spatial location information, the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map The spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and "same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.

Description of the drawings

In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present invention. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of an image segmentation method provided by Embodiment 1 of the present invention;

2 is a schematic structural diagram of an image segmentation model provided by Embodiment 1 of the present invention;

3 is a schematic flowchart of an image segmentation method provided by Embodiment 2 of the present invention;

4 is a schematic diagram of convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module provided by the second embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an image segmentation device provided by Embodiment 3 of the present invention;

Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are the present invention. Part of the embodiment, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

The term "comprising" in the specification and claims of the present invention and the above-mentioned drawings and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally includes Other steps or units inherent in these processes, methods, products or equipment. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects, rather than describing a specific order.

Example one

As shown in FIG. 1, it is a schematic flowchart of an image segmentation method provided by Embodiment 1 of the present invention. This embodiment can be applied to the application scenario of multi-target segmentation of an image. The method can be executed by an image segmentation device, which can be a server, a smart terminal, a tablet or a PC, etc.; in this embodiment of the application, the image segmentation device is used as The executive subject will explain that the method specifically includes the following steps:

S110: Input the image to be segmented into an image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain spatial position information;

In the existing image segmentation methods, image segmentation can be performed by constructing an image segmentation model including a neural network through deep learning target images. However, the image features extracted after the convolution calculation of the multi-layer convolutional layer in the trained image segmentation model of the image to be segmented often exaggerate the difference between similar objects (inter-class distinction) or the similarity of objects of different categories The issue of intra-class consistency. When the image segmentation model performs target image segmentation on the image to be segmented based on the extracted features, it cannot segment the boundary between "different types of similar features" and "same type difference features", resulting in over-segmentation and under-segmentation in the segmentation process. As a result, it is difficult to accurately segment the boundaries between different target images to be segmented. In order to solve this technical problem, the feature relationship between feature image pixels can be extracted from the different levels of the convolutional neural network of the image segmentation model to overcome the inability to segment the difference between "different types of similar features" and "same types of different features". The problem of borders.

Specifically, the image to be segmented can be segmented through an image segmentation model trained based on multiple target images. After the image to be segmented is input into the image segmentation model, the image feature to be segmented is extracted to generate a feature map, and the spatial position relationship between the pixels in the feature map is calculated to obtain the spatial position information, so as to obtain the pixel points in the feature map in different spaces The relative positional relationship of the position.

In an implementation example, the image segmentation model may adopt a U-shaped neural network (Feature Depth UNet) framework, and an encoder and a decoder form a symmetric structure; the encoder and the decoder are spliced by image channel dimensions. Figure 2 shows the structure diagram of the image segmentation model. The specific process of performing feature extraction on the image to be segmented to generate a feature map, and calculating the spatial position relationship between pixels in the feature map to obtain spatial position information may be: pairing N information extraction modules connected in series in the encoder The image to be segmented is subjected to feature extraction to generate a feature map; the N information extraction modules are set according to preset scale information, N≥1; for each of the information extraction modules, the feature map generated by the information extraction module is calculated The spatial position relationship between the pixels obtains the spatial position information.

Specifically, the encoder includes N information extraction modules connected in series to perform image feature extraction on the input image to be divided to generate a feature map. Among them, the N information extraction modules are set according to the preset size information, so that each information extraction module has different size information. After the image to be segmented is input to the image segmentation model, feature extraction of the image to be segmented through N information extraction modules can generate a feature map containing multi-scale information; and after each information extraction module performs feature extraction of the image to be segmented, the information is calculated The spatial position relationship between pixels in the feature map generated by the extraction module is used to obtain spatial position information. After N information extraction modules corresponding to different scale information calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information, the spatial position information containing multi-scale information can be obtained.

In an implementation example, feature extraction is performed on the image to be divided by N information extraction modules connected in series in the encoder to generate a feature map, and the spatial position relationship between pixels in the feature map generated by each information extraction module is calculated The specific process of obtaining the spatial position information may be: when the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented to generate a feature map, and the space between pixels in the feature map is calculated Position relationship to obtain spatial location information; fusion of the feature map and the spatial location information to generate a new feature map and output it to the next information extraction module, so that the next information extraction module performs processing on the new feature map Feature extraction and spatial position relationship calculation.

Specifically, each information extraction module can include two branches. The first branch is used to extract features from the input image to generate a feature map to extract the pixel value information of the image; the second branch is the same as the first branch. After the feature extraction is performed on the input image to generate the feature map, the spatial position relationship between the pixels in the feature map is also calculated to obtain the spatial position information, so as to realize the extraction of the spatial position relationship information between the pixels. Optionally, the first branch used to extract features from the input image to generate a feature map can be composed of several convolutional layers; the second branch can be composed of several convolutional layers that are the same as the first branch and one feature map depth convolutional layer , In order to realize the feature extraction of the input image to generate a feature map, and then calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information. And the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.

By superimposing several convolutional layers in the second branch before the feature map depth convolutional layer, the field of view is expanded; the feature map obtained by extracting features through several convolutional layers is input into the feature map depth convolutional layer for pixel points in the feature map When calculating the spatial position relationship between the feature maps, each pixel in the depth convolutional layer of the feature map can map a different field of view on the original image. Optionally, in order to reduce the over-fitting phenomenon, a Batch Normalization (batch normalization) layer and L2 regularization can be added after the loss function between the convolutional layers in each information extraction module.

In detail, when the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented through several convolutional layers in the first branch of the first information extraction module to generate a feature map; at the same time, the first information is extracted Several convolution layers in the second branch of the module perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map through the feature map depth convolution layer in the second branch to obtain the space location information. A new feature map is generated by fusing the feature map output by the first branch of the first information extraction module and the spatial position information output by the second branch through the pooling layer, and the newly generated feature map is input to the next information extraction module to The first branch of the next information extraction module performs feature extraction on the feature map of the input module, and the second branch of the next information extraction module performs feature extraction on the feature map of the input module and calculate the spatial position relationship. Until the first branch of the Nth information extraction module performs feature extraction on the feature map of the input module to generate a feature map containing multi-scale information, the second branch of the Nth information extraction module performs feature extraction on the feature map of the input module and calculates the spatial position The relationship obtains spatial location information containing multi-scale information.

S120. Fusion of the feature map and the spatial location information to obtain a feature map containing the spatial location information;

After N information extraction modules connected in series in the encoder perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map generated by each information extraction module to obtain the spatial position information, The Nth information extraction module outputs the finally obtained feature map and spatial position information. The feature map output by the Nth information extraction module and the spatial position information are fused to obtain a feature map to complete the feature fusion.

In an implementation example, since the image segmentation model can be composed of an encoder and a decoder, the decoder needs to perform image segmentation on the image to be segmented according to the context information sent by the encoder. Context information can be generated by fusing the feature map and spatial position information output by the Nth information extraction module through the pooling layer in the encoder.

S130. Segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.

Because the image segmentation model can include an encoder and a decoder, and the encoder and the decoder have a symmetrical structure. The decoder corresponds to the convolutional layer structure in the encoder with a corresponding transposed convolutional layer. And in order to make the neural network retain the shallower information, the encoder and the decoder are connected by skipping. In an implementation example, the image to be segmented is segmented by the decoder according to the context information encoded by the encoder, and the target image is output. Since the context information is generated based on the feature map containing the spatial position information, the decoder can obtain the feature relationship between the pixels of the feature map according to the spatial position relationship between the pixels in the context information, thereby segmenting "different types of similar features" and "same features". The boundary between the "type difference feature" realizes the precise segmentation of the boundary between different objects to be segmented.

An image segmentation method provided by an embodiment of the present invention inputs an image to be segmented into an image segmentation model, extracts features of the image to be segmented to generate a feature map, and calculates the spatial position relationship between pixels in the feature map Obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output a target image. The spatial position information is obtained by calculating the spatial position relationship between the pixels in the feature map, and the relative position relationship of the pixels in different spatial positions in the feature map is extracted. After fusing the feature map containing the image information with the calculated spatial location information to obtain the feature map containing the spatial location information, the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map The spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and "same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.

Example two

FIG. 3 is a schematic flowchart of the image segmentation method provided in the second embodiment of the present invention. On the basis of Embodiment 1, this embodiment also provides a process of calculating the spatial position relationship between pixels in the feature map to obtain spatial position information, thereby further improving the accuracy of image segmentation. The method specifically includes:

S210. Input the image to be segmented into the image segmentation model, and perform feature extraction on the image to be segmented to generate a feature map through N information extraction modules connected in series in the encoder; the N information extraction modules are set according to preset scale information, N ≥1;

The N information extraction modules are set according to the preset size information, so that each information extraction module has different size information. After the image to be segmented is input to the image segmentation model, feature extraction of the image to be segmented through N information extraction modules can generate a feature map containing multi-scale information; and after each information extraction module performs feature extraction of the image to be segmented, the information is calculated The spatial position relationship between pixels in the feature map generated by the extraction module is used to obtain spatial position information. After N information extraction modules corresponding to different scale information calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information, the spatial position information containing multi-scale information can be obtained.

Specifically, each information extraction module can include two branches. The first branch is used to extract features from the input image to generate a feature map; the second branch is to perform feature extraction on the input image in the same way as the first branch. After the feature map is generated, the spatial position relationship between the pixels in the feature map is also calculated to obtain the spatial position information. Optionally, the first branch used to extract features from the input image to generate a feature map can be composed of several convolutional layers; the second branch can be composed of several convolutional layers that are the same as the first branch and one feature map depth convolutional layer , In order to realize the feature extraction of the input image to generate a feature map, and then calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information. And the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.

When the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented through a number of convolutional layers in the first branch of the first information extraction module to generate a feature map; at the same time, a feature map is generated by the first information extraction module. Several convolutional layers in the second branch perform feature extraction on the image to be segmented to generate a feature map, and use the feature map depth convolution layer in the second branch to calculate the spatial position relationship between pixels in the feature map to obtain spatial position information. A new feature map is generated by fusing the feature map output by the first branch of the first information extraction module and the spatial position information output by the second branch through the pooling layer, and the newly generated feature map is input to the next information extraction module to The first branch of the next information extraction module performs feature extraction on the feature map of the input module, and the second branch of the next information extraction module performs feature extraction on the feature map of the input module and calculate the spatial position relationship. Until the first branch of the Nth information extraction module performs feature extraction on the feature map of the input module to generate a feature map containing multi-scale information, the second branch of the Nth information extraction module performs feature extraction on the feature map of the input module and calculates the spatial position The relationship obtains spatial location information containing multi-scale information.

S220. For each information extraction module, convolve the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network, and calculate the space between pixels in the feature map The position relationship obtains the spatial position information;

Specifically, since each information extraction module can include two branches, the second branch can be composed of several convolutional layers that are the same as the first branch and a feature map depth convolutional layer to realize feature extraction and generation of the input image After the feature map, the spatial position relationship between the pixels in the feature map is calculated to obtain the spatial position information. Therefore, for each of the information extraction modules, convolving the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network can be: using the features of the second branch in the information extraction module The map depth convolution layer convolves the feature map in the direction of the feature map obtained by convolution calculation of several convolution layers perpendicular to the second branch, and calculates the spatial position relationship between pixels in the feature map to obtain spatial position information.

In an implementation example, if the feature map generated by the information extraction module, that is, the feature map calculated by convolution of several convolutional layers of the second branch, is a two-dimensional feature map, then the feature map depth volume of the second branch in the information extraction module The formula for multi-layer calculation of the spatial position relationship between pixels in the feature map is:

Where a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w _{(i, j)} is the pixel with the coordinate (i, j) in the feature map The weight coefficient of the point; k is the number of channels of the feature map; b is the offset; ⊙ is the Hadamand product.

Specifically, the feature map depth convolution layer of the second branch in the information extraction module can use a H×W×C convolution kernel, where H×W represents the size of the convolution kernel, C represents the number of convolution kernels, and its value Equal to the number of pixels in the XY plane of the output feature map. Optional. As shown in FIG. 4, it is a schematic diagram of the convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module. In order to calculate the output of the depth convolution of the two-dimensional feature map, first put the H×W convolution kernel at the upper left corner of the feature map, and perform the first convolution operation. Then, slide the convolution kernel along the Z axis, and perform the same convolution operation in the direction perpendicular to the feature map in turn. Finally, the calculation results of the convolution operation of the C convolution kernels are arranged on the XY plane according to the position of the feature map to obtain the spatial position information.

In an implementation example, the feature map generated by the information extraction module, that is, the feature map obtained by convolution calculation of several convolutional layers of the second branch, is a three-dimensional feature map, and then the feature map of the second branch in the information extraction module is used to deep convolutional layer The formula for calculating the spatial position relationship between pixels in the feature map is:

Wherein, a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w _{(i, j, k)} is the coordinates in the feature map as (i, j, k) the weight coefficient of the pixel; m is the number of channels of the feature map; b is the offset; ⊙ is the Hadamand product.

Specifically, the feature map depth convolution layer of the second branch in the information extraction module can use a convolution kernel of H×W×P×C, where H×W×P represents the size of the convolution kernel, and C represents the size of the convolution kernel. The number, whose value is equal to the number of pixels in the XY plane of the output feature map. In order to calculate the output of the depth convolution layer of the 3D feature map, first put the H×W×P convolution kernel at the upper left corner of the feature map, and perform the first 3D convolution operation. Then, slide the convolution kernel along the Z axis, and perform the same three-dimensional convolution operation in the direction perpendicular to the feature map in turn. Finally, the calculation results of the C convolution kernels are arranged on the XY plane according to the position of the feature map to obtain the spatial position information.

S230. Fusion of the feature map and the spatial location information to obtain a feature map containing spatial location information;

Context information is generated by fusing the feature map and spatial position information output by the Nth information extraction module through the pooling layer in the encoder.

S240. Segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.

The image to be divided is segmented by the decoder according to the context information encoded by the encoder, and the target image is output. Since the context information is generated based on the feature map containing the spatial position information, the decoder can obtain the feature relationship between the pixels of the feature map according to the spatial position relationship between the pixels in the context information, thereby segmenting "different types of similar features" and "same features". The boundary between the "type difference feature" realizes the precise segmentation of the boundary between different objects to be segmented.

Example three

As shown in FIG. 5, the image segmentation device provided in the third embodiment of the present invention is shown. On the basis of Embodiment 1 or 2, an embodiment of the present invention also provides an image segmentation 5, and the device includes:

The image feature and location information extraction module 501 is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the space location information;

In an implementation example, the image to be segmented is input into the image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain the spatial position information. The feature and location information extraction module 501 includes:

The image feature extraction unit is configured to perform feature extraction on the image to be segmented to generate a feature map through N information extraction modules connected in series in the encoder; the N information extraction modules are set according to preset scale information, N≥1;

The location information extraction unit is configured to calculate the spatial location relationship between pixels in the feature map generated by the information extraction module for each of the information extraction modules to obtain spatial location information.

In an implementation example, for each of the information extraction modules, when the spatial position relationship between pixels in the feature map generated by the information extraction module is calculated to obtain spatial position information, the position information extraction unit includes:

The position information extraction subunit is used to convolve the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network for each of the information extraction modules to calculate the feature map The spatial position relationship between the pixels obtains the spatial position information.

The feature fusion module 502 is configured to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information;

In an implementation example, when fusing the feature map and the spatial location information to obtain a feature map containing spatial location information, the feature fusion module 502 includes:

The feature fusion unit is used for fusing the feature map and the spatial position information output by the Nth information extraction module through the encoder to generate context information.

The image segmentation module 503 is configured to segment the image to be segmented according to the feature map containing spatial position information, and output a target image.

In an implementation example, the image to be segmented is segmented according to the feature map containing spatial position information, and when the target image is output, the image segmentation module 503 includes:

The image segmentation unit is configured to segment the image to be segmented according to the context information by the decoder, and output a target image.

An image segmentation device provided by an embodiment of the present invention inputs an image to be segmented into an image segmentation model, extracts features of the image to be segmented to generate a feature map, and calculates the spatial position relationship between pixels in the feature map Obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output a target image. The spatial position information is obtained by calculating the spatial position relationship between the pixels in the feature map, and the relative position relationship of the pixels in different spatial positions in the feature map is extracted. After fusing the feature map containing the image information with the calculated spatial location information to obtain the feature map containing the spatial location information, the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map The spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and "same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.

Example four

Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention. The server includes a processor 61, a memory 62, and a computer program 63 stored in the memory 62 and running on the processor 61, such as a program for an image segmentation method. The processor 61 implements the steps in the embodiment of the image segmentation method when the computer program 63 is executed, for example, steps S110 to S130 shown in FIG. 1.

Exemplarily, the computer program 63 may be divided into one or more modules, and the one or more modules are stored in the memory 62 and executed by the processor 61 to complete the application. The one or more modules may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 63 in the server. For example, the computer program 63 can be divided into an image feature and location information extraction module, a feature fusion module, and an image segmentation module, and the specific functions of each module are as follows:

The server may include, but is not limited to, a processor 61, a memory 62, and a computer program 63 stored in the memory 62. Those skilled in the art can understand that FIG. 6 is only an example of a server, and does not constitute a limitation on the server. It may include more or less components than those shown in the figure, or a combination of certain components, or different components, such as the The server may also include input and output devices, network access devices, buses, and so on.

The processor 61 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 62 may be an internal storage unit of the server, such as a hard disk or memory of the server. The memory 62 may also be an external storage device, such as a plug-in hard disk equipped on a server, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), and so on. Further, the storage 62 may also include both an internal storage unit of the server and an external storage device. The memory 62 is used to store the computer program and other programs and data required by the image segmentation method. The memory 62 can also be used to temporarily store data that has been output or will be output.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing various embodiments. The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the protection scope of the present invention.

Claims

An image segmentation method, characterized in that it includes:

Input the image to be divided into an image segmentation model, perform feature extraction on the image to be divided to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain spatial position information;

Fusing the feature map and the spatial location information to obtain a feature map containing the spatial location information;

The image to be divided is segmented according to the feature map containing the spatial position information, and the target image is output.
The image segmentation method according to claim 1, wherein the image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the pixel points in the feature map are calculated. The spatial position relationship of to obtain the spatial position information, including:

Perform feature extraction on the image to be segmented by N information extraction modules connected in series in the encoder to generate a feature map; the N information extraction modules are set according to preset scale information, N≥1;

For each of the information extraction modules, the spatial position relationship between pixels in the feature map generated by the information extraction module is calculated to obtain spatial position information.
The image segmentation method of claim 2, wherein the image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the distance between pixels in the feature map is calculated The spatial position relationship of to obtain the spatial position information, including:

When the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information;

Fusion of the feature map and the spatial position information to generate a new feature map and output to the next information extraction module, so that the next information extraction module performs feature extraction and spatial position relationship calculation on the new feature map .
5. The image segmentation method according to claim 3, wherein said fusing the feature map and the spatial position information to obtain a feature map containing spatial position information comprises:

Context information is generated by fusing the feature map and spatial position information output by the Nth information extraction module through the encoder.
8. The image segmentation method according to claim 3, wherein for each of the information extraction modules, calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information, include:

For each of the information extraction modules, convolve the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network, and calculate the spatial position relationship between pixels in the feature map Get spatial location information.
5. The image segmentation method of claim 5, wherein for each of the information extraction modules, calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information, include:

If the feature map generated by the information extraction module is a two-dimensional feature map, the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:

Where a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w (i, j) is the pixel with the coordinate (i, j) in the feature map The weight coefficient of the point; k is the number of channels of the feature map; b is the offset; ⊙ is the Hadaman product.
5. The image segmentation method of claim 5, wherein for each of the information extraction modules, calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information, include:

If the feature map generated by the information extraction module is a three-dimensional feature map, the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:

Wherein, a is the spatial position information; δ is the activation function; l is the number of convolutional layers of the convolutional neural network; w (i, j, k) is the coordinates in the feature map as (i, j, k) the weight coefficient of the pixel; m is the number of channels of the feature map; b is the offset; ⊙ is the Hadaman product.
5. The image segmentation method according to claim 4, wherein said segmenting said image to be segmented according to said feature map containing spatial position information and outputting a target image comprises:

The decoder divides the image to be divided according to the context information, and outputs the target image.
An image segmentation device, characterized in that it comprises:

The image feature and location information extraction module is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the spatial position information;

The feature fusion module is used to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information;

The image segmentation module is configured to segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.
A server, characterized in that it comprises a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program as described in the right The steps of the image segmentation method described in any one of 1 to 8 are required.