WO2021115061A1 - 图像分割方法、装置及服务器 - Google Patents

图像分割方法、装置及服务器 Download PDF

Info

Publication number
WO2021115061A1
WO2021115061A1 PCT/CN2020/129521 CN2020129521W WO2021115061A1 WO 2021115061 A1 WO2021115061 A1 WO 2021115061A1 CN 2020129521 W CN2020129521 W CN 2020129521W WO 2021115061 A1 WO2021115061 A1 WO 2021115061A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
spatial position
image
information
feature
Prior art date
Application number
PCT/CN2020/129521
Other languages
English (en)
French (fr)
Inventor
廖祥云
孙寅紫
王琼
王平安
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2021115061A1 publication Critical patent/WO2021115061A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to the technical field of image segmentation, in particular to an image segmentation method, device and server.
  • Image segmentation is one of the research hotspots of computer graphics, and it has important applications in the fields of medical disease diagnosis and unmanned driving.
  • U-Net U-shaped neural network algorithm
  • the U-shaped neural network algorithm is composed of an encoder and a decoder, and the encoder and the decoder are connected by splicing in the image channel dimension.
  • the image to be segmented is first extracted through an encoder for image feature extraction.
  • the encoder is composed of multiple convolutional layers, and the convolutional layers are connected by a pooling layer, thereby reducing the dimension of the original image to a certain size.
  • the image output from the encoder is restored to the original image size by the decoder.
  • the decoder is composed of multiple convolutional layers, and the convolutional layers are connected by transposed convolutional layers. Finally, the output image is converted into a probability map using the softmax activation function.
  • the UNet algorithm Compared with traditional image segmentation algorithms, such as threshold segmentation, region segmentation, and edge segmentation, the UNet algorithm has a simple network structure and high accuracy of image segmentation.
  • the current UNet image segmentation algorithm questions will exaggerate the difference between similar objects (inter-class distinction) or the similarity of objects of different categories (intra-class consistency), and it is impossible to separate "different types of similar features" and "same characteristics". The boundary between the “species difference characteristics”. As a result, the boundary between different objects to be segmented cannot be accurately segmented, and the segmentation accuracy of the image is low.
  • the embodiments of the present invention provide an image segmentation method, device, and server to solve the problem that the boundary between different objects to be segmented cannot be accurately segmented.
  • the first aspect of the embodiments of the present invention provides an image segmentation method, including:
  • the image to be divided is segmented according to the feature map containing the spatial position information, and the target image is output.
  • the image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information, include:
  • N information extraction modules Perform feature extraction on the image to be segmented by N information extraction modules connected in series in the encoder to generate a feature map; the N information extraction modules are set according to preset scale information, N ⁇ 1;
  • the spatial position relationship between pixels in the feature map generated by the information extraction module is calculated to obtain spatial position information.
  • the image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information, include:
  • the first information extraction module When the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain spatial position information;
  • the fusing the feature map and the spatial location information to obtain a feature map containing spatial location information includes:
  • Context information is generated by fusing the feature map and spatial position information output by the Nth information extraction module through the encoder.
  • calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information includes:
  • calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information includes:
  • the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:
  • a is the spatial position information
  • is the activation function
  • l is the number of convolutional layers of the convolutional neural network
  • w (i, j) is the pixel with the coordinate (i, j) in the feature map The weight coefficient of the point
  • k is the number of channels of the feature map
  • b is the offset
  • is the Hadaman product.
  • calculating the spatial position relationship between pixels in the feature map generated by the information extraction module to obtain spatial position information includes:
  • the formula for calculating the spatial position relationship between pixels in the feature map generated by the information extraction module is:
  • a is the spatial position information
  • is the activation function
  • l is the number of convolutional layers of the convolutional neural network
  • w (i, j, k) is the coordinates in the feature map as (i, j, k) the weight coefficient of the pixel
  • m is the number of channels of the feature map
  • b is the offset
  • is the Hadaman product.
  • the segmenting the image to be segmented according to the feature map containing the spatial position information and outputting the target image includes:
  • the decoder divides the image to be divided according to the context information, and outputs the target image.
  • a second aspect of the embodiments of the present invention provides an image segmentation device, including:
  • the image feature and location information extraction module is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the spatial position information;
  • the feature fusion module is used to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information
  • the image segmentation module is configured to segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.
  • the third aspect of the embodiment provides a server, including: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the first computer program when the computer program is executed.
  • the image segmentation method On the one hand, the image segmentation method.
  • an image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the pixel points in the feature map To obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output the target image.
  • the spatial position information is obtained by calculating the spatial position relationship between pixels in the feature map, and the relative position relationship of pixels in different spatial positions in the feature map can be extracted.
  • the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map
  • the spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and “same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.
  • FIG. 1 is a schematic flowchart of an image segmentation method provided by Embodiment 1 of the present invention
  • Embodiment 1 of the present invention is a schematic structural diagram of an image segmentation model provided by Embodiment 1 of the present invention
  • Embodiment 3 is a schematic flowchart of an image segmentation method provided by Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module provided by the second embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an image segmentation device provided by Embodiment 3 of the present invention.
  • Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.
  • FIG. 1 it is a schematic flowchart of an image segmentation method provided by Embodiment 1 of the present invention.
  • This embodiment can be applied to the application scenario of multi-target segmentation of an image.
  • the method can be executed by an image segmentation device, which can be a server, a smart terminal, a tablet or a PC, etc.; in this embodiment of the application, the image segmentation device is used as The executive subject will explain that the method specifically includes the following steps:
  • S110 Input the image to be segmented into an image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain spatial position information;
  • image segmentation can be performed by constructing an image segmentation model including a neural network through deep learning target images.
  • image features extracted after the convolution calculation of the multi-layer convolutional layer in the trained image segmentation model of the image to be segmented often exaggerate the difference between similar objects (inter-class distinction) or the similarity of objects of different categories
  • intra-class distinction the difference between similar objects
  • similarity of objects of different categories The issue of intra-class consistency.
  • the image segmentation model performs target image segmentation on the image to be segmented based on the extracted features, it cannot segment the boundary between "different types of similar features" and “same type difference features", resulting in over-segmentation and under-segmentation in the segmentation process.
  • the feature relationship between feature image pixels can be extracted from the different levels of the convolutional neural network of the image segmentation model to overcome the inability to segment the difference between "different types of similar features" and “same types of different features". The problem of borders.
  • the image to be segmented can be segmented through an image segmentation model trained based on multiple target images.
  • the image feature to be segmented is extracted to generate a feature map, and the spatial position relationship between the pixels in the feature map is calculated to obtain the spatial position information, so as to obtain the pixel points in the feature map in different spaces The relative positional relationship of the position.
  • the image segmentation model may adopt a U-shaped neural network (Feature Depth UNet) framework, and an encoder and a decoder form a symmetric structure; the encoder and the decoder are spliced by image channel dimensions.
  • Figure 2 shows the structure diagram of the image segmentation model.
  • the specific process of performing feature extraction on the image to be segmented to generate a feature map, and calculating the spatial position relationship between pixels in the feature map to obtain spatial position information may be: pairing N information extraction modules connected in series in the encoder The image to be segmented is subjected to feature extraction to generate a feature map; the N information extraction modules are set according to preset scale information, N ⁇ 1; for each of the information extraction modules, the feature map generated by the information extraction module is calculated The spatial position relationship between the pixels obtains the spatial position information.
  • the encoder includes N information extraction modules connected in series to perform image feature extraction on the input image to be divided to generate a feature map.
  • the N information extraction modules are set according to the preset size information, so that each information extraction module has different size information.
  • feature extraction of the image to be segmented through N information extraction modules can generate a feature map containing multi-scale information; and after each information extraction module performs feature extraction of the image to be segmented, the information is calculated
  • the spatial position relationship between pixels in the feature map generated by the extraction module is used to obtain spatial position information.
  • N information extraction modules corresponding to different scale information calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information, the spatial position information containing multi-scale information can be obtained.
  • feature extraction is performed on the image to be divided by N information extraction modules connected in series in the encoder to generate a feature map, and the spatial position relationship between pixels in the feature map generated by each information extraction module is calculated
  • the specific process of obtaining the spatial position information may be: when the image to be segmented is input to the first information extraction module, feature extraction is performed on the image to be segmented to generate a feature map, and the space between pixels in the feature map is calculated Position relationship to obtain spatial location information; fusion of the feature map and the spatial location information to generate a new feature map and output it to the next information extraction module, so that the next information extraction module performs processing on the new feature map Feature extraction and spatial position relationship calculation.
  • each information extraction module can include two branches.
  • the first branch is used to extract features from the input image to generate a feature map to extract the pixel value information of the image; the second branch is the same as the first branch.
  • the spatial position relationship between the pixels in the feature map is also calculated to obtain the spatial position information, so as to realize the extraction of the spatial position relationship information between the pixels.
  • the first branch used to extract features from the input image to generate a feature map can be composed of several convolutional layers; the second branch can be composed of several convolutional layers that are the same as the first branch and one feature map depth convolutional layer , In order to realize the feature extraction of the input image to generate a feature map, and then calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information.
  • the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.
  • each pixel in the depth convolutional layer of the feature map can map a different field of view on the original image.
  • a Batch Normalization (batch normalization) layer and L2 regularization can be added after the loss function between the convolutional layers in each information extraction module.
  • the image to be segmented is input to the first information extraction module
  • feature extraction is performed on the image to be segmented through several convolutional layers in the first branch of the first information extraction module to generate a feature map
  • the first information is extracted
  • Several convolution layers in the second branch of the module perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map through the feature map depth convolution layer in the second branch to obtain the space location information.
  • a new feature map is generated by fusing the feature map output by the first branch of the first information extraction module and the spatial position information output by the second branch through the pooling layer, and the newly generated feature map is input to the next information extraction module to
  • the first branch of the next information extraction module performs feature extraction on the feature map of the input module
  • the second branch of the next information extraction module performs feature extraction on the feature map of the input module and calculate the spatial position relationship.
  • the first branch of the Nth information extraction module performs feature extraction on the feature map of the input module to generate a feature map containing multi-scale information
  • the second branch of the Nth information extraction module performs feature extraction on the feature map of the input module and calculates the spatial position
  • the relationship obtains spatial location information containing multi-scale information.
  • N information extraction modules connected in series in the encoder perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map generated by each information extraction module to obtain the spatial position information
  • the Nth information extraction module outputs the finally obtained feature map and spatial position information.
  • the feature map output by the Nth information extraction module and the spatial position information are fused to obtain a feature map to complete the feature fusion.
  • the image segmentation model can be composed of an encoder and a decoder
  • the decoder needs to perform image segmentation on the image to be segmented according to the context information sent by the encoder.
  • Context information can be generated by fusing the feature map and spatial position information output by the Nth information extraction module through the pooling layer in the encoder.
  • the image segmentation model can include an encoder and a decoder, and the encoder and the decoder have a symmetrical structure.
  • the decoder corresponds to the convolutional layer structure in the encoder with a corresponding transposed convolutional layer. And in order to make the neural network retain the shallower information, the encoder and the decoder are connected by skipping.
  • the image to be segmented is segmented by the decoder according to the context information encoded by the encoder, and the target image is output.
  • the decoder can obtain the feature relationship between the pixels of the feature map according to the spatial position relationship between the pixels in the context information, thereby segmenting "different types of similar features" and “same features".
  • the boundary between the "type difference feature” realizes the precise segmentation of the boundary between different objects to be segmented.
  • An image segmentation method inputs an image to be segmented into an image segmentation model, extracts features of the image to be segmented to generate a feature map, and calculates the spatial position relationship between pixels in the feature map Obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output a target image.
  • the spatial position information is obtained by calculating the spatial position relationship between the pixels in the feature map, and the relative position relationship of the pixels in different spatial positions in the feature map is extracted.
  • the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map
  • the spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and “same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.
  • FIG. 3 is a schematic flowchart of the image segmentation method provided in the second embodiment of the present invention.
  • this embodiment also provides a process of calculating the spatial position relationship between pixels in the feature map to obtain spatial position information, thereby further improving the accuracy of image segmentation.
  • the method specifically includes:
  • the N information extraction modules are set according to the preset size information, so that each information extraction module has different size information.
  • feature extraction of the image to be segmented through N information extraction modules can generate a feature map containing multi-scale information; and after each information extraction module performs feature extraction of the image to be segmented, the information is calculated
  • the spatial position relationship between pixels in the feature map generated by the extraction module is used to obtain spatial position information.
  • N information extraction modules corresponding to different scale information calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information, the spatial position information containing multi-scale information can be obtained.
  • each information extraction module can include two branches.
  • the first branch is used to extract features from the input image to generate a feature map; the second branch is to perform feature extraction on the input image in the same way as the first branch. After the feature map is generated, the spatial position relationship between the pixels in the feature map is also calculated to obtain the spatial position information.
  • the first branch used to extract features from the input image to generate a feature map can be composed of several convolutional layers; the second branch can be composed of several convolutional layers that are the same as the first branch and one feature map depth convolutional layer , In order to realize the feature extraction of the input image to generate a feature map, and then calculate the spatial position relationship between the pixels in the feature map to obtain the spatial position information.
  • the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.
  • feature extraction is performed on the image to be segmented through a number of convolutional layers in the first branch of the first information extraction module to generate a feature map; at the same time, a feature map is generated by the first information extraction module.
  • convolutional layers in the second branch perform feature extraction on the image to be segmented to generate a feature map, and use the feature map depth convolution layer in the second branch to calculate the spatial position relationship between pixels in the feature map to obtain spatial position information.
  • a new feature map is generated by fusing the feature map output by the first branch of the first information extraction module and the spatial position information output by the second branch through the pooling layer, and the newly generated feature map is input to the next information extraction module to
  • the first branch of the next information extraction module performs feature extraction on the feature map of the input module
  • the second branch of the next information extraction module performs feature extraction on the feature map of the input module and calculate the spatial position relationship.
  • the first branch of the Nth information extraction module performs feature extraction on the feature map of the input module to generate a feature map containing multi-scale information
  • the second branch of the Nth information extraction module performs feature extraction on the feature map of the input module and calculates the spatial position
  • the relationship obtains spatial location information containing multi-scale information.
  • each information extraction module can include two branches
  • the second branch can be composed of several convolutional layers that are the same as the first branch and a feature map depth convolutional layer to realize feature extraction and generation of the input image
  • the spatial position relationship between the pixels in the feature map is calculated to obtain the spatial position information. Therefore, for each of the information extraction modules, convolving the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network can be: using the features of the second branch in the information extraction module
  • the map depth convolution layer convolves the feature map in the direction of the feature map obtained by convolution calculation of several convolution layers perpendicular to the second branch, and calculates the spatial position relationship between pixels in the feature map to obtain spatial position information.
  • the feature map generated by the information extraction module that is, the feature map calculated by convolution of several convolutional layers of the second branch
  • the feature map depth volume of the second branch in the information extraction module The formula for multi-layer calculation of the spatial position relationship between pixels in the feature map is:
  • a is the spatial position information
  • is the activation function
  • l is the number of convolutional layers of the convolutional neural network
  • w (i, j) is the pixel with the coordinate (i, j) in the feature map The weight coefficient of the point
  • k is the number of channels of the feature map
  • b is the offset
  • is the Hadamand product.
  • the feature map depth convolution layer of the second branch in the information extraction module can use a H ⁇ W ⁇ C convolution kernel, where H ⁇ W represents the size of the convolution kernel, C represents the number of convolution kernels, and its value Equal to the number of pixels in the XY plane of the output feature map.
  • H ⁇ W represents the size of the convolution kernel
  • C represents the number of convolution kernels
  • its value Equal to the number of pixels in the XY plane of the output feature map.
  • FIG. 4 it is a schematic diagram of the convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module. In order to calculate the output of the depth convolution of the two-dimensional feature map, first put the H ⁇ W convolution kernel at the upper left corner of the feature map, and perform the first convolution operation.
  • the feature map generated by the information extraction module that is, the feature map obtained by convolution calculation of several convolutional layers of the second branch, is a three-dimensional feature map, and then the feature map of the second branch in the information extraction module is used to deep convolutional layer
  • the formula for calculating the spatial position relationship between pixels in the feature map is:
  • a is the spatial position information
  • is the activation function
  • l is the number of convolutional layers of the convolutional neural network
  • w (i, j, k) is the coordinates in the feature map as (i, j, k) the weight coefficient of the pixel
  • m is the number of channels of the feature map
  • b is the offset
  • is the Hadamand product.
  • the feature map depth convolution layer of the second branch in the information extraction module can use a convolution kernel of H ⁇ W ⁇ P ⁇ C, where H ⁇ W ⁇ P represents the size of the convolution kernel, and C represents the size of the convolution kernel.
  • the number whose value is equal to the number of pixels in the XY plane of the output feature map.
  • H ⁇ W ⁇ P convolution kernel At the upper left corner of the feature map, and perform the first 3D convolution operation.
  • slide the convolution kernel along the Z axis and perform the same three-dimensional convolution operation in the direction perpendicular to the feature map in turn.
  • the calculation results of the C convolution kernels are arranged on the XY plane according to the position of the feature map to obtain the spatial position information.
  • Context information is generated by fusing the feature map and spatial position information output by the Nth information extraction module through the pooling layer in the encoder.
  • the image to be divided is segmented by the decoder according to the context information encoded by the encoder, and the target image is output. Since the context information is generated based on the feature map containing the spatial position information, the decoder can obtain the feature relationship between the pixels of the feature map according to the spatial position relationship between the pixels in the context information, thereby segmenting "different types of similar features" and "same features". The boundary between the "type difference feature" realizes the precise segmentation of the boundary between different objects to be segmented.
  • the image segmentation device provided in the third embodiment of the present invention is shown.
  • an embodiment of the present invention also provides an image segmentation 5, and the device includes:
  • the image feature and location information extraction module 501 is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the space location information;
  • the image to be segmented is input into the image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and the spatial position relationship between pixels in the feature map is calculated to obtain the spatial position information.
  • the feature and location information extraction module 501 includes:
  • the image feature extraction unit is configured to perform feature extraction on the image to be segmented to generate a feature map through N information extraction modules connected in series in the encoder; the N information extraction modules are set according to preset scale information, N ⁇ 1;
  • the location information extraction unit is configured to calculate the spatial location relationship between pixels in the feature map generated by the information extraction module for each of the information extraction modules to obtain spatial location information.
  • the position information extraction unit includes:
  • the position information extraction subunit is used to convolve the feature map in a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network for each of the information extraction modules to calculate the feature map
  • the spatial position relationship between the pixels obtains the spatial position information.
  • the feature fusion module 502 is configured to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information;
  • the feature fusion module 502 when fusing the feature map and the spatial location information to obtain a feature map containing spatial location information, includes:
  • the feature fusion unit is used for fusing the feature map and the spatial position information output by the Nth information extraction module through the encoder to generate context information.
  • the image segmentation module 503 is configured to segment the image to be segmented according to the feature map containing spatial position information, and output a target image.
  • the image to be segmented is segmented according to the feature map containing spatial position information, and when the target image is output, the image segmentation module 503 includes:
  • the image segmentation unit is configured to segment the image to be segmented according to the context information by the decoder, and output a target image.
  • An image segmentation device inputs an image to be segmented into an image segmentation model, extracts features of the image to be segmented to generate a feature map, and calculates the spatial position relationship between pixels in the feature map Obtain spatial position information; fuse the feature map and the spatial position information to obtain a feature map containing spatial position information; segment the image to be divided according to the feature map containing spatial position information, and output a target image.
  • the spatial position information is obtained by calculating the spatial position relationship between the pixels in the feature map, and the relative position relationship of the pixels in different spatial positions in the feature map is extracted.
  • the segmented image is segmented according to the feature map containing the spatial location information, so that the image segmentation model can be based on the pixel points in the feature map
  • the spatial position relationship between the feature maps obtains the feature relationship between the pixels of the feature map, thereby segmenting the boundary between "different types of similar features" and “same type of difference features" to achieve accurate segmentation of the boundaries between different objects to be segmented, Improve the accuracy of image segmentation.
  • Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.
  • the server includes a processor 61, a memory 62, and a computer program 63 stored in the memory 62 and running on the processor 61, such as a program for an image segmentation method.
  • the processor 61 implements the steps in the embodiment of the image segmentation method when the computer program 63 is executed, for example, steps S110 to S130 shown in FIG. 1.
  • the computer program 63 may be divided into one or more modules, and the one or more modules are stored in the memory 62 and executed by the processor 61 to complete the application.
  • the one or more modules may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 63 in the server.
  • the computer program 63 can be divided into an image feature and location information extraction module, a feature fusion module, and an image segmentation module, and the specific functions of each module are as follows:
  • the image feature and location information extraction module is used to input the image to be segmented into the image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate the spatial position relationship between pixels in the feature map to obtain the spatial position information;
  • the feature fusion module is used to fuse the feature map and the spatial location information to obtain a feature map containing the spatial location information
  • the image segmentation module is configured to segment the image to be segmented according to the feature map containing the spatial position information, and output a target image.
  • the server may include, but is not limited to, a processor 61, a memory 62, and a computer program 63 stored in the memory 62.
  • FIG. 6 is only an example of a server, and does not constitute a limitation on the server. It may include more or less components than those shown in the figure, or a combination of certain components, or different components, such as the
  • the server may also include input and output devices, network access devices, buses, and so on.
  • the processor 61 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 62 may be an internal storage unit of the server, such as a hard disk or memory of the server.
  • the memory 62 may also be an external storage device, such as a plug-in hard disk equipped on a server, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), and so on.
  • the storage 62 may also include both an internal storage unit of the server and an external storage device.
  • the memory 62 is used to store the computer program and other programs and data required by the image segmentation method.
  • the memory 62 can also be used to temporarily store data that has been output or will be output.
  • the disclosed device/terminal device and method may be implemented in other ways.
  • the device/terminal device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

Abstract

一种图像分割方法、装置及服务器,该方法包括:将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息(S110);融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图(S120);根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像(S130)。该方法解决了对于不同待分割目标之间的边界不能精确分割的问题。

Description

图像分割方法、装置及服务器 技术领域
本发明涉及图像分割的技术领域,尤其涉及一种图像分割方法、装置及服务器。
背景技术
图像分割是计算机图形学的研究热点之一,在医疗疾病诊断、无人驾驶等领域有重要应用。目前图像分割的算法有多种方法,其中(U-Net)U型神经网络算法是最为常用的算法之一。U型神经网络算法由编码器和解码器组成,编码器和解码器之间通过在图像通道维度拼接,实现连接。具体来说,待分割图像首先经过编码器进行图像特征的提取,编码器由多个卷积层组成,卷积层之间通过池化层连接,从而将原始图像的维度缩小到一定大小。之后从编码器输出的图像通过解码器恢复到原始图像尺寸,解码器由多个卷积层组成,卷积层之间通过转置卷积层连接。最后将输出的图像使用softmax激活函数转换为概率图。相较于传统的图像分割算法,如阈值分割,区域分割,边缘分割,UNet算法网络结构简单,图像分割的准确度高。然而,目前UNet图像分割算法题会出现夸大同类物体之间的差异性(inter-class distinction)或者不同类别物体的相似性(intra-class consistency),无法分割出“不同种类相似特征”和“同种类差异特征”之间的边界。从而导致对于不同待分割目标之间的边界不能精确分割,图像的分割准确率低。
发明内容
有鉴于此,本发明实施例提供了一种图像分割方法、装置及服务器,以解决对于不同待分割目标之间的边界不能精确分割的问题。
本发明实施例的第一方面提供了一种图像分割方法,包括:
将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
在一个实施示例中,所述将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息,包括:
通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图;所述N个信息提取模块根据预设尺度信息设置,N≥1;
对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。
在一个实施示例中,所述将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息,包括:
所述待分割图像输入第一所述信息提取模块时,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
融合所述特征图和所述空间位置信息生成新的特征图输出至下一所述信息提取模块,以使所述下一信息提取模块对所述新的特征图进行特征提取和空间位置关系计算。
在一个实施示例中,所述融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图,包括:
通过编码器融合第N个信息提取模块输出的特征图和空间位置信息生成上下文信息。
在一个实施示例中,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包括:
对于每一所述信息提取模块,通过卷积神经网络沿垂直于所述信息提取模块生成的特征图方向对所述特征图进行卷积,计算所述特征图中像素点之间的空间位置关系得到空间位置信息。
在一个实施示例中,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包括:
若所述信息提取模块生成的特征图为二维特征图,则计算所述信息提取模块生成的特征图中像素点之间的空间位置关系的公式为:
Figure PCTCN2020129521-appb-000001
其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j) 为所述特征图中坐标为(i,j)的像素点的权重系数;k为所述特征图的通道数;b为偏移量;⊙为哈达曼乘积。
在一个实施示例中,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包括:
若所述信息提取模块生成的特征图为三维特征图,则计算所述信息提取模块生成的特征图中像素点之间的空间位置关系的公式为:
Figure PCTCN2020129521-appb-000002
其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j,k)为所述特征图中坐标为(i,j,k)的像素点的权重系数;m为所述特征图的通道数;b为偏移量;⊙为哈达曼乘积。
在一个实施示例中,所述根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像,包括:
通过解码器根据所述上下文信息对所述待分割图像进行分割,输出目标图像。
本发明实施例的第二方面提供了一种图像分割装置,包括:
图像特征和位置信息提取模块,用于将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
特征融合模块,用于融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
图像分割模块,用于根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
实施例的第三方面提供了一种服务器,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现第一方面中图像分割方法。
本发明实施例提供的一种图像分割方法、装置及服务器,通过将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。通过计算特征图中像素点之间的空间位置关系得到空间位置信息,实现提取特 征图中像素点在不同空间位置的相对位置关系。将包含图像信息的特征图与计算得到的空间位置信息融合得到包含空间位置信息的特征图后,根据包含空间位置信息的特征图对待分割图像进行分割,使得图像分割模型能够根据特征图中像素点之间的空间位置关系得到特征图像素间的特征关系,从而分割出“不同种类相似特征”和“同种类差异特征”之间的边界,实现对不同待分割目标之间的边界的精确分割,提高图像的分割准确率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例一提供的图像分割方法的流程示意图;
图2是本发明实施例一提供的图像分割模型的结构示意图;
图3是本发明实施例二提供的图像分割方法的流程示意图;
图4是本发明实施例二提供的信息提取模块中第二分支的特征图深度卷积层的卷积计算示意图;
图5是本发明实施例三提供的图像分割装置的结构示意图;
图6是本发明实施例四提供的服务器的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含一系列步骤或单元的过程、方法或系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同对象,而非用于描述特定顺序。
实施例一
如图1所示,是本发明实施例一提供的图像分割方法的流程示意图。本实施例可适用于对图像进行多目标分割的应用场景,该方法可以由图像分割装置执行,该装置可为服务器、智能终端、平板或PC等;在本申请实施例中以图像分割装置作为执行主体进行说明,该方法具体包括如下步骤:
S110、将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
现有的图像分割方法中可通过深度学习目标图像构建包含神经网络的图像分割模型来进行图像分割。然而待分割图像经过训练好的图像分割模型中多层卷积层的卷积计算后提取得到的图像特征往往会出现夸大同类物体之间的差异性(inter-class distinction)或者不同类别物体的相似性(intra-class consistency)的问题。使得图像分割模型根据提取到的特征对待分割图像进行目标图像分割时,无法分割出“不同种类相似特征”和“同种类差异特征”之间的边界,造成分割过程中的过分割和欠分割,导致对不同待分割目标图像之间的边界难以精确分割。为解决这一技术问题,可从图像分割模型的卷积神经网络的不同层次中提取特征图像像素间的特征关系,克服无法分割出“不同种类相似特征”和“同种类差异特征”之间的边界的问题。
具体地,可通过根据多个目标图像训练好的图像分割模型对待分割图像进行分割。待分割图像输入该图像分割模型后,对待分割图像进行图像特征提取生成特征图,并计算该特征图中像素点之间的空间位置关系得到空间位置信息,从而获得特征图中像素点在不同空间位置的相对位置关系。
在一个实施示例中,图像分割模型可采用U型神经网络(Feature Depth UNet)框架,由编码器和解码器形成对称结构;编码器和解码器之间通过图像通道维度拼接。如图2所示为图像分割模型的结构示意图。对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息的具体过程可为:通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图;所述N个信息提取模块根据预设尺度信息设置,N≥1;对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。
具体地,编码器中包括串联的N个信息提取模块对输入的待分割图像进行图像特征提取生成特征图。其中,N个信息提取模块根据预设尺寸信息设置,使得每一信息提取模块具有不同的尺寸信息。待分割图像输入该图像分割模型后,通过N个信息提取模块对待分 割图像进行特征提取能够生成包含多尺度信息的特征图;且在每一信息提取模块对待分割图像进行特征提取后,计算该信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。经过N个对应不同尺度信息的信息提取模块计算特征图中像素点之间的空间位置关系得到空间位置信息可得到包含多尺度信息的空间位置信息。
在一个实施示例中,通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图,并计算每一信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息具体过程可为:所述待分割图像输入第一所述信息提取模块时,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;融合所述特征图和所述空间位置信息生成新的特征图输出至下一所述信息提取模块,以使所述下一信息提取模块对所述新的特征图进行特征提取和空间位置关系计算。
具体地,每一信息提取模块中可包括两个分支,第一分支用于对输入的图像进行特征提取生成特征图,实现提取图像的像素值信息;第二分支在以与第一分支同样的方式对输入的图像进行特征提取生成特征图后还计算该特征图中像素点之间的空间位置关系得到空间位置信息,实现提取像素之间的空间位置关系信息。可选的,用于对输入的图像进行特征提取生成特征图的第一分支可由若干卷积层构成;第二分支可由与第一分支相同的若干卷积层和一个特征图深度卷积层构成,以实现对输入的图像进行特征提取生成特征图后计算该特征图中像素点之间的空间位置关系得到空间位置信息。且编码器中N个信息提取模块可通过最大池化层串联起来。
通过将第二分支中的若干卷积层叠加在特征图深度卷积层前,扩大视野域;使得通过若干卷积层提取特征得到的特征图输入特征图深度卷积层进行特征图中像素点之间的空间位置关系计算时,特征图深度卷积层中的每个像素点可以在原图上映射出不同的视野域。可选的,为减小过拟合现象,还可在每一信息提取模块中的卷积层之间加入Batch Normalization(批量归一化)层和在损失函数后面加上L2正则化。
详细地,所述待分割图像输入第一信息提取模块时,通过第一信息提取模块的第一分支中若干卷积层对所述待分割图像进行特征提取生成特征图;同时通过第一信息提取模块中第二分支中若干卷积层对所述待分割图像进行特征提取生成特征图,并通过第二分支中特征图深度卷积层计算该特征图中像素点之间的空间位置关系得到空间位置信息。通过池化层融合第一信息提取模块的第一分支输出的特征图和第二分支输出的空间位置信息生成新的特征图,并将该新生成的特征图输入至下一信息提取模块,以使所述下一信息提取模 块的第一分支对输入模块的特征图进行特征提取,下一信息提取模块的第二分支对输入模块的特征图进行特征提取并计算空间位置关系。直至第N信息提取模块的第一分支对输入模块的特征图进行特征提取生成包含多尺度信息的特征图,第N信息提取模块的第二分支对输入模块的特征图进行特征提取并计算空间位置关系得到包含多尺度信息的空间位置信息。
S120、融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图,并计算每一信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息后,由第N个信息提取模块输出最终得到的特征图和空间位置信息。将第N个信息提取模块输出的特征图和空间位置信息进行融合得到特征图完成特征融合。
在一个实施示例中,由于图像分割模型可由编码器和解码器构成,解码器需根据编码器发送的上下文信息对待分割图像进行图像分割。可通过编码器中池化层融合第N个信息提取模块输出的特征图和空间位置信息生成上下文信息。
S130、根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
由于图像分割模型可包括编码器和解码器,且编码器与解码器为对称结构。解码器对应编码器中的卷积层结构设置有对应的转置卷积层。且为使得神经网络保留较浅层信息,编码器和解码器通过跳跃连接。在一个实施示例中,通过解码器根据编码器编码的上下文信息对所述待分割图像进行分割,输出目标图像。由于上下文信息根据包含空间位置信息的特征图生成,使得解码器能够根据上下文信息中像素点之间的空间位置关系得到特征图像素间的特征关系,从而分割出“不同种类相似特征”和“同种类差异特征”之间的边界,实现对不同待分割目标之间的边界的精确分割。
本发明实施例提供的一种图像分割方法,通过将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。通过计算特征图中像素点之间的空间位置关系得到空间位置信息,实现提取特征图中像素点在不同空间位置的相对位置关系。将包含图像信息的特征图与计算得到的空间位置信息融合得到包含空间位置信息的特征图后,根据包含空间位置信息的特征图对待分割图像进行分割,使得图像分割模型能够根据特征图中像素点之间的空间位置关系得到特征图像素间的特征 关系,从而分割出“不同种类相似特征”和“同种类差异特征”之间的边界,实现对不同待分割目标之间的边界的精确分割,提高图像的分割准确率。
实施例二
如图3所示的是本发明实施例二提供的图像分割方法的流程示意图。在实施例一的基础上,本实施例还提供了计算特征图中像素点之间的空间位置关系得到空间位置信息的过程,从而进一步提高图像分割的准确率。该方法具体包括:
S210、将待分割图像输入图像分割模型,通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图;所述N个信息提取模块根据预设尺度信息设置,N≥1;
N个信息提取模块根据预设尺寸信息设置,使得每一信息提取模块具有不同的尺寸信息。待分割图像输入该图像分割模型后,通过N个信息提取模块对待分割图像进行特征提取能够生成包含多尺度信息的特征图;且在每一信息提取模块对待分割图像进行特征提取后,计算该信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。经过N个对应不同尺度信息的信息提取模块计算特征图中像素点之间的空间位置关系得到空间位置信息可得到包含多尺度信息的空间位置信息。
具体地,每一信息提取模块均可包括两个分支,第一分支用于对输入的图像进行特征提取生成特征图;第二分支在以与第一分支同样的方式对输入的图像进行特征提取生成特征图后还计算该特征图中像素点之间的空间位置关系得到空间位置信息。可选的,用于对输入的图像进行特征提取生成特征图的第一分支可由若干卷积层构成;第二分支可由与第一分支相同的若干卷积层和一个特征图深度卷积层构成,以实现对输入的图像进行特征提取生成特征图后计算该特征图中像素点之间的空间位置关系得到空间位置信息。且编码器中N个信息提取模块可通过最大池化层串联起来。
所述待分割图像输入第一信息提取模块时,通过第一信息提取模块的第一分支中若干卷积层对所述待分割图像进行特征提取生成特征图;同时通过第一信息提取模块中第二分支中若干卷积层对所述待分割图像进行特征提取生成特征图,并通过第二分支中特征图深度卷积层计算该特征图中像素点之间的空间位置关系得到空间位置信息。通过池化层融合第一信息提取模块的第一分支输出的特征图和第二分支输出的空间位置信息生成新的特征图,并将该新生成的特征图输入至下一信息提取模块,以使所述下一信息提取模块的第一分支对输入模块的特征图进行特征提取,下一信息提取模块的第二分支对输入模块的特征 图进行特征提取并计算空间位置关系。直至第N信息提取模块的第一分支对输入模块的特征图进行特征提取生成包含多尺度信息的特征图,第N信息提取模块的第二分支对输入模块的特征图进行特征提取并计算空间位置关系得到包含多尺度信息的空间位置信息。
S220、对于每一所述信息提取模块,通过卷积神经网络沿垂直于所述信息提取模块生成的特征图方向对所述特征图进行卷积,计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
具体地,由于每一信息提取模块均可包括两个分支,第二分支可由与第一分支相同的若干卷积层和一个特征图深度卷积层构成,以实现对输入的图像进行特征提取生成特征图后计算该特征图中像素点之间的空间位置关系得到空间位置信息。因此,对于每一所述信息提取模块,通过卷积神经网络沿垂直于所述信息提取模块生成的特征图方向对所述特征图进行卷积可为:通过信息提取模块中第二分支的特征图深度卷积层沿垂直于第二分支的若干卷积层卷积计算得到的特征图方向对该特征图进行卷积,计算该特征图中像素点之间的空间位置关系得到空间位置信息。
在一个实施示例中,若信息提取模块生成的特征图即第二分支的若干卷积层卷积计算得到的特征图为二维特征图,则通过信息提取模块中第二分支的特征图深度卷积层计算该特征图中像素点之间的空间位置关系的公式为:
Figure PCTCN2020129521-appb-000003
其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j)为所述特征图中坐标为(i,j)的像素点的权重系数;k为所述特征图的通道数;b为偏移量;⊙为哈达曼(Hadamand)乘积。
具体地,信息提取模块中第二分支的特征图深度卷积层可使用H×W×C的卷积核,其中H×W表示卷积核的大小,C表示卷积核的数量,其值等于输出的特征图在XY平面的像素个数。可选的。如图4所示,为信息提取模块中第二分支的特征图深度卷积层的卷积计算示意图。为了计算二维特征图深度卷积的输出,首先把H×W的卷积核放到特征图最左上角的位置,进行首次卷积操作。然后,将这个卷积核沿Z轴方向滑动,依次沿垂直于特征图方向进行同样的卷积操作。最后,将C个卷积核的卷积操作计算结果按照特征图的位置在XY平面上排列起来,得到空间位置信息。
在一个实施示例中,信息提取模块生成的特征图即第二分支的若干卷积层卷积计算得到的特征图为三维特征图,则通过信息提取模块中第二分支的特征图深度卷积层计算该特 征图中像素点之间的空间位置关系的公式为:
Figure PCTCN2020129521-appb-000004
其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j,k)为所述特征图中坐标为(i,j,k)的像素点的权重系数;m为所述特征图的通道数;b为偏移量;⊙为哈达曼(Hadamand)乘积。
具体地,信息提取模块中第二分支的特征图深度卷积层可使用H×W×P×C的卷积核,其中H×W×P表示卷积核的大小,C表示卷积核的数量,其值等于输出的特征图在XY平面的像素个数。为了计算三维特征图的深度卷积层输出,首先把H×W×P的卷积核放到特征图最左上角的位置,进行首次三维卷积操作。然后,将这个卷积核沿Z轴方向滑动,依次沿垂直于特征图方向进行同样的三维卷积操作。最后,将C个卷积核的计算结果按照特征图的位置在XY平面上排列起来,得到空间位置信息。
S230、融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
通过编码器中池化层融合第N个信息提取模块输出的特征图和空间位置信息生成上下文信息。
S240、根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
通过解码器根据编码器编码的上下文信息对所述待分割图像进行分割,输出目标图像。由于上下文信息根据包含空间位置信息的特征图生成,使得解码器能够根据上下文信息中像素点之间的空间位置关系得到特征图像素间的特征关系,从而分割出“不同种类相似特征”和“同种类差异特征”之间的边界,实现对不同待分割目标之间的边界的精确分割。
实施例三
如图5所示的是本发明实施例三提供的图像分割装置。在实施例一或二的基础上,本发明实施例还提供了一种图像分割5,该装置包括:
图像特征和位置信息提取模块501,用于将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
在一个实施示例中,将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息时,图 像特征和位置信息提取模块501包括:
图像特征提取单元,用于通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图;所述N个信息提取模块根据预设尺度信息设置,N≥1;
位置信息提取单元,用于对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。
在一个实施示例中,对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息时,位置信息提取单元包括:
位置信息提取子单元,用于对于每一所述信息提取模块,通过卷积神经网络沿垂直于所述信息提取模块生成的特征图方向对所述特征图进行卷积,计算所述特征图中像素点之间的空间位置关系得到空间位置信息。
特征融合模块502,用于融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
在一个实施示例中,融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图时,特征融合模块502包括:
特征融合单元,用于通过编码器融合第N个信息提取模块输出的特征图和空间位置信息生成上下文信息。
图像分割模块503,用于根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
在一个实施示例中,根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像时,图像分割模块503包括:
图像分割单元,用于通过解码器根据所述上下文信息对所述待分割图像进行分割,输出目标图像。
本发明实施例提供的一种图像分割装置,通过将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。通过计算特征图中像素点之间的空间位置关系得到空间位置信息,实现提取特征图中像素点在不同空间位置的相对位置关系。将包含图像信息的特征图与计算得到的空间位置信息融合得到包含空间位置信息的特征图后,根据包含空间位置信息的特征图对待分割图像进行分割,使得图像分割模型能够根据特征图中像素点之间的空间位置关系得到特征图像素间的特征 关系,从而分割出“不同种类相似特征”和“同种类差异特征”之间的边界,实现对不同待分割目标之间的边界的精确分割,提高图像的分割准确率。
实施例四
图6是本发明实施例四提供的服务器的结构示意图。该服务器包括:处理器61、存储器62以及存储在所述存储器62中并可在所述处理器61上运行的计算机程序63,例如用于图像分割方法的程序。所述处理器61执行所述计算机程序63时实现上述图像分割方法实施例中的步骤,例如图1所示的步骤S110至S130。
示例性的,所述计算机程序63可以被分割成一个或多个模块,所述一个或者多个模块被存储在所述存储器62中,并由所述处理器61执行,以完成本申请。所述一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序63在所述服务器中的执行过程。例如,所述计算机程序63可以被分割成图像特征和位置信息提取模块、特征融合模块和图像分割模块,各模块具体功能如下:
图像特征和位置信息提取模块,用于将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
特征融合模块,用于融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
图像分割模块,用于根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
所述服务器可包括,但不仅限于,处理器61、存储器62以及存储在所述存储器62中的计算机程序63。本领域技术人员可以理解,图6仅仅是服务器的示例,并不构成对服务器的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述服务器还可以包括输入输出设备、网络接入设备、总线等。
所述处理器61可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器62可以是所述服务器的内部存储单元,例如服务器的硬盘或内存。所述存 储器62也可以是外部存储设备,例如服务器上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器62还可以既包括服务器的内部存储单元也包括外部存储设备。所述存储器62用于存储所述计算机程序以及图像分割方法所需的其他程序和数据。所述存储器62还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。 另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种图像分割方法,其特征在于,包括:
    将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
    融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
    根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
  2. 如权利要求1所述的图像分割方法,其特征在于,所述将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息,包括:
    通过编码器中串联的N个信息提取模块对所述待分割图像进行特征提取生成特征图;所述N个信息提取模块根据预设尺度信息设置,N≥1;
    对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息。
  3. 如权利要求2所述的图像分割方法,其特征在于,所述将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息,包括:
    所述待分割图像输入第一所述信息提取模块时,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
    融合所述特征图和所述空间位置信息生成新的特征图输出至下一所述信息提取模块,以使所述下一信息提取模块对所述新的特征图进行特征提取和空间位置关系计算。
  4. 如权利要求3所述的图像分割方法,其特征在于,所述融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图,包括:
    通过编码器融合第N个信息提取模块输出的特征图和空间位置信息生成上下文信息。
  5. 如权利要求3所述的图像分割方法,其特征在于,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包 括:
    对于每一所述信息提取模块,通过卷积神经网络沿垂直于所述信息提取模块生成的特征图方向对所述特征图进行卷积,计算所述特征图中像素点之间的空间位置关系得到空间位置信息。
  6. 如权利要求5所述的图像分割方法,其特征在于,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包括:
    若所述信息提取模块生成的特征图为二维特征图,则计算所述信息提取模块生成的特征图中像素点之间的空间位置关系的公式为:
    Figure PCTCN2020129521-appb-100001
    其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j)为所述特征图中坐标为(i,j)的像素点的权重系数;k为所述特征图的通道数;b为偏移量;⊙为哈达曼乘积。
  7. 如权利要求5所述的图像分割方法,其特征在于,所述对于每一所述信息提取模块,计算所述信息提取模块生成的特征图中像素点之间的空间位置关系得到空间位置信息,包括:
    若所述信息提取模块生成的特征图为三维特征图,则计算所述信息提取模块生成的特征图中像素点之间的空间位置关系的公式为:
    Figure PCTCN2020129521-appb-100002
    其中,a为所述空间位置信息;δ为激活函数;l为所述卷积神经网络的卷积层数;w (i,j,k)为所述特征图中坐标为(i,j,k)的像素点的权重系数;m为所述特征图的通道数;b为偏移量;⊙为哈达曼乘积。
  8. 如权利要求4所述的图像分割方法,其特征在于,所述根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像,包括:
    通过解码器根据所述上下文信息对所述待分割图像进行分割,输出目标图像。
  9. 一种图像分割装置,其特征在于,包括:
    图像特征和位置信息提取模块,用于将待分割图像输入图像分割模型,对所述待分割图像进行特征提取生成特征图,并计算所述特征图中像素点之间的空间位置关系得到空间位置信息;
    特征融合模块,用于融合所述特征图和所述空间位置信息得到包含空间位置信息的特征图;
    图像分割模块,用于根据所述包含空间位置信息的特征图对所述待分割图像进行分割,输出目标图像。
  10. 一种服务器,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至8任一项所述图像分割方法的步骤。
PCT/CN2020/129521 2019-12-11 2020-11-17 图像分割方法、装置及服务器 WO2021115061A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911266841.6A CN111145196A (zh) 2019-12-11 2019-12-11 图像分割方法、装置及服务器
CN201911266841.6 2019-12-11

Publications (1)

Publication Number Publication Date
WO2021115061A1 true WO2021115061A1 (zh) 2021-06-17

Family

ID=70518054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129521 WO2021115061A1 (zh) 2019-12-11 2020-11-17 图像分割方法、装置及服务器

Country Status (2)

Country Link
CN (1) CN111145196A (zh)
WO (1) WO2021115061A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610754A (zh) * 2021-06-28 2021-11-05 浙江文谷科技有限公司 一种基于Transformer的缺陷检测方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145196A (zh) * 2019-12-11 2020-05-12 中国科学院深圳先进技术研究院 图像分割方法、装置及服务器
CN112363844B (zh) * 2021-01-12 2021-04-09 之江实验室 一种面向图像处理的卷积神经网络垂直分割方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109087318A (zh) * 2018-07-26 2018-12-25 东北大学 一种基于优化U-net网络模型的MRI脑肿瘤图像分割方法
CN109461157A (zh) * 2018-10-19 2019-03-12 苏州大学 基于多级特征融合及高斯条件随机场的图像语义分割方法
CN110163875A (zh) * 2019-05-23 2019-08-23 南京信息工程大学 一种基于调制网络和特征注意金字塔的半监督视频目标分割方法
US20190311223A1 (en) * 2017-03-13 2019-10-10 Beijing Sensetime Technology Development Co., Ltd. Image processing methods and apparatus, and electronic devices
CN110428428A (zh) * 2019-07-26 2019-11-08 长沙理工大学 一种图像语义分割方法、电子设备和可读存储介质
CN111145196A (zh) * 2019-12-11 2020-05-12 中国科学院深圳先进技术研究院 图像分割方法、装置及服务器

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110838124B (zh) * 2017-09-12 2021-06-18 深圳科亚医疗科技有限公司 用于分割具有稀疏分布的对象的图像的方法、系统和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190311223A1 (en) * 2017-03-13 2019-10-10 Beijing Sensetime Technology Development Co., Ltd. Image processing methods and apparatus, and electronic devices
CN109087318A (zh) * 2018-07-26 2018-12-25 东北大学 一种基于优化U-net网络模型的MRI脑肿瘤图像分割方法
CN109461157A (zh) * 2018-10-19 2019-03-12 苏州大学 基于多级特征融合及高斯条件随机场的图像语义分割方法
CN110163875A (zh) * 2019-05-23 2019-08-23 南京信息工程大学 一种基于调制网络和特征注意金字塔的半监督视频目标分割方法
CN110428428A (zh) * 2019-07-26 2019-11-08 长沙理工大学 一种图像语义分割方法、电子设备和可读存储介质
CN111145196A (zh) * 2019-12-11 2020-05-12 中国科学院深圳先进技术研究院 图像分割方法、装置及服务器

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610754A (zh) * 2021-06-28 2021-11-05 浙江文谷科技有限公司 一种基于Transformer的缺陷检测方法及系统
CN113610754B (zh) * 2021-06-28 2024-05-07 浙江文谷科技有限公司 一种基于Transformer的缺陷检测方法及系统

Also Published As

Publication number Publication date
CN111145196A (zh) 2020-05-12

Similar Documents

Publication Publication Date Title
WO2021115061A1 (zh) 图像分割方法、装置及服务器
WO2020119527A1 (zh) 人体动作识别方法、装置、终端设备及存储介质
CN109960742B (zh) 局部信息的搜索方法及装置
US10726580B2 (en) Method and device for calibration
EP4027299A2 (en) Method and apparatus for generating depth map, and storage medium
US20210272306A1 (en) Method for training image depth estimation model and method for processing image depth information
CN110832501A (zh) 用于姿态不变面部对准的系统和方法
CN107730514B (zh) 场景分割网络训练方法、装置、计算设备及存储介质
EP3803803A1 (en) Lighting estimation
CN113409382A (zh) 车辆损伤区域的测量方法和装置
CN111640180B (zh) 一种三维重建方法、装置及终端设备
WO2022134464A1 (zh) 目标检测定位置信度确定方法、装置、电子设备及存储介质
CN111967467A (zh) 图像目标检测方法、装置、电子设备和计算机可读介质
CN112336342A (zh) 手部关键点检测方法、装置及终端设备
CN114219855A (zh) 点云法向量的估计方法、装置、计算机设备和存储介质
CN113592015A (zh) 定位以及训练特征匹配网络的方法和装置
EP4075381A1 (en) Image processing method and system
CN111161348A (zh) 一种基于单目相机的物体位姿估计方法、装置及设备
WO2019109410A1 (zh) 用于分割 mri 图像中异常信号区的全卷积网络模型训练方法
CN110288691B (zh) 渲染图像的方法、装置、电子设备和计算机可读存储介质
CN111368860B (zh) 重定位方法及终端设备
Geng et al. SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation
US20230048643A1 (en) High-Precision Map Construction Method, Apparatus and Electronic Device
EP4086853A2 (en) Method and apparatus for generating object model, electronic device and storage medium
CN114494782B (zh) 图像处理方法、模型训练方法、相关装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900307

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900307

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20900307

Country of ref document: EP

Kind code of ref document: A1