WO2021128896A1 - 一种用于分割图像的神经网络模型及其图像分割方法 - Google Patents

一种用于分割图像的神经网络模型及其图像分割方法 Download PDF

Info

Publication number
WO2021128896A1
WO2021128896A1 PCT/CN2020/110983 CN2020110983W WO2021128896A1 WO 2021128896 A1 WO2021128896 A1 WO 2021128896A1 CN 2020110983 W CN2020110983 W CN 2020110983W WO 2021128896 A1 WO2021128896 A1 WO 2021128896A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
intelligent selection
image segmentation
module
output
Prior art date
Application number
PCT/CN2020/110983
Other languages
English (en)
French (fr)
Inventor
王立
郭振华
赵雅倩
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to EP20907894.8A priority Critical patent/EP4053739A4/en
Publication of WO2021128896A1 publication Critical patent/WO2021128896A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This application relates to the field of computer vision technology, and in particular to a neural network model, image segmentation method, device and readable storage medium for image segmentation.
  • image segmentation technology is an important research direction in the field of computer vision and an important part of image semantic understanding.
  • Image segmentation refers to the process of dividing an image into several regions with similar properties.
  • image segmentation technology has developed by leaps and bounds.
  • the technology has been widely used in industries such as unmanned driving, augmented reality, and security monitoring.
  • the purpose of this application is to provide a neural network model, image segmentation method, device, and readable storage medium for image segmentation, so as to solve the large amount of calculation and model parameters of the current neural network model for image segmentation.
  • Can’t run on mobile phones and other mobile device terminals.
  • the specific plan is as follows:
  • this application provides a neural network model for image segmentation, including: a feature extraction module, an intelligent selection module, an up-sampling module, and a classification module, wherein the intelligent selection module includes a feature extraction unit, a normalization Unit, intelligent selection unit, output unit;
  • the feature extraction module is used to extract the original feature map of the target image
  • the feature extraction unit is configured to perform feature extraction on the original feature map of the target image by using multiple hole convolutions of different scales, and perform a splicing operation on the multiple feature maps obtained by the extraction to obtain the feature extraction unit Output feature map;
  • the normalization unit is configured to normalize and non-linearly map the output feature map of the feature extraction unit to obtain the output feature map of the normalization unit;
  • the intelligent selection unit is used to determine the first weight value of each channel in the output feature map of the normalization unit, and the first weight value is used to characterize the contribution of the channel to the accuracy of image segmentation;
  • the output feature map of the normalization unit filters out the preset number of target channels with the largest first weight value; the target channel is weighted according to the first weight value to obtain the output feature map of the intelligent selection unit ;
  • the output unit is configured to add the output feature map of the intelligent selection unit and the original feature map of the target image to obtain the target feature map;
  • the up-sampling module is used for up-sampling the target feature map
  • the classification module is used to generate an image segmentation result of the target image according to the feature map obtained by upsampling.
  • the intelligent selection unit is specifically configured to: filter out the preset number of target channels with the largest first weight value from the output feature map of the normalization unit; The first weight value of the target channel is linearly weighted to obtain the second weight value of the target channel; the target channel is weighted according to the second weight value to obtain the output characteristic map of the intelligent selection unit.
  • the intelligent selection module includes a down-sampling type intelligent selection module, and the intelligent selection unit of the down-sampling type feature retention module is used to filter the first weight value from the output feature map of the normalization unit The largest first preset number of target channels, where the first preset number is a number determined according to the convolution step size and the number of channels of the original feature map of the target image.
  • the intelligent selection module includes a feature-maintaining intelligent selection module, and the intelligent selection unit of the feature-maintaining intelligent selection module is used to filter the first weight value from the output feature map of the normalization unit The largest second preset number of target channels, where the second preset number is equal to the number of channels of the original feature map of the target image.
  • the intelligent selection unit is specifically configured to: sequentially perform an average pooling operation, a full connection operation, a nonlinear mapping operation, a full connection operation, and a normalization operation on the output feature map of the normalization unit to obtain The first weight value of each channel in the output feature map of the normalization unit.
  • it comprises a plurality of said intelligent selection modules connected in series.
  • the feature extraction module includes a first feature extraction module and a second feature extraction module
  • the intelligent selection module includes a first intelligent selection module and a second intelligent selection module
  • the first feature extraction module is connected in series with a first preset number of the first intelligent selection modules connected in series, and the second feature extraction module and the second intelligent selection module are output to a second preset through the connection module A number of second intelligent selection modules connected in series.
  • this application provides an image segmentation method, which is implemented based on the above-mentioned neural network model for image segmentation, including:
  • the target image is input into the neural network model to obtain an image segmentation result.
  • this application provides an image segmentation device, including:
  • Memory used to store computer programs
  • Processor used to execute the computer program to realize an image segmentation method as described above.
  • the present application provides a readable storage medium, where the readable storage medium is used to store a computer program, and when the computer program is executed by a processor, it is used to implement an image segmentation method as described above.
  • a neural network model for image segmentation includes: a feature extraction module, an intelligent selection module, an up-sampling module, and a classification module.
  • the intelligent selection module includes a feature extraction unit, a normalization unit, and an intelligent selection unit , Output unit.
  • the feature extraction unit of the intelligent selection module adopts multi-scale hole convolution, the information of different scales of the input feature map is obtained, and the feature information of multiple scales is spliced together to provide a large amount of rich feature information for subsequent feature screening; in addition, , The smart selection unit of the smart selection module trains a weight value that characterizes the contribution of the channel to the accuracy of image segmentation, and according to the weight value, discards the channel with the smaller weight value and leaves the channel with the larger weight value to achieve Intelligently select the input feature map channel. It can be seen that the intelligent selection module can fuse multi-scale information and extract rich features. It can also reduce the amount of parameters and calculations while ensuring the accuracy of segmentation by extracting effective channels.
  • the neural network model of the present application can quickly extract the effective features of the image by using the above-mentioned intelligent selection module, and has a small amount of calculation and few model parameters. It is a lightweight neural network for segmentation of images suitable for mobile terminals. Network model.
  • this application also provides an image segmentation method, device and readable storage medium, the technical effect of which corresponds to the technical effect of the aforementioned neural network model, and will not be repeated here.
  • FIG. 1 is a schematic structural diagram of Embodiment 1 of a neural network model for image segmentation provided by this application;
  • Embodiment 2 is a schematic structural diagram of an intelligent selection module in Embodiment 1 of a neural network model for image segmentation provided by this application;
  • FIG. 3 is a schematic structural diagram of Embodiment 2 of a neural network model for image segmentation provided by this application; FIG. 1;
  • Embodiment 4 is a schematic structural diagram of Embodiment 2 of a neural network model for image segmentation provided by this application.
  • the convolutional neural network model is used in computer vision related processing software, and the mainstream convolutional neural network model has a large amount of calculation and the size of the model itself, which is hundreds of megabytes, so it is difficult to deploy to the hardware.
  • the mainstream convolutional neural network model has a large amount of calculation and the size of the model itself, which is hundreds of megabytes, so it is difficult to deploy to the hardware.
  • On embedded systems with limited resources In addition, for large-scale convolutional neural networks, running large-scale convolutional neural networks requires large memory bandwidth to read and calculate, and requires many floating-point operations, and because large networks cannot be completely placed in DRAM, they need more Multiple DRAM accesses require a lot of power.
  • APP software is downloaded and updated through the application store. For these applications, the size of the software itself is very important, because if the size of the application software itself is too large, the download is too slow, it will affect the user experience, and many users may not Will download or update these software.
  • this application provides a neural network model, image segmentation method, device, and readable storage medium for image segmentation.
  • the intelligent selection module in the neural network model fuses multi-scale information to obtain rich features. And extract the effective channel, while ensuring the accuracy of segmentation, effectively reducing the amount of parameters and calculations. Therefore, the neural network model of the present application can quickly extract the effective features of the image, and has a small amount of calculation and few model parameters. It is a lightweight neural network model suitable for mobile terminals.
  • the first embodiment includes: a feature extraction module, an intelligent selection module, an up-sampling module, and a classification module, wherein the intelligent selection
  • the module includes a feature extraction unit, a normalization unit, an intelligent selection unit, and an output unit;
  • the feature extraction module is used to extract the original feature map of the target image
  • the feature extraction unit is configured to perform feature extraction on the original feature map of the target image by using multiple hole convolutions of different scales, and perform a splicing operation on the multiple feature maps obtained by the extraction to obtain the feature extraction unit Output feature map;
  • the normalization unit is configured to normalize and non-linearly map the output feature map of the feature extraction unit to obtain the output feature map of the normalization unit;
  • the intelligent selection unit is used to determine the first weight value of each channel in the output feature map of the normalization unit, and the first weight value is used to characterize the contribution of the channel to the accuracy of image segmentation;
  • the output feature map of the normalization unit filters out the preset number of target channels with the largest first weight value; the target channel is weighted according to the first weight value to obtain the output feature map of the intelligent selection unit ;
  • the output unit is configured to add the output feature map of the intelligent selection unit and the original feature map of the target image to obtain the target feature map;
  • the up-sampling module is used for up-sampling the target feature map
  • the classification module is used to generate an image segmentation result of the target image according to the feature map obtained by upsampling.
  • Wise Select Block is one of the cores of this application, the following is a detailed introduction to the Wise Select Block.
  • the structure in the dashed box labeled Wise Select Block is the detailed structure of the intelligent selection module. Each row is called a layer, and there are 14 layers in total.
  • the structure in the dashed box labeled feature extract module is the above-mentioned feature extraction unit
  • the structure in the dashed box labeled Wise Select module is the above-mentioned intelligent selection unit
  • the structure between the feature extraction unit and the intelligent selection unit is the above-mentioned normalization Unit
  • the structure after the intelligent selection unit is the above-mentioned output unit.
  • the scale of the input feature map of the smart selection module is H ⁇ W ⁇ In_Channel
  • H and W represent height and width
  • In_channel represents the number of channels of the input feature map.
  • the first layer of the intelligent selection module is the ConvBnPrelu layer, which represents the integration of Conv (convolutional layer) + Bn (batchNorm layer) + Prelu layer or (relu layer).
  • the output dimension of the ConvBnPrelu layer is: (H ⁇ W ⁇ layer1_kernel).
  • the second layer is a network composed of multiple convolutional layers, including n convolutional layers.
  • each layer is named: Layer2_kernel_1,...,layer2_kernel_n.
  • This layer of convolution has two characteristics, one is The convolution kernel uses hole convolution, and the other is to reduce the amount of parameters.
  • group convolution is used. The use of different scales of hole convolution can improve the perception of feature maps of different scales and improve the classification accuracy, and the use of group convolution can reduce the amount of parameters and calculations, thereby ensuring that the model can run on mobile devices.
  • the third layer is the concat layer (concatenate), that is, the connection layer, which is used to splice the output results of the previous convolutional layers together. Therefore, the size of the input feature map is H ⁇ W ⁇ layer2_kernel_1, H ⁇ W ⁇ layer2_kernel_2,..., H ⁇ W ⁇ layer2_kernel_n, and the size of the output feature map is H ⁇ W ⁇ (layer2_kernel_1+layer2_kernel_2,...,+layer2_kernel_n), In other words, the width and height of the input feature map remain unchanged, and the number of channels in the feature map is added.
  • the concat layer concatenate
  • the fourth layer is the bn layer (batchnorm), which is used to normalize the input feature map.
  • the fifth layer is the prelu layer, which is used for non-linear mapping of the input feature map.
  • the scale of the feature map is H ⁇ W ⁇ layer3, that is to say, the bn layer and the prelu layer do not change the scale of the feature map.
  • the purpose of this embodiment in the dropChannel layer is to filter these channels. Remove low-value channels, and retain channels that are conducive to segmentation. Specifically, a weight is assigned to each channel, so that the channel that is more conducive to segmentation can obtain a larger weight value, and the layer that is not conducive to improving the segmentation accuracy can obtain a smaller weight value.
  • the dropChannel layer will be described in detail below.
  • the 6th layer is mainly used to transfer the feature map, which is used to connect the 5th and 7th layers, and to connect the 5th and 13th layers.
  • the seventh layer is the avepool layer, which is identified as the adaptive_avepool layer in the figure.
  • the purpose is to average pool the data of each channel of the input feature map.
  • the input is H ⁇ W ⁇ layer3, after average pooling, the output is 1 ⁇ layer3, and each H ⁇ W feature map is averaged through avepooling Get a number. Therefore, the output of this layer is layer3 ⁇ 1, which is a vector.
  • the eighth layer is a fully connected layer, which is used to linearly map the input layer3 ⁇ 1. Because nn.liner is used to implement the full connection function in pytorch, it is named nn.liner in this embodiment. Of course, other names can also be used, such as full connection layer.
  • the output of this layer is fc1. In practical applications, the dimension of the output feature can be changed by changing the number of neurons in the fully connected layer.
  • the 9th layer is the relu layer, which is used to perform nonlinear mapping on the input feature map, and the output of this layer is fc1.
  • the 11th layer is the sigmoid layer, which normalizes the input feature map so that the dimension of the input feature map is normalized to [0,1], and the output of this layer is fc2.
  • the 12th layer is the wiseSelect layer.
  • the purpose of this layer is to sort and select the input feature maps.
  • the specific implementation is: sort the output of the 11th layer from large to small, record the value of each element in fc2 and its location, The position represents the layer, and the output dimension of this layer is fc2 ⁇ 2.
  • the first column represents the weight value, and the second column represents the position of the weight value.
  • Table 1 the input feature weight value size and position order are obtained by sorting:
  • the value of fc2 is very large (the reason is that the layer connection is carried out in the third layer, which makes the number of channels of the feature map soar, which is not conducive to deploying the model to mobile devices).
  • the 12th layer output The weight value order and position information of each channel are used to facilitate subsequent screening of channels.
  • the 13th layer is the wise_Multiply layer, the purpose of this layer is to intelligently output feature maps that are conducive to segmentation.
  • This layer has 2 inputs, one is the output feature map of the 5th layer, the dimension of the feature map is H ⁇ W ⁇ layer3, and the other is the output of the 12th layer, the dimension is fc2 ⁇ 2.
  • the principle of this layer will be described in detail below, and will not be introduced here.
  • the 14th layer is the add layer, which is used to add the output feature maps of the 13th and 6th layers, and the final feature map is the output feature map of the intelligent selection module.
  • the number of channels of the output feature map of the 13th layer can be set according to actual needs. It can be divided into two types: the number of channels of the output feature map is equal to the number of channels of the input feature map of the intelligent selection module, and, The number of channels of the output characteristic map is not equal to the number of channels of the input characteristic map of the intelligent selection module.
  • smart selection modules can be divided into the following two types according to network requirements: down-sampling smart selection modules, and feature-maintaining smart selection modules.
  • the number of channels of the output feature map of the down-sampling intelligent selection module is greater than the number of channels of the input feature map
  • the number of channels of the output feature map of the feature-maintaining intelligent selection module is equal to the number of channels of the input feature map.
  • the 13th layer of the intelligent selection module retains the k channels with a larger weight value in fc2 according to the sorting result of fc2, and discards the remaining layers to realize the streamlining of the channels.
  • the operating principle is: through the neural network training results, select channels that are more conducive to segmentation, and keep them, and discard the unfavorable channels, so as to reduce the amount of parameters and the amount of calculation.
  • the specific steps are as follows:
  • the dimension of weight_valuable is k ⁇ 1
  • the dimension of channel_valuable is H ⁇ W ⁇ k.
  • the multiplication is weight_valuable multiplied by H ⁇
  • weight_valuable is the output result of the 11th layer, and the 11th layer is the softmax layer. After this layer, the value of weight_valuable is between [0,1].
  • This embodiment uses these valuable weight values to give the 5th layer
  • the output feature map feature_in is assigned the corresponding weight.
  • this embodiment introduces a new value G here and named it a global weight. If the value is positive, the weight_valuable can be linearly enlarged or reduced, that is, the above The linear weighting operation is used to convert the first weight value into the second weight value.
  • the overall weighted value is obtained through neural network training. Through this value, the value of weight_valuable can be distributed in the range of [0,1], which can be greater than 1, or less than 1, but must be greater than 0.
  • the dimension of the output feature map of the 13th layer is H ⁇ W ⁇ k, which is named wise_channel in this embodiment.
  • This embodiment provides a neural network model for image segmentation, including a feature extraction module, an intelligent selection module, an up-sampling module, and a classification module.
  • the intelligent selection module includes a feature extraction unit, a normalization unit, an intelligent selection unit, Output unit.
  • the neural network model has at least the following advantages:
  • the intelligent selection module uses different scale hole convolutions to fuse feature map information of different scales to obtain richer feature maps and provide a large amount of feature information for subsequent feature screening;
  • the intelligent selection module intelligently selects the input feature map channels, and trains a weight value to indicate which input feature map channels are more conducive to future segmentation. At the same time, according to the weight value Size, discard the channel with the smaller weight value, and leave the channel with the larger weight value. While ensuring the accuracy of segmentation, reduce the amount of parameters and the amount of calculation;
  • the intelligent selection module also trains the overall weight value, so that the weight value of each channel is linearly weighted again, and the expression range of the weight value of each channel is increased or suppressed, so that it is not limited to the range of [0,1].
  • the intelligent selection module in the neural network model used to segment images, the accuracy of image segmentation can be maintained when the model parameters are small, and a lightweight neural network model that can be applied to mobile devices is realized. To segment the image.
  • the following describes in detail the second embodiment of a neural network model for image segmentation provided by this application.
  • the foregoing first embodiment is based on the general structure of the image segmentation model and gives a detailed introduction to the intelligent selection module.
  • the second embodiment will Taking a specific network structure as an example, a specific image segmentation model using an intelligent selection module is introduced.
  • the feature extraction module in the second embodiment includes a first feature extraction module and a second feature extraction module
  • the intelligent selection module includes a first intelligent selection module and a second intelligent selection module; wherein the first feature extraction module and the first A preset number of first smart selection modules connected in series are connected in series, and the second feature extraction module and the second smart selection module are output to a second preset number of second smart selection modules connected in series through the connection module.
  • the second intelligent selection module is subsequently connected to the up-sampling module and the classification module in sequence.
  • each smart selection module is just for the convenience of describing the structure of the entire neural network model.
  • the structure of each smart selection module is the same (as implemented The description of the smart selection module in Example 1), but the parameters of each smart selection module may be different, and the specific parameter settings will be described in detail below.
  • FIG. 4 The detailed structure of the neural network model of the second embodiment is shown in FIG. 4, and each network structure will be described separately in the order of the circle marks in FIG. 4 below.
  • the network input is shown as circle 0, and the input dimension is H ⁇ W ⁇ 3. As a specific implementation, it can be set to 768 ⁇ 768 ⁇ 3 in this embodiment.
  • ConvBnPRelu represents the network composed of Conv (convolutional layer), BatchNormal (Bn) layer and Relu layer.
  • ConvBnPRelu (3,32,3,2,1) represents the input is 3 channels, The output is 32 channels, the convolution kernel adopts a 3 ⁇ 3 scale, stride is set to 2, and padding is set to a network formed by a convolutional layer.
  • the width and height of the feature map are reduced by half, and the output of this layer is 384 ⁇ 384 ⁇ 32; after the second ConvBnPRelu layer, the scale of the feature map remains unchanged, and the output of this layer is 384 ⁇ 384 ⁇ 32 ; After the third ConvBnPRelu layer, the scale of the feature map remains unchanged, and the output of this layer is 384 ⁇ 384 ⁇ 32.
  • the output dimension is 384 ⁇ 384 ⁇ 35.
  • the intelligent selection module obtains more features through expansion convolution operation and multi-branch channel fusion, and intelligently selects the optimal channel to reduce the amount of parameters and calculations. This module can extract more streamlined and excellent features.
  • the stride scale of the first convolutional layer of the smart selection module shown in circle 6 is 2, so the width and height scale of the feature map will be reduced to half of the original.
  • the dimension of the output feature map of circle 6 is 192 ⁇ 192 ⁇ 64, where 64 represents the number of output channels.
  • the stride scale of the first convolutional layer in the intelligent selection module is 1, and the feature map scale is not changed.
  • the dimension of the output feature map of circle 7 is 192 ⁇ 192 ⁇ 64.
  • the smart selection module of circle 8 is the same as circle 7. By connecting the smart selection module in series, the effective segmentation feature is extracted through the superimposed layer.
  • the dimension of the output feature map of circle 8 is 192 ⁇ 192 ⁇ 64.
  • the smart selection module in circle 9 can increase the number of channels. By specifying the number of selected channels, the number of output channels can be changed. In this embodiment, the number of output channels is set to 128, and the dimension of the output feature map after circle 9 is 192 ⁇ 192 ⁇ 128.
  • Circle 10 is the concat module. The function of the circle 10 module and the circle 4 is the same. The circle 10 superimposes the output results of the circle 9 and the circle 3 according to the channel dimension. The output dimension of circle 10 is 192 ⁇ 192 ⁇ (128+3).
  • Circle 11 is the BN+Prelu module, through which the feature map dimension is: 192 ⁇ 192 ⁇ (128+3).
  • Circle 12 represents the superposition of multiple wise select block modules. Similar to the smart selection module mentioned above, the first smart selection module reduces the width and height of the feature map by setting the convolution layer stride (convolution step length). The last intelligent selection module changes the number of channels of the last output feature map, and the other intelligent selection modules do not change the dimension of the input feature map. After passing circle 12, the dimension of the output feature map becomes 96 ⁇ 96 ⁇ 256.
  • the circle 13 is the BN+Prelu module, through which the feature map dimension is 96 ⁇ 96 ⁇ 256.
  • Circle 14 is the dropout2d module, which is the dropout layer. After passing this layer, the dimension of the feature map is 96 ⁇ 96 ⁇ 256.
  • Circle 15 is the convolutional layer conv2d, set the convolution kernel scale to 3 ⁇ 3, the convolution kernel channel is class, and class represents the category of the segmented training database sample, that is, the segmented category.
  • the circle 16 is a non-linear interpolation layer (interpolate), after passing through this layer, the input feature map is interpolated to the size of the original input feature map, and the interpolation is interpolated according to the size of 8 times.
  • the output feature map size of this layer is 768 ⁇ 768 ⁇ class, and each class channel represents a segmentation category.
  • Circle 17 represents the final output, and the output size is 768 ⁇ 768 ⁇ class.
  • An image segmentation method described below and a neural network model for segmentation of an image described above may correspond to each other and refer to each other.
  • the image segmentation method is implemented based on the neural network model for segmenting images as described above, and includes the following steps: acquiring a target image to be segmented; inputting the target image into the neural network model to obtain an image segmentation result.
  • the image segmentation method includes the following steps:
  • S53 Use a normalization unit to normalize and non-linearly map the output feature map of the feature extraction unit to obtain the output feature map of the normalization unit;
  • S54 Use the intelligent selection unit to determine the first weight value of each channel in the output feature map of the normalization unit, where the first weight value is used to characterize the contribution of the channel to the accuracy of image segmentation; Filter out the preset number of target channels with the largest first weight value in the output feature map of the unifying unit; perform a weighting operation on the target channel according to the first weight value to obtain the output feature map of the intelligent selection unit;
  • S57 Generate an image segmentation result of the target image according to the output feature map obtained by upsampling.
  • the image segmentation method of this embodiment is implemented based on the neural network model used to segment images as described above. Therefore, the specific implementation of the method can be seen in the example part of the neural network model used to segment images in the preceding paragraph, and its technology The effect corresponds to the technical effect of the above-mentioned neural network model, and will not be repeated here.
  • this application also provides an image segmentation device, including:
  • Memory used to store computer programs
  • Processor used to execute the computer program to implement an image segmentation method as described above.
  • the present application provides a readable storage medium for storing a computer program, which is used to implement an image segmentation method as described above when the computer program is executed by a processor.
  • the steps of the method or algorithm described in combination with the embodiments disclosed herein can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

一种用于分割图像的神经网络模型及其图像分割方法、设备及可读存储介质,包括智能选择模块,该智能选择模块进一步包括特征提取单元和智能选择单元。由于特征提取单元采用多尺度的空洞卷积,获得了输入特征图不同尺度的信息,为后面特征筛选提供大量丰富的特征信息;而智能选择单元通过训练一个权重值,并根据权重值大小对输入特征图通道进行智能筛选,因此该智能选择模块能够在保证分割精度的同时,降低参数量和计算量。因此,本申请的神经网络模型通过采用上述智能选择模块,能够快速抽取图像的有效特征,且计算量小,模型参数少,适用于移动终端。

Description

一种用于分割图像的神经网络模型及其图像分割方法
本申请要求于2019年12月22日提交中国专利局、申请号为201911332559.3、发明名称为“一种用于分割图像的神经网络模型及其图像分割方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,特别涉及一种用于分割图像的神经网络模型、图像分割方法、设备及可读存储介质。
背景技术
目前,通过深度学习解决图像分类、图像分割和物体检测等计算机视觉领域问题已成为热门,并取得了巨大的成功。
其中,图像分割技术是计算机视觉领域的重要的研究方向,是图像语义理解的重要一环。图像分割是指将图像分成若干具有相似性质的区域的过程,近些年,图像分割技术有了突飞猛进的发展,该技术相关的场景物体分割、人体前背景分割、人脸人体Parsing、三维重建等技术已经在无人驾驶、增强现实、安防监控等行业得到广泛应用。
近年来,涌现出了很多优秀的卷积神经网络模型,但是这些模型的计算量和大小往往都很大,只能用于服务器端并且需要高性能GPU加速才能运行,无法在智能手机等移动设备运行。
考虑到移动设备对深度学习同样有着巨大的需求,因此,如何设计一个能够应用到移动设备的轻量级的卷积神经网络模型,以实现图像分割,是亟待本领域技术人员解决的问题。
发明内容
本申请的目的是提供一种用于分割图像的神经网络模型、图像分割方法、设备及可读存储介质,用以解决当前的用于图像分割的神经网络模型的计算量大、模型参数量大,无法在移动手机等移动设备终端运行的问题。其具体方案如下:
第一方面,本申请提供了一种用于分割图像的神经网络模型,包括: 特征提取模块、智能选择模块、上采样模块和分类模块,其中所述智能选择模块包括特征提取单元、归一化单元、智能选择单元、输出单元;
所述特征提取模块用于提取目标图像的原始特征图;
所述特征提取单元用于分别利用多个不同尺度的空洞卷积对所述目标图像的原始特征图进行特征提取,并对提取得到的多个特征图进行拼接操作,得到所述特征提取单元的输出特征图;
所述归一化单元用于对所述特征提取单元的输出特征图进行归一化和非线性映射,得到所述归一化单元的输出特征图;
所述智能选择单元用于确定所述归一化单元的输出特征图中各个通道的第一权重值,所述第一权重值用于表征该通道对图像分割精准性的贡献大小;从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据所述第一权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图;
所述输出单元用于对所述智能选择单元的输出特征图与所述目标图像的原始特征图进行相加操作,得到目标特征图;
所述上采样模块用于对所述目标特征图进行上采样;
所述分类模块用于根据上采样得到的特征图,生成所述目标图像的图像分割结果。
优选的,所述智能选择单元具体用于:从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据预先训练得到的整体加权值对各个所述目标通道的第一权重值进行线性加权,得到所述目标通道的第二权重值;根据所述第二权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图。
优选的,所述智能选择模块包括降采样型的智能选择模块,所述降采样型的特征保持模块的智能选择单元用于从所述归一化单元的输出特征图中筛选出第一权重值最大的第一预设数量的目标通道,其中,所述第一预设数量为根据卷积步长和所述目标图像的原始特征图的通道数量确定的数量。
优选的,所述智能选择模块包括特征保持型的智能选择模块,所述特 征保持型的智能选择模块的智能选择单元用于从所述归一化单元的输出特征图中筛选出第一权重值最大的第二预设数量的目标通道,其中,所述第二预设数量等于所述目标图像的原始特征图的通道数量。
优选的,所述智能选择单元具体用于:通过对所述归一化单元的输出特征图依次进行平均池化操作、全连接操作、非线性映射操作、全连接操作、归一化操作,得到所述归一化单元的输出特征图中各个通道的第一权重值。
优选的,包括多个相互串联的所述智能选择模块。
优选的,所述特征提取模块包括第一特征提取模块和第二特征提取模块,所述智能选择模块包括第一智能选择模块和第二智能选择模块;
所述第一特征提取模块与第一预设数量的相互串联的所述第一智能选择模块串联,所述第二特征提取模块和所述第二智能选择模块通过连接模块输出至第二预设数量的相互串联的第二智能选择模块。
第二方面,本申请提供了一种图像分割方法,基于如上所述的用于分割图像的神经网络模型实现,包括:
获取待分割的目标图像;
将所述目标图像输入所述神经网络模型,得到图像分割结果。
第三方面,本申请提供了一种图像分割设备,包括:
存储器:用于存储计算机程序;
处理器:用于执行所述计算机程序,以实现如上所述的一种图像分割方法。
第四方面,本申请提供了一种可读存储介质,所述可读存储介质用于存储计算机程序,所述计算机程序被处理器执行时用于实现如上所述的一种图像分割方法。
本申请所提供的一种用于分割图像的神经网络模型,包括:特征提取模块、智能选择模块、上采样模块和分类模块,该智能选择模块包括特征提取单元、归一化单元、智能选择单元、输出单元。由于智能选择模块的特征提取单元采用多尺度的空洞卷积,获得了输入特征图不同尺度的信息,并将多种尺度的特征信息拼接到一起,为后面特征筛选提供大量丰富的特 征信息;此外,智能选择模块的智能选择单元通过训练一个表征通道对图像分割精准性的贡献大小的权重值,并根据权重值大小,抛弃权重值较小的通道,留下权重值较大的通道,实现了对输入特征图通道进行智能选择。可见,该智能选择模块能够融合多尺度信息,提取丰富特征,还能够通过提取有效通道,在保证分割精度的同时,降低参数量和计算量。因此,本申请的神经网络模型通过采用上述智能选择模块,能够快速抽取图像的有效特征,且计算量小,模型参数少,是一种轻量级的适用于移动终端的用于分割图像的神经网络模型。
此外,本申请还提供了一种图像分割方法、设备及可读存储介质,其技术效果与上述神经网络模型的技术效果相对应,这里不再赘述。
附图说明
为了更清楚的说明本申请实施例或现有技术的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请所提供的一种用于分割图像的神经网络模型实施例一的结构示意图;
图2为本申请所提供的一种用于分割图像的神经网络模型实施例一中智能选择模块的结构示意图;
图3为本申请所提供的一种用于分割图像的神经网络模型实施例二的结构示意图图一;
图4为本申请所提供的一种用于分割图像的神经网络模型实施例二的结构示意图图二。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,计算机视觉领域相关的处理软件都会用到卷积神经网络模型,而主流的卷积神经网络模型的计算量和模型本身大小都很大,动辄几百兆字节,因此很难部署到硬件资源有限的嵌入式系统上。此外,对于大型卷积神经网络来说,运行大型卷积神经网络需要大的内存带宽去读取和计算,并且需要进行很多次浮点运算,而且大型网络因为无法完全放置在DRAM里面,需要更多次DRAM访问,这就需要耗费很多电能。同时,APP软件都是通过应用商店下载更新,对于这些应用软件来说,软件本身的大小显得非常重要,因为如果应用软件本身大小过大,下载过慢,会影响到用户体验,很多用户可能不会下载或者更新这些软件。
可见,计算资源的限制,计算耗能问题,以及终端软件越大用户体验越差的问题,为卷积神经网络模型在智能手机等移动设备上大规模应用造成了很大的障碍。
针对上述问题,本申请提供了一种用于分割图像的神经网络模型、图像分割方法、设备及可读存储介质,通过神经网络模型中的智能选择模块融合多尺度信息,获取得到丰富的特征,并提取有效通道,在保证分割精度的同时,有效降低参数量和计算量。因此,本申请的神经网络模型能够快速抽取图像的有效特征,且计算量小,模型参数少,是一种轻量级的适用于移动终端的神经网络模型。
下面对本申请提供的一种用于分割图像的神经网络模型实施例一进行介绍,参见图1,实施例一包括:特征提取模块、智能选择模块、上采样模块和分类模块,其中所述智能选择模块包括特征提取单元、归一化单元、智能选择单元、输出单元;
所述特征提取模块用于提取目标图像的原始特征图;
所述特征提取单元用于分别利用多个不同尺度的空洞卷积对所述目标图像的原始特征图进行特征提取,并对提取得到的多个特征图进行拼接操作,得到所述特征提取单元的输出特征图;
所述归一化单元用于对所述特征提取单元的输出特征图进行归一化和非线性映射,得到所述归一化单元的输出特征图;
所述智能选择单元用于确定所述归一化单元的输出特征图中各个通道的第一权重值,所述第一权重值用于表征该通道对图像分割精准性的贡献大小;从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据所述第一权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图;
所述输出单元用于对所述智能选择单元的输出特征图与所述目标图像的原始特征图进行相加操作,得到目标特征图;
所述上采样模块用于对所述目标特征图进行上采样;
所述分类模块用于根据上采样得到的特征图,生成所述目标图像的图像分割结果。
由于智能选择模块(Wise Select Block)为本申请的核心之一,因此,下面对智能选择模块进行详细介绍。
如图2所示,标注Wise Select Block的虚线框内的结构为智能选择模块的详细结构,每一行称之为一层,总共14层。其中,标注feature extract module的虚线框内的结构为上述特征提取单元,标注Wise Select module的虚线框内的结构为上述智能选择单元,特征提取单元与智能选择单元之间的结构为上述归一化单元,智能选择单元之后的结构为上述输出单元。
参见图2,智能选择模块的输入特征图尺度是H×W×In_Channel,H、W代表height、width,In_channel代表输入特征图的通道数。在实际应用中,输入特征图的尺度也可以是batchsize×H×W×In_Channel,其中batchsize代表输入图像张数,本实施例以batchsize=1为例进行说明。可以理解的是,在实际应用中,可以根据实际需求自行设置batchsize,例如设置为2,设置为4等。
智能选择模块的第1层网络是ConvBnPrelu层,这一层代表了Conv(卷积层)+Bn(batchNorm层)+Prelu层或(relu层)的集成,ConvBnPrelu层的输出维度是:(H×W×layer1_kernel)。
第2层是由多个卷积层拼成的网络,包含n个卷积层,本实施例分别将每一层命名为:Layer2_kernel_1,…,layer2_kernel_n,该层卷积有2个特点,一个是卷积核采用空洞卷积,另一个是为了减少参数量,本实施例 使用group卷积。通过采用不同尺度的空洞卷积可以提高对不同尺度特征图的感受,提高了分类精度,而通过采用group卷积可以减少参数量和计算量,从而保证模型可以运行在移动设备端。
第3层是concat层(concatenate),即连接层,该层用于将上一层各个卷积层的输出结果拼接到一起。因此,输入特征图的尺寸是H×W×layer2_kernel_1,H×W×layer2_kernel_2,…,H×W×layer2_kernel_n,而输出特征图的尺寸是H×W×(layer2_kernel_1+layer2_kernel_2,…,+layer2_kernel_n),也就是说,输入特征图的宽和高不变,特征图的通道数相加。为方便描述,本实施例将上述(layer2_kernel_1+layer2_kernel_2,…,+layer2_kernel_n)命名为layer3,即(layer2_kernel_1+layer2_kernel_2,…,+layer2_kernel_n)=layer3。
第4层是bn层(batchnorm),用于对输入特征图进行归一化。
第5层是prelu层,用于对输入特征图进行非线性映射。经过第4层和第5层,特征图的尺度输出特征图的尺度是H×W×layer3,也就是说,bn层和prelu层不改变特征图的尺度。这里简要介绍一下,Prelu,即Parametric Rectified Linear Unit,是一种激活函数,其中,a是一个可学习参数:Prelu(x)=max(0,x)+a*min(0,x)。
下面介绍第6层至第13层,这些层合成一起,本实施例将其称之为dropChannel层。
如上所述,第5层的输出特征图的尺寸为H×W×layer3,其中这个layer3很大,不利于移动设备计算和存储,因此本实施例在dropChannel层的目的在于对这些通道进行筛选,去除掉价值小的通道,保留有利于分割的通道。具体的,为每个通道赋予权重,使更利于分割的通道获得更大的权重值,使不利于提高分割精度的层获得更小的权重值。下面对dropChannel层进行详细说明。
第6层主要是传递特征图,用于衔接第5层与第7层,并衔接第5层与第13层。
第7层是avepool层,图中标识为adaptive_avepool层。目的是对输入特征图的每一个通道的数据进行平均池化,例如,输入是H×W×layer3, 经过平均池化后输出是1×layer3,每一个H×W特征图经过avepooling,求平均获得一个数字。因此,该层输出是layer3×1,是一个向量。
第8层是全连接层,该层用于对输入layer3×1进行线性映射。因为在pytorch里用nn.liner实现了全连接功能,因此本实施例将其命名为nn.liner,当然也可以采用其他命名,例如full connection layer。该层输出为fc1,在实际应用中,通过改变全连接层的神经元的数目可以改变输出特征的维度。
第9层是relu层,用于对输入特征图进行非线性映射,该层输出为fc1。
第10层是全连接层,用于再次对输入特征图进行线性映射,该层输出为fc2(fc2是一个向量,维度是fc2×1)。其中fc2=layer3,保证该层的输出和特征图的通道数一致。
第11层是sigmoid层,对输入特征图进行归一化,使输入特征图的维度归一化为[0,1],该层输出为fc2。
第12层是wiseSelect层,该层目的是对输入特征图进行排序选择,具体实现是:将第11层输出进行从大到小的排序,记录好fc2中各元素的值及其所在的位置,位置则代表所在的层,该层输出维度是fc2×2。其中第一列代表权重值,第二列代表该权重值所在的位置,如表1所示,通过排序获得输入特征权重值大小及所在位置的排序:
表1
Weight index
0.98 7
0.85 4
因为fc2==layer3,所以fc2数值非常大(原因是在第3层进行了层的连接,使特征图的通道数暴增,不利于将该模型部署到移动设备),在第12层,输出每个通道的权重值排序及位置信息,以便于后续筛选通道。
第13层是wise_Multiply层,该层目的是智能的输出有利于分割的特征图。该层有2个输入,一个是第5层的输出特征图,该特征图的维度是H×W×layer3,另一个是第12层的输出,维度是fc2×2。该层的原理将在下文进行详细说明,此处不再展开介绍。
第14层是add层,用于对第13层和第6层的输出特征图进行相加操作,最终得到的特征图即智能选择模块的输出特征图。
特别说明的是,第13的层的输出特征图的通道数可以根据实际需求自行设置,具体可以分为两种:输出特征图的通道数等于智能选择模块的输入特征图的通道数,以及,输出特征图的通道数不等于智能选择模块的输入特征图的通道数。
更具体的,在实际应用中,可以根据网络需求将智能选择模块分为以下两种:降采样型的智能选择模块,以及,特征保持型的智能选择模块。其中,降采样型的智能选择模块的输出特征图的通道数大于输入特征图的通道数,而特征保持型的智能选择模块的输出特征图通道数等于输入特征图的通道数。
智能选择模块的第13层根据fc2排序结果,将fc2中weight值较大的k个通道保留下来,其余的层抛弃掉,实现通道的精简。该操作原理是:通过神经网络训练结果,选择出更有利于分割的通道,并保留下来,将没有利的通道抛掉,从而为了减少参数量和计算量。具体执行步骤如下:
S11、获取第5层的输出特征图->feature_in=H×W×layer3;
S12、获取第12层的输出->weight_in=fc2×2;
S13、选择weight_in的最大的前k个输入权重,记为weight_valuable;
S14、获取weight_valuable对应的索引值,记为index_valuable;
S15、根据index_valuable的值选取feature_in的对应的通道,即抽取出feature_in中index值所在的通道,共抽取了k个通道。经过抽取后,得到输出特征图的维度是H×W×k,记为channel_valuable;
S16、将weight_valuable与对应的channel_valuable进行相乘。
这里解释一下,weight_valuable的维度是k×1,channel_valuable维度是H×W×k,取出weight_valuable每个元素与channel_valuable对应通道的矩阵(H×W维)进行相乘,相乘就是weight_valuable乘以H×W维矩阵的每个元素。其中,weight_valuable是第11层输出的结果,第11层是softmax层,经过该层以后,weight_valuable的值在[0,1]之间,本实施例通过这些有价值的权重值给第5层的输出特征图feature_in赋以相应的权 重。
作为一种优选的实施方式,本实施例在此处引入一个新的值G,将其命名为整体加权值(global weight),该值为正,可以对weight_valuable进行一个线性放大或缩小,即上文所述的通过线性加权操作将第一权重值转换为第二权重值。整体加权值通过神经网络网络训练得到,通过该值可以使weight_valuable的值不限制分布在[0,1]的范围内,可以大于1,也可以小于1,但一定大于0。通过引入整体加权值,可以为更优秀的通道赋予更大的权重,即大于1的权重值。即:wise_channel=(G×weight_valuable)×channel_valuable。
综上,第13层的输出特征图维度是H×W×k,本实施例将其命名为wise_channel。当k=In_channel时,此时的智能选择模块为前述特征保持型的智能选择模块;当k=stride×In_channel且stride﹥1时,此时的智能选择模块为降采样型的智能选择模块,其中stride为卷积步长。
本实施例所提供一种用于分割图像的神经网络模型,包括特征提取模块、智能选择模块、上采样模块和分类模块,其中智能选择模块包括特征提取单元、归一化单元、智能选择单元、输出单元。该神经网络模型至少具备以下优点:
第一,智能选择模块采用不同尺度的空洞卷积,融合不同尺度的特征图信息,获得更为丰富的特征图,为后面特征筛选提供大量特征信息;
第二,由于采用了多尺度的空洞卷积,即使用不同尺度的多个卷积分支对相同的输入特征图进行卷积,增加了参数量和计算量。为了消减参数量和计算量,作为一种优选的实施方式,本申请在每个分支采用group卷积的方法,大大减少参数量,而很小的损失精度;
第三,为了进一步减少参数量和特征图通道数,智能选择模块对输入特征图通道进行智能选择,通过训练一个权重值来表明哪些输入特征图通道更有利于未来的分割,同时,根据权重值大小,抛弃权重值较小的通道,留下权重值较大的通道,在保证分割精度的同时,降低参数量和计算量;
第四,智能选择模块还训练整体加权值,从而对各通道权重值进行再次线性加权,提高或抑制各通道的权重值的表达范围,使其不仅仅局限在 [0,1]的范围。
最终,通过在用于分割图像的神经网络模型中采用智能选择模块,能够在模型参数很小的情况下,保持图像分割精度,实现了轻量级的可应用于移动设备端的神经网络模型,用于对图像进行分割。
下面开始详细介绍本申请提供的一种用于分割图像的神经网络模型实施例二,前述实施例一以图像分割模型的通用结构为基础,对智能选择模块进行了详尽的介绍,实施例二将以具体的网络结构为例,对一种采用了智能选择模块的具体的图像分割模型进行介绍。
参见图3,实施例二中特征提取模块包括第一特征提取模块和第二特征提取模块,智能选择模块包括第一智能选择模块和第二智能选择模块;其中,第一特征提取模块与第一预设数量的相互串联的第一智能选择模块串联,第二特征提取模块和第二智能选择模块通过连接模块输出至第二预设数量的相互串联的第二智能选择模块。第二智能选择模块后续顺次连接上采样模块和分类模块。
值得一提的是,上述将智能选择模块分为第一智能选择模块和第二智能选择模块,只是为了方便描述整个神经网络模型的结构,本实施例中各个智能选择模块的结构相同(如实施例一中对智能选择模块的描述),但各个智能选择模块的参数可能不同,具体的参数设置将在下文进行详细说明。
实施例二的神经网络模型的详细结构如图4所示,下面按照图4中圆圈标号的顺序,分别对各个网络结构进行说明。
1、首先该网络输入如圆圈0所示,输入维度是H×W×3,作为一种具体的实施方式,在本实施例中可以设置为768×768×3。
2、输入图像首先进入圆圈1,ConvBnPRelu代表由Conv(卷积层)、BatchNormal(Bn)层和Relu层组成的网络,ConvBnPRelu(3,32,3,2,1)代表输入是3个通道,输出是32个通道,卷积核采用3×3尺度,stride设置为2,padding设置为1的卷积层形成的网络。
具体的,经过第一个ConvBnPRelu层,特征图宽高减少一半,该层输出为384×384×32;经过第二个ConvBnPRelu层,特征图尺度保持不变, 该层输出为384×384×32;经过第三个ConvBnPRelu层,特征图尺度保持不变,该层输出为384×384×32。
3、进入圆圈4,将圆圈1和圆圈2的结果concat(通道叠加)到一起,圆圈2是pooling层,卷积步长stride=2,是原始图像的下采样2倍的图像,圆圈4的输出维度是384×384×35。
4、进入圆圈5,BN+Prelu层,本层不改变特征图的维度,输出特征图维度为384×384×35。
5、进入圆圈6,智能选择模块通过扩张卷积操作和多分支通道融合获得更多的特征,通过智能选取最优通道,减少参数量和计算量。该模块可以提取更精简、优异的特征。圆圈6所示的智能选择模块的第一个卷积层stride尺度为2,所以特征图宽高尺度会缩减到原来的一半。圆圈6的输出特征图维度是192×192×64,其中64代表输出通道数。
6、进入圆圈7,该智能选择模块中第一个卷积层stride尺度为1,不改变特征图尺度,圆圈7的输出特征图的维度是192×192×64。
7、圆圈8的智能选择模块与圆圈7相同,通过串联智能选择模块,通过叠加层提取有效分割特征,圆圈8的输出特征图的维度是192×192×64。
8、圆圈9的智能选择模块可以增加通道数,通过规定选取通道的数量,可以改变输出通道数,在本本实施例中将输出通道数设置为128,通过圆圈9后输出特征图的维度是192×192×128。
9、圆圈10是concat模块,圆圈10模块和圆圈4功能相同,圆圈10将圆圈9和圆圈3的输出结果按照通道维度叠加到一起。圆圈10的输出维度是192×192×(128+3)。
10、圆圈11是BN+Prelu模块,通过该模块特征图维度为:192×192×(128+3)。
11、圆圈12代表多个wise select block模块的叠加,与上面讲的智能选择模块类似,第一个智能选择模块通过设置卷积层stride(卷积步长)来降低特征图的宽高,通过最后一个智能选择模块改变最后输出特征图的通道数,其它的智能选择模块不改变输入其中的特征图的维度。通过圆圈12后,输出特征图的维度变为96×96×256。
12、圆圈13是BN+Prelu模块,通过该模块特征图维度为96×96×256。
13、圆圈14是dropout2d模块,是dropout层,通过该层后特征图维度为96×96×256。
14、圆圈15是卷积层conv2d,设置为卷积核尺度为3×3,卷积核通道为class,class代表分割训练数据库样本的类别,也即分割的类别。
15、圆圈16是非线性插值层(interpolate),通过该层后将输入特征图插值到原始输入特征图的尺寸,插值按照8倍尺寸插值。该层的输出特征图尺寸为768×768×class,其中每一个class通道代表一个分割类别。
16、圆圈17代表最终输出,输出尺寸是768×768×class。
下面对本申请提供的一种图像分割方法实施例进行介绍,下文描述的一种图像分割方法与上文描述的一种用于分割图像的神经网络模型可相互对应参照。
该图像分割方法基于如上文所述的用于分割图像的神经网络模型实现,包括以下步骤:获取待分割的目标图像;将所述目标图像输入所述神经网络模型,得到图像分割结果。
具体的,该图像分割方法包括以下步骤:
S51、提取目标图像的原始特征图;
S52、利用特征提取单元分别利用多个不同尺度的空洞卷积对所述目标图像的原始特征图进行特征提取,并对提取得到的多个特征图进行拼接操作,得到所述特征提取单元的输出特征图;
S53、利用归一化单元对所述特征提取单元的输出特征图进行归一化和非线性映射,得到所述归一化单元的输出特征图;
S54、利用智能选择单元确定所述归一化单元的输出特征图中各个通道的第一权重值,所述第一权重值用于表征该通道对图像分割精准性的贡献大小;从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据所述第一权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图;
S55、利用输出单元对所述智能选择单元的输出特征图与所述目标图像的原始特征图进行相加操作,得到目标特征图;
S56、对所述目标特征图进行上采样;
S57、根据上采样得到的输出特征图,生成所述目标图像的图像分割结果。
本实施例的图像分割方法基于如上文所述的用于分割图像的神经网络模型实现,因此该方法的具体实施方式可见前文中的用于分割图像的神经网络模型的实施例部分,且其技术效果与上述神经网络模型的技术效果相对应,这里不再赘述。
此外,本申请还提供了一种图像分割设备,包括:
存储器:用于存储计算机程序;
处理器:用于执行所述计算机程序,以实现如上文所述的一种图像分割方法。
最后,本申请提供了一种可读存储介质,所述可读存储介质用于存储计算机程序,所述计算机程序被处理器执行时用于实现如上文所述的一种图像分割方法。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的方案进行了详细介绍,本文中应用了具体个例 对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种用于分割图像的神经网络模型,其特征在于,包括:特征提取模块、智能选择模块、上采样模块和分类模块,其中所述智能选择模块包括特征提取单元、归一化单元、智能选择单元、输出单元;
    所述特征提取模块用于提取目标图像的原始特征图;
    所述特征提取单元用于分别利用多个不同尺度的空洞卷积对所述目标图像的原始特征图进行特征提取,并对提取得到的多个特征图进行拼接操作,得到所述特征提取单元的输出特征图;
    所述归一化单元用于对所述特征提取单元的输出特征图进行归一化和非线性映射,得到所述归一化单元的输出特征图;
    所述智能选择单元用于确定所述归一化单元的输出特征图中各个通道的第一权重值,所述第一权重值用于表征该通道对图像分割精准性的贡献大小;从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据所述第一权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图;
    所述输出单元用于对所述智能选择单元的输出特征图与所述目标图像的原始特征图进行相加操作,得到目标特征图;
    所述上采样模块用于对所述目标特征图进行上采样;
    所述分类模块用于根据上采样得到的特征图,生成所述目标图像的图像分割结果。
  2. 如权利要求1所述的用于分割图像的神经网络模型,其特征在于,所述智能选择单元具体用于:从所述归一化单元的输出特征图中筛选出第一权重值最大的预设数量的目标通道;根据预先训练得到的整体加权值对各个所述目标通道的第一权重值进行线性加权,得到所述目标通道的第二权重值;根据所述第二权重值对所述目标通道进行加权操作,得到所述智能选择单元的输出特征图。
  3. 如权利要求1所述的用于分割图像的神经网络模型,其特征在于,所述智能选择模块包括降采样型的智能选择模块,所述降采样型的特征保持模块的智能选择单元用于从所述归一化单元的输出特征图中筛选出第一 权重值最大的第一预设数量的目标通道,其中,所述第一预设数量为根据卷积步长和所述目标图像的原始特征图的通道数量确定的数量。
  4. 如权利要求3所述的用于分割图像的神经网络模型,其特征在于,所述智能选择模块包括特征保持型的智能选择模块,所述特征保持型的智能选择模块的智能选择单元用于从所述归一化单元的输出特征图中筛选出第一权重值最大的第二预设数量的目标通道,其中,所述第二预设数量等于所述目标图像的原始特征图的通道数量。
  5. 如权利要求1所述的用于分割图像的神经网络模型,其特征在于,所述智能选择单元具体用于:通过对所述归一化单元的输出特征图依次进行平均池化操作、全连接操作、非线性映射操作、全连接操作、归一化操作,得到所述归一化单元的输出特征图中各个通道的第一权重值。
  6. 如权利要求1-5任意一项所述的用于分割图像的神经网络模型,其特征在于,包括多个相互串联的所述智能选择模块。
  7. 如权利要求6所述的用于分割图像的神经网络模型,其特征在于,所述特征提取模块包括第一特征提取模块和第二特征提取模块,所述智能选择模块包括第一智能选择模块和第二智能选择模块;
    所述第一特征提取模块与第一预设数量的相互串联的所述第一智能选择模块串联,所述第二特征提取模块和所述第二智能选择模块通过连接模块输出至第二预设数量的相互串联的第二智能选择模块。
  8. 一种图像分割方法,其特征在于,基于如权利要求1-7任意一项所述的用于分割图像的神经网络模型实现,包括:
    获取待分割的目标图像;
    将所述目标图像输入所述神经网络模型,得到图像分割结果。
  9. 一种图像分割设备,其特征在于,包括:
    存储器:用于存储计算机程序;
    处理器:用于执行所述计算机程序,以实现如权利要求8所述的一种图像分割方法。
  10. 一种可读存储介质,其特征在于,所述可读存储介质用于存储计算机程序,所述计算机程序被处理器执行时用于实现如权利要求8所述的 一种图像分割方法。
PCT/CN2020/110983 2019-12-22 2020-08-25 一种用于分割图像的神经网络模型及其图像分割方法 WO2021128896A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20907894.8A EP4053739A4 (en) 2019-12-22 2020-08-25 NEURONAL NETWORK MODEL FOR IMAGE SEGMENTATION AND ASSOCIATED IMAGE SEGMENTATION METHOD

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911332559.3 2019-12-22
CN201911332559.3A CN111079767B (zh) 2019-12-22 2019-12-22 一种用于分割图像的神经网络模型及其图像分割方法

Publications (1)

Publication Number Publication Date
WO2021128896A1 true WO2021128896A1 (zh) 2021-07-01

Family

ID=70316655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110983 WO2021128896A1 (zh) 2019-12-22 2020-08-25 一种用于分割图像的神经网络模型及其图像分割方法

Country Status (3)

Country Link
EP (1) EP4053739A4 (zh)
CN (1) CN111079767B (zh)
WO (1) WO2021128896A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114005028A (zh) * 2021-07-30 2022-02-01 北京航空航天大学 一种抗干扰的遥感图像目标检测轻量模型及其方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079767B (zh) * 2019-12-22 2022-03-22 浪潮电子信息产业股份有限公司 一种用于分割图像的神经网络模型及其图像分割方法
CN112270668B (zh) * 2020-11-06 2021-09-21 威海世一电子有限公司 垂吊线缆检测方法、系统和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934397A (zh) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
CN110136141A (zh) * 2019-04-24 2019-08-16 佛山科学技术学院 一种面向复杂环境的图像语义分割方法及装置
CN110189337A (zh) * 2019-05-31 2019-08-30 广东工业大学 一种自动驾驶图像语义分割方法
CN110232394A (zh) * 2018-03-06 2019-09-13 华南理工大学 一种多尺度图像语义分割方法
CN110348411A (zh) * 2019-07-16 2019-10-18 腾讯科技(深圳)有限公司 一种图像处理方法、装置和设备
CN111079767A (zh) * 2019-12-22 2020-04-28 浪潮电子信息产业股份有限公司 一种用于分割图像的神经网络模型及其图像分割方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102601604B1 (ko) * 2017-08-04 2023-11-13 삼성전자주식회사 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치
CN110335290B (zh) * 2019-06-04 2021-02-26 大连理工大学 基于注意力机制的孪生候选区域生成网络目标跟踪方法
CN110378243A (zh) * 2019-06-26 2019-10-25 深圳大学 一种行人检测方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934397A (zh) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
CN110232394A (zh) * 2018-03-06 2019-09-13 华南理工大学 一种多尺度图像语义分割方法
CN110136141A (zh) * 2019-04-24 2019-08-16 佛山科学技术学院 一种面向复杂环境的图像语义分割方法及装置
CN110189337A (zh) * 2019-05-31 2019-08-30 广东工业大学 一种自动驾驶图像语义分割方法
CN110348411A (zh) * 2019-07-16 2019-10-18 腾讯科技(深圳)有限公司 一种图像处理方法、装置和设备
CN111079767A (zh) * 2019-12-22 2020-04-28 浪潮电子信息产业股份有限公司 一种用于分割图像的神经网络模型及其图像分割方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4053739A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114005028A (zh) * 2021-07-30 2022-02-01 北京航空航天大学 一种抗干扰的遥感图像目标检测轻量模型及其方法
CN114005028B (zh) * 2021-07-30 2023-02-17 北京航空航天大学 一种抗干扰的遥感图像目标检测轻量系统及其方法

Also Published As

Publication number Publication date
EP4053739A1 (en) 2022-09-07
EP4053739A4 (en) 2023-04-26
CN111079767A (zh) 2020-04-28
CN111079767B (zh) 2022-03-22

Similar Documents

Publication Publication Date Title
WO2021128896A1 (zh) 一种用于分割图像的神经网络模型及其图像分割方法
WO2022017025A1 (zh) 图像处理方法、装置、存储介质以及电子设备
CN110569851B (zh) 门控多层融合的实时语义分割方法
CN112347248A (zh) 一种方面级文本情感分类方法及系统
WO2023138188A1 (zh) 特征融合模型训练及样本检索方法、装置和计算机设备
CN112580694B (zh) 基于联合注意力机制的小样本图像目标识别方法及系统
Zhao et al. A balanced feature fusion SSD for object detection
CN110298841B (zh) 一种基于融合网络的图像多尺度语义分割方法及装置
CN110993037A (zh) 一种基于多视图分类模型的蛋白质活性预测装置
CN110134967A (zh) 文本处理方法、装置、计算设备及计算机可读存储介质
CN113806564B (zh) 多模态信息性推文检测方法及系统
WO2020147259A1 (zh) 一种用户画像方法、装置、可读存储介质及终端设备
Gongguo et al. An improved small target detection method based on Yolo V3
CN116738983A (zh) 模型进行金融领域任务处理的词嵌入方法、装置、设备
WO2024040941A1 (zh) 神经网络结构搜索方法、装置及存储介质
CN115266141B (zh) 基于gru-c网络的点焊质量检测方法、装置及存储介质
CN115545168A (zh) 基于注意力机制和循环神经网络的动态QoS预测方法及系统
CN110826726B (zh) 目标处理方法、目标处理装置、目标处理设备及介质
CN116484067A (zh) 目标对象匹配方法、装置及计算机设备
CN114241234A (zh) 细粒度图像分类方法、装置、设备及介质
CN111242146A (zh) 基于卷积神经网络的poi信息分类
Zhang et al. Research on aesthetic models based on neural architecture search
Tang et al. Robust neighborhood preserving low-rank sparse CNN features for classification
Wang et al. A Hybrid Self-Attention Model for Pedestrians Detection
CN113409769B (zh) 基于神经网络模型的数据识别方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907894

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020907894

Country of ref document: EP

Effective date: 20220601

NENP Non-entry into the national phase

Ref country code: DE