AU2021105161A4

AU2021105161A4 - Image segmentation based on feature attenuator

Info

Publication number: AU2021105161A4
Application number: AU2021105161A
Authority: AU
Inventors: Zhenya Yue
Original assignee: Yunshigao Technology Co Ltd
Current assignee: Yunshigao Technology Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-10-07
Anticipated expiration: 2029-08-09

Abstract

: Image segmentation is one of the popular techniques for vision application, i.e., retrieving object information from an image. Segmentation not only provides the class and location of an object, but the associated contour as well. Recent advances of the segmentation deploy the encoder and decoder architectures with skip connection, and also utilize the edge information to obtain the best contour. This patent focuses on maintaining the information flow from skip connection, and also can identifies suited feature from the bottom layer. The feature selector can be deployed in edge information for yielding segmentation contour, our method can also maintain the deep learning feature maps through the information filtering. The filtering module outperforms the former schemes with skip connection. Moreover, the proposed method also can attenuate the important edges information which can select the most important edges for different class reconstruction. 1 FigI Sets of filtering layer and factor ------------ fato Feature Map Filtering Layer Edge Detector Edges of Feature Map Weighted Edges for Contour Reconstruction Fig.2 (a) (b) 1

Description

FigI

Sets of filtering layer and ------------ factor fato Feature Map Filtering Layer

Edge Detector

Edges of Feature Map Weighted Edges for Contour Reconstruction

Fig.2

(a) (b)

1. Background and Purpose

Image segmentation aims to automatically label objects within images. Further, the nature and characteristic of the object can be identified based on the segmented parts. Moreover, the segmentation can also provide the location and the contours of the object which is much informative compared with the simple object detection. Image segmentation becomes more popular in case of the accuracy and the precise location of an object. However, the design of fast and accurate image segmentation is highly challenging for real-time condition due to the limitation on embedding system and hardware. Recently, with the aid of deep learning the image segmentation has been significantly improved. In this approach, the image segmentation is performed in a pixel wise manner, and each pixel is assigned to a specific segment. The Convolutional Neural Network (CNN) is commonly used as a feature extractor. Subsequently, the semantic segmentation is executed, and the objects are categorized into different classes, but not including the instances. The most popular schemes for semantic segmentation is to deploy autoencoder such as ENet , ResNet, DenseNet, these models have layer jump connections,this kind of structures can reduce the loss of information in the convolution process. However, not all of the skip connection features are important for the next reconstruction, unnecessary calculations will significantly reduce the running speed. Another question is about contour detection, The detected edges are not of equal contributions for segmentation contours in the classification problem. This patent focuses on maintaining the information flow from skip connection, and also can identifies suited feature from the bottom layer. The feature selector can be deployed in edge information for yielding segmentation contour, our method can also maintain the deep learning feature maps through the information filtering. The filtering module outperforms the former schemes with skip connection. Moreover, the proposed method also can attenuate the important edges information which can select the most important edges for different class reconstruction.

2. Image segmentation method Description 2.1 Efficient Network (ENet) We use ENet as the basic structure of image segmentation. ENet consist of several layers, they are initial, bottleneck and full convolution. The initial layer is composed of the 13 filter of convolution and maxpooling for the rest that later will be concatenated as the initial feature. In the bottleneck layer, the ENet has several different type of bottleneck which are regular, dilated or asymmetric. The regular convolution in ENet is a naive convolution with size 3x3 and the dilated convolution is used to capture the wide area of input with a cheap computation. Meanwhile for the asymmetric convolution is the separable convolution. In ENet the asymmetric convolution size is x5 which separated into two different convolution that are 1x5 and followed by 5x1. The full convolution layer that also known as the transposed convolution is used as the last upsampling layer. The other operation are max pooling for downsampling and max upooling for the upsampling. For the max pooling operation the kernel size 2x2 with stride 2 is used. The index of the pooling process will be saved to be used in the upsampling process. In the upsampling process, the value in the lower spatial size will be copied into the associated index that saved during the pooling operation, and the other value in the upsampled feature will set to zero.

2.2 Model structure Step 1: Attenuator layers This layer filters the information before passing it to the top layer. By filtering the feature based on their importance, the decoder is not overwhelmed, and the reconstruction can be performed better. Two terms in the attenuator layers, e.g., factor f and value #. The factor is the feature with range 0 to 1 as the weighting, indicating the importance of the feature. The value is the feature that attenuates with the factor to form the final feature. The factor is the result of filtering the feature map jby sequences of convolutional layers O, then followed by sigmoid activation function 0 as: f = 0(o (0)) (1) 0' =0 o f(O) (2) The filtering layers analyzes the importance of the feature, and the bottleneck layer is used for this process. The dimension of the feature is reduced first with 1xi convolution. Subsequently, the 3x3 separable convolution is applied, in which 1x3 is conducted before 3x1. The feature is expanded through 1xi convolution to its original depth size. The batch normalization layer and PReLU activation function are deployed inside the filtering layer. Finally, the sigmoid activation function is used to normalize the value of factor in between 0 to 1. Step 2: Edge selector method

In this method, every class will have different attenuator. After the attenuator produces the weight and weighting the vales, the result will be average and sum to the respected class. After that, the edge feature extraction of the object needs to be carried out from this information. In our method, the CNN layer is used to determine which information is edge information 1 G 2 *([1 0 1]*0i,) (3)

G, 0 1 2 1]*0,) (4) 1

G= G +G (5) The input feature maps later feed into Sobel operator to detect the edges of the feature map. The G, operator will compute the gradient of the feature maps in x direction that shows in (3), where * is the convolution operator. The separable Sobel operator is used to simplify the computational complexity compared with the original 3x3 kernel. The same way is applied to the y direction using Gy operator in (4). Finally (5) is applied to the feature maps of the initial block in ENet Oib. Then pass an activation layer to filter better edge features: fE o (0 (E(0,) (6) Step 3: Feature fusion The attenuated edges filter will be average to get the single class dedicated edges feature CE using (7): l d CE =I fjEm 0 E (0,) (7)

The CE later concatenated to form the final attenuated feature of the edges fCE using (8):

fCE ={CE, CEI,... ,CEd (8)

The fCE feature represent the most important edges across the feature maps that beneficial for the specific class feature. This feature later combined using the element wise summation with the class feature from the bottom layer 0 b for the final feature maps Of that feed to the top layer using (9): 0 f =fCE+(Oh) (9)

3. Brief Description of the Drawings

Fig. 1: Selecting the feature based on their importance. Fig. 2 Result of the filtered edges information. (a)The edges feature for sky class. (b) The edges feature for the tree class.

Claims

The claims defining the invention are as follows: Image segmentation based on feature attenuator

1. Image segmentation method Description 1.1 Efficient Network (ENet) We use ENet as the basic structure of image segmentation. ENet consist of several layers, they are initial, bottleneck and full convolution. The initial layer is composed of the 13 filter of convolution and maxpooling for the rest that later will be concatenated as the initial feature. In the bottleneck layer, the ENet has several different type of bottleneck which are regular, dilated or asymmetric. The regular convolution in ENet is a naive convolution with size 3x3 and the dilated convolution is used to capture the wide area of input with a cheap computation. Meanwhile for the asymmetric convolution is the separable convolution. In ENet the asymmetric convolution size is x5 which separated into two different convolution that are 1x5 and followed by 5x1. The full convolution layer that also known as the transposed convolution is used as the last upsampling layer. The other operation are max pooling for downsampling and max upooling for the upsampling. For the max pooling operation the kernel size 2x2 with stride 2 is used. The index of the pooling process will be saved to be used in the upsampling process. In the upsampling process, the value in the lower spatial size will be copied into the associated index that saved during the pooling operation, and the other value in the upsampled feature will set to zero.

1.

2 Model structure Step 1: Attenuator layers This layer filters the information before passing it to the top layer. By filtering the feature based on their importance, the decoder is not overwhelmed, and the reconstruction can be performed better. Two terms in the attenuator layers, e.g., factor f and value #. The factor is the feature with range 0 to 1 as the weighting, indicating the importance of the feature. The value is the feature that attenuates with the factor to form the final feature. The factor is the result of filtering the feature map jby sequences of convolutional layers O, then followed by sigmoid activation function 0 as: f = 0of (0)) (1) 0' =0 o f(O) (2) The filtering layers analyzes the importance of the feature, and the bottleneck layer is used for this process. The dimension of the feature is reduced first with 1xi convolution. Subsequently, the 3x3 separable convolution is applied, in which 1x3 is conducted before 3x1. The feature is expanded through 1xi convolution to its original depth size. The batch normalization layer and PReLU activation function are deployed inside the filtering layer. Finally, the sigmoid activation function is used to normalize the value of factor in between 0 to 1. Step 2: Edge selector method In this method, every class will have different attenuator. After the attenuator produces the weight and weighting the vales, the result will be average and sum to the respected class. After that, the edge feature extraction of the object needs to be carried out from this information. In our method, the CNN layer is used to determine which information is edge information 1 G 2 1 0 1]*0,,) (3) 1

1 G, 0 1 2 1]*0,,) (4)

G= G +G (5)

The input feature maps later feed into Sobel operator to detect the edges of the feature map. The G, operator will compute the gradient of the feature maps in x direction that shows in (3), where * is the convolution operator. The separable Sobel operator is used to simplify the computational complexity compared with the original 3x3 kernel. The same way is applied to the y direction using Gy operator in (4). Finally (5) is applied to the feature maps of the initial block in ENet Oib. Then pass an activation layer to filter better edge features:

fE o (0 (E(0,) (6)

Step 3: Feature fusion The attenuated edges filter will be average to get the single class dedicated edges feature CE using (7):

I d CE =I fJEm 0 E (0,) (7)

fCE ={CEI, CEI,... ,CEdI (8)

Fig1

Fig.2