CN113362343A

CN113362343A - Lightweight image semantic segmentation algorithm suitable for operating at Android end

Info

Publication number: CN113362343A
Application number: CN202110692929.5A
Authority: CN
Inventors: 张永军; 陈霞
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-09-07

Abstract

A lightweight semantic segmentation algorithm suitable for running at the android id end is disclosed. The Mob i L eNetV3 network is firstly improved, the network structure of the Mob i L eNetV3 network is adjusted and an optimized activation function L-ReLU6 is used, and then the down-sampling part in a classic semantic segmentation algorithm FCN is replaced by the improved Mob i L eNetV3 to extract the characteristics of the picture, so that the operation and the time overhead of the model are reduced; and replacing an upsampling part in the FCN by a bilinear interpolation mode, and performing add fusion on low-level and high-level features to solve the problem of image feature loss. On the basis, the light semantic segmentation model is operated at the mobile terminal to perform a picture segmentation task according to the task requirement of the system, and the meter reading system for pre-segmenting the picture at the android id mobile terminal is realized.

Description

Lightweight image semantic segmentation algorithm suitable for operating at Android end

Technical Field

The invention relates to the field of deep learning model optimization and image semantic segmentation, in particular to a method for building a lightweight semantic segmentation network suitable for running at an Android end by using a deep learning framework.

Background

With the rapid development of deep learning algorithms and artificial intelligence technologies, intelligent meter reading systems are gradually developed, and information of target areas such as meter reading and two-dimensional codes in pictures can be automatically and efficiently and accurately extracted, so that the cost of manual reading operation is reduced. However, the intelligent meter reading system also has some defects: uploading large-size pictures by the mobile terminal can increase network transmission time; the server-side system stores a large number of original-size pictures, which causes the pressure of storage space; due to the fact that information elements in the picture are too many, a large number of samples are needed for training the model when the information of the target area is accurately extracted, and the system operation time is prolonged. Due to the popularization of the mobile intelligent terminal, the collected picture is divided into the target areas at the mobile terminal according to actual needs and then uploaded, and the picture can be directly transmitted into the picture recognition model to be subjected to relevant information extraction operation after being received by the server terminal, so that the overall operation efficiency of the system is improved. The mobile terminal of the mobile phone has the characteristics of small size, portability and high real-time performance, and can bear diversified Apps. Currently, the mainstream mobile terminals of the mobile phones are mainly an IOS system and an Android system. The Android system has a huge user group, and can provide more supporting space for exchanging and discussing for research personnel at an Android end.

The picture is pre-divided at the mobile terminal, and a related picture division algorithm can be adopted. Most of the traditional image segmentation algorithms such as threshold method, edge detection and region method process the image according to some lower-order visual information of the image pixels, and the segmentation effect is not good on some complex engineering segmentation tasks. In 2014, the emergence of the fcn (full relational networks) semantic segmentation neural network makes semantic segmentation have a major breakthrough, the picture segmentation effect is greatly improved compared with that of the traditional method, and an end-to-end training mode lays a foundation for the development of a subsequent semantic segmentation algorithm. Many excellent networks are proposed based on FCN, and at present, the more common semantic segmentation networks mainly include a codec-based method, a hole convolution-based method and a feature fusion-based method. Although the current semantic segmentation models based on the FCN are improved in the image segmentation accuracy, the model calculation amount and the parameter amount are increased, and the semantic segmentation models are not suitable for being operated at a mobile terminal with limited calculation storage capacity.

Based on the analysis, the method improves on the classic FCN semantic segmentation algorithm, and provides a lightweight semantic segmentation algorithm facing to a mobile terminal.

Disclosure of Invention

The invention mainly provides a light image semantic segmentation algorithm suitable for running at an Android end, and a model obtained by training the algorithm is run on an Android platform, so that a meter reading system for pre-segmenting a picture at an Android mobile end is realized. The network structure of the lightweight semantic segmentation algorithm proposed herein is mainly divided into two parts: firstly, extracting picture features by using the MobileNet V3, and secondly, upsampling the feature map.

(1) Picture features were extracted with MobileNetV 3: firstly, the MobileNet V3 network is improved, so that the MobileNet V3 network can be suitable for picture feature extraction in a picture segmentation task. The improvement to MobileNetV3 includes two aspects:

a) the network structure of MobileNetV3 was adjusted. In order to reduce the calculation overhead and the memory occupation, the method only adopts the layer with the output channel number of the MobileNet V3 being before 160, namely the first 16 layers of the network, thereby avoiding the increased calculation resources caused by the sudden increase of the channel number. By changing the step size in the Bneck, the size of a feature map output after feature extraction of the picture is 1/16 of the original picture, and compared with the original feature map with the size of 1/32, more picture features can be relatively reserved.

b) An optimized activation function L-ReLU6 is used. The front 7 layers of the MobileNet V3 use an optimized L-ReLU6 activation function, the H-Swise activation function is still used in a high-level network, and the L-ReLU6 activation function can avoid the situation that the network cannot learn, so that the network obtains more picture features, and the segmentation result is more accurate. The optimized activation function L-ReLU6 formula presented herein is as follows:

L-ReLU6(x)＝min(6，max(αx，x)) (1)

inputting the picture into the model, and entering the part of the network structure which is adjusted by the MobileNet V3 to extract the picture feature, wherein the detailed structure of the part of the network structure which is extracted by the MobileNet V3 is shown in the attached figure 2. The main part mainly has a structure of Cov2d and four large Bneck modules, the four large Bneck modules can change the size of an input picture, each large Bneck module is repeatedly executed, and the times n of executing the picture feature extraction operations of Bneck1 to Bneck4 are 2, 5 and 6 respectively. The input picture passes through Cov2d and Bneck1 and then becomes 1/2 of the size of the original, passes through Bneck2 and then becomes 1/4 of the size of the original, passes through Bneck3 and then becomes 1/8 of the size of the original, and finally passes through Bneck4 and then becomes 1/16 of the size of the original, and feature maps of 1/2, 1/4 and 1/8 need to be saved, and add fusion operation needs to be performed in picture upsampling. The Bneck3 and Bneck4 modules adjust the step sizes of the 8 th, 10 th and 14 th layers in the MobileNet V3 so as to ensure that the final output characteristic diagram is 1/16 of the original image. The specific operation of the Bneck structure is shown in FIG. 3, where NL represents the activation function, the L-ReLU6 activation function proposed herein is used in the first three operational blocks, and the H-Swise activation function is used in the last two operational blocks.

(2) Feature map upsampling: after the image is subjected to feature extraction by the MobileNetV3, the segmentation graph can be obtained only after the feature graph is subjected to upsampling, and the structure of the upsampling part of the feature graph is shown in fig. 4. The feature graph is up-sampled by a mode of bilinear interpolation plus depth separable convolution, and the depth separable convolution has the greatest characteristic of obviously reducing the calculated amount and the parameter amount of the network, if the calculation is carried out by a 3x3 convolution kernel, the parameter amount can be reduced to about one ninth, so that shorter operation time can be obtained under limited resources; and when in up-sampling, the feature maps with the picture sizes of 1/2, 1/4 and 1/8 and the feature maps with the corresponding sizes in the up-sampling process are subjected to add fusion through a jump connection structure, so that low-level and high-level features are effectively fused, and the problem of image feature information loss can be solved. The essence of add operation is to add the two feature maps, and the number of channels of the resulting image is constant. The number of channels is not increased, but add fusion operation, the information amount under each dimension is increased, and the classification result of the final pixel can be more accurate.

Through tests, compared with a classical semantic segmentation network FCN, the volume of the model and the operation time of the model of the network designed by the method are reduced by about 10 times, the operation speed of the segmentation model is greatly improved while the segmentation precision is kept, and the method is suitable for a mobile terminal with limited memory resources.

The light semantic segmentation model is operated at the mobile terminal, related functional modules including a picture acquisition module, a model calling module, a front-end and back-end communication module and the like are developed, and a meter reading system for pre-segmenting pictures at the Android mobile terminal is realized.

Drawings

FIG. 1 is the complete structure of a lightweight semantic segmentation model designed herein;

FIG. 2 is a detailed block diagram of the portion of the picture feature extraction using MobileNet V3 in the lightweight semantic segmentation model designed herein;

FIG. 3 is a diagram of the detailed operation steps in the Bneck structure;

FIG. 4 is a detailed block diagram of the upsampling portion of the feature map in the lightweight semantic segmentation model designed herein;

detailed description of the invention

The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.

Referring to fig. 1, 2 and 3, an embodiment of the present invention includes:

(a) the lightweight network structure in the graph 1 is operated at an Android end, and a meter reading system for pre-dividing the picture at the Android mobile end is realized. The segmentation model is called in the system, and whether the user exists or not needs to be judged firstly, if the user does not exist, the user logs in after registration is needed. After entering the system, selecting a meter reading task from the task list, and after obtaining the picture of the target table, sending the picture into a segmentation model for segmentation.

(b) And (3) inputting the picture into the model, and entering a MobileNet V3 part with the adjusted network structure for picture feature extraction, wherein the detailed structure is shown in FIG. 2. The main part mainly has a structure of Cov2d and four large Bneck modules, the four large Bneck modules can change the size of an input picture, each large Bneck module is repeatedly executed, and the times n of executing the picture feature extraction operations of Bneck1 to Bneck4 are 2, 5 and 6 respectively. The input picture becomes 1/16 of the size of the original after passing through several Bneck blocks. In addition, 1/2, 1/4 and 1/8 feature maps need to be saved, and add fusion operation is needed in picture up-sampling. The Bneck3 and Bneck4 modules adjust the step sizes of the 8 th, 10 th and 14 th layers in the MobileNet V3 so as to ensure that the final output characteristic diagram is 1/16 of the original image. The first three operational modules use the L-ReLU6 activation function set forth herein, and the H-Swise activation function used in the last two operational modules.

(c) After the picture features are extracted by the structure shown in fig. 2, the feature map is up-sampled to obtain a final segmentation map. The detailed structure of the up-sampling part is shown in fig. 3, and a D-block module in a picture performs up-sampling on a characteristic image by 2 times through bilinear interpolation and depth separable operation. When decoding the feature map, adding a feature fusion part, mainly performing add fusion on the feature map with the picture size of 1/2, 1/4 and 1/8 and the feature map with the corresponding size obtained by a decoder, wherein the essence of the operation is to add the two feature maps, so that the number of channels of the obtained result image is unchanged, and the features of a lower-layer network and a higher-layer network can be effectively fused, so that the segmentation result is more accurate.

(d) If the picture segmentation is successful, the picture segmentation can be uploaded to a server side, and relevant data is stored in a local database.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The lightweight image semantic segmentation algorithm based on the Android mobile terminal is characterized by comprising the following steps:

the method comprises the following steps: marking the original picture by using a marking tool, and forming a training set by the marking picture and the original picture;

step two: extracting picture features by using improved MobileNet V3, and performing up-sampling on the feature map by adopting a bilinear interpolation mode to construct a lightweight semantic segmentation network;

step three: loading a training set, and training the constructed segmentation network to obtain a trained model;

step four: sending the picture to be segmented into a lightweight segmentation network, and loading a trained model to obtain a final segmentation result;

2. the Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 1, characterized in that: the light semantic segmentation network constructed in the step two uses the improved MobileNet V3 as a downsampling part to extract the characteristics of the picture, and the improvement of the MobileNet V3 network comprises the adjustment of the network structure and the use of an optimized activation function L-ReLU 6; and performing upsampling on the image by adopting a bilinear interpolation mode, and performing add fusion on the low-level and high-level features.

3. The Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 2, characterized in that: the structure adjustment of the mobilenetV3 mainly comprises three parts, namely, a layer before 160 output channels of the mobilenetV3, namely, the first 16 layers of the network; changing the step sizes of Bneck in the 8 th layer, the 10 th layer and the 14 th layer to enable the size of a feature image output after feature extraction of the image to be 1/16 of the original image; the activation functions of the top 7 layers all use the optimized L-ReLU6 activation function.

4. The Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 3, characterized in that: the optimized activation function L-ReLU6 formula presented herein is as follows:

L-ReLU6(x)＝min(6，max(αx，x))

5. the Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 2, characterized in that: and performing 2-time upsampling on the feature image by adopting a bilinear interpolation method and a depth separable operation, and performing add fusion on the feature image with the picture size of 1/2, 1/4 and 1/8 and the feature image with the corresponding size obtained in the upsampling process.