CN113362343A - Lightweight image semantic segmentation algorithm suitable for operating at Android end - Google Patents

Lightweight image semantic segmentation algorithm suitable for operating at Android end Download PDF

Info

Publication number
CN113362343A
CN113362343A CN202110692929.5A CN202110692929A CN113362343A CN 113362343 A CN113362343 A CN 113362343A CN 202110692929 A CN202110692929 A CN 202110692929A CN 113362343 A CN113362343 A CN 113362343A
Authority
CN
China
Prior art keywords
picture
semantic segmentation
network
image
mobile terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110692929.5A
Other languages
Chinese (zh)
Inventor
张永军
陈霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110692929.5A priority Critical patent/CN113362343A/en
Publication of CN113362343A publication Critical patent/CN113362343A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)

Abstract

A lightweight semantic segmentation algorithm suitable for running at the android id end is disclosed. The Mob i L eNetV3 network is firstly improved, the network structure of the Mob i L eNetV3 network is adjusted and an optimized activation function L-ReLU6 is used, and then the down-sampling part in a classic semantic segmentation algorithm FCN is replaced by the improved Mob i L eNetV3 to extract the characteristics of the picture, so that the operation and the time overhead of the model are reduced; and replacing an upsampling part in the FCN by a bilinear interpolation mode, and performing add fusion on low-level and high-level features to solve the problem of image feature loss. On the basis, the light semantic segmentation model is operated at the mobile terminal to perform a picture segmentation task according to the task requirement of the system, and the meter reading system for pre-segmenting the picture at the android id mobile terminal is realized.

Description

Lightweight image semantic segmentation algorithm suitable for operating at Android end
Technical Field
The invention relates to the field of deep learning model optimization and image semantic segmentation, in particular to a method for building a lightweight semantic segmentation network suitable for running at an Android end by using a deep learning framework.
Background
With the rapid development of deep learning algorithms and artificial intelligence technologies, intelligent meter reading systems are gradually developed, and information of target areas such as meter reading and two-dimensional codes in pictures can be automatically and efficiently and accurately extracted, so that the cost of manual reading operation is reduced. However, the intelligent meter reading system also has some defects: uploading large-size pictures by the mobile terminal can increase network transmission time; the server-side system stores a large number of original-size pictures, which causes the pressure of storage space; due to the fact that information elements in the picture are too many, a large number of samples are needed for training the model when the information of the target area is accurately extracted, and the system operation time is prolonged. Due to the popularization of the mobile intelligent terminal, the collected picture is divided into the target areas at the mobile terminal according to actual needs and then uploaded, and the picture can be directly transmitted into the picture recognition model to be subjected to relevant information extraction operation after being received by the server terminal, so that the overall operation efficiency of the system is improved. The mobile terminal of the mobile phone has the characteristics of small size, portability and high real-time performance, and can bear diversified Apps. Currently, the mainstream mobile terminals of the mobile phones are mainly an IOS system and an Android system. The Android system has a huge user group, and can provide more supporting space for exchanging and discussing for research personnel at an Android end.
The picture is pre-divided at the mobile terminal, and a related picture division algorithm can be adopted. Most of the traditional image segmentation algorithms such as threshold method, edge detection and region method process the image according to some lower-order visual information of the image pixels, and the segmentation effect is not good on some complex engineering segmentation tasks. In 2014, the emergence of the fcn (full relational networks) semantic segmentation neural network makes semantic segmentation have a major breakthrough, the picture segmentation effect is greatly improved compared with that of the traditional method, and an end-to-end training mode lays a foundation for the development of a subsequent semantic segmentation algorithm. Many excellent networks are proposed based on FCN, and at present, the more common semantic segmentation networks mainly include a codec-based method, a hole convolution-based method and a feature fusion-based method. Although the current semantic segmentation models based on the FCN are improved in the image segmentation accuracy, the model calculation amount and the parameter amount are increased, and the semantic segmentation models are not suitable for being operated at a mobile terminal with limited calculation storage capacity.
Based on the analysis, the method improves on the classic FCN semantic segmentation algorithm, and provides a lightweight semantic segmentation algorithm facing to a mobile terminal.
Disclosure of Invention
The invention mainly provides a light image semantic segmentation algorithm suitable for running at an Android end, and a model obtained by training the algorithm is run on an Android platform, so that a meter reading system for pre-segmenting a picture at an Android mobile end is realized. The network structure of the lightweight semantic segmentation algorithm proposed herein is mainly divided into two parts: firstly, extracting picture features by using the MobileNet V3, and secondly, upsampling the feature map.
(1) Picture features were extracted with MobileNetV 3: firstly, the MobileNet V3 network is improved, so that the MobileNet V3 network can be suitable for picture feature extraction in a picture segmentation task. The improvement to MobileNetV3 includes two aspects:
a) the network structure of MobileNetV3 was adjusted. In order to reduce the calculation overhead and the memory occupation, the method only adopts the layer with the output channel number of the MobileNet V3 being before 160, namely the first 16 layers of the network, thereby avoiding the increased calculation resources caused by the sudden increase of the channel number. By changing the step size in the Bneck, the size of a feature map output after feature extraction of the picture is 1/16 of the original picture, and compared with the original feature map with the size of 1/32, more picture features can be relatively reserved.
b) An optimized activation function L-ReLU6 is used. The front 7 layers of the MobileNet V3 use an optimized L-ReLU6 activation function, the H-Swise activation function is still used in a high-level network, and the L-ReLU6 activation function can avoid the situation that the network cannot learn, so that the network obtains more picture features, and the segmentation result is more accurate. The optimized activation function L-ReLU6 formula presented herein is as follows:
L-ReLU6(x)=min(6,max(αx,x)) (1)
inputting the picture into the model, and entering the part of the network structure which is adjusted by the MobileNet V3 to extract the picture feature, wherein the detailed structure of the part of the network structure which is extracted by the MobileNet V3 is shown in the attached figure 2. The main part mainly has a structure of Cov2d and four large Bneck modules, the four large Bneck modules can change the size of an input picture, each large Bneck module is repeatedly executed, and the times n of executing the picture feature extraction operations of Bneck1 to Bneck4 are 2, 5 and 6 respectively. The input picture passes through Cov2d and Bneck1 and then becomes 1/2 of the size of the original, passes through Bneck2 and then becomes 1/4 of the size of the original, passes through Bneck3 and then becomes 1/8 of the size of the original, and finally passes through Bneck4 and then becomes 1/16 of the size of the original, and feature maps of 1/2, 1/4 and 1/8 need to be saved, and add fusion operation needs to be performed in picture upsampling. The Bneck3 and Bneck4 modules adjust the step sizes of the 8 th, 10 th and 14 th layers in the MobileNet V3 so as to ensure that the final output characteristic diagram is 1/16 of the original image. The specific operation of the Bneck structure is shown in FIG. 3, where NL represents the activation function, the L-ReLU6 activation function proposed herein is used in the first three operational blocks, and the H-Swise activation function is used in the last two operational blocks.
(2) Feature map upsampling: after the image is subjected to feature extraction by the MobileNetV3, the segmentation graph can be obtained only after the feature graph is subjected to upsampling, and the structure of the upsampling part of the feature graph is shown in fig. 4. The feature graph is up-sampled by a mode of bilinear interpolation plus depth separable convolution, and the depth separable convolution has the greatest characteristic of obviously reducing the calculated amount and the parameter amount of the network, if the calculation is carried out by a 3x3 convolution kernel, the parameter amount can be reduced to about one ninth, so that shorter operation time can be obtained under limited resources; and when in up-sampling, the feature maps with the picture sizes of 1/2, 1/4 and 1/8 and the feature maps with the corresponding sizes in the up-sampling process are subjected to add fusion through a jump connection structure, so that low-level and high-level features are effectively fused, and the problem of image feature information loss can be solved. The essence of add operation is to add the two feature maps, and the number of channels of the resulting image is constant. The number of channels is not increased, but add fusion operation, the information amount under each dimension is increased, and the classification result of the final pixel can be more accurate.
Through tests, compared with a classical semantic segmentation network FCN, the volume of the model and the operation time of the model of the network designed by the method are reduced by about 10 times, the operation speed of the segmentation model is greatly improved while the segmentation precision is kept, and the method is suitable for a mobile terminal with limited memory resources.
The light semantic segmentation model is operated at the mobile terminal, related functional modules including a picture acquisition module, a model calling module, a front-end and back-end communication module and the like are developed, and a meter reading system for pre-segmenting pictures at the Android mobile terminal is realized.
Drawings
FIG. 1 is the complete structure of a lightweight semantic segmentation model designed herein;
FIG. 2 is a detailed block diagram of the portion of the picture feature extraction using MobileNet V3 in the lightweight semantic segmentation model designed herein;
FIG. 3 is a diagram of the detailed operation steps in the Bneck structure;
FIG. 4 is a detailed block diagram of the upsampling portion of the feature map in the lightweight semantic segmentation model designed herein;
detailed description of the invention
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
Referring to fig. 1, 2 and 3, an embodiment of the present invention includes:
(a) the lightweight network structure in the graph 1 is operated at an Android end, and a meter reading system for pre-dividing the picture at the Android mobile end is realized. The segmentation model is called in the system, and whether the user exists or not needs to be judged firstly, if the user does not exist, the user logs in after registration is needed. After entering the system, selecting a meter reading task from the task list, and after obtaining the picture of the target table, sending the picture into a segmentation model for segmentation.
(b) And (3) inputting the picture into the model, and entering a MobileNet V3 part with the adjusted network structure for picture feature extraction, wherein the detailed structure is shown in FIG. 2. The main part mainly has a structure of Cov2d and four large Bneck modules, the four large Bneck modules can change the size of an input picture, each large Bneck module is repeatedly executed, and the times n of executing the picture feature extraction operations of Bneck1 to Bneck4 are 2, 5 and 6 respectively. The input picture becomes 1/16 of the size of the original after passing through several Bneck blocks. In addition, 1/2, 1/4 and 1/8 feature maps need to be saved, and add fusion operation is needed in picture up-sampling. The Bneck3 and Bneck4 modules adjust the step sizes of the 8 th, 10 th and 14 th layers in the MobileNet V3 so as to ensure that the final output characteristic diagram is 1/16 of the original image. The first three operational modules use the L-ReLU6 activation function set forth herein, and the H-Swise activation function used in the last two operational modules.
(c) After the picture features are extracted by the structure shown in fig. 2, the feature map is up-sampled to obtain a final segmentation map. The detailed structure of the up-sampling part is shown in fig. 3, and a D-block module in a picture performs up-sampling on a characteristic image by 2 times through bilinear interpolation and depth separable operation. When decoding the feature map, adding a feature fusion part, mainly performing add fusion on the feature map with the picture size of 1/2, 1/4 and 1/8 and the feature map with the corresponding size obtained by a decoder, wherein the essence of the operation is to add the two feature maps, so that the number of channels of the obtained result image is unchanged, and the features of a lower-layer network and a higher-layer network can be effectively fused, so that the segmentation result is more accurate.
(d) If the picture segmentation is successful, the picture segmentation can be uploaded to a server side, and relevant data is stored in a local database.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (5)

1. The lightweight image semantic segmentation algorithm based on the Android mobile terminal is characterized by comprising the following steps:
the method comprises the following steps: marking the original picture by using a marking tool, and forming a training set by the marking picture and the original picture;
step two: extracting picture features by using improved MobileNet V3, and performing up-sampling on the feature map by adopting a bilinear interpolation mode to construct a lightweight semantic segmentation network;
step three: loading a training set, and training the constructed segmentation network to obtain a trained model;
step four: sending the picture to be segmented into a lightweight segmentation network, and loading a trained model to obtain a final segmentation result;
2. the Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 1, characterized in that: the light semantic segmentation network constructed in the step two uses the improved MobileNet V3 as a downsampling part to extract the characteristics of the picture, and the improvement of the MobileNet V3 network comprises the adjustment of the network structure and the use of an optimized activation function L-ReLU 6; and performing upsampling on the image by adopting a bilinear interpolation mode, and performing add fusion on the low-level and high-level features.
3. The Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 2, characterized in that: the structure adjustment of the mobilenetV3 mainly comprises three parts, namely, a layer before 160 output channels of the mobilenetV3, namely, the first 16 layers of the network; changing the step sizes of Bneck in the 8 th layer, the 10 th layer and the 14 th layer to enable the size of a feature image output after feature extraction of the image to be 1/16 of the original image; the activation functions of the top 7 layers all use the optimized L-ReLU6 activation function.
4. The Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 3, characterized in that: the optimized activation function L-ReLU6 formula presented herein is as follows:
L-ReLU6(x)=min(6,max(αx,x))
5. the Android mobile terminal-based lightweight image semantic segmentation algorithm according to claim 2, characterized in that: and performing 2-time upsampling on the feature image by adopting a bilinear interpolation method and a depth separable operation, and performing add fusion on the feature image with the picture size of 1/2, 1/4 and 1/8 and the feature image with the corresponding size obtained in the upsampling process.
CN202110692929.5A 2021-06-22 2021-06-22 Lightweight image semantic segmentation algorithm suitable for operating at Android end Pending CN113362343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110692929.5A CN113362343A (en) 2021-06-22 2021-06-22 Lightweight image semantic segmentation algorithm suitable for operating at Android end

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110692929.5A CN113362343A (en) 2021-06-22 2021-06-22 Lightweight image semantic segmentation algorithm suitable for operating at Android end

Publications (1)

Publication Number Publication Date
CN113362343A true CN113362343A (en) 2021-09-07

Family

ID=77535665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110692929.5A Pending CN113362343A (en) 2021-06-22 2021-06-22 Lightweight image semantic segmentation algorithm suitable for operating at Android end

Country Status (1)

Country Link
CN (1) CN113362343A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110692A (en) * 2019-05-17 2019-08-09 南京大学 A kind of realtime graphic semantic segmentation method based on the full convolutional neural networks of lightweight
US20200151497A1 (en) * 2018-11-12 2020-05-14 Sony Corporation Semantic segmentation with soft cross-entropy loss
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151497A1 (en) * 2018-11-12 2020-05-14 Sony Corporation Semantic segmentation with soft cross-entropy loss
CN110110692A (en) * 2019-05-17 2019-08-09 南京大学 A kind of realtime graphic semantic segmentation method based on the full convolutional neural networks of lightweight
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONGJUN ZHANG 等: "Lightweight semantic segmentation algorithm based on MobileNetV3 network", 《2020 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, AUTOMATION AND SYSTEMS (ICICAS)》 *

Similar Documents

Publication Publication Date Title
CN110147794A (en) A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning
CN113780296A (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN114943963A (en) Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN112085031A (en) Target detection method and system
CN112560701B (en) Face image extraction method and device and computer storage medium
CN111882620A (en) Road drivable area segmentation method based on multi-scale information
CN113658200A (en) Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN112580567A (en) Model obtaining method, model obtaining device and intelligent equipment
CN115908793A (en) Coding and decoding structure semantic segmentation model based on position attention mechanism
CN113744185A (en) Concrete apparent crack segmentation method based on deep learning and image processing
CN113362343A (en) Lightweight image semantic segmentation algorithm suitable for operating at Android end
CN116561879A (en) Hydraulic engineering information management system and method based on BIM
CN113378598B (en) Dynamic bar code detection method based on deep learning
CN116310875A (en) Target detection method and device for satellite remote sensing image
CN112084815A (en) Target detection method based on camera focal length conversion, storage medium and processor
CN115984574A (en) Image information extraction model and method based on cyclic transform and application thereof
CN113223006B (en) Lightweight target semantic segmentation method based on deep learning
CN114863094A (en) Industrial image region-of-interest segmentation algorithm based on double-branch network
CN112991398B (en) Optical flow filtering method based on motion boundary guidance of cooperative deep neural network
CN115984603A (en) Fine classification method and system for urban green land based on GF-2 and open map data
CN115661097A (en) Object surface defect detection method and system
CN116680434B (en) Image retrieval method, device, equipment and storage medium based on artificial intelligence
CN116612287B (en) Image recognition method, device, computer equipment and storage medium
CN117830788B (en) Image target detection method for multi-source information fusion
CN117058498B (en) Training method of segmentation map evaluation model, and segmentation map evaluation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination