CN114612477A

CN114612477A - Lightweight image segmentation method, system, medium, terminal and application

Info

Publication number: CN114612477A
Application number: CN202210208581.2A
Authority: CN
Inventors: 朱烨; 胡伟; 魏敏
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-06-10
Anticipated expiration: 2042-03-03
Also published as: CN114612477B

Abstract

The invention belongs to the technical field of image processing, and discloses a lightweight image segmentation method, a system, a medium, a terminal and application, wherein a training data set is obtained, and images in the training data set and a real label graph are cut; constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by using the images in the cut training data set and the real label images; and carrying out lightweight image segmentation by using the trained improved U-Net coding and decoding network. The invention improves the feature extraction capability, is lighter, enhances the network training performance and improves the reasoning efficiency; the image segmentation performance is improved, and meanwhile, the segmentation efficiency is considered.

Description

Lightweight image segmentation method, system, medium, terminal and application

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a lightweight image segmentation method, a lightweight image segmentation system, a lightweight image segmentation medium, a lightweight image segmentation terminal and application.

Background

At present, the utilization of information received by human depends on the processing of image information, and image segmentation is an important method for image preprocessing and an important link for extracting image value. Image segmentation refers to the decomposition of an image into sets of non-overlapping regions. Image segmentation has a wide range of applications, such as scene object segmentation, automatic driving, remote sensing image analysis, medical image analysis, and other fields.

Traditional segmentation algorithms such as threshold segmentation, edge detection segmentation, region segmentation and the like are long in time consumption, and only low-level features such as colors, shapes, textures and the like of images can be extracted, so that the segmentation effect on images with complex and various features is poor.

In recent years, with the rapid iteration of computer software and hardware, deep learning is rapidly developed in the aspect of computer vision, and image segmentation by utilizing deep learning becomes a research hotspot. The U-Net network model is a full convolution neural network with a coding and decoding structure, wherein the coding network part is used for extracting target characteristic information, and the decoding network part is used for recovering the detail information of an image and performing up-sampling on a characteristic image by using deconvolution to recover the size of an input picture. A jump connection structure is used between corresponding stages of the coding part and the decoding part, low-level feature information is multiplexed, and image detail information is better restored.

However, with the rapid development of the society, more and more tasks are applied to more and more complex scenes, the requirements on the segmentation algorithm are higher and higher in the complex scenes, the complex background and various class targets cause inaccuracy of the segmentation result of the segmentation algorithm, meanwhile, the deep learning algorithm often needs to calculate a large number of parameters, and the real-time performance is difficult to meet in the environment with poor hardware facilities.

Through the above analysis, the problems and defects of the prior art are as follows: the existing segmentation method has poor image segmentation performance effect in a complex scene, a network model is easy to have the phenomena of wrong segmentation and missed segmentation, the network parameter quantity and the calculated quantity are large, and the training and reasoning efficiency is low.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a light-weight image segmentation method, a light-weight image segmentation system, a light-weight image segmentation medium, a light-weight image segmentation terminal and application, and particularly relates to a light-weight image segmentation method based on U-Net.

The invention is realized in such a way that a light-weight image segmentation method based on U-Net comprises the following steps:

acquiring a training data set, and cutting and data enhancing the images and the real label images in the training data set;

constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by using the images in the cut training data set and the real label images;

and thirdly, carrying out lightweight image segmentation by using the trained improved U-Net coding and decoding network.

Further, the cutting and data enhancement processing of the image in the training data set and the real label map in the step one includes:

the method comprises the steps of cutting images and real label images in a training data set into 256 × 256 sizes in a sliding window mode with 256 pixels as step lengths, conducting oversampling cutting on objects with less class occupation ratio in data in order to relieve the problem of class imbalance of the data set, increasing the occupation ratio of the objects with less classes, and adopting a data enhancement mode in order to enhance the generalization and robustness of a network, specifically turning cut images at 90 degrees, 180 degrees and 270 degrees through an OpenCV library, and simultaneously conducting online enhancement by using a PyTorch depth learning framework to enable the images in the same batch to be turned over at random level.

Further, in step two, the codec network of the improved U-Net comprises:

the encoding part for replacing the original U-Net network by utilizing an EfficientNet V2-S network is an encoder taking the EfficientNet V2-S network as U-Net, 5 characteristic diagrams of the encoder outputs 1/2, 1/4, 1/8, 1/16 and 1/32 are utilized, and a multi-scale convolution fusion module is utilized for encoding;

and the decoding end is used for decoding the 5 characteristic maps based on the encoder outputs 1/2, 1/4, 1/8, 1/16 and 1/32 by using a convolution structure re-parameterization method and combining a channel attention mechanism to obtain a segmentation result.

Further, the encoding by using the multi-scale convolution fusion module includes:

a multi-scale convolution module and a down-sampling module;

the multi-scale convolution module is used for extracting features by convolution with different convolution kernel sizes and splicing and fusing feature graphs obtained by the multi-scale convolution;

and the down-sampling module is used for transmitting the fused characteristic information to a deep network.

Further, the multi-scale convolution module is sequentially provided with three convolution kernels with convolution kernels of 3 × 3, 5 × 5 and 7 × 7, three branch features for splicing, and a convolution for fusing of 3 × 3.

Further, the down-sampling module comprises:

a maximum pooling downsampling unit for performing feature map dimension change by 1 × 1 convolution;

a down-sampling unit for performing down-sampling by adopting a 3 × 3 convolution with a step length of 2;

and the fusion unit is used for fusing the maximum pooling downsampling unit and the downsampling unit in a corresponding element summation mode, and accessing the SiLU function to perform nonlinear activation.

Further, the decoding of the 5 feature maps based on the encoder outputs 1/2, 1/4, 1/8, 1/16, and 1/32 using a convolution structure re-parameterization method in combination with a channel attention mechanism comprises:

the characteristic graph of 1/32 is used as the input of a decoding end, and 1/2, 1/4, 1/8 and 1/16 characteristic graphs are used as characteristic information of jump connection to be spliced and fused with the decoding end;

expanding the size of the feature graph to 2 times of the original size by utilizing deconvolution processing, gradually restoring the feature graph to the resolution of the original image, and splicing the feature graph with the coding features output by each layer of the coding end; and mapping the feature map into a specific number of categories by using an output layer to predict pixel categories, so as to obtain a segmentation result.

Further, the decoding end includes:

the RepVGG module and the RepVGG-SE module;

the RepVGG module is used for enhancing the feature extraction capability and avoiding the problem of gradient disappearance;

and the RepVGG-SE module is used for improving the segmentation performance of the model, strengthening useful characteristics and inhibiting invalid information.

Further, the RepVGG module includes:

constructing an information flow as y ═ b (x) + g (x) + f (x), if x and f (x) are not matched in dimension, y ═ g (x) + f (x), wherein b (x), g (x) and f (x) are respectively connected by BN layer (identity branch connection), 1 × 1 volume layer and 3 × 3 convolution layer, converting from the trained model in reasoning, integrating a branched volume layer and BN layer, integrating a plurality of branch structures in a linear combination mode, wherein the network branch structure is equivalent to y ═ h (x), wherein h (x) is realized by only one 3 × 3 volume layer and passes through a ReLU activation layer.

Further, the RepVGG-SE module comprises:

on the basis of the RepVGG module, an SE channel attention mechanism is combined in an identity branch structure.

Further, in the second step, the training of the constructed codec network of the improved U-Net by using the image in the cut training data set and the real label graph includes:

inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network by adopting a training strategy combining warmup and cosine annealing to obtain a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; and when the loss function loss value is converged, obtaining the trained improved U-Net coding and decoding network.

Further, the loss function is as follows:

L_all＝λL_SCE+μL_Dice；

wherein L is_CERepresents the cross entropy loss, L_SCERepresents the cross entropy loss, L, smoothed using a label_DiceDenotes the loss of Dice, L_allRepresenting the total loss function, N the total number of samples, M the number of classes, y_icRepresenting the real category of the sample i, if the real category of the sample i is c, taking 1, otherwise, taking 0; y is_ic' denotes the true class of sample i smoothed with a label, p_icRepresenting the prediction result of the sample i belonging to the class c, epsilon represents the hyperparameter of the smoothing quantity, | X ≦ Y | represents the intersection of the set X and Y, and X and Y represent the number of the elements of the set X and Y; for the segmentation task, X denotes a group Truth segmented image, Y denotes a predicted segmented image, and λ and μ denote weighting coefficients.

Further, the light-weight image segmentation method based on the U-Net comprises the following steps: cutting the training data by 256 steps, performing data enhancement operation, inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network, and obtaining a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; and when the loss function loss value is converged, obtaining the trained improved U-Net coding and decoding network. And reasoning the image to be segmented by using the trained improved U-Net coding and decoding network to obtain a segmentation result graph corresponding to the image.

Another object of the present invention is to provide a lightweight image segmentation system including:

the training data set image and real label image cutting processing module is used for acquiring a training data set and cutting the images in the training data set and the real label images;

the improved U-Net coding and decoding network training module is used for constructing an improved U-Net coding and decoding network and training the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label images;

and the lightweight image segmentation module is used for carrying out lightweight image segmentation by utilizing the trained improved U-Net coding and decoding network.

Another object of the present invention is to provide an information data processing terminal for implementing the method for lightweight image segmentation based on U-Net.

Another object of the present invention is to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to execute the method for lightweight image segmentation based on U-Net.

The invention also aims to provide an application of the light-weight image segmentation method based on the U-Net in image segmentation in the fields of scene object segmentation, automatic driving, remote sensing image analysis and medical image analysis.

In combination with the technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:

first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:

the U-Net network model has poor image segmentation performance in a complex scene, the network model is easy to have the phenomena of wrong division and missed division, the network parameter quantity and the calculated quantity are large, and the training and reasoning efficiency is low. The U-Net network model is limited in segmentation performance in a complex scene, wrong segmentation and missed segmentation are easy to occur in a segmentation result, the essential reason is that the network model is limited in feature extraction capability, feature information of a target cannot be fully extracted in the complex scene, an image has rich spatial texture information and background information, the difficulty of feature extraction is increased, and meanwhile, the difficulty of correct classification of the classes is increased due to the fact that the scales of the objects in the classes in the image are changeable, the shapes of the objects are different in size, the edge structure of the image is complex and the like. For the low efficiency of network training and reasoning, the parameters and the calculated amount of a network model need to be reduced, so that the network becomes lighter. The method is based on deep learning, uses images as input, uses an EfficientNet V2-S network as a U-Net coding network, and enhances the feature extraction capability; a convolutional structure re-parameterization method is used in a decoding part, a multi-branch structure is adopted during training, the problem of gradient disappearance is avoided, meanwhile, the multi-branch structure enables a network to capture and utilize spatial information of feature maps with different scales, compared with a single-path architecture, the branch structure further integrates detail and semantic information, the extracted feature expression capacity is stronger, the training performance is improved, and a channel attention mechanism is combined, so that the extraction of effective features is enhanced, and useless information is suppressed; the feature map containing more feature information of different scales is obtained by combining a multi-scale convolution module, and the feature information is transmitted to a deep network through a down-sampling module, so that the extraction capability of the network on target features of different scales is improved, and the context information is better combined. Deep separable convolution is largely used in the EfficientNet V2-s, so that the overall parameter quantity and the calculated quantity of the network are reduced, and meanwhile, the deep separable convolution is replaced by common 3 multiplied by 3 convolution in a shallow Fused-MBConv module, so that the network speed can be obviously improved; the decoding network starts from 256 channels to the last 24 channels, compared with the original U-Net network, the number of the channels is greatly reduced from 1024, which is also the key point for greatly reducing the parameter and the calculation amount; after training, the branch structures are fused into a common 3 x 3 convolution, so that the parameter quantity and the calculated quantity during reasoning can be further reduced on the basis of the branch structures, and the advantage of optimizing and accelerating the 3 x 3 convolution by the bottom layer is fully utilized, so that the running speed during network reasoning is further increased. Experimental results show that the method improves the image segmentation performance and the segmentation efficiency.

Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows:

the invention improves the image segmentation performance and considers the segmentation efficiency at the same time. The method is based on deep learning, uses the image as input, uses EfficientNet V2-S as a coding network, improves the feature extraction capability and is lighter; when a convolutional structure re-parameterization method is used for training, the multi-branch structure is used for enhancing the network training performance, the multi-branch structure is fused into a single-path structure during reasoning, the reasoning efficiency is improved, a channel attention mechanism is used in branches, useful features are strengthened, and invalid information is suppressed; the network coding part is combined with a multi-scale convolution module to fuse multi-scale features, and a down-sampling module is used for transmitting shallow information to a deep network, so that the extraction capability of the network on target features of different scales is enhanced, and context information is better combined.

Drawings

FIG. 1 is a schematic diagram of a lightweight image segmentation method based on U-Net according to an embodiment of the present invention.

FIG. 2 is a flowchart of a lightweight image segmentation method based on U-Net according to an embodiment of the present invention.

Fig. 3 is an overall structure diagram of the U-Net network according to the embodiment of the present invention.

Fig. 4 is an overall structure diagram of the deep learning network according to the embodiment of the present invention.

Fig. 5 is a structural diagram of a RepVGG module provided by an embodiment of the invention.

Fig. 6 is a block diagram of an SE channel attention module according to an embodiment of the present invention.

Fig. 7 is a structural diagram of a ReptVGG-SE module provided by an embodiment of the invention.

Fig. 8 is a structural diagram of a multi-scale convolution module according to an embodiment of the present invention.

Fig. 9 is a block diagram of a downsampling module according to an embodiment of the present invention.

Fig. 10 is a graph illustrating a learning rate variation of a training strategy according to an embodiment of the present invention.

Fig. 11 is an image and real label cutting diagram provided by an embodiment of the present invention.

Fig. 12 is a diagram of an image segmentation result provided by the embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

First, an embodiment is explained. This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.

As shown in fig. 1, the principle of the lightweight image segmentation method based on U-Net according to the embodiment of the present invention includes: cutting the training data by 256 steps, performing data enhancement operation, inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network, and obtaining a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; and when the loss function loss value is converged, obtaining the trained improved U-Net coding and decoding network.

As shown in fig. 2, a light-weight image segmentation method based on U-Net according to an embodiment of the present invention includes:

s101, acquiring a training data set, and cutting images in the training data set and a real label image;

s102, constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by using the images in the cut training data set and the real label images;

and S103, carrying out light-weight image segmentation by using the trained improved U-Net coding and decoding network.

The method for cutting the images in the training data set and the real label images comprises the following steps:

The encoding and decoding network of the improved U-Net provided by the embodiment of the invention comprises:

The encoding by using the multi-scale convolution fusion module provided by the embodiment of the invention comprises the following steps:

a multi-scale convolution module and a down-sampling module;

The multi-scale convolution module provided by the embodiment of the invention is sequentially provided with three convolution kernels with convolution kernels of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7, three branch characteristics for splicing processing and a convolution for fusing 3 multiplied by 3.

The down-sampling module provided by the embodiment of the invention comprises:

The 5 feature maps based on the encoder outputs 1/2, 1/4, 1/8, 1/16 and 1/32 provided by the embodiment of the invention are decoded by a convolution structure reparameterization method and a channel attention mechanism, and the decoding comprises the following steps:

The decoding end provided by the embodiment of the invention comprises:

the RepVGG module and the RepVGG-SE module;

and the RepVGG-SE module is used for enhancing useful features.

The RepVGG module provided by the embodiment of the invention comprises:

The RepMVGG-SE module provided by the embodiment of the invention comprises:

The training of the constructed encoding and decoding network of the improved U-Net by using the images in the cut training data set and the real label graph, which is provided by the embodiment of the invention, comprises the following steps:

inputting the images in the cut training data set into the constructed improved U-Net coding and decoding network by adopting a training strategy combining wartup and cosine annealing to obtain a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; and when the loss function loss value is converged, obtaining the trained improved U-Net coding and decoding network.

The loss function provided by the embodiment of the invention is as follows:

L_all＝λL_SCE+μL_Dice；

wherein L is_CERepresents the cross entropy loss, L_SCERepresents the cross-entropy loss, L, using label smoothing_DiceDenotes the loss of Dice, L_allRepresenting the total loss function, N the total number of samples, M the number of classes, y_icRepresenting the real category of the sample i, if the real category of the sample i is c, taking 1, otherwise, taking 0; y is_ic' real class, p, representing sample i smoothed with a label_icRepresenting the prediction result of the sample i belonging to the class c, epsilon represents the hyperparameter of the smoothing quantity, | X ≦ Y | represents the intersection of the set X and Y, and X and Y represent the number of the elements of the set X and Y; for the segmentation task, X denotes a group Truth segmented image, Y denotes a predicted segmented image, and λ and μ denote weighting coefficients.

The present invention also provides a lightweight image segmentation system including: the training data set image and real label image cutting processing module is used for acquiring a training data set and cutting the images in the training data set and the real label images;

And II, application embodiment. In order to prove the creativity and the technical value of the technical scheme of the invention, the part is an application example of the technical scheme of the claims to a specific product or related technology.

The lightweight image segmentation method based on the U-Net is based on deep learning, an improved U-Net coding and decoding network is constructed, images serve as input of the network, and the network is output as a segmentation result graph of the corresponding images; the image segmentation method applied to remote sensing image segmentation comprises the following steps:

s1, preparing a training data set, "AI classification and recognition competition of CCF satellite images", cutting a data set image and a real label image into 256 × 256 sizes in a sliding window format with 256 pixels as step lengths, performing oversampling cutting on an object with less category in the data, and performing data enhancement, specifically, turning the cut image at 90, 180, and 270 degrees by an OpenCV library, and performing online enhancement by using a PyTorch deep learning framework to turn the image of the same batch at random horizontally, as shown in fig. 11;

s2, based on the U-Net network, as shown in fig. 3, includes: the U-Net is composed of an encoding part and a decoding part. In the coding network, before each down-sampling, 2 convolution layers with convolution kernel of 3 × 3 are used for feature extraction, after convolution, a ReLU activation function is used, and maximum pooling operation with size of 2 × 2 is used to reduce feature dimensionality and increase receptive field. And each time downsampling is carried out, the size of the image is reduced by half, the dimensionality is doubled, and high-level features of the image can be fully extracted and unnecessary information can be filtered out through repeated operation. The decoding part performs up-sampling by using deconvolution, and after the up-sampling, 2 convolution layers with convolution kernels of 3 x 3 are also used, so that the detail information of the image is gradually restored, and finally the feature map is restored to the size of the input picture. And each time the up-sampling is carried out, the image size is doubled, and the dimensionality is halved. A jump connection structure is used between corresponding stages of the coding part and the decoding part, low-level feature information is multiplexed, and image detail information is better restored.

The U-Net network model is improved, and the whole network is divided into encoding and decoding, as shown in FIG. 4, including: the whole network is divided into an encoding part and a decoding part, the encoding part of an original U-Net network is replaced by an EfficientNet V2-S network, the EfficientNet V2-S network is composed of a Fused-MBConv module and an MBConv module, the encoder has 5 outputs, namely 5 characteristic diagrams of 1/2, 1/4, 1/8, 1/16 and 1/32, the characteristic diagram of 1/32 is used as the input of a decoding end, and the other 4 characteristic diagrams are used as encoding characteristics of jump connection to be spliced and Fused with the decoding end. And the decoding part expands the size of the feature map to 2 times of the original size through deconvolution operation, gradually restores the feature map to the resolution of the original image, performs splicing operation with 4 coding features of the coding part to fuse and obtain more feature information and detail information, and finally maps the feature map into a specific number of categories through an output layer to perform pixel category prediction to obtain a segmentation result. In the coding network, the enhancement processing is carried out by introducing a RepVGG-SE module which is a multi-branch structure, the multi-branch structure is kept during training, compared with a single-path architecture, the branch structure further integrates detail information and semantic information, the extracted feature expression capability is stronger, the training performance is improved, the convolution structure is subjected to reparameterization conversion into a 3 x 3 convolution and SEBlock during reasoning, the reasoning speed is improved, the memory is saved, the SE channel attention mechanism in the knowledge branch structure can establish the dependency relationship on the channel dimension, the expression of useful information is enhanced, and the invalid feature is inhibited. When the input dimension and the output dimension do not match, the RepVGG module has no identity branch, only 1 × 1 and 3 × 3 convolution branches. The multi-scale convolution module extracts features by convolution with different convolution kernel sizes, then splices and fuses feature maps obtained by the multi-scale convolution, and injects shallow information into a deep network through the down-sampling module, so that the multi-scale feature information is better fused with each layer of the network, and meanwhile, the detail information lost in the deep network is compensated.

The encoding part replaces the encoding part of the original U-Net network with an EfficientNet V2-S network, the encoder has 5 outputs, namely 5 characteristic diagrams of 1/2, 1/4, 1/8, 1/16 and 1/32, the characteristic diagram of 1/32 is used as the input of a decoding end, and the other 4 characteristic diagrams are used as encoding characteristics of jump connection to be spliced and fused with the decoding end;

in the coding network, a multi-scale convolution module, as shown in fig. 8, performs feature extraction by convolution with different convolution kernel sizes, then performs splicing and fusion on feature maps obtained by the multi-scale convolution, and injects shallow information into a deep network through a down-sampling module, as shown in fig. 9.

In fig. 8, the multi-scale convolution module is sequentially provided with three convolution kernels having convolution kernels with sizes of 3 × 3, 5 × 5, and 7 × 7, three branch features for performing splicing processing, and one convolution with 3 × 3 for performing fusion.

In fig. 9, the down-sampling module includes two branches, one branch is maximum pooling down-sampling and changes the feature map dimension by 1 × 1 convolution, one branch is down-sampling by 3 × 3 convolution with a step length of 2, the last two branches are fused by corresponding element summation, and the SiLU function is accessed for nonlinear activation.

A decoding part, which enlarges the size of the feature graph to 2 times of the original size through deconvolution operation, gradually restores the feature graph to the original image resolution size, performs splicing operation with the coding features output by each layer of the coding part, and finally maps the feature graph into a specific number of categories through an output layer for pixel category prediction to obtain a segmentation result;

in the coding network, deepening processing is performed by introducing a ReptVGG module and a ReptVGG-SE module, as shown in FIGS. 5 and 7 (on the basis of the ReptVGG module, an attention mechanism of an SE channel is combined in an identity branch structure) respectively.

In fig. 5, the information stream is constructed as y ═ b (x) + g (x)) + f (x), if x and f (x) are not matched in dimension, y ═ g (x)) + f (x), where b (x), g (x) and f (x) are x connected by BN (identity branch), 1 × 1 volume and 3 × 3 convolved peer branch, respectively, and when inference is made, conversion is performed from the trained model, the volume layer and BN layer of one branch are integrated, a plurality of branch structures are integrated by linear combination, 1 × 1 convolution corresponds to a degenerated 3 × 3 convolution, and the 1 × 1 convolution kernel is filled with 0 to obtain 3 × 3 convolution, identity branch is a special 1 × 1 convolution, unit matrix is used as matrix kernel, and by transformation, the vgg module has only one 3 × 3 convolution kernel, two 1 × 1 convolution kernels and three bias parameters, the three bias parameters can be directly combined into one bias by add, the new convolution kernel can be obtained by adding the 1 × 1 convolution kernel parameter to the center point of the 3 × 3 convolution kernel, all branch characteristics and final bias are allocated to a new 3 × 3 bias convolution, and the network branch architecture is equivalent to y ═ h (x), wherein h (x) is only the final 3 × 3 bias convolution, and the layer is activated by the ReLU.

The two modules are of a multi-branch structure, the branch structure is kept during training, and the convolutional structure reparameterization is converted into a 3 x 3 convolution and SEBlock during reasoning; the SE channel attention mechanism in the identity branch structure, as shown in fig. 6, includes: the SEBlock is divided into two processes of Squeeze and Excitation, firstly, global average pooling operation is carried out on a feature map to obtain global compression feature quantity of a current feature map, then the feature map enters two full-connection layers, the first full-connection layer carries out dimensionality reduction processing, the second full-connection layer carries out dimensionality enhancement processing to obtain the number of original channels, more nonlinear processing processes are added to fit complex correlation among the channels, finally, a Sigmoid layer is accessed to generate attention weight of 0-1 among the channels, and full multiplication operation is carried out on the attention weight and the original feature map, so that the feature with the largest information quantity is enhanced, and useless features are restrained.

The method can establish the dependency relationship on the channel dimension, enhance the expression of useful information and inhibit invalid characteristics. When the input dimension and the output dimension do not match, the RepVGG module has no identity branch, only 1 × 1 and 3 × 3 convolution branches.

S3, inputting the cut picture into a segmentation model;

s4, the segmentation result graph of the current picture is output by the model;

s5, comparing the segmentation result image with the real label image, and calculating to obtain a loss function loss value;

the loss function refers to a cross entropy loss function with label smoothing and a Dice loss function, alleviates the problems of overfitting and class imbalance, and then weights the two together to form a final loss function:

L_all＝λL_SCE+μL_Dice

in the formula, L_CEFor cross entropy loss, L_SCEFor cross-entropy loss using label smoothing, L_DiceFor loss of Dice, L_allTotal loss function, N represents total number of samples, M represents number of classes, y_icTo represent the true class of sample i, 1 is taken if the true class of sample i is c, otherwise 0, y is taken_ic' denotes the true class of sample i smoothed with a label, p_icIndicating that the sample i belongs to the prediction result of class c, ∈ indicates the hyper-parameter of the smoothing quantity, where 0.1 is taken, | X ≦ Y | indicates the intersection of sets X and Y, X and Y indicate the number of their elements, X indicates the group Truth split image, Y indicates the predicted split image, and λ and μ are weighting coefficients, where both are taken to be 0.5.

And S6, when the loss function loss value is not converged, updating the model parameters through back propagation, and when the loss value is converged, indicating that the segmentation model is trained completely.

The model training strategy uses a combination of warmup and cosine annealing, as shown in FIG. 10.

And thirdly, evidence of relevant effects of the embodiment. The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.

TABLE 1 comparison of the model results

TABLE 2 model lightweight comparison

Through the implementation process, the model is trained and tested through an NVIDIA GeForce 2070S (8G) display card, as shown in FIG. 12, a remote sensing image, a real label graph, an FCN-8S model segmentation result graph, a SegNet model segmentation result graph, a DeeplabV3+ model segmentation result graph, a U-Net model segmentation result graph, a Unet + + model segmentation result graph, a U2-Net model segmentation result graph, an Attention U-Net model segmentation result graph and the segmentation result graph are sequentially arranged from left to right, and by combining the graph 12 and the table 1, the evaluation indexes of all the comparison methods are the highest, the phenomena of wrong division and missing division of each target are better improved, meanwhile, the targets with different scales can be better identified and completely divided, the targets are closer to the label pictures, and the division details are finer than those of other networks. As can be seen from Table 2, the parameters and the calculated amount of the method are greatly reduced compared with the original U-Net network, the parameter amount is reduced by 9.64M, the calculated amount is reduced by nearly 88%, meanwhile, the training time and the testing time are also greatly reduced by more than 50%, and the method has the advantages in all methods. The results of the experiments in conjunction with fig. 12 and table 1 and table 2 show that the method of the present invention has higher segmentation performance and lighter weight. The method provided by the invention has obvious effect.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A lightweight image segmentation method, comprising:

acquiring a training data set, and cutting images in the training data set and a real label image;

2. The method for lightweight image segmentation as set forth in claim 1, wherein the cutting the images in the training dataset and the true label map comprises:

cutting the images and the real label map in the training dataset to 256 × 256 dimensions;

the encoding and decoding network of the improved U-Net comprises:

3. The lightweight image segmentation method according to claim 2, wherein the encoding using the multi-scale convolution fusion module includes:

a multi-scale convolution module and a down-sampling module;

the down-sampling module is used for transmitting the fused feature information to a deep network;

the multi-scale convolution module is sequentially provided with three convolution kernels with convolution kernels of 3 x 3, 5 x 5 and 7 x 7, three branch characteristics for splicing processing and a convolution for fusing 3 x 3;

the down-sampling module comprises:

4. The method of light-weighted image segmentation as set forth in claim 2, wherein the decoding of the 5 feature maps based on the encoder outputs 1/2, 1/4, 1/8, 1/16, and 1/32 using a convolution structure re-parameterization method in combination with a channel attention mechanism comprises:

expanding the size of the feature graph to 2 times of the original size by utilizing deconvolution processing, gradually restoring the feature graph to the resolution of the original image, and splicing the feature graph with the coding features output by each layer of the coding end; mapping the feature map into a specific number of categories by using an output layer to predict pixel categories, and obtaining a segmentation result;

the decoding end includes:

the system comprises a RepVGG module, a RepVGG-SE module, a multi-scale convolution module and a down-sampling module;

the RepVGG module is used for enhancing the feature extraction capability and avoiding gradient disappearance;

the RepVGG-SE module is used for enhancing useful characteristics;

the RepVGG module includes:

constructing an information flow as y ═ b (x) + g (x)) + f (x), if x and f (x) are not matched in dimension, y ═ g (x)) + f (x), wherein b (x), g (x) and f (x) are respectively connected through BN layer, 1 × 1 convolution layer and 3 × 3 convolution layer branches, converting from a trained model in inference, integrating a branched convolution layer and BN layer, integrating a plurality of branch structures in a linear combination mode, and realizing a network branch architecture equivalent to y ═ h (x), wherein h (x) is realized by only one 3 × 3 convolution layer and passes through a ReLU activation layer;

the RepVGG-SE module comprises:

5. The method for lightweight image segmentation as set forth in claim 1, wherein the training of the constructed codec network of the improved U-Net using the images in the cut training data set and the true label map comprises: inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network by adopting a training strategy combining warmup and cosine annealing to obtain a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; when the loss function loss value is converged, obtaining a trained improved U-Net coding and decoding network;

the loss function is as follows:

L_all＝λL_SCE+μL_Dice；

wherein L is_CERepresents the cross entropy loss, L_SCERepresents the cross-entropy loss, L, using label smoothing_DiceDenotes the loss of Dice, L_allRepresenting the total loss function, N the total number of samples, M the number of classes, y_icRepresenting the real category of the sample i, if the real category of the sample i is c, taking 1, otherwise, taking 0; y is_ic' denotes the true class of sample i smoothed with a label, p_icRepresenting the prediction result of the sample i belonging to the class c, epsilon represents the hyperparameter of the smoothing quantity, | X ≦ Y | represents the intersection of the set X and Y, and X and Y represent the number of the elements of the set X and Y; for the segmentation task, X denotes a group Truth segmented image, Y denotes a predicted segmented image, and λ and μ denote weighting coefficients.

6. The method for lightweight image segmentation according to claim 1, wherein the method for lightweight image segmentation based on U-Net further comprises: cutting the training data by 256 steps, performing data enhancement operation, inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network, and obtaining a segmentation result graph corresponding to the training images; comparing the obtained segmentation result graph with the real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if not, updating the model parameters through back propagation; when the loss function loss value is converged, obtaining a trained improved U-Net coding and decoding network; and (3) reasoning the image to be segmented by using the trained improved U-Net coding and decoding network to obtain a segmentation result graph corresponding to the image.

7. A lightweight image segmentation system for implementing the lightweight image segmentation method of claim 1, the lightweight image segmentation system comprising:

8. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the light-weight image segmentation method based on U-Net according to any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the U-Net based lightweight image segmentation method according to any one of claims 1 to 6.

10. Use of the method for lightweight image segmentation based on U-Net as claimed in any of claims 1-6 in image segmentation in the fields of scene object segmentation, automatic driving, remote sensing image analysis, and medical image analysis.