CN116188778A

CN116188778A - Double-sided semantic segmentation method based on super resolution

Info

Publication number: CN116188778A
Application number: CN202310159918.XA
Authority: CN
Inventors: 刘顺
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-05-30

Abstract

The invention discloses a double-side semantic segmentation method based on super resolution, which is based on a super resolution technology and a attention mechanism technology, adopts a double-branch semantic segmentation method, has higher flexibility, and can improve the accuracy of image segmentation under the condition of not increasing extra calculation in the low-resolution field; according to the invention, a more advanced semantic segmentation method is replaced by exchanging a main network of a main branch, and simultaneously, an image channel and pixels are associated and fused to obtain a high-accuracy segmentation result under high resolution; and sending the segmentation result to a fusion module to guide the main branch to carry out segmentation learning.

Description

Double-sided semantic segmentation method based on super resolution

Technical Field

The invention relates to the field of image semantic segmentation, in particular to a bilateral semantic segmentation method based on super resolution.

Background

Over the past decade, deep learning-based machine learning related techniques have gained widespread social attention. For example, the automatic driving technology of new energy automobiles, which has been developed in recent years, has been gradually tried in the hands of the general public. The basis for making this technology possible is an image segmentation technique that gives the machine the ability to identify roads, pedestrians, traffic control lights, and ground markings. However, in the whole deep learning process, a proper algorithm is needed and a sufficient amount of original road surface pictures for machine learning are provided, and in the process, an image semantic segmentation technology is utilized.

At present, the image semantic segmentation technology is mainly applied to the fields of land segmentation, automatic driving, face segmentation, clothing classification, precise agriculture and the like. Meanwhile, the problems which are not solved yet exist in various fields, for example, in land segmentation, large-scale publicly available data sets are needed for monitoring regional forest cutting, urban progress, urban planning and the like. Also for example in the field of autopilot, it is necessary to sense, plan and execute the corresponding commands in a constantly changing environment. In the task, the security guarantee is a non-negligible link. This task needs to be performed with the highest accuracy. Semantic segmentation provides free space information on roads in this task, as well as detecting ground markers and traffic markers. However, the trade-off between real-time and accuracy of segmentation remains a challenge for this task.

In recent years, face recognition technology has been applied quite widely, and this technology often needs to be executed on a smaller device. A problem that follows is the need for fast, high precision segmentation of pictures with relatively low pixels. In the current semantic segmentation technology, the input of high-resolution images is accompanied with the best precision, which means that the calculation amount is increased sharply, and the calculation is difficult or even impossible on a mobile device or a terminal with lower configuration. And reducing the resolution of the image input into the semantic segmentation model brings about a sharp drop in segmentation accuracy. According to experiments, taking the current popular semantic segmentation technique deep labv3+ as an example, the accuracy of segmentation at 512 x 1024 resolution as input is reduced from 70% to 63.2% compared to 448 x 896 resolution as input, and only 56.5% is used in case 256 x 512 is used as input. The exact drop is quite obvious. In a scene requiring accurate recognition, the requirement of practical use is obviously not met, so that it is necessary to provide a segmentation method capable of performing rapid calculation under the condition of being as low as possible and improving accuracy.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.

The present invention has been made in view of the above-described problems occurring in the prior art. Therefore, the invention provides a bilateral semantic segmentation method based on super resolution, which is used for solving the problems of large calculation amount and low calculation speed of extra data generated by image semantic segmentation in actual problems, and meanwhile, when the input of a high-resolution image is pursued, part of mobile equipment and terminal equipment with low configuration cannot be calculated, so that the image semantic segmentation precision is greatly reduced.

In order to solve the technical problems, the invention provides the following technical scheme:

the invention provides a bilateral semantic segmentation method based on super resolution, which comprises the following steps:

the acquired image is input into a segmentation grid of the main branch to obtain a corresponding feature map;

the feature map is sent into a slave branch to obtain a new feature map;

calculating an image channel and a pixel respectively by two sub-branches in the sub-branch;

the image channel and the pixels are associated and fused to obtain a high-accuracy segmentation result under high resolution;

and sending the segmentation result into a fusion module, and guiding the main branch to carry out segmentation learning by combining the fusion result.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the image is input into a segmentation grid of a main branch to obtain a corresponding feature map, and the steps comprise:

through a backbone network, an input image is input into a pooling layer through a convolution layer, so that parameters in a parameter matrix and the number of parameters in a subsequent convolution layer are reduced, and meanwhile, the phenomenon of model overfitting is relieved; and then the image is operated for 3 times through the convolution layer by the same method, and a corresponding characteristic diagram is obtained.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the feature map is sent into a secondary branch, and a new feature map is obtained through reconstruction, and the method comprises the following steps: reconstructing a high-resolution picture from the branch, and inputting the high-resolution picture into the sub-branch to obtain a new feature map.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the constructing a high resolution picture from a branch includes:

the sub-pixel convolution-based mode is adopted from the branch, and fine granularity structural information input in high resolution is effectively rebuilt according to the result of the feature map obtained by the main network, namely a single image super-resolution module;

as a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: by two sub-branches of the slave branch, comprising: the sub-branches are divided into an inter-channel attention module and an inter-pixel attention module.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the inter-channel attention module calculates an image channel and pixels, and the method comprises the following steps:

carrying out reshape operation on the feature map A with the specification of H multiplied by W multiplied by C to obtain a feature map C multiplied by N, and then carrying out softmax operation once;

the characteristic diagram with the specification of C multiplied by C, which is obtained after softmax operation, is marked as X, the characteristic diagram is transposed and multiplied by B again, H multiplied by W multiplied by C is obtained through reshape operation, and then a coefficient beta is multiplied to obtain the characteristic diagram, which is marked as D;

adding the obtained feature map D with the feature map A to obtain a final result;

the initial value of the coefficient β is 0, which is the optimal value obtained after the debugging by deep learning.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the inter-pixel attention module calculates an image channel and pixels, comprising:

the characteristic diagram A with the specification of H multiplied by W multiplied by C is subjected to a convolution layer to obtain a new characteristic diagram B with the specification of C multiplied by H multiplied by W; then carrying out reshape operation on the obtained product so that the size is changed into C multiplied by N, wherein N=H multiplied by W, and obtaining a new characteristic diagram S with the specification of N multiplied by N after softmax operation; the sum of each row in S is 1, S _ij The pixel weight of the pixel at the j position to the pixel at the i position can be understood as that the sum of the weights of all the pixels j to a certain fixed pixel i is 1;

the obtained feature map S is transposed and then multiplied by a feature map B with the specification of C multiplied by N and subjected to reshape operation to obtain a feature map with the specification of C multiplied by N, and the feature map S is subjected to reshape operation to obtain a feature map with the specification of C multiplied by H multiplied by W;

multiplying the obtained characteristic diagram with the specification of C multiplied by H multiplied by W by a coefficient alpha, and then adding the characteristic diagram with the characteristic diagram A to obtain a final result;

the initial value of the coefficient α is 0, which is the optimal value obtained after the debugging through deep learning.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the image channel and the pixel are associated and fused to obtain a high-accuracy segmentation result under high resolution, comprising the following steps: and carrying out element summation on the two convolution feature graphs generated in the inter-pixel attention module and the inter-channel attention module, and then sending the generated new feature graph into a convolution layer to obtain a final segmentation result.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the feature map performs element summation, including: the two convolution feature graphs are respectively connected with one convolution layer, and then the summation operation of the two convolution layers is carried out.

As a preferable scheme of the super-resolution-based bilateral semantic segmentation method, the invention comprises the following steps: the segmentation result is sent to a fusion module, and the main branch is guided to carry out segmentation learning by combining the fusion result, comprising the following steps:

carrying out feature fusion on the feature map obtained in the claim 2 and the new feature map generated in the claim 8, so that the low-resolution picture obtains additional picture structure information;

searching a local optimal solution through a loss function of feature fusion, matching more proper super parameters, guiding semantic segmentation of a main branch and guiding segmentation learning of a module, and finally achieving an optimal semantic segmentation result;

the loss function for feature fusion is expressed as follows:

L＝L _ce +w ₁ L _msc

wherein y is _i Representing the probability of classification, S (X _i ) Representing a super-resolution output picture obtained by a super-resolution module, N represents the sum of pixel points of the current picture, and p _i The probability of judging as the target class y under the condition that the pixel point i is output in the main branch segmentation grid is represented; y is Y _i Representing the true classification of the current pixel point; w (w) ₁ Is a super parameter, can be adjusted in the actual segmentation learning process, and is generally set to be 0.1; l is the total loss function, consisting of multiple cross entropies, L _ce Loss functions commonly used for semantic segmentation, i.e. cross entropy, L _msc Is the mean square error.

Compared with the prior art, the invention has the beneficial effects that: the method has the advantages that the calculation speed is remarkably improved, and the precision of semantic segmentation is improved under the condition that the calculation amount is not additionally increased; carrying out feature reconstruction on a feature image generated under low resolution in a super-resolution mode to obtain fine granularity information under high resolution, wherein the high resolution image has clearer category classification; meanwhile, the serial attention mechanism modules also improve the segmentation accuracy; more importantly, the super-resolution module and the attention mechanism module of the branch can be deleted in the actual reasoning stage, and only a higher calculated amount is needed in the training process; under the same basic network architecture, the network architecture added with the branch can improve the segmentation precision by 3-5% under the low resolution input.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of a super-resolution based bilateral semantic segmentation method according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of an inter-channel attention module in a super-resolution based bilateral semantic segmentation method according to one embodiment of the present invention;

FIG. 3 is a schematic diagram of an inter-pixel attention module in a super-resolution based bilateral semantic segmentation method according to one embodiment of the present invention;

FIG. 4 is a graph of segmentation accuracy results of a super-resolution based bilateral semantic segmentation method according to one embodiment of the present invention;

fig. 5 is a schematic diagram of a super-resolution module of a super-resolution-based bilateral semantic segmentation method according to an embodiment of the present invention.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1, a first embodiment of the present invention provides a method for bilateral semantic segmentation based on super resolution, including:

s1, inputting an image into a segmentation grid of a main branch to obtain a corresponding feature map; taking VGG16 as an example, removing the last full connection layer to obtain a feature map with the specification of 14 multiplied by 512;

s2, dividing the obtained feature image into two parts, and sending the first part into a Decoder module corresponding to the VGG16, wherein the semantic segmentation result is obtained from the bilinear difference value to the size of the input image; the second part is sent to a super resolution module;

s3, calculating an image channel and pixels respectively through two sub-branches under super-resolution, an inter-channel attention module and an inter-pixel attention module;

s4, the image channels and the pixels are associated and fused to obtain a high-accuracy segmentation result under high resolution;

s5, sending the segmentation result into a fusion module, guiding the semantic segmentation of the main branch and the segmentation learning of the guiding module by combining the fusion result, and finally achieving the optimal semantic segmentation result;

by the semantic segmentation method of the double branches, the computing speed of the system can be effectively improved on the premise of not improving the computing amount of the system.

Example 2

Referring to fig. 2 and 3, a second embodiment of the present invention provides a super-resolution-based bilateral semantic segmentation method, which includes: a block diagram of an inter-channel attention module and a block diagram of an inter-pixel attention module;

the inter-picture channel attention module of fig. 2 refines to:

the inter-pixel attention module of fig. 3 refines to:

wherein, the initial value of the coefficient beta and the coefficient alpha is 0, which is the optimal value obtained after the deep learning and the debugging;

according to the invention, after a series of operations of the inter-channel attention module and the inter-pixel attention module on the feature map, the precision of image segmentation can be effectively improved.

Example 3

Referring to fig. 4, a third embodiment of the present invention provides a super-resolution-based bilateral semantic segmentation method, which includes:

2000 pictures in the CityScaps are randomly selected as a data set, the resolution is respectively adjusted to 256×512, 320×640, 384×768, 448×896 and 512×1024, and the semantic segmentation method adopted by the invention, for example VGG16, is respectively input to obtain MIOU, namely, the result of the homonymy ratio.

Example 4

Referring to fig. 5, a fourth embodiment of the present invention provides a bilateral semantic segmentation method based on super resolution, including:

taking VGG16 as an example, a 14×14×512 feature map is obtained; in order to achieve the super-resolution of 448×448×3, a feature map of 112×112×12 is obtained after two convolution layers, wherein 12 channels of each pixel point are arranged to obtain a 2×2×3 image, and after the rest pixels are subjected to the above operation, the generated images are spliced to obtain a reconstructed super-resolution image of 448×448×3, and the formula is expressed as:

I _SR ＝f _L (I _LR )

wherein f _L The image of H×W×C is convolved into H×W×r ² Combination of convolutional layer of C and subpixel convolutional layer, I _LR Is a low resolution picture, I _SR Is a super-resolution picture;

according to the invention, the characteristic reconstruction is carried out on the characteristic map generated under the low resolution by utilizing the mode based on super resolution, so that fine granularity information under the high resolution is obtained, and the image can be clearer.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. The bilateral semantic segmentation method based on super resolution is characterized by comprising the following steps of:

the feature map is sent into a slave branch to obtain a new feature map;

2. The method for bilateral semantic segmentation based on super resolution as in claim 1, wherein the image is input into a segmentation grid of a main branch to obtain a corresponding feature map, the steps comprising:

3. The super-resolution-based bilateral semantic segmentation method as in claim 1 or 2, wherein the feature map feeding-in sub-branch and reconstructing to obtain a new feature map comprises:

reconstructing a high-resolution picture from the branch, and inputting the high-resolution picture into the sub-branch to obtain a new feature map.

4. The super-resolution based bilateral semantic segmentation method as in claim 3, wherein the constructing a high resolution picture from a branch comprises:

and the sub-pixel convolution-based mode is adopted from the branch, and the fine granularity structural information input in high resolution is effectively reconstructed according to the result of the feature map obtained by the main network, namely the single-image super-resolution module.

5. The super resolution based bilateral semantic segmentation method as in claim 4, wherein the method comprises: the sub-branches are divided into an inter-channel attention module and an inter-pixel attention module.

6. The super-resolution based bilateral semantic segmentation method as in claim 4 or 5, wherein the inter-channel attention module calculates image channels and pixels, the step comprising:

7. The super-resolution based bilateral semantic segmentation method as in claim 6, wherein the inter-pixel attention module calculates an image channel and a pixel, comprising:

multiplying the characteristic diagram with the specification of C multiplied by H multiplied by W obtained in the previous step by a coefficient alpha, and then adding the characteristic diagram with the characteristic diagram A to obtain a final result;

8. The super-resolution-based bilateral semantic segmentation method as in claim 7, wherein the image channel and the pixel are fused in association to obtain a segmentation result with high accuracy at high resolution, comprising:

and carrying out element summation on the two convolution feature graphs generated in the inter-pixel attention module and the inter-channel attention module, and then sending the generated new feature graph into a convolution layer to obtain a final segmentation result.

9. The super-resolution based bilateral semantic segmentation method according to claim 7 or 8, wherein the feature map performs element summation, comprising:

the two convolution feature graphs are respectively connected with one convolution layer, and then the summation operation of the two convolution layers is carried out.

10. The method for bilateral semantic segmentation based on super resolution as in claim 9, wherein the segmentation result is sent to a fusion module, and the main branch is guided to perform segmentation learning by combining the fusion result, comprising:

and searching a local optimal solution through a loss function of feature fusion, matching more proper super parameters, guiding semantic segmentation of a main branch and guiding segmentation learning of a module, and finally achieving an optimal semantic segmentation result.