CN112580649B - Semantic segmentation method based on regional context relation module - Google Patents

Semantic segmentation method based on regional context relation module Download PDF

Info

Publication number
CN112580649B
CN112580649B CN202011478891.3A CN202011478891A CN112580649B CN 112580649 B CN112580649 B CN 112580649B CN 202011478891 A CN202011478891 A CN 202011478891A CN 112580649 B CN112580649 B CN 112580649B
Authority
CN
China
Prior art keywords
region
feature
module
model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011478891.3A
Other languages
Chinese (zh)
Other versions
CN112580649A (en
Inventor
刘明皓
杜江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011478891.3A priority Critical patent/CN112580649B/en
Publication of CN112580649A publication Critical patent/CN112580649A/en
Application granted granted Critical
Publication of CN112580649B publication Critical patent/CN112580649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention relates to a semantic segmentation method based on a regional context module, and belongs to the field of remote sensing image processing. The method comprises the following steps: s1: enhancing the remote sensing image; s2: constructing an RC-Module; s3: establishing a remote sensing image semantic segmentation model RC-Net based on RC-Moudele; s4: MIOU test and evaluation. The RC-moudlet is a derivative of an attention mechanism in the semantic segmentation model, so that the model can be guided to a certain degree by learning the context relation around each region, the model can learn the adjacent relation among the regions, the information amount for model classification is increased from the aspect of statistics, and the classification precision of semantic segmentation is increased. Meanwhile, the RC-Module is a plug-and-play Module and can be combined with any existing semantic segmentation model, so that the precision of the model is improved.

Description

Semantic segmentation method based on regional context relation module
Technical Field
The invention belongs to the field of remote sensing image processing, and relates to a semantic segmentation method based on a regional context module.
Background
Semantic segmentation is a dense prediction task that directly performs pixel-level prediction on each pixel of an image. At present, in the field of remote sensing images, due to the characteristics of high resolution and extremely high acquirability, an effective semantic segmentation model is extremely needed to be effectively applied to the remote sensing images, and an attention mechanism can be used for pertinently guiding the learning process of the semantic segmentation model, so that the model can learn more precise characteristic representation of the remote sensing images. Long et al applied the full convolution technique to the semantic segmentation domain for the first time in 2015, and greatly improved the overall domain in terms of accuracy for the first time, and the way of discarding the full connected layer was continuously learned and referred by the later model. In 2015, U-Net, FCN, Seg-Net, Deconv-Net, Deeplabv1 and Parse-Net are layered like spring bamboo shoots for a year, so that the applicability of semantic segmentation is continuously improved; during this period, many new technologies are created, such as skip link of Olaf et al, unpooling of Vijay et al, deconv of hyeonwood, and cavity convolution of the Deeplab series, which is most popular in body fluid; in 2017, the authors of Non-Local successfully applied the idea of an attention mechanism in NLP to the field of semantic segmentation. The appearance of the attention mechanism provides a new research idea for the semantic segmentation model, and the model is derived on the basis of the attention mechanism.
When a common semantic segmentation model is used for carrying out prediction segmentation on an image, the boundary of a prediction region is easy to be fuzzy or disordered and even wrongly classified in many times, and the phenomenon is that the context relation between each region is not learned in the stage of learning and extracting the image features by the model; and attention is paid to a mechanism which can effectively solve the problem. In the semantic segmentation field, many segmentation models derived based on an attention mechanism exist, for example, CCNet learns the characteristics of a characteristic cross backbone in an emphatic mode by designing a Criss-cross segmentation module, and a guide model learns the long-range relation of the characteristics to a certain extent; OCRnet enables a model to learn a feature diagram with enhanced object features by designing an object context module; however, the semantic segmentation models do not learn the context relationship between the regions, so that the situations of disordered boundary pixel classification and wrong boundary classification still exist, and the situation can be effectively solved by arranging a module for guiding the model to learn the context relationship of the regions.
According to the method, the segmentation precision of the model is enhanced by considering the context relation among the regions, the mode of the attention mechanism is used, the characteristic enhancement and the mode of the attention mechanism guiding the model learning are combined with each other, and the remote sensing image semantic segmentation method based on the region context relation module is provided, so that the precision of the model can be improved to a certain extent, and the model can more accurately classify the boundary among the regions. The design of the attention mechanism not only has high training speed, but also occupies little memory of the model. The RC-Module is a plug-and-play Module, can be combined with any semantic segmentation model and serves as an enhancement Module of the model for regional context characteristics, and therefore the accuracy of the model is improved. A pointedwise spatial attribute module considering the context relationship between a point and the point is designed by Hengshang Zhao from the perspective of information flow and is used for guiding the influence relationship between all pixels in a model learning image, the integrity between areas is not considered, each pixel is considered independently, and the prediction result of a salt and pepper mode is easy to form. And the OCR-Net adopts a mode of enhancing object characteristics, so that the model conducts guided learning on the characteristics of the object, the characteristics in an object range are taken as the object characteristics, the salt and pepper effect is solved, and the context relationship among all the regions is not considered. The remote sensing image semantic segmentation method based on the RC-module carries out semantic segmentation on the remote sensing image and effectively learns the regional context characteristics of the image.
Disclosure of Invention
A semantic segmentation method based on a region context module is characterized in that: the method comprises the following steps:
s1: enhancing the remote sensing image;
s2: constructing an RC-Module;
s3: establishing a remote sensing image semantic segmentation model RC-Net based on RC-Moudele;
s4: MIOU test and evaluation.
Optionally, the S1 specifically includes:
s11: randomly cutting the picture to generate additional data sets with the same number as the original data sets, adding the additional data sets into the original data sets, and simultaneously training the model;
s12: and selecting an image enhancement mode for the characteristics of each category of the data set, reducing the color dithering range to 0.01 in the color dithering process if grasslands, lands or other similar objects which are extremely sensitive to color characteristics exist in the image, respectively setting the saturation, the chromaticity and the contrast dithering range of the image to be 0.2, and generating the images with the same quantity as the S11 step to replace the original data set.
S13: the data sets were randomly flipped horizontally and vertically to generate the same number of data sets as S12.
S14: the data sets were randomly rotated by a limited rotation range of 30 degrees to generate the same number of data sets as in S13.
S15: in S14, gaussian noise and salt-and-pepper noise are added to each image.
Optionally, the S2 specifically includes:
a semantic segmentation basic framework is provided with a feature extractor, namely a Backbone which is composed of a series of convolution and pooling operations, an image is subjected to feature extraction through the Backbone and is integrated into a P, and the first step of a region context module is to generate a region rough region R on the basis of the feature P soft The calculation formula is as follows:
Figure GDA0003693094060000021
where x represents the original image, K represents the number of classes, f represents a convolution operation, and r represents the coarse region feature of the corresponding class.
The RC-module designs an autocorrelation module by utilizing the theory of a self-attention mechanism on the basis of R, and is used for calculating the correlation W between the regions ij
Figure GDA0003693094060000031
Wherein wij represents the influence factor of the jth area on the ith area
At the same time, the pixel feature P and the roughness region R are integrated sot Obtaining feature of each region soft-region :
(feature soft-region ) i =unsqueeze(-1)(R_T(R soft )*R T ′(P))) i ,(i∈(0,K))
Wherein unsqueeze represents the addition of a new dimension at a specified position, and R _ T is an abbreviation for reshape and transpose; feature soft-region Is a characteristic diagram of N C K1, N represents the number of pictures, C represents the number of characteristic channels, and K represents the number of areas;
relating each region to W ij As weights for the original coarse regionFeature soft-region Carrying out feature enhancement of regional relevance to obtain regional feature _ R with enhanced regional context features:
featur_R=W*feature soft-region
the RC-Module designs a region context learning Module by using the idea of attention mechanism, and finally, region features of enhanced region context features are combined with pixel-level features to form an integrated feature _ region of the features:
feature region =R_T 1 (R_T 2 (P)*R_T(feature_R))
and finally, adopting the most common skip link method to link the feature integrated with the pixel feature to obtain an enhanced feature F passing through an RC-Module, so that the final region context Module has the calculation formula as follows:
F=cat(feature_region||P)
optionally, the S3 specifically includes:
DeeplabV3 is a multi-scale model which is verified to have a very good effect, preliminarily integrates multi-scale features of an image by a method of a plurality of different void convolution rates through an ASPP structure, simultaneously adopts a ParseNet method, and uses adaptive global pooling globally to obtain global information, the Deeplabv3 model is an effective model which considers multi-scale and a certain global context relationship at the same time, and the model adopts DeeplabV3 as a feature extractor-Back bone of the model, wherein the feature calculation formula of ASPP is as follows:
Figure GDA0003693094060000032
where Yi represents the output of the ASPP module, F represents different convolution operations performed according to different D, D is a set of void rates, ASPP achieves the purpose of considering multi-scale information by gathering information of void rates of different sizes, and its commonly used D includes 1, 6, 12, and 18;
after the image is integrated by a feature extractor Deeplabv3, the image is received and the RC-Module is received to carry out context relationship integration of features, and finally a prediction result is obtained by a Decoder.
The Decoder is composed of two depth separable convolutions of 3x3 and a common convolution of 1x1, and the computation complexity of the common Decoder is reduced by utilizing the characteristics of the depth separable convolutions; the parameters of the common convolutional layer are calculated as follows:
P=K*2xC in *C out
wherein P represents the total parameter, K represents the convolution kernel size, and a square convolution kernel is used by default; c represents the dimension of the image;
the parameter calculation formula for the depth separable convolution is as follows:
P=K*2xC in +C in *C out
it is clear that the amount of parameters of the deep separable convolution is greatly reduced, and the computational complexity is also reduced from the far-gate O (Cin × Cout) to O (Cin + Cout); the decoupling operation of the image channel greatly reduces the calculated amount of the model.
Optionally, the S4 specifically includes:
the calculation formula of Miou (mean intersection over Union) is as follows:
Figure GDA0003693094060000041
wherein p is ij Representing the true value i, predicted as the number of j, and K +1 is the number of classes (including empty classes). p is a radical of ii Is a true quantity. p is a radical of formula ij 、p ji False positive and false negative are indicated, respectively.
MIOU is the extension of multiple categories of IOU, IOU is a kind of measurement to calculate two sets of similarity, because of the particularity of the task of semantic segmentation, under the situation of using pixel precision, FP quantity and FN quantity are easy to appear to dominate the whole pixel precision, thus cause to make wrong estimation to model precision, and MIOU will not; MIOU is the most widely applied evaluation label in the semantic segmentation field, so in MIOU evaluation, MIOU is used as a measurement result of precision.
According to the method, the segmentation precision of the model is enhanced by considering the context relation among the regions, the mode of the attention mechanism is used, the characteristic enhancement and the mode of the attention mechanism guiding the model learning are combined with each other, and the remote sensing image semantic segmentation method based on the region context relation module is provided, so that the precision of the model can be improved to a certain extent, and the model can more accurately classify the boundary among the regions. The design of the attention mechanism not only has high training speed, but also occupies little memory of the model. The RC-Module is a plug-and-play Module, can be combined with any semantic segmentation model and serves as an enhancement Module of the model for regional context characteristics, and therefore the accuracy of the model is improved. A pointwise spatial attribute module considering the context relationship between points and points is designed by Hengshang Zhao from the information flow angle and is used for guiding a model to learn the influence relationship among all pixels in an image, the integrity among regions is not considered, each pixel is considered independently, and the prediction result of a salt-pepper mode is easy to form. And the OCR-Net adopts a mode of enhancing object characteristics, so that the model conducts guided learning on the characteristics of the object, the characteristics in an object range are taken as the object characteristics, the salt and pepper effect is solved, and the context relationship among all the regions is not considered. The remote sensing image semantic segmentation method based on the RC-module carries out semantic segmentation on the remote sensing image and effectively learns the regional context characteristics of the image.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustration only and not for the purpose of limiting the invention, shown in the drawings are schematic representations and not in the form of actual drawings; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Fig. 1 is a schematic diagram of the present invention.
1 technical process and method
1.1 remote sensing image enhancement technical process
The remote sensing image enhancement technology used by the invention consists of five steps 1, and the specific execution flow is as follows.
The process of remote sensing image enhancement is as follows: (1) randomly cutting the pictures to generate additional data sets with the same number as the original data sets, adding the additional data sets into the original data sets, and simultaneously training the model; (2) and selecting an image enhancement mode for the characteristics of each category of the data set, reducing the color dithering range to 0.01 in the color dithering process if grasslands, lands or other similar objects which are extremely sensitive to color characteristics exist in the image, respectively setting the saturation, the chromaticity and the contrast dithering range of the image to be 0.2, and generating the images with the same quantity as the S11 step to replace the original data set. (3) The data sets are randomly flipped horizontally and vertically to generate the same number of data sets as S12. (4) The data sets are randomly rotated by a limited rotation range of 30 degrees to generate the same number of data sets as in S13. (5) Gaussian noise and salt and pepper noise are added to each image in S14.
1.2 technical process of RC-Module
The RC-module provided by the invention is constructed on the basis of a self-attention mechanism and a correlation attention mechanism.
A semantic segmentation basic framework is provided with a feature extractor, namely a Backbone which is composed of a series of convolution and pooling operations, an image is subjected to feature extraction through the Backbone and is integrated into a P, and the first step of a region context module is to generate a region rough region R on the basis of the feature P soft The calculation formula is as follows:
Figure GDA0003693094060000061
where x represents the original image, K represents the number of classes, f represents a convolution operation, and r represents the coarse region feature of the corresponding class.
The RC-module designs an autocorrelation module by utilizing the theory of a self-attention mechanism on the basis of R, and is used for calculating the correlation W between the regions ij
Figure GDA0003693094060000062
Wherein wij represents the influence factor of the jth area on the ith area
At the same time, the pixel feature P and the roughness region R are integrated sot Obtaining feature of each region soft-region :
(feature soft-region ) i =unsqueeze(-1)(R_T(R soft )*R T ′(P))) i ,(i∈(0,K))
Wherein unsqueeze represents the addition of a new dimension at a specified position, and R _ T is an abbreviation for reshape and transpose; feature soft-region Is a characteristic diagram of N C K1, N represents the number of pictures, C represents the number of characteristic channels, and K represents the number of areas;
relating each region to W ij As weights for the original coarse area feature soft-region Carrying out feature enhancement of regional relevance to obtain regional feature _ R with enhanced regional context features:
featur_R=W*feature soft-region
the RC-Module designs a region context learning Module by using the idea of attention mechanism, and finally, region features of enhanced region context features are combined with pixel-level features to form an integrated feature _ region of the features:
feature region =R_T 1 (R_T 2 (P)*R_T(feature_R))
and finally, adopting the most common skip link method to link the feature integrated with the pixel feature to obtain an enhanced feature F passing through an RC-Module, so that the final region context Module has the calculation formula as follows:
F=cat(feature_region||P)
1.3 technical Process of RC-Net
DeeplabV3 is a multi-scale model which is verified to have a very good effect, preliminarily integrates multi-scale features of an image by a method of a plurality of different void convolution rates through an ASPP structure, simultaneously adopts a ParseNet method, and uses adaptive global pooling globally to obtain global information, and the Deeplabv3 model is an effective model which considers multi-scale and a certain global context relationship at the same time, so that the model adopts DeeplabV3 as a feature extractor-Back bone of the model, wherein the feature calculation formula of ASpp is as follows:
Figure GDA0003693094060000071
where Yi represents the output of the ASPP module, F represents different convolution operations performed according to different D, D is a set of void rates, ASPP achieves the purpose of considering multi-scale information by gathering information of void rates of different sizes, and D, which is commonly used, includes 1, 6, 12, 18;
after the image is integrated by a feature extractor Deeplabv3, the image is received and the context relation of features is integrated in an RC-Module, and finally a prediction result is obtained by a Decoder.
The ecoder is composed of two depth separable convolutions of 3x3 and a common convolution of 1x1, and the computation complexity of the common decoder is reduced by utilizing the characteristic of the depth separable convolutions; the parameters of the common convolutional layer are calculated as follows:
P=K*2xC in *C out
wherein P represents the total parameter, K represents the convolution kernel size, and a square convolution kernel is used by default; c represents the dimension of the image;
the parameter calculation formula for the depth separable convolution is as follows:
P=K*2xCin+Cin*Cout
it is clear that the amount of parameters of the deep separable convolution is greatly reduced, and the computational complexity is also reduced from the far-gate O (Cin × Cout) to O (Cin + Cout); the decoupling operation of the image channel greatly reduces the calculated amount of the model.
1.4MIOU test and evaluation.
The calculation formula of Miou (mean intersection over Union) is as follows:
Figure GDA0003693094060000072
wherein p is ij Representing the true value i, predicted as the number of jThe quantity, K +1, is the number of classes (including empty classes). pi is the true number. p is a radical of ij 、p ji False positive and false negative are indicated, respectively.
MIOU is the extension of multiple categories of IOU, IOU is a kind of measurement to calculate two sets of similarity, because of the particularity of the task of semantic segmentation, under the situation of using pixel precision, FP quantity and FN quantity are easy to appear to dominate the whole pixel precision, thus cause to make wrong estimation to model precision, and MIOU will not; MIOU is the most widely applied evaluation label in the semantic segmentation field, so in MIOU evaluation, MIOU is used as a measurement result of precision.
2 summary of the invention
The invention provides a plug-and-play RC-Module, and an RCNet for learning the context relationship of a region is designed on the basis of Deeplabv 3; RCNet is a derivation of attention mechanism, and combines enhanced features for final semantic segmentation by separately designing region features and enhancing region correlation features; RCNet is another mapping and development of attention mechanism in the semantic segmentation field, and it is believed that RCNet not only can be widely used in the field of remote sensing images, but also can be widely used in other fields in the future research process.

Claims (1)

1. A semantic segmentation method based on a region context module is characterized in that: the method comprises the following steps:
s1: enhancing the remote sensing image;
s2: constructing an RC-Module;
s3: establishing a remote sensing image semantic segmentation model RC-Net based on RC-Module;
s4: MIOU inspection and evaluation;
the S1 specifically includes:
s11: randomly cutting the picture to generate additional data sets with the same number as the original data sets, adding the additional data sets into the original data sets, and simultaneously training the model;
s12: selecting an image enhancement mode for the characteristics of each category of the data set, reducing the color dithering range to 0.01 in the color dithering process if grasslands, lands or other similar objects which are extremely sensitive to color characteristics exist in the image, respectively setting the dithering ranges of the saturation, the chromaticity and the contrast of the image to be 0.2, and generating images with the same quantity as that of the S11 to replace the original data set;
s13: randomly turning the data sets horizontally and vertically to generate data sets with the same number as that of S12;
s14: randomly rotating the data sets within the limit rotation range of 30 degrees to generate data sets with the same number as that of S13;
s15: adding Gaussian noise and salt and pepper noise to each image in S14;
the S2 specifically includes:
a semantic segmentation basic framework is provided with a feature extractor, namely a backbone which is composed of a series of convolution and pooling operations, an image is subjected to feature extraction through the backbone and is integrated into a P, and the first step of a region context module is to generate a region rough region R on the basis of the feature P soft The calculation formula is as follows:
Figure FDA0003674738840000011
wherein x represents an original image, K represents the number of categories, f represents a convolution operation, and r represents the rough region feature of the corresponding category;
the RC-module utilizes the theory of a self-attention mechanism on the basis of R to design an autocorrelation module for calculating the correlation W between the regions ij
Figure FDA0003674738840000012
Wherein w ij Representing the influence factor of the jth region on the ith region
At the same time, the pixel feature P and the roughness region R are integrated soft Obtaining feature of each region soft-region
(feature soft-region ) i =unsqueeze(-1)(R_T(R soft )*R T (P))) i ,i∈(0,K)
Wherein unsqueeze represents the addition of a new dimension at a specified position, and R _ T is an abbreviation for reshape and transpose; feature soft-region The method comprises the following steps of A, obtaining a characteristic diagram of N C K1, wherein N represents the number of pictures, C represents the number of characteristic channels, and K represents the number of regions;
relating each region to W ij As a weight to the original coarse region feature soft-region Carrying out feature enhancement of regional relevance to obtain regional feature _ R with enhanced regional context features:
featur_R=W*feature soft-region
the RC-Module designs a region context learning Module by using the idea of an attention mechanism, and region features of enhanced region context features are combined with pixel-level features to form an integrated feature _ region of the features:
feature region =R_T 1 (R_T 2 (P)*R_T(feature_R))
and (3) linking the characteristics of the characteristic integration with the pixel characteristics by adopting a skip linking method to obtain an enhanced characteristic F passing through an RC-Module, wherein the final region context Module has the calculation formula as follows:
F=cat(feature_region||P)
the S3 specifically includes:
the DeeplabV3 is a multi-scale model, multi-scale features of an image are preliminarily integrated by a method of multiple different hole convolution rates through an ASPP structure, a ParseNet method is adopted, global information is obtained by globally using adaptive global pooling, the DeeplabV3 model is an effective model considering multi-scale and a certain global context relationship, the DeeplabV3 is adopted as a feature extractor-backhaul of the model, and a feature calculation formula of the ASPP is as follows:
Figure FDA0003674738840000021
where Yi represents the output of the ASPP module, F represents different convolution operations performed according to different D, D is a set of void rates, ASPP achieves the purpose of considering multi-scale information by gathering information of void rates of different sizes, and D is 1, 6, 12, and 18;
after the image is integrated by a feature extractor Deeplabv3, the image is received and the context relation of features is integrated in an RC-Module, and finally a prediction result is obtained by a Decoder;
the Decoder is composed of two depth separable convolutions of 3x3 and a common convolution of 1x1, and the computation complexity of the common Decoder is reduced by utilizing the characteristics of the depth separable convolutions; the parameters of the common convolutional layer are calculated as follows:
P=K*2xC in *c out
wherein P represents the total parameter, K represents the convolution kernel size, and a square convolution kernel is used; c represents the dimension of the image;
the parameter calculation formula for the depth separable convolution is as follows:
P=K*2xC in +C in *C out
the S4 specifically includes:
the formula for calculating the average cross-over ratio Miou is as follows:
Figure FDA0003674738840000031
wherein p is ij Representing the real value i and the number predicted to be j, wherein K +1 is the number of categories and comprises empty categories; p is a radical of ii Is a true quantity; p is a radical of ij 、p ji False positive and false negative are indicated, respectively.
CN202011478891.3A 2020-12-15 2020-12-15 Semantic segmentation method based on regional context relation module Active CN112580649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011478891.3A CN112580649B (en) 2020-12-15 2020-12-15 Semantic segmentation method based on regional context relation module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011478891.3A CN112580649B (en) 2020-12-15 2020-12-15 Semantic segmentation method based on regional context relation module

Publications (2)

Publication Number Publication Date
CN112580649A CN112580649A (en) 2021-03-30
CN112580649B true CN112580649B (en) 2022-08-02

Family

ID=75135153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011478891.3A Active CN112580649B (en) 2020-12-15 2020-12-15 Semantic segmentation method based on regional context relation module

Country Status (1)

Country Link
CN (1) CN112580649B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380237A (en) * 2021-06-09 2021-09-10 中国科学技术大学 Unsupervised pre-training speech recognition model for enhancing local dependency relationship and training method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257099B (en) * 2018-01-11 2021-09-10 重庆邮电大学 Self-adaptive infrared image enhancement method based on visual contrast resolution
CN109447994B (en) * 2018-11-05 2019-12-17 陕西师范大学 Remote sensing image segmentation method combining complete residual error and feature fusion
US11179064B2 (en) * 2018-12-30 2021-11-23 Altum View Systems Inc. Method and system for privacy-preserving fall detection
CN110097544A (en) * 2019-04-25 2019-08-06 武汉精立电子技术有限公司 A kind of display panel open defect detection method
CN111563508B (en) * 2020-04-20 2023-05-23 华南理工大学 Semantic segmentation method based on spatial information fusion
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN111932553B (en) * 2020-07-27 2022-09-06 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism

Also Published As

Publication number Publication date
CN112580649A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN111310862B (en) Image enhancement-based deep neural network license plate positioning method in complex environment
CN108416250B (en) People counting method and device
CN111476292A (en) Small sample element learning training method for medical image classification processing artificial intelligence
WO2020114378A1 (en) Video watermark identification method and apparatus, device, and storage medium
CN106611420B (en) The SAR image segmentation method constrained based on deconvolution network and sketch map direction
CN112288011B (en) Image matching method based on self-attention deep neural network
CN109035300B (en) Target tracking method based on depth feature and average peak correlation energy
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
CN110569782A (en) Target detection method based on deep learning
CN109902584A (en) A kind of recognition methods, device, equipment and the storage medium of mask defect
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN112116593A (en) Domain self-adaptive semantic segmentation method based on Gini index
CN109635653A (en) A kind of plants identification method
CN110517270B (en) Indoor scene semantic segmentation method based on super-pixel depth network
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN109344851A (en) Image classification display methods and device, analysis instrument and storage medium
CN112580649B (en) Semantic segmentation method based on regional context relation module
Rout et al. Walsh–Hadamard-kernel-based features in particle filter framework for underwater object tracking
Zhang et al. Visual saliency based object tracking
CN113436251B (en) Pose estimation system and method based on improved YOLO6D algorithm
CN113421268B (en) Semantic segmentation method based on deplapv 3+ network of multi-level channel attention mechanism
CN111738237B (en) Heterogeneous convolution-based target detection method for multi-core iteration RPN
CN107423771B (en) Two-time-phase remote sensing image change detection method
CN111967399A (en) Improved fast RCNN behavior identification method
CN112241758A (en) Apparatus and method for evaluating a saliency map determiner

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant