CN112801104B - Image pixel level pseudo label determination method and system based on semantic segmentation - Google Patents

Image pixel level pseudo label determination method and system based on semantic segmentation Download PDF

Info

Publication number
CN112801104B
CN112801104B CN202110074943.9A CN202110074943A CN112801104B CN 112801104 B CN112801104 B CN 112801104B CN 202110074943 A CN202110074943 A CN 202110074943A CN 112801104 B CN112801104 B CN 112801104B
Authority
CN
China
Prior art keywords
image
pixel
map
semantic segmentation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110074943.9A
Other languages
Chinese (zh)
Other versions
CN112801104A (en
Inventor
于哲舟
张哲�
王碧琳
李志远
王兰亭
赵凤志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110074943.9A priority Critical patent/CN112801104B/en
Publication of CN112801104A publication Critical patent/CN112801104A/en
Application granted granted Critical
Publication of CN112801104B publication Critical patent/CN112801104B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a semantic segmentation-based image pixel level pseudo label determination method and a semantic segmentation-based image pixel level pseudo label determination system, wherein the method comprises the following steps: acquiring a first image and extracting features of the first image to obtain a first feature map; obtaining a second feature map and a plurality of third feature maps according to the first feature map; further obtaining a plurality of first pixel relation measurement matrixes and a plurality of second pixel relation measurement matrixes; obtaining a fourth characteristic diagram according to the second characteristic diagram and the second pixel relation measurement matrix; further obtaining a tensor matrix and a functional relation between the tensor matrix and the image output probability; training a classification network according to the loss function corresponding to the function relationship; obtaining a target position graph and a background target graph according to the trained classification network and the fourth feature graph; training a semantic segmentation network model according to the first image, the target position graph and the background target graph; and inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label. The invention can obtain the pixel level pseudo label of the segmentation network.

Description

Image pixel level pseudo label determination method and system based on semantic segmentation
Technical Field
The invention relates to the field of image semantic segmentation, in particular to a semantic segmentation-based image pixel level pseudo tag determination method and system.
Background
Since semantic segmentation labels require labeling of every pixel of the image, this results in a lot of time and effort. The generation of training data sets has been the bottleneck of semantic segmentation studies. Therefore, how to obtain pixel-level labeling of a given image in an inexpensive and efficient manner is a promising direction for future semantic segmentation.
Disclosure of Invention
The invention aims to provide a method and a system for determining image pixel-level pseudo labels based on semantic segmentation, which can obtain the pixel-level pseudo labels of a segmentation network through images and labels of the classification network.
In order to achieve the purpose, the invention provides the following scheme:
a semantic segmentation based image pixel level pseudo label determination method comprises the following steps:
acquiring an initial image and preprocessing the initial image to obtain a first image;
performing feature extraction on the first image by using a feature extractor to obtain a first feature map;
inputting the first feature map into a cavity convolution pixel relation model to obtain a second feature map and a plurality of third feature maps;
performing matrix product operation on the second characteristic diagram and each third characteristic diagram respectively to correspondingly obtain a plurality of first pixel relation measurement matrixes;
carrying out average fusion on the plurality of first pixel relation measurement matrixes to obtain a second pixel relation measurement matrix;
performing matrix product operation on the second characteristic diagram and the second pixel relation measurement matrix to obtain a fourth characteristic diagram;
inputting the fourth feature map into a global average pooling layer to obtain a tensor matrix;
inputting the tensor matrix into a softmax classification layer for classification to obtain a functional relation between the tensor matrix and the image output probability;
training a classification network according to the loss function corresponding to the function relationship to obtain a trained classification network;
obtaining a target position graph and a background target graph according to the trained classification network and the fourth feature graph;
obtaining a semantic segmentation network model;
training the semantic segmentation network model according to the first image, the target position graph and the background target graph to obtain a trained semantic segmentation network model;
and inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
Optionally, the initial image is randomly scaled by [321,481], and then the picture is cropped to size 321 × 321, resulting in the first image.
Optionally, the feature extractor is an improved VGG-16 network model that removes the last two pool layers in the VGG-16 model structure.
Optionally, the function relationship between the tensor matrix and the image output probability is
Figure BDA0002907252110000021
Wherein the content of the first and second substances,
Figure BDA0002907252110000022
indicating that for classes n, FCWeight parameter of PnRepresenting the probability of image output of class n, FCA tensor matrix is represented.
Optionally, the semantic segmentation network model is a deep lab-ASPP network model.
Optionally, the initial image employs a PASCAL VOC 2012 data set.
Optionally, after the step of inputting the first feature map into the void convolution pixel relation model to obtain a second feature map and a plurality of third feature maps, performing matrix product operation on the second feature map and each of the third feature maps, and before the step of correspondingly obtaining a plurality of first pixel relation measurement matrices, the method further includes:
and performing size reshaping on the second feature map and the plurality of third feature maps.
Optionally, the first characteristic diagram is obtained through a conv5_3 layer of the improved VGG-16 network model.
Optionally, the first feature size is C × H × W, where C is the number of channels, and W and H are feature sizes, respectively.
An image pixel level pseudo tag determination system based on semantic segmentation, comprising:
the system comprises a preprocessing module, a first image acquisition module, a second image acquisition module and a second image acquisition module, wherein the preprocessing module is used for acquiring an initial image and preprocessing the initial image to obtain a first image;
the characteristic extraction module is used for extracting the characteristics of the first image by using a characteristic extractor to obtain a first characteristic diagram;
the first input module is used for inputting the first characteristic diagram into the cavity convolution pixel relation model to obtain a second characteristic diagram and a plurality of third characteristic diagrams;
the first matrix product operation module is used for respectively carrying out matrix product operation on the second characteristic diagram and each third characteristic diagram to correspondingly obtain a plurality of first pixel relation measurement matrixes;
the matrix fusion module is used for carrying out average fusion on the plurality of first pixel relation measurement matrixes to obtain a second pixel relation measurement matrix;
the second matrix product operation module is used for carrying out matrix product operation on the second characteristic diagram and the second pixel relation measurement matrix to obtain a fourth characteristic diagram;
the second input module is used for inputting the fourth feature map into a global average pooling layer to obtain a tensor matrix;
the classification module is used for inputting the tensor matrix into a softmax classification layer for classification to obtain a functional relation between the tensor matrix and the image output probability;
the first network training module is used for training a classification network according to the loss function corresponding to the function relation to obtain the trained classification network;
the target position graph and background target graph determining module is used for obtaining a target position graph and a background target graph according to the trained classification network and the fourth feature graph;
the model acquisition module is used for acquiring a semantic segmentation network model;
the second network training module is used for training the semantic segmentation network model according to the first image, the target position graph and the background target graph to obtain a trained semantic segmentation network model;
and the pseudo label determining module is used for inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the invention, a deep neural classification network is designed to provide a high-quality pseudo label training segmentation network for a segmentation network, so that image semantic segmentation is carried out. A void convolution pixel relation network is provided, a pixel relation model between a void convolution characteristic diagram and a general convolution characteristic diagram is generated by combining the void convolution and an attention mechanism in a classification network, and a class excitation diagram generated by the classification network can highlight a more complete target area, so that a segmentation network pseudo label with higher quality is generated, and the segmentation capability of the segmentation network is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a semantic segmentation image pixel level pseudo tag determination method of the present invention;
FIG. 2 is a diagram of a void convolution pixel relationship network architecture in accordance with the present invention;
FIG. 3 is a diagram of a void convolution pixel relationship model according to the present invention;
FIG. 4 is a block diagram of a semantic segmentation image pixel level pseudo tag determination system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention aims to provide a method and a system for determining image pixel-level pseudo labels based on semantic segmentation, which can obtain the pixel-level pseudo labels of a segmentation network through images and labels of the classification network.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, the invention discloses a method for determining semantic segmented image pixel level pseudo labels, comprising:
step 101: the method comprises the steps of obtaining an initial image and preprocessing the initial image to obtain a first image.
Step 102: and performing feature extraction on the first image by using a feature extractor to obtain a first feature map.
Step 103: and inputting the first characteristic diagram into a cavity convolution pixel relation model to obtain a second characteristic diagram and a plurality of third characteristic diagrams.
Step 104: and performing matrix product operation on the second characteristic diagram and each third characteristic diagram respectively to correspondingly obtain a plurality of first pixel relation measurement matrixes.
Step 105: and carrying out average fusion on the plurality of first pixel relation measurement matrixes to obtain a second pixel relation measurement matrix.
Step 106: and performing matrix product operation on the second characteristic diagram and the second pixel relation measurement matrix to obtain a fourth characteristic diagram.
Step 107: and inputting the fourth feature map into a global average pooling layer to obtain a tensor matrix.
Step 108: and inputting the tensor matrix into a softmax classification layer for classification to obtain a functional relation between the tensor matrix and the image output probability.
Step 109: and training a classification network according to the loss function corresponding to the function relationship to obtain the trained classification network.
Step 110: and obtaining a target position image and a background target image according to the trained classification network and the fourth feature image.
Step 111: and acquiring a semantic segmentation network model.
Step 112: and training the semantic segmentation network model according to the first image, the target position graph and the background target graph to obtain the trained semantic segmentation network model.
Step 113: and inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
Step 101 specifically includes:
the PASCAL VOC 2012 data set (20 foreground classes and one background class) is used, which includes 1464 pictures as training set, 1449 pictures as verification set, and 1456 pictures as test set.
The pictures are randomly scaled by the range of [321,481], and then cropped to size 321 × 321 as the input image set a of the network.
Step 102 specifically includes:
2.1 VGG-16 models pre-trained on the Imagnet database were used as feature extractors.
2.2 remove the last two pooling layers in the VGG-16 model structure to improve the resolution of the feature map.
2.3 passing the input image set A into the changed VGG-16 model for feature extraction.
2.4 obtaining the characteristic diagram Z with the size C H W at the VGG-16 model conv5_3 layer, wherein C is the channel number, and W and H are the characteristic diagram size respectively.
Step 103-106 specifically comprises:
3.1 the obtained feature map of the modified VGG-16 is imported into a hole convolution pixel relation model (DCPAM).
3.2 the DCPAM substitutes the characteristic diagram Z (the first characteristic diagram) into the hole convolution unit and the standard convolution unit respectively to obtain the characteristic diagram D epsilon RC×H×W(third feature diagram) and S ∈ RC×H×W(second feature map).
3.3 reshaping S and D to size RC×NWhere N — H × W is the total number of positions in the S and D feature maps.
3.4 obtaining a first pixel relation measurement matrix A ∈ R between the reshaped feature maps S and D by matrix-multiplying themN×N. The characteristic value of A is represented as
Figure BDA0002907252110000061
Where i and j are indices of the reshaped feature map S and D locations, SiAnd DjAnd characteristic values of the reconstructed characteristic maps S and D at the i position and the j position.
3.5 normalization of ai, j by softmax layer
Figure BDA0002907252110000071
3.6 average merging the multiple first pixel relation measurement matrixes between the characteristic diagram (second characteristic diagram) output by the standard convolution unit and the characteristic diagram (third characteristic diagram) output by the convolution kernels with different cavity rates in the cavity convolution unit to obtain a second pixel relation measurement matrix
Figure BDA0002907252110000072
Where d represents the hole convolution rate, AdAnd a first pixel relation measurement matrix which represents the first pixel relation between the characteristic diagram output by the standard convolution unit and the characteristic diagram output by the convolution kernel with the hole convolution rate d.
3.7 Reinforcement of the remodeled profile S generated by the standard convolution unit with a second pixel relationship measurement matrix A, performing a matrix multiplication between S and A, then we reform the result to RC×H×WAnd element summation with S to obtain an enhanced feature map (fourth feature map) E ∈ RC×H×W
Figure BDA0002907252110000073
Where λ is initialized to 0 and gradually learned through training.
Step 107 specifically includes:
the obtained enhanced feature map (fourth feature map) is transferred into an average pooling layer, and the result of performing a global average pooling layer for channel C is
Figure BDA0002907252110000074
Finally, a tensor matrix R is obtainedC×H×W∈RC×1×1As an image representation.
Step 108 specifically includes:
substituting the image expression tensor matrix into a softmax classification layer for classification, and outputting softmax as class n
Figure BDA0002907252110000075
Wherein
Figure BDA0002907252110000076
Representing a weight parameter, P, for class n, FCnRepresenting the probability of image output of class n, FCA tensor matrix is represented. .
Step 109 specifically includes:
computing Cross-control loss function through image class labels, and training a classification network, wherein the loss function is
Figure BDA0002907252110000077
ynA label representing the data set, n denotes a category. This loss function optimizes the classification network by stochastic gradient descent.
Step 110 specifically includes:
7.1 weighting parameters between the GAP layer and the classification layer in the trained classification network
Figure BDA0002907252110000078
And E, transmitting the enhanced feature graph E (a fourth feature graph) in the classification network to operate to obtain a target position graph:
Figure BDA0002907252110000081
Mn(i, j) represents a target location map belonging to the category n.
7.2 highlight the regions irrelevant to the target according to the position map of the class with the lowest probability in the classification network, and the first x classes with the lowest probability are taken as background targets,
Figure BDA0002907252110000082
wherein b (x) is the equilibrium fusion function:
Figure BDA0002907252110000083
step 111-112 is to train the segmentation network by using the target location map as the pseudo label of the segmentation network, and the following is to train the hyper-parameter setting of the segmentation network.
The method specifically comprises the following steps:
8.1 DeepLab-ASPP is adopted as a semantic segmentation network model.
8.2 takes the top 20% of the highest pixel values in the target location map in substep 7.1 as the foreground target.
8.3 takes the top 30% of the highest pixel values in the background target location map in substep 7.2 as the background target, and sets p to 3 and q to the number of dataset categories-p.
8.4 ignore all unallocated and conflicting pixels during the training process.
8.5 use the PASCAL VOC 2012 dataset as training data for the segmented network, defined as G, for any training image G ∈ G.
8.6 define the tagset as N ═ Nfg∪nbgWherein n isfgAs a foreground label, nbgAs a background label.
8.7 define the segmented network model as f (g; θ), where θ is an optimizable parameter. f. ofu,c(g; θ) represents the modeling by the segmentation model of the conditional probability of any label c at any position u of the confidence map of a particular class.
8.8 define the balanced seed loss function:
Figure BDA0002907252110000084
hc denotes a pixel-level division pseudo label generated with the target position map Mn (i, j), and | · | denotes the number of pixels.
8.9 defines the helper seed loss function:
Figure BDA0002907252110000091
a target location tag representing an on-line prediction of the image by the segmentation model.
8.10 boundary constraint loss function is defined by Conditional Random Fields (CRF):
Figure BDA0002907252110000092
wherein Ru, c (i, f (i; theta)) is an output probability chart of the fully-connected CRF.
8.11 the loss function of the final model is defined as: l ═ Lseed+Lseg+Lboundary
8.12 set mini-batch to 10 images, momentum 0.9, weight decay 0.0005. The initial learning rate was 5e-3, which was reduced by a factor of 10 per 2000 iterations, and training was terminated after 10000 iterations.
After the step 112, the trained semantic segmentation network model is obtained, and then the image to be detected can be directly input into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
1. In addition, the present invention also discloses a semantic segmentation based image pixel level pseudo tag determination system, as shown in fig. 4, an image pixel level pseudo tag determination system based on semantic segmentation, comprising:
the preprocessing module 201 is configured to acquire an initial image and preprocess the initial image to obtain a first image.
A feature extraction module 202, configured to perform feature extraction on the first image by using a feature extractor to obtain a first feature map.
A first input module 203, configured to input the first feature map into the void convolution pixel relation model to obtain a second feature map and multiple third feature maps.
And a first matrix product operation module 204, configured to perform matrix product operation on the second feature map and each third feature map respectively, so as to obtain a plurality of first pixel relationship measurement matrices correspondingly.
A matrix fusion module 205, configured to perform average fusion on the plurality of first pixel relationship measurement matrices to obtain a second pixel relationship measurement matrix.
And a second matrix product operation module 206, configured to perform matrix product operation on the second feature map and the second pixel relation measurement matrix to obtain a fourth feature map.
A second input module 207, configured to input the fourth feature map into a global average pooling layer, so as to obtain a tensor matrix.
And the classification module 208 is configured to input the tensor matrix into a softmax classification layer for classification, so as to obtain a functional relationship between the tensor matrix and the image output probability.
And the first network training module 209 is configured to train a classification network according to the loss function corresponding to the functional relationship, so as to obtain a trained classification network.
And a target location map and background target map determining module 210, configured to obtain a target location map and a background target map according to the trained classification network and the fourth feature map.
And the model obtaining module 211 is configured to obtain a semantic segmentation network model.
And the second network training module 212 is configured to train the semantic segmentation network model according to the first image, the target location map and the background target map, so as to obtain a trained semantic segmentation network model.
And a pseudo label determining module 213, configured to input the image to be detected into the trained semantic segmentation network model to obtain an image pixel-level pseudo label.
The invention also discloses the following technical effects:
1. by combining the advantages of the void convolution and the attention mechanism, the method can effectively enlarge the highlighted target area, meanwhile can enhance the generation of the class-related target area and inhibit the class-unrelated area, obtains higher-quality semantic segmentation into labels, and further improves the segmentation capability of the segmentation network.
2. According to the invention, a deep neural classification network is designed to provide a high-quality pseudo label training segmentation network for a segmentation network, so that image semantic segmentation is carried out. A void convolution pixel relation network is provided, a pixel relation model between a void convolution characteristic diagram and a general convolution characteristic diagram is generated by combining the void convolution and an attention mechanism in a classification network, and a class excitation diagram generated by the classification network can highlight a more complete target area, so that a segmentation network pseudo label with higher quality is generated, and the segmentation capability of the segmentation network can be improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A semantic segmentation-based image pixel level pseudo label determination method is characterized by comprising the following steps:
acquiring an initial image and preprocessing the initial image to obtain a first image;
performing feature extraction on the first image by using a feature extractor to obtain a first feature map;
inputting the first feature map into a cavity convolution pixel relation model to obtain a second feature map and a plurality of third feature maps;
inputting the first feature map into a cavity convolution pixel relation model to obtain a second feature map and a plurality of third feature maps, specifically comprising: respectively substituting the first characteristic diagram into a hole convolution unit and a standard convolution unit to respectively obtain a third characteristic diagram and a second characteristic diagram;
performing matrix product operation on the second characteristic diagram and each third characteristic diagram respectively to correspondingly obtain a plurality of first pixel relation measurement matrixes;
carrying out average fusion on the plurality of first pixel relation measurement matrixes to obtain a second pixel relation measurement matrix;
performing matrix product operation on the second characteristic diagram and the second pixel relation measurement matrix to obtain a fourth characteristic diagram;
inputting the fourth feature map into a global average pooling layer to obtain a tensor matrix;
inputting the tensor matrix into a softmax classification layer for classification to obtain a functional relation between the tensor matrix and the image output probability;
the function relation between the tensor matrix and the image output probability is
Figure FDA0003286605860000011
Wherein the content of the first and second substances,
Figure FDA0003286605860000012
indicating that for classes n, FCWeight parameter of PnRepresenting the probability of image output of class n, FCA tensor matrix is represented, C represents the first eigen mapThe number of channels of (a);
training a classification network according to the loss function corresponding to the function relationship to obtain a trained classification network;
obtaining a target position graph and a background target graph according to the trained classification network and the fourth feature graph;
the obtaining of the target position map and the background target map according to the trained classification network and the fourth feature map specifically includes:
weighting parameters between the GAP layer and the classification layer in the trained classification network
Figure FDA0003286605860000013
Transmitting the fourth feature map into a classification network, and operating to obtain a target position map; the target position map is
Figure FDA0003286605860000014
Wherein M isn(i, j) represents a target location map belonging to category n; i and j represent feature map location indices;
according to the region which is highlighted by the position map of the class with the lowest probability in the classification network and is irrelevant to the target, adopting the front X classes with the lowest probability as background target maps:
Figure FDA0003286605860000021
wherein b (x) is the equilibrium fusion function:
Figure FDA0003286605860000022
obtaining a semantic segmentation network model; the semantic segmentation network model is a deep Lab-ASPP network model;
training the semantic segmentation network model according to the first image, the target position graph and the background target graph to obtain a trained semantic segmentation network model;
the training of the semantic segmentation network model according to the first image, the target position map and the background target map to obtain the trained semantic segmentation network model specifically comprises the following steps:
selecting the top 20% of the pixel values in the target position image as a foreground target;
selecting the top 30% of the highest pixel value in the background target image as a background target, and setting p to be 3 and q to be the data set category number-p;
ignoring all unallocated and conflicting pixels;
adopting a PASCAL VOC 2012 data set as training data of the semantic segmentation network model, defining the training data as G, and regarding any training image G belonging to G;
defining a tag set as N ═ Nfg∪nbgWherein n isfgAs a foreground label, nbgA background label;
defining the semantic segmentation network model as f (g; theta), wherein theta is an optimizable parameter, fu,c(g; θ) represents modeling by the segmentation model of the conditional probability of any label c at any position u of the confidence map of the particular class;
defining a balanced seed loss function:
Figure FDA0003286605860000023
Hcrepresentation of the target location map Mn(i, j) the generated pixel-level segmentation pseudo-label, | · | representing the number of pixels;
defining a helper seed loss function:
Figure FDA0003286605860000024
Figure FDA0003286605860000025
a target location tag representing an on-line prediction of the image by the segmentation model;
the boundary constraint loss function is defined by the conditional random field:
Figure FDA0003286605860000031
wherein R isu,c(g, f (g; theta)) is an output probability map of the fully-connected CRF;
the loss function of the final model is defined as: l ═ Lseed+Lseg+Lboundary
Setting mini-batch to 10 images, momentum to 0.9, weight decay to 0.0005; the initial learning rate was 5e-3, which was reduced by 10 times per 2000 iterations, and training was terminated after 10000 iterations;
and inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
2. The semantic segmentation based image pixel level pseudo label determination method according to claim 1, characterized in that the initial image is randomly scaled by [321,481] and then the picture is cropped to size 321 x 321 to obtain the first image.
3. The image pixel-level pseudo tag determination method based on semantic segmentation according to claim 1, wherein the feature extractor is an improved VGG-16 network model which removes the last two pooling layers in the VGG-16 model structure.
4. The semantic segmentation based image pixel level pseudo label determination method according to claim 1, characterized in that the initial image employs a pascanoc 2012 data set.
5. The image pixel-level pseudo tag determination method based on semantic segmentation according to claim 1, wherein after the step of inputting the first feature map into a hole convolution pixel relation model to obtain a second feature map and a plurality of third feature maps, the step of performing matrix product operation on the second feature map and each of the third feature maps respectively, and before the step of correspondingly obtaining a plurality of first pixel relation measurement matrices, further comprises:
and performing size reshaping on the second feature map and the plurality of third feature maps.
6. The image pixel-level pseudo tag determination method based on semantic segmentation according to claim 3, wherein the first feature map is obtained through a conv5_3 layer of a modified VGG-16 network model.
7. The image pixel-level pseudo label determination method based on semantic segmentation according to claim 1 or 6, wherein the first feature map size is C H W, where C is the number of channels of the first feature map, and W and H are feature map sizes, respectively.
8. An image pixel level pseudo label determination system based on semantic segmentation, comprising:
the system comprises a preprocessing module, a first image acquisition module, a second image acquisition module and a second image acquisition module, wherein the preprocessing module is used for acquiring an initial image and preprocessing the initial image to obtain a first image;
the characteristic extraction module is used for extracting the characteristics of the first image by using a characteristic extractor to obtain a first characteristic diagram;
the first input module is used for inputting the first characteristic diagram into the cavity convolution pixel relation model to obtain a second characteristic diagram and a plurality of third characteristic diagrams;
the first input module specifically includes: respectively substituting the first characteristic diagram into a hole convolution unit and a standard convolution unit to respectively obtain a third characteristic diagram and a second characteristic diagram;
the first matrix product operation module is used for respectively carrying out matrix product operation on the second characteristic diagram and each third characteristic diagram to correspondingly obtain a plurality of first pixel relation measurement matrixes;
the matrix fusion module is used for carrying out average fusion on the plurality of first pixel relation measurement matrixes to obtain a second pixel relation measurement matrix;
the second matrix product operation module is used for carrying out matrix product operation on the second characteristic diagram and the second pixel relation measurement matrix to obtain a fourth characteristic diagram;
the second input module is used for inputting the fourth feature map into a global average pooling layer to obtain a tensor matrix;
a classification module for classifying the tensor matrix input softmax intoClassifying the class layer to obtain a functional relation between the tensor matrix and the image output probability; the function relation between the tensor matrix and the image output probability is
Figure FDA0003286605860000041
Wherein the content of the first and second substances,
Figure FDA0003286605860000042
indicating that for classes n, FCWeight parameter of PnRepresenting the probability of image output of class n, FCRepresenting a tensor matrix, C representing the number of channels of the first eigen map;
the first network training module is used for training a classification network according to the loss function corresponding to the function relation to obtain the trained classification network;
the target position graph and background target graph determining module is used for obtaining a target position graph and a background target graph according to the trained classification network and the fourth feature graph;
the target position map and background target map determining module specifically comprises:
weighting parameters between the GAP layer and the classification layer in the trained classification network
Figure FDA0003286605860000043
Transmitting the fourth feature map into a classification network, and operating to obtain a target position map; the target position map is
Figure FDA0003286605860000044
Wherein M isn(i, j) represents a target location map belonging to category n; i and j represent feature map location indices;
according to the region which is highlighted by the position map of the class with the lowest probability in the classification network and is irrelevant to the target, adopting the front X classes with the lowest probability as background target maps:
Figure FDA0003286605860000051
wherein b (x) is the equilibrium fusion function:
Figure FDA0003286605860000052
the model acquisition module is used for acquiring a semantic segmentation network model; the semantic segmentation network model is a deep Lab-ASPP network model;
the second network training module is used for training the semantic segmentation network model according to the first image, the target position graph and the background target graph to obtain a trained semantic segmentation network model;
the second network training module specifically includes:
selecting the top 20% of the pixel values in the target position image as a foreground target;
selecting the top 30% of the highest pixel value in the background target image as a background target, and setting p to be 3 and q to be the data set category number-p;
ignoring all unallocated and conflicting pixels;
adopting a PASCAL VOC 2012 data set as training data of the semantic segmentation network model, defining the training data as G, and regarding any training image G belonging to G;
defining a tag set as N ═ Nfg∪nbgWherein n isfgAs a foreground label, nbgA background label;
defining the semantic segmentation network model as f (g; theta), wherein theta is an optimizable parameter, fu,c(g; θ) represents modeling by the segmentation model of the conditional probability of any label c at any position u of the confidence map of the particular class;
defining a balanced seed loss function:
Figure FDA0003286605860000053
Hcrepresentation of the target location map Mn(i, j) the generated pixel-level segmentation pseudo-label, | · | representing the number of pixels;
defining a helper seed loss function:
Figure FDA0003286605860000054
Figure FDA0003286605860000055
a target location tag representing an on-line prediction of the image by the segmentation model;
the boundary constraint loss function is defined by the conditional random field:
Figure FDA0003286605860000061
wherein R isu,c(g, f (g; theta)) is an output probability map of the fully-connected CRF;
the loss function of the final model is defined as: l ═ Lseed+Lseg+Lboundary
Setting mini-batch to 10 images, momentum to 0.9, weight decay to 0.0005; the initial learning rate was 5e-3, which was reduced by 10 times per 2000 iterations, and training was terminated after 10000 iterations;
and the pseudo label determining module is used for inputting the image to be detected into the trained semantic segmentation network model to obtain the image pixel level pseudo label.
CN202110074943.9A 2021-01-20 2021-01-20 Image pixel level pseudo label determination method and system based on semantic segmentation Expired - Fee Related CN112801104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110074943.9A CN112801104B (en) 2021-01-20 2021-01-20 Image pixel level pseudo label determination method and system based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110074943.9A CN112801104B (en) 2021-01-20 2021-01-20 Image pixel level pseudo label determination method and system based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN112801104A CN112801104A (en) 2021-05-14
CN112801104B true CN112801104B (en) 2022-01-07

Family

ID=75810732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110074943.9A Expired - Fee Related CN112801104B (en) 2021-01-20 2021-01-20 Image pixel level pseudo label determination method and system based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN112801104B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223037B (en) * 2021-05-31 2023-04-07 南开大学 Unsupervised semantic segmentation method and unsupervised semantic segmentation system for large-scale data
CN114693967B (en) * 2022-03-20 2023-10-31 电子科技大学 Multi-classification semantic segmentation method based on classification tensor enhancement
CN116664845B (en) * 2023-07-28 2023-10-13 山东建筑大学 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
CN110136141A (en) * 2019-04-24 2019-08-16 佛山科学技术学院 A kind of image, semantic dividing method and device towards complex environment
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN110428428A (en) * 2019-07-26 2019-11-08 长沙理工大学 A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing
CN110544258A (en) * 2019-08-30 2019-12-06 北京海益同展信息科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN111680695A (en) * 2020-06-08 2020-09-18 河南工业大学 Semantic segmentation method based on reverse attention model
WO2020220126A1 (en) * 2019-04-30 2020-11-05 Modiface Inc. Image processing using a convolutional neural network to track a plurality of objects
WO2020243826A1 (en) * 2019-06-04 2020-12-10 University Of Manitoba Computer-implemented method of analyzing an image to segment article of interest therein

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
CN110136141A (en) * 2019-04-24 2019-08-16 佛山科学技术学院 A kind of image, semantic dividing method and device towards complex environment
WO2020220126A1 (en) * 2019-04-30 2020-11-05 Modiface Inc. Image processing using a convolutional neural network to track a plurality of objects
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
WO2020243826A1 (en) * 2019-06-04 2020-12-10 University Of Manitoba Computer-implemented method of analyzing an image to segment article of interest therein
CN110428428A (en) * 2019-07-26 2019-11-08 长沙理工大学 A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing
CN110544258A (en) * 2019-08-30 2019-12-06 北京海益同展信息科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN111680695A (en) * 2020-06-08 2020-09-18 河南工业大学 Semantic segmentation method based on reverse attention model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于深度特征融合的图像语义分割》;周鹏程,龚声蓉,钟珊,包宗铭,戴兴华;《计算机科学》;20200215;第47卷(第2期);126-134 *

Also Published As

Publication number Publication date
CN112801104A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112801104B (en) Image pixel level pseudo label determination method and system based on semantic segmentation
Zhang et al. Deep gated attention networks for large-scale street-level scene segmentation
Kim et al. Fully deep blind image quality predictor
CN109345508B (en) Bone age evaluation method based on two-stage neural network
CN110706302B (en) System and method for synthesizing images by text
Mnih Machine learning for aerial image labeling
WO2019089578A1 (en) Font identification from imagery
CN111738344B (en) Rapid target detection method based on multi-scale fusion
CN111369581A (en) Image processing method, device, equipment and storage medium
KR20230004710A (en) Processing of images using self-attention based neural networks
CN113111716B (en) Remote sensing image semiautomatic labeling method and device based on deep learning
CN112906794A (en) Target detection method, device, storage medium and terminal
CN111461043A (en) Video significance detection method based on deep network
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN116363489A (en) Copy-paste tampered image data detection method, device, computer and computer-readable storage medium
CN115222998A (en) Image classification method
CN114511785A (en) Remote sensing image cloud detection method and system based on bottleneck attention module
CN109447897B (en) Real scene image synthesis method and system
CN111209886A (en) Rapid pedestrian re-identification method based on deep neural network
CN110866552A (en) Hyperspectral image classification method based on full convolution space propagation network
US20190156182A1 (en) Data inference apparatus, data inference method and non-transitory computer readable medium
CN112529081B (en) Real-time semantic segmentation method based on efficient attention calibration
Xu et al. Steganography algorithms recognition based on match image and deep features verification
CN114596466A (en) Multi-modal image missing completion classification method based on tensor network model
CN113095328A (en) Self-training-based semantic segmentation method guided by Gini index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220107