CN114758135A - Unsupervised image semantic segmentation method based on attention mechanism - Google Patents
Unsupervised image semantic segmentation method based on attention mechanism Download PDFInfo
- Publication number
- CN114758135A CN114758135A CN202210504797.3A CN202210504797A CN114758135A CN 114758135 A CN114758135 A CN 114758135A CN 202210504797 A CN202210504797 A CN 202210504797A CN 114758135 A CN114758135 A CN 114758135A
- Authority
- CN
- China
- Prior art keywords
- image
- pixel
- semantic segmentation
- feature
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
An attention mechanism-based unsupervised image semantic segmentation method comprises the steps of removing part of redundant background information of an RGB image through an attention module; and extracting image semantic information by using an unsupervised image semantic segmentation network, and marking the same label on pixels belonging to the same category in the image so as to realize the extraction of the image semantic information. The method can be used for solving the problems of labor waste, segmentation precision reduction and the like caused by the existence of a large amount of honor background information in the unsupervised image semantic segmentation.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an unsupervised image semantic segmentation method based on an attention mechanism.
Background
As the amount of image data increases from blowout, the amount of work required for image processing has correspondingly proliferated. In order to reduce the workload as much as possible, researchers continuously develop more automatic and more accurate image processing algorithms, and the purpose is to enable a sensor of a machine to be close to the recognition capability of human eyes, and to autonomously make analysis and judgment according to the 'seeing' condition, so that the burden of people is reduced.
As one of the important characteristics of the human visual system, the visual attention mechanism enables a human to quickly select a few salient objects from a complex scene for attention, and is an important means for the human to process a large amount of external information with limited resources. In view of the high-efficiency data screening capability, the visual attention mechanism is introduced into the field of computer information processing, particularly the field of image processing needing to calculate mass data, and has very important theoretical value and practical significance. The attention mechanism is derived from the research of experts on human attention, and after the deep learning is rapidly developed, the attention mechanism becomes a core technology in the fields of natural language processing, statistical learning, image detection and the like in a wide range of fields which are widely applied to the deep learning. The structural model based on the attention mechanism not only can record the position relation among information, but also can measure the importance of different information characteristics according to the weight of the information. The dynamic weight parameters are established by making decisions on correlation and irrelevance on the information characteristics to strengthen the weakening of useless information by key information, so that the efficiency of the deep learning algorithm is improved, and meanwhile, some defects of the traditional deep learning are improved.
The image semantic segmentation refers to an image segmentation technology for classifying each pixel point according to semantic content expressed by the pixel point in an image. The image semantic segmentation can greatly compress the image memory on the premise of only retaining the image semantic content, is an important application of semantic communication, and is also one of the most important basic technologies of visual intelligence direction. The semantic segmentation effect is related to the understanding capability of the intelligent system to the application scene of the intelligent system, so that the semantic segmentation effect has great application value in important fields such as unmanned driving, robot cognition and navigation, security monitoring, unmanned aerial vehicle landing systems and the like. However, the above task typically requires a large amount of marking data that matches the scene under consideration to achieve reliable performance. Collecting and labeling large data sets for each new task and domain is very expensive, time consuming, and error prone. Furthermore, in many cases, for various reasons, sufficient training data may not be available, and the large amount of data for other areas and tasks is somewhat relevant to the task in question. These considerations are particularly true for semantic segmentation, where the learning framework requires a large amount of manually labeled data, which is very expensive to acquire. In order to solve the problem of difficult training data labeling, more and more flexible unsupervised semantic segmentation methods with stronger expansibility are designed to be paid more and more attention, and unsupervised semantic segmentation is a future development trend.
Although the current semantic segmentation technology has achieved remarkable results, due to the fact that development time is short and complexity is high, the semantic segmentation technology is completely applied to actual life, and many problems such as insufficient segmentation precision and low algorithm efficiency are urgently needed to be solved. Therefore, the method has important significance for the development of computer vision technology aiming at the research and the improvement of the semantic segmentation algorithm.
Disclosure of Invention
In order to overcome the defects that the traditional unsupervised image semantic segmentation technology needs a large amount of computing power and causes computing power waste to a certain extent aiming at the images with a large amount of redundant background information, the invention provides an unsupervised image semantic segmentation method based on an attention mechanism, wherein the image redundant background information is reduced by utilizing a space attention mechanism, a key area of the image is reserved, an unsupervised image semantic segmentation model is used as a segmentation model to segment the images, and the existing unsupervised image semantic segmentation efficiency and precision are effectively improved.
In order to solve the technical problem, the invention adopts the following technical scheme:
an unsupervised image semantic segmentation method based on an attention mechanism comprises the following steps:
s1: obtaining an affine transformation matrix theta, firstly, initializing the theta into an identity transformation matrix, and continuously correcting parameters of the theta through a loss function to finally obtain an expected affine transformation matrix;
S2: after the RGB image U is input, the position of the coordinate point of the input image U corresponding to the coordinate point of the feature map V is calculated according to the affine transformation matrix obtained at the previous stage, and the calculation method is as follows:
wherein, the first and the second end of the pipe are connected with each other,representing the position of the pixel, s representing an input feature image coordinate point, t representing an output feature map coordinate point, AθIs affine transformation matrix obtained in the S1 stage;
s3: the gray value of a certain specific pixel point in the output characteristic graph is calculated by utilizing an interpolation mode, and the calculation method comprises the following steps:
where W and H represent the width and height of the input image, Vi cIs the position in the channel cThe gray value of the pixel i of (a),the gray value of the c channel point (n, m) on the input feature map is obtained;
s4: extraction of deep features { x from input images using a feature extraction modulen};
S5: one-dimensional (1D) convolutional layer computation q-dimensional classFeature response vector in the class space rn};
S6: feature response vector rnObtaining r 'on each axis of pixel class space by using Batch Normalization function (Batch Normalization)'nH, make { r'nHas zero mean and unit variance;
s7: using argmax function, choose to be at { r'nThe dimension with the maximum value in the pixel is determined as the class label of each pixel cn};
S8: calculating a loss function and performing back propagation to update parameters, wherein the loss function is composed of characteristic similarity loss and space continuity loss, and mu represents a weight loss function for balancing the two loss functions and is defined as follows:
L=Lsim({r′n,cn})+μLcon({r′n}) (3)
Wherein the feature similarity loss function is as follows:
wherein, the first and the second end of the pipe are connected with each other,
where N is the total number of pixels in the input image, the response map { r }n=WcxnIs obtained by applying a linear classifier, where { W }c∈Rq×pIs normalized to { r'n};
The spatial continuity loss function is defined as follows:
r 'in the formula'ξ,ηRepresentative response map { r'nThe pixel value at (ξ, η);
by applying a loss of spatial continuity, too many pixel labels due to complex patterns or textures are removed.
Further, in the step S1, the affine transformation matrix θ is a 2 × 3 matrix in the two-dimensional image.
Still further, in step S2, the coordinate mapping relationship is that the target picture is mapped to the input picture, because the coordinate mapping needs to collect pixels from different coordinates of the original picture to the target picture, the coordinates of the target picture need to be traversed each time sampling, and the coordinates of the collected original picture are not fixed, so that the corresponding coordinate point of the coordinates of each position of the transformed output feature map on the input feature map can be obtained.
Further, in the step S3, whenOrIf greater than 1, the corresponding max () entry will take 0, so only (x)i,yi) The gray scale value of 4 surrounding points determines the gray scale of the target pixel point, and when the gray scale value is less than the gray scale value of 4 surrounding points Andthe smaller the influence (i.e. the closer to the point (n, m)), the greater the weight.
Further, in step S8, the objective behind the feature similarity loss function is to enhance the similarity of similar features, and once image pixels are clustered according to their features, feature vectors within the same class should be similar to each other, while feature vectors of different classes should be different from each other, and through the minimization of this loss function, the network weights are updated to facilitate extracting more effective features for classification.
The invention has the beneficial effects that: the redundant background information of the image is reduced by utilizing a space attention mechanism, the key area of the image is reserved, the unsupervised image semantic segmentation model is used as a segmentation model to segment the image, and the semantic segmentation efficiency and precision of the conventional unsupervised image are effectively improved.
Drawings
FIG. 1 is a flow chart of an unsupervised image semantic segmentation method based on an attention mechanism according to the present invention.
FIG. 2 is a schematic flow chart of an unsupervised image semantic segmentation method based on the attention mechanism according to another embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
Referring to fig. 1 and 2, an unsupervised image semantic segmentation method based on an attention mechanism includes the following steps:
s1: obtaining an affine transformation matrix theta, firstly, initializing the theta into an identity transformation matrix, continuously correcting parameters of the theta through a loss function, and finally obtaining an expected affine transformation matrix, wherein the affine transformation matrix theta is a 2 x 3 matrix in a two-dimensional image;
s2: after the RGB image U is input, calculating the position of the coordinate point of the input image U corresponding to the coordinate point of the characteristic diagram V according to the affine transformation matrix obtained in the previous stage, wherein the calculating method comprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,representing the position of the pixel, s representing the input feature image coordinate point, t representing the output feature map coordinate point, AθIs the affine transformation matrix obtained in the first stage, by which stepThe coordinates of each position of the transformed output characteristic diagram correspond to coordinate points on the input characteristic diagram;
s3: the gray value of a certain specific pixel in the output feature map is calculated by an interpolation method, and the calculation method is as follows:
where W and H represent the width and height of the input image, Vi cIs the position in the channel cThe gray value of the pixel i of (a),is the gray value of the c-th channel point (n, m) on the input feature map. When in use Or alternativelyAbove 1, the corresponding max () item will take 0, that is, only (x)i,yi) The gray values of the surrounding 4 points determine the gray value of the target pixel point, and whenAndthe smaller, the larger the influence (i.e., the closer to point (n, m)), the larger the weight;
s4: extracting deep features { x) from an input image using a feature extraction modulen};
S5: one-dimensional (1D) convolutional layer computation of response vectors for features in a q-dimensional class space { r }n};
S6: the response vector is obtained by using a batch normalization function on each axis of the category space to obtain r'nH, make { r'nHas zero mean and unit variance;
s7: by using the argmax function, select at { r'nThe dimension with the maximum value in the pixel is obtained as the class label of each pixel cn};
S8: calculating a loss function and performing back propagation to update parameters, wherein the loss function consists of characteristic similarity loss and spatial continuity loss, and mu represents a weight loss function for balancing the two loss functions, and is specifically defined as follows:
L=Lsim({r′n,cn})+μLcon({r′n}) (3)
wherein the feature similarity loss function is as follows:
wherein the content of the first and second substances,
in the formula cnIs a class label, is determined by assigning a class ID to a response vector using the argmax function, N is the total number of pixels in the input image, the response map { r }n=WcxnIs obtained by applying a linear classifier, where { W } c∈Rq ×p} then the response map is normalized to { r'n- { r'nWith zero mean and unit variance, the goal behind this loss function is to enhance the similarity of similar features, and once image pixels are clustered according to their features, the feature vectors within the same class should be similar to each other, while the feature vectors of different classes should be different from each other, and through minimization of this loss function, the network weights are updated to facilitate extraction of more efficient features for classification;
the spatial continuity loss function is defined as follows:
r 'in the formula'ξ,ηRepresentative response map { r'nThe pixel value at (ξ, η);
by applying a loss of spatial continuity, excess pixel labels due to other reasons such as complex patterns or textures can be deleted to some extent.
To have the loss function propagate backwards, the associated gradient is defined as follows:
The method provided by the embodiment can optimize the problem of computing power waste caused by overlarge redundant background area of the traditional unsupervised image semantic segmentation method, realizes the process of generating the semantic image by using the spatial attention mechanism algorithm as an image redundancy removing tool and using the unsupervised image semantic segmentation algorithm, and can be used for solving the problems of computing power waste and precision reduction of the unsupervised segmentation algorithm caused by the overlarge redundant information of the image background in unsupervised image semantic segmentation.
The embodiments described in this specification are merely exemplary of implementations of the inventive concepts and are provided for illustrative purposes only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the embodiments, but is to be accorded the widest scope consistent with the principles and equivalents thereof as contemplated by those skilled in the art.
Claims (5)
1. An attention-based unsupervised image semantic segmentation method is characterized by comprising the following steps of:
s1, obtaining an affine transformation matrix theta, firstly, initializing the theta into an identity transformation matrix, and continuously correcting parameters of the theta through a loss function to finally obtain an expected affine transformation matrix;
s2, after the RGB image U is input, calculating the position of the coordinate point of the input image U corresponding to the coordinate point of the characteristic diagram V according to the affine transformation matrix obtained in the previous stage, wherein the calculation method comprises the following steps:
wherein the content of the first and second substances,representing the position of the pixel, s representing the input feature image coordinate point, t representing the output feature map coordinate point, AθAffine transformation obtained in the step S1;
s3, calculating the gray value of a specific pixel point in the output characteristic graph by using an interpolation mode, wherein the calculation method comprises the following steps:
Where W and H represent the width and height of the input image,is the position in the channel cThe gray value of the pixel i of (a),the gray value of the c channel point (n, m) on the input feature map is obtained;
s4, extracting deep features { x ] from the input image by using the feature extraction modulen};
S5 one-dimensional (1D) convolutional layer computing a feature response vector { r ] in a q-dimensional class spacen};
S6 feature response vector rnObtaining r 'on each axis of pixel class space by using Batch Normalization function (Batch Normalization)'nH, make { r'nHas zero mean and unit variance;
s7 selecting at { r'nThe dimension with the maximum value in the pixel is determined as the class label of each pixel cn};
S8, calculating a loss function and performing back propagation to update parameters, wherein the loss function is composed of characteristic similarity loss and space continuity loss, mu represents a weight loss function for balancing the two loss functions and is defined as follows:
L=Lsim({r′n,cn})+μLcon({r′n}) (3)
wherein the feature similarity loss function is as follows:
wherein the content of the first and second substances,
where N is the total number of pixels in the input image, the response map { r }n=WcxnIs obtained by applying a linear classifier, where { W }c∈Rq×pIs normalized to { r'n};
The spatial continuity loss function is defined as follows:
r 'in the formula'ξ,ηRepresentative response map { r' nThe pixel value at (ξ, η);
by applying a loss of spatial continuity, too many pixel labels due to complex patterns or textures are removed.
2. The unsupervised image semantic segmentation method based on the attention mechanism as claimed in claim 1, wherein in the step S1, the affine transformation matrix θ is a 2 x 3 matrix in the two-dimensional image.
3. The method for unsupervised image semantic segmentation based on attention mechanism as claimed in claim 1 or 2, wherein in step S2, the coordinate mapping relationship is that the target picture is mapped to the input picture, because the coordinate mapping requires the pixel acquisition from different coordinates of the original image to the target picture, the coordinates of the target picture need to be traversed for each sampling, and the coordinates of the acquired original image are not fixed, so that the corresponding coordinate point of the coordinates of each position of the transformed output feature map on the input feature map can be obtained.
4. The method for unsupervised image semantic segmentation based on attention mechanism as claimed in claim 1 or 2, wherein in step S3, when the image semantic segmentation is performedOrIf greater than 1, the corresponding max () entry will take 0, so only (x)i,yi) The gray scale value of 4 surrounding points determines the gray scale of the target pixel point, and when the gray scale value is less than the gray scale value of 4 surrounding points Andthe smaller the influence (i.e., the closer to point (n, m)), the larger the weight.
5. An unsupervised image semantic segmentation method based on attention mechanism as claimed in claim 1 or 2 wherein in step S8, the goal behind the feature similarity loss function is to enhance the similarity of similar features, once image pixels are clustered according to their features, the feature vectors in the same class should be similar to each other, and the feature vectors in different classes should be different from each other, and through the minimization of this loss function, the network weight is updated to facilitate extracting more effective features for classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210504797.3A CN114758135A (en) | 2022-05-10 | 2022-05-10 | Unsupervised image semantic segmentation method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210504797.3A CN114758135A (en) | 2022-05-10 | 2022-05-10 | Unsupervised image semantic segmentation method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114758135A true CN114758135A (en) | 2022-07-15 |
Family
ID=82334627
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210504797.3A Pending CN114758135A (en) | 2022-05-10 | 2022-05-10 | Unsupervised image semantic segmentation method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114758135A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079061A (en) * | 2023-10-17 | 2023-11-17 | 四川迪晟新达类脑智能技术有限公司 | Target detection method and device based on attention mechanism and Yolov5 |
-
2022
- 2022-05-10 CN CN202210504797.3A patent/CN114758135A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079061A (en) * | 2023-10-17 | 2023-11-17 | 四川迪晟新达类脑智能技术有限公司 | Target detection method and device based on attention mechanism and Yolov5 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114241282A (en) | Knowledge distillation-based edge equipment scene identification method and device | |
CN111899172A (en) | Vehicle target detection method oriented to remote sensing application scene | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
CN111008978B (en) | Video scene segmentation method based on deep learning | |
CN109034035A (en) | Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features | |
CN111027377B (en) | Double-flow neural network time sequence action positioning method | |
CN110728694B (en) | Long-time visual target tracking method based on continuous learning | |
CN108427919B (en) | Unsupervised oil tank target detection method based on shape-guided saliency model | |
CN110443279B (en) | Unmanned aerial vehicle image vehicle detection method based on lightweight neural network | |
WO2022218396A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
CN113592894B (en) | Image segmentation method based on boundary box and co-occurrence feature prediction | |
CN115205633A (en) | Automatic driving multi-mode self-supervision pre-training method based on aerial view comparison learning | |
CN112785636A (en) | Multi-scale enhanced monocular depth estimation method | |
Alsanad et al. | Real-time fuel truck detection algorithm based on deep convolutional neural network | |
CN114758135A (en) | Unsupervised image semantic segmentation method based on attention mechanism | |
Yin | Object Detection Based on Deep Learning: A Brief Review | |
CN112668662B (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN116579616B (en) | Risk identification method based on deep learning | |
CN105844299B (en) | A kind of image classification method based on bag of words | |
CN110363240B (en) | Medical image classification method and system | |
CN111950476A (en) | Deep learning-based automatic river channel ship identification method in complex environment | |
CN108765384B (en) | Significance detection method for joint manifold sequencing and improved convex hull | |
CN116386042A (en) | Point cloud semantic segmentation model based on three-dimensional pooling spatial attention mechanism | |
CN113223037B (en) | Unsupervised semantic segmentation method and unsupervised semantic segmentation system for large-scale data | |
CN112784800B (en) | Face key point detection method based on neural network and shape constraint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |