CN116597151B - Unsupervised semantic segmentation method based on fine-grained feature grouping - Google Patents

Unsupervised semantic segmentation method based on fine-grained feature grouping Download PDF

Info

Publication number
CN116597151B
CN116597151B CN202310871120.8A CN202310871120A CN116597151B CN 116597151 B CN116597151 B CN 116597151B CN 202310871120 A CN202310871120 A CN 202310871120A CN 116597151 B CN116597151 B CN 116597151B
Authority
CN
China
Prior art keywords
segmentation
feature
image
unsupervised
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310871120.8A
Other languages
Chinese (zh)
Other versions
CN116597151A (en
Inventor
于潇丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yaxin Software Co ltd
Original Assignee
Nanjing Yaxin Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yaxin Software Co ltd filed Critical Nanjing Yaxin Software Co ltd
Priority to CN202310871120.8A priority Critical patent/CN116597151B/en
Publication of CN116597151A publication Critical patent/CN116597151A/en
Application granted granted Critical
Publication of CN116597151B publication Critical patent/CN116597151B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised semantic segmentation method based on fine-grained feature grouping, which comprises the steps of extracting features extracted by a convolutional neural network by using a capsule network, reserving important information, reducing interference of irrelevant information and improving the overall segmentation effect. On the other hand, the super-pixel segmentation is used for pre-segmenting the picture, and the boundaries of the pre-segmented areas are used for guiding the subsequent pixel characteristics to carry out fine granularity grouping, so that the problem of poor segmentation in the case of fuzzy boundary is solved. According to the invention, the subsequent advanced semantic information is subjected to fine-grained feature grouping according to the segmentation boundary and is used as the input of a Capsule layer, so that the number of capsules can be reduced; center difference is carried out in each super pixel block, so that detail information of a segmentation edge in the pixel block can be highlighted; the differential image and the multi-scale feature map are fused and are jointly used as the input of the capsule layer, so that the expression capacity of the network to the segmentation boundary can be improved.

Description

Unsupervised semantic segmentation method based on fine-grained feature grouping
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an unsupervised semantic segmentation method based on fine-grained feature grouping.
Background
Image semantic segmentation tasks are an important component in the field of computer vision. It aims to classify each pixel point in an image by using artificial intelligence technology, so that a region of interest in the image is highlighted. The method is widely applied to scenes such as face recognition, license plate recognition, satellite image analysis, automatic driving, man-machine interaction, video processing and the like. Most of the conventional semantic segmentation methods are supervised, pixel-level labeling is required to be performed on a target area in an image in advance, and the amount of data required for training is large, which often requires a great deal of manpower and time. Meanwhile, since the labeled category is fixed, the learned model is limited to a few labeled categories, and the model cannot be generalized to unknown categories. The unsupervised semantic segmentation method can well solve the problem, and can realize end-to-end semantic segmentation without data annotation. However, the segmentation effect of the unsupervised semantic segmentation is far less than that of the supervised semantic segmentation, and how to improve the accuracy of the unsupervised semantic segmentation becomes an important research direction.
The main detection ideas of the existing disclosed unsupervised semantic segmentation method are as follows:
an unsupervised semantic segmentation method and system for large-scale data are provided in a chinese patent application (hereinafter referred to as patent 1) with publication number CN 202110600887. Firstly, acquiring a plurality of images to be segmented; inputting the acquired image into a segmentation network model to obtain a semantic segmentation result; the segmentation network model is trained in an unsupervised mode, and the training process is as follows: performing characterization learning based on a pixel attention mechanism on the acquired training image to obtain an image characterization result; clustering is carried out according to the obtained image characterization result, and a plurality of pseudo tags are obtained; training a segmentation network model according to the obtained pseudo tag; according to the method, through a pixel attention mechanism and a pixel alignment mechanism, foreground saliency information generated based on an unsupervised method is used for supervising the learning of the pixel attention mechanism, so that the efficiency and the precision of semantic segmentation are improved.
Paper PicE Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp.16794-16804 (hereinafter referred to as paper 1) proposes an unsupervised semantic segmentation method using invariance and isodegeneration in clustering that incorporates photometric invariance and geometric isodegeneration into a deep convolution framework to learn advanced semantic concepts without requiring hyper-parameters and pre-processing. The algorithm flow comprises the following steps: firstly, carrying out luminosity invariance, geometric invariance and convolution neural network on an input data set to obtain the characteristic of each pixel of an image to obtain a characteristic vector, and then clustering the characteristic vector by using K-means to obtain K clustering centers and labels corresponding to each pixel. After all training images are clustered, all pixels have corresponding pseudo labels, and finally the pseudo labels are used for supervision training, so that a stable result is finally obtained.
The unsupervised semantic segmentation belongs to the pixel-level classification problem, the traditional method depends on the consistency of segmentation target semantics to a great extent in a processing mode, and the method adopted for clustering the features is suitable for single-label and object-centered images, and has poor segmentation effect on multi-label, scene-centered or target object-smaller images. Meanwhile, the whole unsupervised training process can be simultaneously iterated by the classification network and the clustering algorithm, and if no effective constraint conditions and training strategies exist, the training effect is difficult to guarantee.
Among the common unsupervised semantic segmentation methods, end-to-end methods often learn the clustering function by imposing consistency on the clustering assignment of pixels in the image enhancement view. However, these methods tend to lock low-level image cues, such as color or texture, and clustering relies strongly on initialization of the network, which all can create difficulties for training of the network. Yet another bottom-up approach utilizes low-level or mid-level visual priors such as edge detection or saliency estimation to find image regions that may share the same semantics and uses the image regions to learn pixel embedding that captures the semantic information. In this way, the image region acts as a regularizer, eliminating the dependency of segmentation on network initialization. The pixel embedding is then clustered by K-means, etc. to obtain image segmentation. Although the bottom-up methods achieve better results, they suffer from a number of drawbacks: the reliance on hand-made priors (e.g., edges or salience) to group pixels limits their use. For example, saliency estimation is only applicable to object-centric images. In addition, some works require markers to identify the appropriate image area.
The pixel level characterization learning and clustering of images using the pixel attention mechanism and SwAV in patent 1 is performed and the resulting pseudo-labels of all pixels are used to guide the segmentation model (modified deeplabv3++) for training. In the method, although the method is more specific than learning image level characterization, in end-to-end learning, clustering at the pixel level is easy to directly focus on low-level image features (such as color, contrast and the like) and higher-level semantic information is ignored.
In paper 1, the image is transformed based on photometric invariance and geometric alike. Luminosity invariance means that when the illumination intensity of an image slightly shakes, pixels at the same position should be divided into the same label and not changed, namely, the characteristic representation obtained after each pixel point is subjected to two different luminosity transformations should be the same; the geometric isomorphism refers to that when a picture is enlarged and reduced, the label result of the object should be an enlarged and reduced version of the original image after processing. By constraining the invariant and the invariant, and using the roll-in neural network to perform feature representation, the overall segmentation effect can be improved. In the scheme, the resnet-18 is used for characteristic representation, and the convolutional neural network is insensitive to small changes of illumination, position and the like, so that the changes of illumination, geometric transformation and the like need to be performed in advance to construct and the like to constrain the clusters. In addition, the mode of extracting the features layer by the convolutional neural network is not beneficial to subsequent clustering as the meaning of the deepened features of the network is not clear.
Disclosure of Invention
In order to solve the problems, the invention discloses an unsupervised semantic segmentation method based on fine-grained feature grouping, which comprises the steps of extracting features extracted by a convolutional neural network by using a capsule network, reserving important information, reducing interference of irrelevant information and improving the overall segmentation effect. On the other hand, the super-pixel segmentation is used for pre-segmenting the picture, and the boundaries of the pre-segmented areas are used for guiding the subsequent pixel characteristics to carry out fine granularity grouping, so that the problem of poor segmentation in the case of fuzzy boundary is solved.
The "capsules" in the capsule network represent various features of a particular entity in the image, such as position, size, orientation, speed, hue, texture, etc., as a single logical unit. Then, using a protocol routing algorithm, when a capsule passes its own learned and predicted data to a higher level capsule, if the predictions agree, the higher level capsule becomes active, a process called dynamic routing. With the continued iteration of the routing mechanism, various capsules can be trained into units that learn different ideas. Meanwhile, the capsule network requires the model to learn feature variables in the capsule, and the valuable information is reserved to the maximum extent, so that the obtained pixel features are more easily clustered compared with CNN by using the model as a network for representing the image features.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
an unsupervised semantic segmentation method based on fine granularity feature grouping comprises the following steps:
step 1, obtaining an image to be segmented;
step 2, inputting the image into an unsupervised semantic segmentation model to obtain a semantic segmentation result, and specifically comprising the following steps:
step 2-1, extracting a feature map, which comprises the following steps:
the method comprises the steps that an Encoder module in an unsupervised semantic segmentation model adopts a DCNN with ASPP to conduct multi-scale feature extraction on an image, and a multi-scale feature map is obtained;
the original image is convolved by 7x7 to obtain a low-level semantic feature map;
step 2-2, using super-pixel segmentation as a priori, and carrying out fine-grained feature grouping and fusion on the feature map by using an FFG-Capsule module; the method comprises the following steps:
super-pixel segmentation is carried out on an original input image by using a slec method, and a segmentation boundary based on low-level semantic information is obtained;
taking the segmentation boundary as a priori, and carrying out fine-granularity feature grouping on the low-level semantic feature map according to the segmentation boundary of the super-pixel block, wherein each feature block is divided into a group; performing center difference processing in the feature block to obtain a center difference diagram;
performing feature grouping on the multi-scale feature map according to the segmentation boundary of the super pixel block;
inputting the multi-scale feature images subjected to feature grouping and the central difference image into a Capsule layer together for feature screening to obtain a semantic feature image;
and 2-3, sending the semantic feature map output by the FFG-Capsule module to a Decoder module to obtain a segmentation result.
Furthermore, the unsupervised semantic segmentation model is trained in an unsupervised manner.
Further, when performing unsupervised training, based on the feature map output by the FFG-Capsule module, clustering pixel-level features of a plurality of images by using k-means to obtain a cluster map of each image, and taking the cluster map as a pseudo tag of a network to participate in unsupervised semantic segmentation model training.
Further, the model training process further comprises a Auxiliary Decoder module, wherein the largest vector among the plurality of Capsule vectors output by the FFG-Capsule module is input into the Auxiliary Decoder module, images are reconstructed through 2-layer full connection, and training of the network is supervised based on the difference between the reconstructed images and the original images.
Further, the Decoder module includes a 3x3 convolutional layer and a softmax module.
The beneficial effects of the invention are as follows:
according to the invention, the ultra-pixel segmentation is carried out on the original image by using the slec method, so that a segmentation boundary based on low-level semantic information is obtained, and the subsequent high-level semantic information is subjected to fine-granularity feature grouping according to the segmentation boundary and is used as the input of a Capsule layer, so that the number of capsules can be effectively reduced.
The prior art, the Resnet50 uses maximum pooling to downsample image features, which may result in partial image information loss. The slice algorithm performs image segmentation according to the semantic information of the shallow layer, so that the problem of excessive segmentation can occur, and as the network structure deepens, the obtained advanced semantic information is not clearly supervised during clustering, so that the segmentation effect can be poor. In order to solve the problem, the invention takes the result of the segmentation of the slec according to low-level semantic information as a priori, fuses the result into the whole network, and carries out center difference in each super-pixel block so as to highlight the detail information of the segmentation edges in the pixel blocks. And fusing the differential image and the multi-scale feature map to be used as the input of a capsule layer, selecting representative multi-scale features through the capsule layer, merging the segmentation boundaries, improving the expression capacity of the network on the segmentation boundaries and improving the segmentation effect of the region boundaries.
According to the invention, capsuleNet decoder is used as an auxiliary network, so that the training of the segmentation network is supervised, and the model segmentation effect is improved.
Drawings
Fig. 1 is a schematic diagram of a model architecture in an unsupervised semantic segmentation method based on fine-grained feature grouping provided by the invention.
Description of the embodiments
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
The invention provides an unsupervised semantic segmentation method based on fine granularity feature grouping, the whole realization architecture is shown in figure 1, and the unsupervised semantic segmentation method comprises the following steps:
and step 1, acquiring an image to be segmented.
And 2, inputting the image into an unsupervised semantic segmentation model to obtain a semantic segmentation result.
The whole framework of the unsupervised semantic segmentation model is based on deeplabv3+, changes the Encoder Encoder and Decoder Decoder modules, and an auxiliary network is added. The training of the model adopts an unsupervised mode.
The network structure of the unsupervised semantic segmentation model is shown in fig. 1, and is integrally composed of an Encoder, a Decoder and a Auxiliary Decoder auxiliary Decoder. The Encoder part carries out multi-scale feature extraction on the image by using a DCNN with ASPP (Atrous Spatial Pyramid Pooling cavity space pyramid pooling) in the deeplabv & lt3+ & gt to obtain a multi-scale feature map. In architecture, the backbone network is a resnet50. For the multi-scale feature map, the original deeplabv3+ is directly spliced, and then 1x1 convolution is connected to carry out multi-scale fusion. In order to reduce the interference of the background, the obtained pixel characteristics are more representative, and the accuracy of subsequent clustering is improved. In the process of fusing the multi-scale features, the invention uses an FFG-Capsule module (Fine-grained feature grouping-Capsule Mobile, abbreviated as FFG-Capsule) to fuse and screen the features.
The input of the FFG-Capsule module mainly comprises two parts, wherein one part is low-level semantic information, namely a feature map (a first convolution layer of a resnet 50) U1 obtained by 7x7 convolution of an original image; the other part is the characteristic diagram output by the ASPP module. The original resnet50 second layer would be a maxpooling layer that can get global information and reduce feature dimensions, but also results in image information loss, especially blurring edge information of semantic segmentation. In order to improve the segmentation effect of the region boundary, the patent fuses the processed feature map U1 with the feature map output by the ASPP module. The processing mode of the characteristic diagram U1 is as follows: the method comprises the steps of firstly, performing super-pixel segmentation on an original input image by using a slec (simple linear iterative clustering) algorithm to obtain a plurality of segmentation blocks and segmentation boundaries, and taking the segmentation boundaries as priori. Typically, the segmentation boundaries derived from low-level semantics often include the final segmentation boundaries. After the slic segmentation, the image is segmented into super-pixel blocks (256 segments in a preset number) with a size, and a segmentation boundary based on low-level semantic information is acquired. And carrying out fine granularity feature grouping on the feature map U1 according to the super pixel blocks, wherein each feature block is divided into a group. To enhance the boundary information, a center difference (similar to CDC) is performed within the feature block. And then, carrying out feature grouping on the multi-scale feature map obtained by ASPP according to the segmentation boundary of the super pixel block, taking the multi-scale feature map and the central difference map as input, and carrying out feature screening by using a Capsule layer so as to obtain a more representative semantic feature map.
For the feature map output by the FFG-Capsule module, the feature map is sent to the Decoder module, and a segmentation map with the same size as the input image is obtained through a 3x3 convolution and softmax activation function. And during unsupervised training, obtaining pixel-level features of a plurality of images, and clustering the features by using k-means to obtain a cluster map of each image, wherein the cluster map is used as a pseudo tag of a network to participate in training.
In addition, in order to improve the segmentation effect, the patent adds a Auxiliary Decoder module after the Encoder. The module is similar to the Decoder in a capsule net. Taking the vector with the largest model in a plurality of Capsule vectors output by the FFG-Capsule module as input, reconstructing an image through 2-layer full connection, and supervising the training of the network by the difference between the image and the original image. The module only participates in training and does not participate in reasoning.
The loss function of the invention in model training comprises two: one is a segmentation network, which calculates the cross entropy loss between the segmentation map and the pseudo-labels; the other is an auxiliary network that calculates the euclidean distance between the original image and the reconstructed image.
It should be noted that the foregoing merely illustrates the technical idea of the present invention and is not intended to limit the scope of the present invention, and that a person skilled in the art may make several improvements and modifications without departing from the principles of the present invention, which fall within the scope of the claims of the present invention.

Claims (5)

1. An unsupervised semantic segmentation method based on fine granularity feature grouping is characterized by comprising the following steps:
step 1, obtaining an image to be segmented;
step 2, inputting the image into an unsupervised semantic segmentation model to obtain a semantic segmentation result, and specifically comprising the following steps:
step 2-1, extracting a feature map, which comprises the following steps:
the method comprises the steps that an Encoder module in an unsupervised semantic segmentation model adopts a DCNN with ASPP to conduct multi-scale feature extraction on an image, and a multi-scale feature map is obtained;
the original image is convolved by 7x7 to obtain a low-level semantic feature map;
step 2-2, using super-pixel segmentation as a priori, and carrying out fine-grained feature grouping and fusion on the feature map by using an FFG-Capsule module; the method comprises the following steps:
super-pixel segmentation is carried out on an original input image by using a slec method, and a segmentation boundary based on low-level semantic information is obtained;
taking the segmentation boundary as a priori, and carrying out fine-granularity feature grouping on the low-level semantic feature map according to the segmentation boundary of the super-pixel block, wherein each feature block is divided into a group; performing center difference processing in the feature block to obtain a center difference diagram;
performing feature grouping on the multi-scale feature map according to the segmentation boundary of the super pixel block;
inputting the multi-scale feature images subjected to feature grouping and the central difference image into a Capsule layer together for feature screening to obtain a semantic feature image;
and 2-3, sending the semantic feature map output by the FFG-Capsule module to a Decoder module to obtain a segmentation result.
2. The fine-grained feature grouping based unsupervised semantic segmentation method according to claim 1, wherein the unsupervised semantic segmentation model is trained in an unsupervised manner.
3. The unsupervised semantic segmentation method based on fine granularity feature grouping according to claim 2, wherein when unsupervised training is performed, based on a feature map output by an FFG-Capsule module, clustering pixel-level features of a plurality of images by using k-means to obtain a cluster map of each image, and taking the cluster map as a pseudo tag of a network to participate in unsupervised semantic segmentation model training.
4. The method for unsupervised semantic segmentation based on fine-grained feature grouping according to claim 1 or 2, further comprising a Auxiliary Decoder module during model training, wherein the largest vector among the plurality of Capsule vectors outputted by the FFG-Capsule module is inputted into the Auxiliary Decoder module, the image is reconstructed through 2-layer full connection, and training of the network is supervised based on the difference between the reconstructed image and the original image.
5. The fine granularity feature grouping based unsupervised semantic segmentation method according to claim 1, wherein the Decoder module comprises a 3x3 convolutional layer and softmax module.
CN202310871120.8A 2023-07-17 2023-07-17 Unsupervised semantic segmentation method based on fine-grained feature grouping Active CN116597151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310871120.8A CN116597151B (en) 2023-07-17 2023-07-17 Unsupervised semantic segmentation method based on fine-grained feature grouping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310871120.8A CN116597151B (en) 2023-07-17 2023-07-17 Unsupervised semantic segmentation method based on fine-grained feature grouping

Publications (2)

Publication Number Publication Date
CN116597151A CN116597151A (en) 2023-08-15
CN116597151B true CN116597151B (en) 2023-09-26

Family

ID=87611990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310871120.8A Active CN116597151B (en) 2023-07-17 2023-07-17 Unsupervised semantic segmentation method based on fine-grained feature grouping

Country Status (1)

Country Link
CN (1) CN116597151B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488132A (en) * 2020-12-18 2021-03-12 贵州大学 Semantic feature enhancement-based fine-grained image classification method
CN113111916A (en) * 2021-03-15 2021-07-13 中国科学院计算技术研究所 Medical image semantic segmentation method and system based on weak supervision
CN113160246A (en) * 2021-04-14 2021-07-23 中国科学院光电技术研究所 Image semantic segmentation method based on depth supervision
CN115482387A (en) * 2022-09-28 2022-12-16 山东聚祥机械股份有限公司 Weak supervision image semantic segmentation method and system based on multi-scale class prototype

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488132A (en) * 2020-12-18 2021-03-12 贵州大学 Semantic feature enhancement-based fine-grained image classification method
CN113111916A (en) * 2021-03-15 2021-07-13 中国科学院计算技术研究所 Medical image semantic segmentation method and system based on weak supervision
CN113160246A (en) * 2021-04-14 2021-07-23 中国科学院光电技术研究所 Image semantic segmentation method based on depth supervision
CN115482387A (en) * 2022-09-28 2022-12-16 山东聚祥机械股份有限公司 Weak supervision image semantic segmentation method and system based on multi-scale class prototype

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Coarse-to-fine Capsule Network for Fine-grained Image Categorization";zhongqi等;《ResearchGate》;第1-10页 *
"基于卷积神经网络的零件识别及姿态估计";李昌明;《中国优秀硕士论文电子期刊网》;第33-43页 *

Also Published As

Publication number Publication date
CN116597151A (en) 2023-08-15

Similar Documents

Publication Publication Date Title
Hafiz et al. A survey on instance segmentation: state of the art
Lim et al. Learning multi-scale features for foreground segmentation
WO2022000426A1 (en) Method and system for segmenting moving target on basis of twin deep neural network
Spencer et al. Defeat-net: General monocular depth via simultaneous unsupervised representation learning
Pandey et al. Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images
Zhang et al. Self-supervised visual representation learning from hierarchical grouping
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
Zhou et al. Multi-scale context for scene labeling via flexible segmentation graph
Le et al. Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos.
Gong et al. Advanced image and video processing using MATLAB
Geng et al. Using deep learning in infrared images to enable human gesture recognition for autonomous vehicles
Li et al. ComNet: Combinational neural network for object detection in UAV-borne thermal images
Hurtado et al. Semantic scene segmentation for robotics
Perreault et al. FFAVOD: Feature fusion architecture for video object detection
CN113221770A (en) Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
Maggiolo et al. Improving maps from CNNs trained with sparse, scribbled ground truths using fully connected CRFs
Liang et al. Cross-scene foreground segmentation with supervised and unsupervised model communication
Tsutsui et al. Distantly supervised road segmentation
Muzammul et al. A survey on deep domain adaptation and tiny object detection challenges, techniques and datasets
Sravani et al. Robust detection of video text using an efficient hybrid method via key frame extraction and text localization
CN116597151B (en) Unsupervised semantic segmentation method based on fine-grained feature grouping
Patel et al. A novel approach for detecting number plate based on overlapping window and region clustering for Indian conditions
Girisha et al. Semantic segmentation with enhanced temporal smoothness using crf in aerial videos
Seth et al. State of the art techniques to advance deep networks for semantic segmentation: A systematic review
Moussaoui et al. Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant