CN114693967B - Multi-classification semantic segmentation method based on classification tensor enhancement - Google Patents

Multi-classification semantic segmentation method based on classification tensor enhancement Download PDF

Info

Publication number
CN114693967B
CN114693967B CN202210274049.0A CN202210274049A CN114693967B CN 114693967 B CN114693967 B CN 114693967B CN 202210274049 A CN202210274049 A CN 202210274049A CN 114693967 B CN114693967 B CN 114693967B
Authority
CN
China
Prior art keywords
classification
tensors
network
features
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210274049.0A
Other languages
Chinese (zh)
Other versions
CN114693967A (en
Inventor
李宏亮
高翔宇
邱奔流
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210274049.0A priority Critical patent/CN114693967B/en
Publication of CN114693967A publication Critical patent/CN114693967A/en
Application granted granted Critical
Publication of CN114693967B publication Critical patent/CN114693967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Abstract

The invention provides a multi-classification semantic segmentation method based on classification tensor enhancement, which comprises the following steps: inputting the pictures to be classified into an original segmentation network; the feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel; the N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transit part outputs the features to be classified; and cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result. The invention can be simply added into most of the divided network structures, and only a small amount of network parameter increase is brought. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.

Description

Multi-classification semantic segmentation method based on classification tensor enhancement
Technical Field
The present invention relates to multi-classification technology, and more particularly, to a technology for supporting multi-classification segmentation based on classification tensor enhancement.
Background
With the continuous development of hardware computing and deep learning, the need for high-precision pixel-level processing of images is becoming increasingly common. Image segmentation is used as the most common visual task in computer vision, and pixel-level classification of images can be realized by means of a deep neural network. In many application scenarios, image segmentation is an indispensable loop in the processing flow, such as automatic driving, augmented reality, etc., and the segmentation result directly affects the effect of downstream processing. The method is mainly used for enhancing the classification links of the segmentation network, so that the segmentation effect is improved.
Most networks implementing semantic segmentation ultimately optimize the final output multi-class tensor only by Cross Entropy (CE) loss, achieving relatively good segmentation results in most tasks. However, since CE loss only excites the prediction scores on the correct class, suppression of similar class scores is easily ignored, which in turn easily causes confusion in the network when distinguishing similar classes. Especially in specific subordinate tasks such as human body analysis, the influence of misclassification caused by confusion type on the network is not negligible. The use of the two-class header to support multiple class headers can increase the classification capability of the partitioned network, but the relation between the two-class predictions of each class is difficult to reflect through BCE loss, so that the output tensor of the two-class header can further pass through new loss constraint to realize better support of the final result.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for optimizing the classification tensor by using novel loss so that the classification tensor can better support the final result of the semantic segmentation network and realize better support of multi-classification results. The method aims at reducing the classification scores of predictions of incorrect classes in the classification tensor in each truth area and further enhancing the scores of correct predictions in each truth area. In addition, the present invention aims to further improve the classification ability of the segmentation network for similar classes. The loss is optimized by adding a simple two-class header to the existing network structure, so that the supporting effect of the two-class tensor on the final result is improved, and the average cross-over ratio of the final segmentation result is further improved.
The technical scheme adopted by the invention for solving the technical problems is that the multi-classification semantic segmentation method based on the enhancement of the classification tensor comprises the following steps:
1) Inputting the pictures to be classified into an original segmentation network; the original segmentation network comprises a feature extraction part and a multi-classification head;
2) The feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel;
3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transfer part keeps the dimension of the input features and performs simple feature conversion to output the features to be classified;
4) And cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result.
Further, the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l CE +α·L BCE +β·L B2M The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is CE And L is equal to BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters;
L B2M to enhance loss of the classification tensor support attribute, L B2M =L overlap +L missing ;L overlap To reflect unreasonably overlapping loss terms in the two classifications, L missing To reflect the loss term of the missing prediction in the two classifications.
The invention has the advantages that the method can be simply added into most of the split network structures, only a small amount of network parameter increase is brought, the improved strategy can improve the average cross ratio of the split network results, and the output tensor of the simple two-class head is further constrained by the designed B2M loss. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.
Drawings
Fig. 1: the multi-classification schematic diagram of the invention;
fig. 2: a network schematic used in the invention;
fig. 3: structure of two classification heads.
Detailed Description
Since the semantic segmentation task can be regarded as a pixel-by-pixel classification task. The conventional semantic segmentation network finally classifies each pixel by N classes through a plurality of classification heads on the premise that the label types are N classes. Considering that the classification heads can assist the classification of the N classification heads, the applicant proposes an improved strategy for enhancing the segmentation effect by strengthening the characteristics of the classification tensor. Structurally, a simple two-classification head and a transit part are designed; the penalty is designed to strengthen the two-class tensor support attribute, named B2M (Binary to Multiple) penalty.
As shown in fig. 1, we selected HRNetV2 as the baseline network for implementing the improvement strategy. Firstly, the tensor to be passed through multiple classifying heads in the original dividing network is simultaneously fed into multiple parallel two classifying heads and a transit part, the multiple parallel two classifying heads are used for converting N classifying problems into N classifying problems and outputting N classifying results, the transit part is used for maintaining the channel number of the tensor to be classified, outputting the characteristics to be classified, then the N classifying results are cascaded with the characteristics to be classified, the cascaded tensor is finally fed into the multiple classifying heads, and the multiple classifying heads are used for outputting the final N classifying results. The multi-classification head structure adds N input channels to match with the added classification tensors in the cascade based on the original network design.
For training of the network, the end result of the network is still optimized using CE loss; optimization of the binary class tensor requires the use of the proposed B2M penalty in addition to the BCE penalty.
We obtain the truth label g of the two classification heads from the multi-classification truth label by converting the N classification problem into N classification problems 1 ,g 2 ...g N . Inputting a picture, the two classification heads will output N classification foreground score maps (N classification tensors), each corresponding to a class, wherein the k is recorded as p k ,p k ∈[0,1] H×W K=1, 2..n, H and W are the high and wide of the score plot, respectively, which corresponds to the true value g k ∈{0,1} H×W 。p k The area with the foreground score larger than 0.5 is the foreground area, and the serial number of the label category actually existing in the input picture is recorded as l 1 ,l 2 ...l C C is the number of truth-value categories contained in a picture. Since the score map of each class in the two-class tensor is optimized by BCE loss alone, there may be overlap or absence of foreground regions predicted by the tensor over different classes, i.e., some pixels on the binary tensor have a prediction score over more than 0.5 or less than 0.5 over all classes. The overlapping and missing of various foreground regions can weaken the support of multi-classification results of corresponding positions.
For the first i Prediction of classes, we calculate the Overlap degree of other classes in their corresponding truth regions, overlap (l) i ) Where "sum" means summing all elements and "×" means multiplying the corresponding position elements:
obtaining Overlap degree Overlap (l) i ) It is then mapped through a non-linearity where "σ" represents the Sigmoid function and k and b are hyper-parameters:
f(x)=σ(k·x+b)-σ(b) (2)
after mapping the overlapping degree of the C categories in turn, averaging over the categories, we get the first term of B2M loss, denoted L overlap The loss term is mainly directed to suppression of unreasonable overlap in two-class prediction:
for areas lacking predictions, we further enhance the correct score of the true areas by using a similar method to calculate the cross-ratios. Note this loss as L missing The calculation method is as follows:
from this, the proposed B2M loss, L, can be calculated B2M
L B2M =L overlap +L missing (5)
The loss function of the whole network is as follows, L CE And L is equal to BCE The cross entropy loss is the multi-class header and the two-class header, respectively, and α and β are super parameters:
L=L CE +α·L BCE +β·L B2M (6)
the invention is implemented on a server containing 8 TITAN X PASCALs, and the network adopts HRNetV2 as a base, as shown in figure 2. The whole network mainly comprises a backbone network backbone, a two-classification link binary classitication head and a multi-classification link multi-classitication head. The backbone network is HRNetV2-W48 and is used for extracting characteristics; the structure of the two sorting heads is shown in fig. 3, and the transit part transformation part is similar to the structure of the two sorting heads, and is composed of 2 1×1 convolution layers, a batch standardization Batch Normalization layer and an activation function Relu layer, which are only different in the number of output channels of the last convolution layer; the multi-sort head structure is similar to baseline. The main steps of the design strategy are as follows: and obtaining two classification tensors by using a light-weight two-classification head, calculating the overlapping degree corresponding to each truth class, taking an average value on the class, calculating the average score of each truth class in the truth area, taking an average value on the class, and optimizing the two classification tensors by using the two average values.
The effect of the invention is described below by combining experimental results, and the data sets of human body analysis tasks in semantic segmentation have more similar categories, so the following three data sets are selected for experiments. The same experimental conditions were used for both training and testing by baseline and the network after policy improvement:
table 1 comparative experiments with mIoU (percent) on three data sets
The method (outer) has obvious improvement on the mIoU compared with the original split network (baseline) on three data sets (LIP, ATR, PPSS), so that the effectiveness of the strategy on the improvement of the network performance can be seen.

Claims (2)

1. The multi-classification semantic segmentation method based on the two-classification tensor enhancement is characterized by comprising the following steps of:
1) Inputting the pictures to be classified into an original segmentation network; the original segmentation network comprises a feature extraction part and a multi-classification head;
2) The feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel;
3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, the transit part keeps the channel number of the input features, and the features to be classified are output;
4) Cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as final multi-classification results;
the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l CE +α.L BCE +β.L B2M
Wherein L is CE And L is equal to BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters; l (L) B2M To enhance loss of the classification tensor support attribute, L B2M =L overlap +L missing ;L overlap To reflect unreasonably overlapping loss terms in the two classifications, L missing To reflect in two categoriesMissing a predicted penalty term;
loss term L reflecting unreasonable overlap in two classifications overlap The specific calculation method of (a) is as follows:
wherein, C represents the number of truth value categories contained in the current input picture; for the i-th truth class contained, l i Corresponding to the serial number representing the class label, f (x) represents a nonlinear mapping to the input x, f (x) =δ (k·x+b) - δ (b), δ represents a Sigmoid function, k and b are hyper-parameters; overlap (l) i ) For other categories in the first i The overlapping degree of the true value areas corresponding to the categories;
loss term L reflecting miss prediction in two classifications missing The specific calculation method of (a) is as follows:
where sum represents the sum of all elements, x represents the multiplication of the corresponding position elements,represents the first i A classification prediction score corresponding to the respective classification, +.>Representing the two classification of real tags.
2. The method of claim 1, wherein the original split network is HRNetV2.
CN202210274049.0A 2022-03-20 2022-03-20 Multi-classification semantic segmentation method based on classification tensor enhancement Active CN114693967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210274049.0A CN114693967B (en) 2022-03-20 2022-03-20 Multi-classification semantic segmentation method based on classification tensor enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210274049.0A CN114693967B (en) 2022-03-20 2022-03-20 Multi-classification semantic segmentation method based on classification tensor enhancement

Publications (2)

Publication Number Publication Date
CN114693967A CN114693967A (en) 2022-07-01
CN114693967B true CN114693967B (en) 2023-10-31

Family

ID=82138917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210274049.0A Active CN114693967B (en) 2022-03-20 2022-03-20 Multi-classification semantic segmentation method based on classification tensor enhancement

Country Status (1)

Country Link
CN (1) CN114693967B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
CN111462163A (en) * 2020-01-03 2020-07-28 华中科技大学 Weakly supervised semantic segmentation method and application thereof
WO2020192469A1 (en) * 2019-03-26 2020-10-01 腾讯科技(深圳)有限公司 Method and apparatus for training image semantic segmentation network, device, and storage medium
CN111860514A (en) * 2020-05-21 2020-10-30 江苏大学 Orchard scene multi-class real-time segmentation method based on improved deep Lab
CN112465844A (en) * 2020-12-29 2021-03-09 华北电力大学 Multi-class loss function for image semantic segmentation and design method thereof
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
WO2021097055A1 (en) * 2019-11-14 2021-05-20 Nec Laboratories America, Inc. Domain adaptation for semantic segmentation via exploiting weak labels
CN113191392A (en) * 2021-04-07 2021-07-30 山东师范大学 Breast cancer image information bottleneck multi-task classification and segmentation method and system
CN114092818A (en) * 2022-01-07 2022-02-25 中科视语(北京)科技有限公司 Semantic segmentation method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188799B2 (en) * 2018-11-12 2021-11-30 Sony Corporation Semantic segmentation with soft cross-entropy loss

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
WO2020192469A1 (en) * 2019-03-26 2020-10-01 腾讯科技(深圳)有限公司 Method and apparatus for training image semantic segmentation network, device, and storage medium
WO2021097055A1 (en) * 2019-11-14 2021-05-20 Nec Laboratories America, Inc. Domain adaptation for semantic segmentation via exploiting weak labels
CN111462163A (en) * 2020-01-03 2020-07-28 华中科技大学 Weakly supervised semantic segmentation method and application thereof
CN111860514A (en) * 2020-05-21 2020-10-30 江苏大学 Orchard scene multi-class real-time segmentation method based on improved deep Lab
CN112465844A (en) * 2020-12-29 2021-03-09 华北电力大学 Multi-class loss function for image semantic segmentation and design method thereof
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN113191392A (en) * 2021-04-07 2021-07-30 山东师范大学 Breast cancer image information bottleneck multi-task classification and segmentation method and system
CN114092818A (en) * 2022-01-07 2022-02-25 中科视语(北京)科技有限公司 Semantic segmentation method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Lian Xu 等.Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation.《Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 》.2022,4310-4319. *
Longrong Yang等.Learning with Noisy Class Labels for Instance Segmentation.《Computer Vision – ECCV 2020》.2020,38–53. *
Rosario Delgado 等.Enhancing Confusion Entropy (CEN) for binary and multiclass classification.《PLOS ONE》.2019,1-30. *
张宏钊 等.基于加权损失函数的多尺度对抗网络图像语义分割算法.《计算机应用与软件》.2020,(第01期),290-297. *
王珊.基于多尺度卷积对核磁图像分割的研究.《中国优秀硕士学位论文全文数据库医药卫生科技辑》.2021,(第05期),E060-49. *

Also Published As

Publication number Publication date
CN114693967A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN111046962B (en) Sparse attention-based feature visualization method and system for convolutional neural network model
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
Simo-Serra et al. Mastering sketching: adversarial augmentation for structured prediction
CN109886121B (en) Human face key point positioning method for shielding robustness
Zhao et al. Document image binarization with cascaded generators of conditional generative adversarial networks
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
Lopes et al. Automatic histogram threshold using fuzzy measures
CN110163286B (en) Hybrid pooling-based domain adaptive image classification method
CN110555060A (en) Transfer learning method based on paired sample matching
CN110929099B (en) Short video frame semantic extraction method and system based on multi-task learning
CN111126115A (en) Violence sorting behavior identification method and device
Fu et al. A two-stage attention aware method for train bearing shed oil inspection based on convolutional neural networks
CN113837366A (en) Multi-style font generation method
CN108932715B (en) Deep learning-based coronary angiography image segmentation optimization method
Xu et al. RGB-T salient object detection via CNN feature and result saliency map fusion
Zheng et al. Generative adversarial network with multi-branch discriminator for imbalanced cross-species image-to-image translation
Guo et al. Global context and boundary structure-guided network for cross-modal organ segmentation
CN112802039B (en) Panorama segmentation method based on global edge attention
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN114693967B (en) Multi-classification semantic segmentation method based on classification tensor enhancement
CN117152438A (en) Lightweight street view image semantic segmentation method based on improved deep LabV3+ network
Li et al. Exposing low-quality deepfake videos of social network service using spatial restored detection framework
Huang et al. Expression-targeted feature learning for effective facial expression recognition
CN115797642A (en) Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field
Papamarkos A technique for fuzzy document binarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant