CN114693967B - Multi-classification semantic segmentation method based on classification tensor enhancement - Google Patents
Multi-classification semantic segmentation method based on classification tensor enhancement Download PDFInfo
- Publication number
- CN114693967B CN114693967B CN202210274049.0A CN202210274049A CN114693967B CN 114693967 B CN114693967 B CN 114693967B CN 202210274049 A CN202210274049 A CN 202210274049A CN 114693967 B CN114693967 B CN 114693967B
- Authority
- CN
- China
- Prior art keywords
- classification
- tensors
- network
- features
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 230000008093 supporting effect Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011143 downstream manufacturing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
Abstract
The invention provides a multi-classification semantic segmentation method based on classification tensor enhancement, which comprises the following steps: inputting the pictures to be classified into an original segmentation network; the feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel; the N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transit part outputs the features to be classified; and cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result. The invention can be simply added into most of the divided network structures, and only a small amount of network parameter increase is brought. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.
Description
Technical Field
The present invention relates to multi-classification technology, and more particularly, to a technology for supporting multi-classification segmentation based on classification tensor enhancement.
Background
With the continuous development of hardware computing and deep learning, the need for high-precision pixel-level processing of images is becoming increasingly common. Image segmentation is used as the most common visual task in computer vision, and pixel-level classification of images can be realized by means of a deep neural network. In many application scenarios, image segmentation is an indispensable loop in the processing flow, such as automatic driving, augmented reality, etc., and the segmentation result directly affects the effect of downstream processing. The method is mainly used for enhancing the classification links of the segmentation network, so that the segmentation effect is improved.
Most networks implementing semantic segmentation ultimately optimize the final output multi-class tensor only by Cross Entropy (CE) loss, achieving relatively good segmentation results in most tasks. However, since CE loss only excites the prediction scores on the correct class, suppression of similar class scores is easily ignored, which in turn easily causes confusion in the network when distinguishing similar classes. Especially in specific subordinate tasks such as human body analysis, the influence of misclassification caused by confusion type on the network is not negligible. The use of the two-class header to support multiple class headers can increase the classification capability of the partitioned network, but the relation between the two-class predictions of each class is difficult to reflect through BCE loss, so that the output tensor of the two-class header can further pass through new loss constraint to realize better support of the final result.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for optimizing the classification tensor by using novel loss so that the classification tensor can better support the final result of the semantic segmentation network and realize better support of multi-classification results. The method aims at reducing the classification scores of predictions of incorrect classes in the classification tensor in each truth area and further enhancing the scores of correct predictions in each truth area. In addition, the present invention aims to further improve the classification ability of the segmentation network for similar classes. The loss is optimized by adding a simple two-class header to the existing network structure, so that the supporting effect of the two-class tensor on the final result is improved, and the average cross-over ratio of the final segmentation result is further improved.
The technical scheme adopted by the invention for solving the technical problems is that the multi-classification semantic segmentation method based on the enhancement of the classification tensor comprises the following steps:
1) Inputting the pictures to be classified into an original segmentation network; the original segmentation network comprises a feature extraction part and a multi-classification head;
2) The feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel;
3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transfer part keeps the dimension of the input features and performs simple feature conversion to output the features to be classified;
4) And cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result.
Further, the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l CE +α·L BCE +β·L B2M The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is CE And L is equal to BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters;
L B2M to enhance loss of the classification tensor support attribute, L B2M =L overlap +L missing ;L overlap To reflect unreasonably overlapping loss terms in the two classifications, L missing To reflect the loss term of the missing prediction in the two classifications.
The invention has the advantages that the method can be simply added into most of the split network structures, only a small amount of network parameter increase is brought, the improved strategy can improve the average cross ratio of the split network results, and the output tensor of the simple two-class head is further constrained by the designed B2M loss. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.
Drawings
Fig. 1: the multi-classification schematic diagram of the invention;
fig. 2: a network schematic used in the invention;
fig. 3: structure of two classification heads.
Detailed Description
Since the semantic segmentation task can be regarded as a pixel-by-pixel classification task. The conventional semantic segmentation network finally classifies each pixel by N classes through a plurality of classification heads on the premise that the label types are N classes. Considering that the classification heads can assist the classification of the N classification heads, the applicant proposes an improved strategy for enhancing the segmentation effect by strengthening the characteristics of the classification tensor. Structurally, a simple two-classification head and a transit part are designed; the penalty is designed to strengthen the two-class tensor support attribute, named B2M (Binary to Multiple) penalty.
As shown in fig. 1, we selected HRNetV2 as the baseline network for implementing the improvement strategy. Firstly, the tensor to be passed through multiple classifying heads in the original dividing network is simultaneously fed into multiple parallel two classifying heads and a transit part, the multiple parallel two classifying heads are used for converting N classifying problems into N classifying problems and outputting N classifying results, the transit part is used for maintaining the channel number of the tensor to be classified, outputting the characteristics to be classified, then the N classifying results are cascaded with the characteristics to be classified, the cascaded tensor is finally fed into the multiple classifying heads, and the multiple classifying heads are used for outputting the final N classifying results. The multi-classification head structure adds N input channels to match with the added classification tensors in the cascade based on the original network design.
For training of the network, the end result of the network is still optimized using CE loss; optimization of the binary class tensor requires the use of the proposed B2M penalty in addition to the BCE penalty.
We obtain the truth label g of the two classification heads from the multi-classification truth label by converting the N classification problem into N classification problems 1 ,g 2 ...g N . Inputting a picture, the two classification heads will output N classification foreground score maps (N classification tensors), each corresponding to a class, wherein the k is recorded as p k ,p k ∈[0,1] H×W K=1, 2..n, H and W are the high and wide of the score plot, respectively, which corresponds to the true value g k ∈{0,1} H×W 。p k The area with the foreground score larger than 0.5 is the foreground area, and the serial number of the label category actually existing in the input picture is recorded as l 1 ,l 2 ...l C C is the number of truth-value categories contained in a picture. Since the score map of each class in the two-class tensor is optimized by BCE loss alone, there may be overlap or absence of foreground regions predicted by the tensor over different classes, i.e., some pixels on the binary tensor have a prediction score over more than 0.5 or less than 0.5 over all classes. The overlapping and missing of various foreground regions can weaken the support of multi-classification results of corresponding positions.
For the first i Prediction of classes, we calculate the Overlap degree of other classes in their corresponding truth regions, overlap (l) i ) Where "sum" means summing all elements and "×" means multiplying the corresponding position elements:
obtaining Overlap degree Overlap (l) i ) It is then mapped through a non-linearity where "σ" represents the Sigmoid function and k and b are hyper-parameters:
f(x)=σ(k·x+b)-σ(b) (2)
after mapping the overlapping degree of the C categories in turn, averaging over the categories, we get the first term of B2M loss, denoted L overlap The loss term is mainly directed to suppression of unreasonable overlap in two-class prediction:
for areas lacking predictions, we further enhance the correct score of the true areas by using a similar method to calculate the cross-ratios. Note this loss as L missing The calculation method is as follows:
from this, the proposed B2M loss, L, can be calculated B2M :
L B2M =L overlap +L missing (5)
The loss function of the whole network is as follows, L CE And L is equal to BCE The cross entropy loss is the multi-class header and the two-class header, respectively, and α and β are super parameters:
L=L CE +α·L BCE +β·L B2M (6)
the invention is implemented on a server containing 8 TITAN X PASCALs, and the network adopts HRNetV2 as a base, as shown in figure 2. The whole network mainly comprises a backbone network backbone, a two-classification link binary classitication head and a multi-classification link multi-classitication head. The backbone network is HRNetV2-W48 and is used for extracting characteristics; the structure of the two sorting heads is shown in fig. 3, and the transit part transformation part is similar to the structure of the two sorting heads, and is composed of 2 1×1 convolution layers, a batch standardization Batch Normalization layer and an activation function Relu layer, which are only different in the number of output channels of the last convolution layer; the multi-sort head structure is similar to baseline. The main steps of the design strategy are as follows: and obtaining two classification tensors by using a light-weight two-classification head, calculating the overlapping degree corresponding to each truth class, taking an average value on the class, calculating the average score of each truth class in the truth area, taking an average value on the class, and optimizing the two classification tensors by using the two average values.
The effect of the invention is described below by combining experimental results, and the data sets of human body analysis tasks in semantic segmentation have more similar categories, so the following three data sets are selected for experiments. The same experimental conditions were used for both training and testing by baseline and the network after policy improvement:
table 1 comparative experiments with mIoU (percent) on three data sets
The method (outer) has obvious improvement on the mIoU compared with the original split network (baseline) on three data sets (LIP, ATR, PPSS), so that the effectiveness of the strategy on the improvement of the network performance can be seen.
Claims (2)
1. The multi-classification semantic segmentation method based on the two-classification tensor enhancement is characterized by comprising the following steps of:
1) Inputting the pictures to be classified into an original segmentation network; the original segmentation network comprises a feature extraction part and a multi-classification head;
2) The feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel;
3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, the transit part keeps the channel number of the input features, and the features to be classified are output;
4) Cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as final multi-classification results;
the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l CE +α.L BCE +β.L B2M ;
Wherein L is CE And L is equal to BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters; l (L) B2M To enhance loss of the classification tensor support attribute, L B2M =L overlap +L missing ;L overlap To reflect unreasonably overlapping loss terms in the two classifications, L missing To reflect in two categoriesMissing a predicted penalty term;
loss term L reflecting unreasonable overlap in two classifications overlap The specific calculation method of (a) is as follows:
wherein, C represents the number of truth value categories contained in the current input picture; for the i-th truth class contained, l i Corresponding to the serial number representing the class label, f (x) represents a nonlinear mapping to the input x, f (x) =δ (k·x+b) - δ (b), δ represents a Sigmoid function, k and b are hyper-parameters; overlap (l) i ) For other categories in the first i The overlapping degree of the true value areas corresponding to the categories;
loss term L reflecting miss prediction in two classifications missing The specific calculation method of (a) is as follows:
where sum represents the sum of all elements, x represents the multiplication of the corresponding position elements,represents the first i A classification prediction score corresponding to the respective classification, +.>Representing the two classification of real tags.
2. The method of claim 1, wherein the original split network is HRNetV2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210274049.0A CN114693967B (en) | 2022-03-20 | 2022-03-20 | Multi-classification semantic segmentation method based on classification tensor enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210274049.0A CN114693967B (en) | 2022-03-20 | 2022-03-20 | Multi-classification semantic segmentation method based on classification tensor enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114693967A CN114693967A (en) | 2022-07-01 |
CN114693967B true CN114693967B (en) | 2023-10-31 |
Family
ID=82138917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210274049.0A Active CN114693967B (en) | 2022-03-20 | 2022-03-20 | Multi-classification semantic segmentation method based on classification tensor enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114693967B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN109509192A (en) * | 2018-10-18 | 2019-03-22 | 天津大学 | Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space |
CN111462163A (en) * | 2020-01-03 | 2020-07-28 | 华中科技大学 | Weakly supervised semantic segmentation method and application thereof |
WO2020192469A1 (en) * | 2019-03-26 | 2020-10-01 | 腾讯科技(深圳)有限公司 | Method and apparatus for training image semantic segmentation network, device, and storage medium |
CN111860514A (en) * | 2020-05-21 | 2020-10-30 | 江苏大学 | Orchard scene multi-class real-time segmentation method based on improved deep Lab |
CN112465844A (en) * | 2020-12-29 | 2021-03-09 | 华北电力大学 | Multi-class loss function for image semantic segmentation and design method thereof |
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
WO2021097055A1 (en) * | 2019-11-14 | 2021-05-20 | Nec Laboratories America, Inc. | Domain adaptation for semantic segmentation via exploiting weak labels |
CN113191392A (en) * | 2021-04-07 | 2021-07-30 | 山东师范大学 | Breast cancer image information bottleneck multi-task classification and segmentation method and system |
CN114092818A (en) * | 2022-01-07 | 2022-02-25 | 中科视语(北京)科技有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11188799B2 (en) * | 2018-11-12 | 2021-11-30 | Sony Corporation | Semantic segmentation with soft cross-entropy loss |
-
2022
- 2022-03-20 CN CN202210274049.0A patent/CN114693967B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN109509192A (en) * | 2018-10-18 | 2019-03-22 | 天津大学 | Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space |
WO2020192469A1 (en) * | 2019-03-26 | 2020-10-01 | 腾讯科技(深圳)有限公司 | Method and apparatus for training image semantic segmentation network, device, and storage medium |
WO2021097055A1 (en) * | 2019-11-14 | 2021-05-20 | Nec Laboratories America, Inc. | Domain adaptation for semantic segmentation via exploiting weak labels |
CN111462163A (en) * | 2020-01-03 | 2020-07-28 | 华中科技大学 | Weakly supervised semantic segmentation method and application thereof |
CN111860514A (en) * | 2020-05-21 | 2020-10-30 | 江苏大学 | Orchard scene multi-class real-time segmentation method based on improved deep Lab |
CN112465844A (en) * | 2020-12-29 | 2021-03-09 | 华北电力大学 | Multi-class loss function for image semantic segmentation and design method thereof |
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN113191392A (en) * | 2021-04-07 | 2021-07-30 | 山东师范大学 | Breast cancer image information bottleneck multi-task classification and segmentation method and system |
CN114092818A (en) * | 2022-01-07 | 2022-02-25 | 中科视语(北京)科技有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
Non-Patent Citations (5)
Title |
---|
Lian Xu 等.Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation.《Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 》.2022,4310-4319. * |
Longrong Yang等.Learning with Noisy Class Labels for Instance Segmentation.《Computer Vision – ECCV 2020》.2020,38–53. * |
Rosario Delgado 等.Enhancing Confusion Entropy (CEN) for binary and multiclass classification.《PLOS ONE》.2019,1-30. * |
张宏钊 等.基于加权损失函数的多尺度对抗网络图像语义分割算法.《计算机应用与软件》.2020,(第01期),290-297. * |
王珊.基于多尺度卷积对核磁图像分割的研究.《中国优秀硕士学位论文全文数据库医药卫生科技辑》.2021,(第05期),E060-49. * |
Also Published As
Publication number | Publication date |
---|---|
CN114693967A (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111046962B (en) | Sparse attention-based feature visualization method and system for convolutional neural network model | |
WO2021042828A1 (en) | Neural network model compression method and apparatus, and storage medium and chip | |
Simo-Serra et al. | Mastering sketching: adversarial augmentation for structured prediction | |
CN109886121B (en) | Human face key point positioning method for shielding robustness | |
Zhao et al. | Document image binarization with cascaded generators of conditional generative adversarial networks | |
CN112507777A (en) | Optical remote sensing image ship detection and segmentation method based on deep learning | |
Lopes et al. | Automatic histogram threshold using fuzzy measures | |
CN110163286B (en) | Hybrid pooling-based domain adaptive image classification method | |
CN110555060A (en) | Transfer learning method based on paired sample matching | |
CN110929099B (en) | Short video frame semantic extraction method and system based on multi-task learning | |
CN111126115A (en) | Violence sorting behavior identification method and device | |
Fu et al. | A two-stage attention aware method for train bearing shed oil inspection based on convolutional neural networks | |
CN113837366A (en) | Multi-style font generation method | |
CN108932715B (en) | Deep learning-based coronary angiography image segmentation optimization method | |
Xu et al. | RGB-T salient object detection via CNN feature and result saliency map fusion | |
Zheng et al. | Generative adversarial network with multi-branch discriminator for imbalanced cross-species image-to-image translation | |
Guo et al. | Global context and boundary structure-guided network for cross-modal organ segmentation | |
CN112802039B (en) | Panorama segmentation method based on global edge attention | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
CN114693967B (en) | Multi-classification semantic segmentation method based on classification tensor enhancement | |
CN117152438A (en) | Lightweight street view image semantic segmentation method based on improved deep LabV3+ network | |
Li et al. | Exposing low-quality deepfake videos of social network service using spatial restored detection framework | |
Huang et al. | Expression-targeted feature learning for effective facial expression recognition | |
CN115797642A (en) | Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field | |
Papamarkos | A technique for fuzzy document binarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |