CN114463335A - Weak supervision semantic segmentation method and device, electronic equipment and storage medium - Google Patents

Weak supervision semantic segmentation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114463335A
CN114463335A CN202111602397.8A CN202111602397A CN114463335A CN 114463335 A CN114463335 A CN 114463335A CN 202111602397 A CN202111602397 A CN 202111602397A CN 114463335 A CN114463335 A CN 114463335A
Authority
CN
China
Prior art keywords
training
label
semantic segmentation
branch
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111602397.8A
Other languages
Chinese (zh)
Inventor
张兆翔
李靖
樊峻菘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202111602397.8A priority Critical patent/CN114463335A/en
Publication of CN114463335A publication Critical patent/CN114463335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Abstract

The embodiment of the application discloses a weak supervision semantic segmentation method, a weak supervision semantic segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result; the semantic segmentation model is obtained by training a basic semantic segmentation model based on the training pseudo labels; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; the first training label is an initial label generated by the CAM; the second training label is an online label output by the two-branch model. According to the method, a double-branch model is trained in an iterative optimization mode, so that the method can predict the object boundary and the segmentation result with higher quality, and finally, a high-quality pseudo label for training the basic semantic segmentation model is generated according to the object boundary and the segmentation result, so that the high-precision semantic segmentation model is trained.

Description

Weak supervision semantic segmentation method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computer vision, in particular to a weak supervision semantic segmentation method and device, electronic equipment and a storage medium.
Background
Semantic segmentation is an important and classical computer vision task and has wide application in the aspects of image editing, scene analysis and the like. Although significant progress is made in current semantic segmentation based on deep neural networks, these methods rely heavily on time-consuming and labor-consuming pixel-level picture segmentation labels.
In order to reduce the cost of image labeling, a weak supervision semantic segmentation method based on image category labels is widely researched. Most of the existing methods train a classification network by using class labels, further acquire position and shape information of a foreground object by using an activation map (CAM) of the last convolutional layer of the classification network, generate initial labels (seed labels), train a standard semantic segmentation model by using the initial labels, and finally predict a semantic segmentation result of an image to be recognized by using the trained semantic segmentation model. The foreground region of the activation map is usually only locally highlighted, so the initial label generally only marks a partial region of the foreground object, and the foreground class recall is low, which affects the performance of the segmentation model.
At present, some works extract a foreground object boundary (contour) by using an initial label (seed label) training boundary detection model, and perform foreground category score propagation under contour constraint, so that a highlighted foreground region in an activation map is more complete. However, there are many false positive cases (object internal edges) in the object boundary map (constraint map) predicted by these boundary detection models, and they block the foreground category score propagation, so that there are cases where the highlighted foreground region is still incomplete in the modified activation map, and the initial label call is still low.
Disclosure of Invention
Because the existing method has the above problems, embodiments of the present application provide a method and an apparatus for weakly supervised semantic segmentation, an electronic device, and a storage medium, and focus on solving the problem that an initial tag call is low.
Specifically, the embodiment of the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides a weak supervised semantic segmentation method, including:
acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
Optionally, the CAM is obtained by performing feature recognition on the picture by using a classification network model; the classification network model is obtained after training based on the image category labels.
Optionally, the training pseudo tag is obtained by identifying a picture by a dual-branch model, and includes:
and obtaining the training pseudo label according to a semantic segmentation prediction result obtained by identifying the picture by the semantic segmentation branch and an object boundary result obtained by identifying the picture by the object boundary detection branch.
Optionally, the dual-branch model is obtained after iterative training based on a first training label and a second training label, and includes:
processing the CAM and generating a first training label offline; under the constraint of an object boundary graph generated by the object boundary detection branch, a foreground category score in an initial segmentation probability graph generated by the semantic segmentation branch is propagated in a foreground category score propagation mode to obtain a modified segmentation probability graph, and a second training label is generated based on the modified segmentation probability graph;
according to the first training label and the second training label, supervising and training the object boundary detection branch and the semantic segmentation branch in the double-branch model;
processing the initial segmentation probability map based on the dense conditional random field dense CRF to obtain a background reference label, and correcting the second training label according to the background reference label to obtain a corrected second training label; and monitoring and training the object boundary submodel in the double-branch model according to the first training label and the modified second training label.
Optionally, the obtaining the training pseudo label according to the semantic segmentation prediction result obtained by recognizing the picture according to the semantic segmentation branch and the object boundary result obtained by recognizing the picture according to the object boundary submodel includes:
after the picture is subjected to multi-scale scaling and horizontal turning, inputting the trained semantic segmentation branch to obtain a semantic segmentation prediction result, and inputting the trained object boundary detection branch to obtain an object boundary result;
and generating the training pseudo label according to the semantic segmentation prediction result and the object boundary result.
In a second aspect, an embodiment of the present application provides a weak supervised semantic segmentation apparatus, including:
the processing module is used for acquiring a picture to be recognized and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
Optionally, the CAM is obtained by performing feature recognition on the picture by using a classification network model; the classification network model is obtained after training based on the image category labels.
Optionally, the processing module is specifically configured to:
and obtaining the training pseudo label according to a semantic segmentation prediction result obtained by identifying the picture by the semantic segmentation branch and an object boundary result obtained by identifying the picture by the object boundary detection branch.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the weak supervised semantic segmentation method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the weak supervised semantic segmentation method according to the first aspect is implemented.
According to the technical scheme, the image to be recognized is input into a semantic segmentation model, and a semantic segmentation result of the image to be recognized is obtained; the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features. Therefore, the two branches of the double-branch model are subjected to iterative optimization through the online labels, the foreground category scores in the segmentation results are propagated to the periphery under the constraint of the object boundary in forward propagation, the second training label is generated, the label predicts a more complete and accurate foreground region, and the two branches of the double-branch model are well optimized in backward propagation. Compared with the existing scheme of monitoring the object boundary branch only by using the initial label (the first training label), the method and the device can effectively inhibit false positive examples (the object internal boundary) in the object boundary, and are favorable for the foreground category score to be spread from the significant area to the non-significant area. According to the method and the device, the segmentation result of the segmentation sub-model is optimized by using fractional propagation to generate the training pseudo labels, and compared with the traditional method based on the CAM, the generated training pseudo labels are more accurate, so that a basic semantic segmentation model with higher performance can be trained, and the accuracy of the semantic segmentation result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a weakly supervised semantic segmentation method according to an embodiment of the present application;
FIG. 2 is a second flowchart illustrating steps of a weakly supervised semantic segmentation method according to an embodiment of the present application;
FIG. 3 is a block diagram of an iteratively trained two-branch model provided by an embodiment of the present application;
fig. 4 is a schematic diagram of a network structure of a dual-branch model according to an embodiment of the present disclosure;
fig. 5 is a second schematic diagram of a network structure of a dual-branch model according to an embodiment of the present application;
fig. 6 is a third schematic diagram of a network structure of a dual-branch model according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a weakly supervised semantic segmentation apparatus provided by an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 shows one of step flow diagrams of a weak supervised semantic segmentation method provided by an embodiment of the present application, fig. 2 is a second of the step flow diagrams of the weak supervised semantic segmentation method provided by the embodiment of the present application, fig. 3 is a framework diagram of an iterative training dual-branch model provided by the embodiment of the present application, fig. 4 is one of a network structure schematic diagram of the dual-branch model provided by the embodiment of the present application, fig. 5 is a second of the network structure schematic diagram of the dual-branch model provided by the embodiment of the present application, and fig. 6 is a third of the network structure schematic diagram of the dual-branch model provided by the embodiment of the present application. The weak supervised semantic segmentation method provided by the embodiment of the present application is explained and explained in detail below with reference to fig. 1 to 6, and as shown in fig. 1, the weak supervised semantic segmentation method provided by the embodiment of the present application includes:
step 101: acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
In this step, it should be noted that, firstly, a classification network model needs to be trained by using the picture category label, as shown in fig. 3, the classification network model backbone is initialized by using the ImageNet pre-training model backbone, the full connection layer for classification has no bias, and the weights are initialized randomly. And after randomly enhancing the input pictures during training, inputting network training and optimizing by using SGD.
In this step, after the classification network model is trained, inputting a training picture into the classification network model, outputting a feature map F of the last convolutional layer without pooling, converting the weight of the full-link layer into a 1 × 1 convolutional core, performing convolution on the F, and inputting the result into a Relu Activation function, thereby obtaining an Activation map CAM (class Activation map).
In the step, after obtaining the activation map CAM, the CAM is up-sampled to the size of the input image, 0 is set for the channel corresponding to the class of the input image, a background channel is added to the CAM, and the background channel is set to be tau1Obtaining a CAM1The background channel is set to be tau22<τ1) Obtaining a CAM2. General CAM1And CAM2Inputting argmax function respectively to obtain two initial tags, and correcting the two tags by using pydensecrf package to obtain Yfg,Ybg. For YfgIf they are in YbgIf it is foreground, then the pixel is relabeled as an uncertain pixel (take 255), as shown below (Y)init[i]Represents YinitThe ith pixel, Yfg[i],Ybg[i]Similarly), to YfgObtaining a first training label Y after modificationinit
Figure BDA0003433395010000071
In this step, as shown in fig. 4, the dual-branch model selects rescet 50 or rescet 101 as the backbone, changes stride of rescet network stage4 and stage5 from 2 to 1, and finally outputs a feature map F with stride of 8 from stage5s8Adjusting the disparation of convolutional layers of network stages 4 and 5 to Fs8The receptive field at each location is as large as the receptive field at the corresponding location of the original resnet network. The two-branch model is composed of a semantic division branch (division submodel) and an object boundary detection branch (object boundary submodel), and as shown in fig. 5, the division submodel is constructed in a manner that: adding seg head after stage5, wherein the seg head adopts an Aspp model and consists of 4 different 3 x 3 convolution kernels, Fs8And inputting the results obtained by the 4 convolutions, adding the results, performing 2 times of upsampling on the spatial dimension, and finally inputting softmax to obtain a semantic segmentation probability map M. As shown in fig. 6, the object boundary submodule is constructed in the following manner: respectively reducing the output characteristics of the stages 1 to 5 to 32 through 5 edge _ layer, inputting the obtained 5 feature maps concat together into edge _ layer6(1 × 1 convolution), outputting an object boundary map with the channel of 1, and mapping the object boundary map to [0,1 through a sigmoid function]And is denoted as B.
In this step, except for the initial label Yinit(first training label) and also using the segmentation results M and M in the two-branch model training phaseObject boundary B generating online label Yonline(second training label). The training picture is subjected to random scaling and clipping enhancement before the double-branch model is input, only a certain rectangular region R in the input picture I contains the content of the original picture, and other regions are 0-complementing regions during picture enhancement. The invention relates to YonlineThe 0-complementing area in (1) is 255 (uncertain label), and Y isonlineThe label of the middle rectangular region R is obtained by fractional propagation using the effective region R 'in M, B (R' corresponds to R). . Y isonlineThe information of M and B is fused, and the result is more accurate. On the other hand, a small number of highlighted background regions in M will expand rapidly through fractional propagation, resulting in Y being generatedonlineIn these areas, a large number of false positive labels are predicted to be in the foreground, so that Y is required to be matchedonlineCorrecting the error labels into background labels to obtain a corrected second label Yrefine
In this step, the first training label Y is obtainedinitAnd a second training label YonlineAnd a corrected second training label YrefineThen, use YinitAnd YonlineThe segmentation prediction result M of the double-branch model is supervised and trained by Cross Engine loss, and Y can be usedinitAnd YrefineAnd obtaining a semantic correlation matrix among different pixels in the B, and indirectly monitoring an object boundary prediction result B of the double-branch model by utilizing the semantic correlation matrix generated by the matrix monitoring based on the B.
In the step, after the training of the double-branch model is completed, after the picture in the training set is subjected to multi-scale scaling and horizontal turning, the trained double-branch model is input, a semantic segmentation prediction result is obtained by a segmentation sub-model, an object boundary result is obtained by an object boundary sub-model, and then a training pseudo label is generated through fractional propagation according to the semantic segmentation prediction result and the object boundary result.
In this step, a basic semantic segmentation model (for example, depeplab) is trained by using the generated pseudo tag, and after the training is completed, the recognized picture is input into the semantic segmentation model to obtain a semantic segmentation result of the picture.
According to the technical scheme, the image to be recognized is input into a semantic segmentation model, and a semantic segmentation result of the image to be recognized is obtained; the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label comprises an initial label Y generated by a classification network activation map CAMinit(ii) a Said Y isinitForeground object position and shape information including pictures; the second training label YonlimeIs an online tag output by the two-branch model; said Y isonlineGenerating a branch prediction result based on semantic segmentation and object boundary detection; the dual-branch model consists of a segmentation branch and an object boundary detection branch, which share a trunk branch for extracting features from the input picture. Therefore, the embodiment of the application passes through the online tag YonlineIterative optimization is carried out on two branch submodels of the double-branch model, foreground category fractions in the segmentation result are propagated to the periphery under the constraint of object boundaries during forward propagation, and the generated YonlineAnd a more complete and accurate foreground region is marked, and two sub-model branches of the dual-branch model can be optimized during back propagation. The embodiment of the application only uses the initial label YinitCompared with the (first training label) supervised object boundary branching scheme, the method can effectively suppress false positive examples (object internal boundaries) in the object boundary and is beneficial to the propagation of the foreground class score from the significant area to the non-significant area. The segmentation prediction result based on the segmentation sub-model is subjected to score propagation optimization and training pseudo labels are generated, and compared with the traditional method based on the CAM, the generated training pseudo labels are more accurate, so that a basic semantic segmentation model with higher performance can be trained, and the accuracy of the semantic segmentation result is improved.
Based on the content of the above embodiment, in this embodiment, the CAM is obtained by performing feature identification on the picture by using a classification network model; the classification network model is obtained after training based on the image category label; the picture category labels are provided by a training data set.
Based on the content of the foregoing embodiment, in this embodiment, the training pseudo tag is obtained by identifying a picture by a dual-branch model, and includes:
and obtaining a semantic segmentation prediction result after identifying the picture according to the semantic segmentation branch, obtaining an object boundary result after identifying the picture according to the object boundary detection branch, and obtaining the training pseudo label by using the two results.
In this embodiment, it should be noted that, each picture in the training set is subjected to multi-scale scaling and horizontal inversion, a plurality of pictures are generated and input into the trained dual-branch model, a plurality of semantic segmentation prediction results and object boundary results are obtained, an average value of the plurality of semantic segmentation prediction results and the object boundary results is taken, and a training pseudo label is generated based on the average value in a manner similar to the generation of the second training label.
Based on the content of the foregoing embodiment, in this embodiment, the dual-branch model is obtained after performing iterative training based on the first training label and the second training label, and includes:
processing the CAM and generating a first training label offline; under the constraint of an object boundary graph generated by the object boundary detection branch, a foreground category score in an initial segmentation probability graph generated by the semantic segmentation branch is propagated in a foreground category score propagation mode to obtain a modified segmentation probability graph, and a second training label is generated based on the modified segmentation probability graph;
supervising training of the object boundary detection branches in the two-branch model according to the first training label and the second training label;
when the second training label is used for monitoring the object boundary detection branch, the second training label can be corrected to a certain extent, and the corrected second training label is used as a monitoring signal. Firstly, processing the initial segmentation probability map based on the dense conditional random field dense CRF to obtain a background reference label, and then correcting the second training label according to the background reference label to obtain a corrected second training label;
and finally, supervising and training the object boundary sub-model in the double-branch model according to the first training label and the corrected second training label.
In this embodiment, it should be noted that after the first training label generated by the CAM of the activation map is used, a second training label is obtained by using a network foreground class score propagation manner, and a segmentation sub-model in the dual-branch model is supervised and trained according to the first training label and the second training label. Furthermore, the second training label may be modified in order to better supervise the object boundary branch. The reason is that a small number of highlight background areas in the segmentation probability map are rapidly expanded through fractional propagation, so that the generated second training labels are predicted to be foreground in the areas, and a large number of false positive labels exist, so that the second training labels need to be corrected, the false labels are corrected to be background labels, so that dense CR processing is performed on segmentation results to obtain reference labels, the second training labels are corrected according to the reference labels to obtain corrected second training labels, and then object boundary submodels in the double-branch model are trained and supervised according to the first training labels and the corrected second training labels. Therefore, in the embodiment of the application, when the dual-branch model is reversely propagated, the initial label (the first training label) and the online label (the second training label) are used for monitoring the training semantic segmentation sub-model, and the initial label and the modified second training label are used for monitoring the object boundary sub-model. The initial label plays a role in initializing and stabilizing a training process, the online label and the corrected second training label are integrated with information of the two branch submodels, iterative optimization is carried out on the two branch submodels during training, and the corrected second training label avoids adverse effects of a low-quality object segmentation result diagram to a certain extent. After the training of the double-branch network is completed, the segmentation prediction result of the segmentation sub-model is optimized by using the object boundary information, and a training pseudo label is generated.
The present application will be specifically described below with reference to specific examples.
The first embodiment:
in this embodiment, taking a certain semantic segmentation database as an example, the database includes 21 semantic categories in total, and has 10582 training images and corresponding semantic segmentation labels, and in this embodiment, only image category labels are used, and can be obtained by converting the semantic segmentation labels.
Fig. 2 is a flowchart of the present invention, and as shown in the drawing, the weak supervised semantic segmentation method provided in the embodiment of the present application specifically includes the following steps:
step S0, training a classification network by using the picture category label, as shown in fig. 3, a classical model such as resnet50 may be adopted, the network backbone weight is initialized by the backbone of the ImageNet pre-training model, the full connection layer for classification has no bias, and the weight is initialized randomly. During training, the input picture is randomly scaled (the long edge is within the range of 320-640), horizontal inversion is randomly carried out, pixel value normalization is carried out (the pixel value is changed into 0,1 by dividing 255 first, then picture RGB channels are respectively normalized based on the average value of 0.485,0.456,0.406 and the variance of 0.229,0.224 and 0.225), then random clipping is carried out to 512 x 512, and the shortage part is filled with 0 during clipping. Inputting the cut pictures into a network for training, optimizing by using an SGD (generalized minimum) with a backbone learning rate of 0.1, finally training by 5 epochs with a full-connection-layer learning rate of 1.0 for classification and a batch _ size of 16.
Step S1, after training of the classification network is completed, inputting the training picture into the network, outputting a feature map F by the last convolution layer, taking the weight of the full connection layer as the weight of the 1 multiplied by 1 convolution kernel, convolving F and inputting the result into the Relu activation function. An activation map (CAM) with a channel number of 20 (20 classes for foreground objects) of the same size as F is obtained, as shown in fig. 3.
Step S2, the size of the input image is 512 × 512, the activation map (CAM) is upsampled 512 × 512, the channel corresponding to the class of the input image does not appear is set to be 0, and the values of the rest channels are normalized to be 0,1](each channel divided by the maximum of all its positions), a background of 0.3 is added before the first channelchannel to get a new activation map CAM1。CAM1Inputting argmax function, taking maximum value in channel dimension to obtain a segmentation label, and performing dense conditional random field processing on the segmentation label by using pydensecrf package to obtain a label Yfg(addpayweisegaussian parameter sxy ═ 3, composition ═ 3; addpayweisebilateral parameter sxy ═ 50, srgb ═ 5, composition ═ 10, unity _ from _ labels parameter gt _ prob ═ 0.7, zero _ unity ═ False, reference 10 times). Similarly, a background channel with a value of 0.05 is added before the first channel, and the same subsequent operation is performed, so that the label Y can be obtainedbg. For YfgIf it is in YbgIf the center is foreground, the image is marked as an uncertain pixel again to obtain Yinit(512X 512) as shown below (Y)init[i]Represents YinitThe ith pixel, Yfg[i],Ybg[i]Similarly, 0 represents a background category and 255 represents an uncertain pixel).
Figure BDA0003433395010000131
Step S3, a dual branch network backhaul is constructed, as shown in fig. 4. The picture size of the double branch network input is 512 multiplied by 512, renet 50 or renet 101 is selected as backbone, stride of the renet network stage4 and stage5 is changed from 2 to 1, meanwhile, the disparity of the 3 × 3 convolutions from the 2 nd layer to the last layer of stage4 is set to 2, the first 3 × 3 convolution disparity is set to 1, the disparity of the 3 × 3 convolutions from the 2 nd layer to the last layer of stage5 is set to 4, and the first 3 × 3 convolution disparity is set to 2, so that the stage5 finally outputs a feature map F with stride of 8s8(size 64X 64), Fs8The receptive field at each location is as large as the receptive field at the corresponding location of the original resnet network.
Step S4, constructing a dual-branch network division branch submodel, adding seg head after stage5, adopting an Aspp model, as shown in FIG. 5, composed of 4 convolutions 3 × 3 with bias, output channel 21, and dispations 6, 12, 18, and 24 respectively, Fs8The results of the 4 convolutions are input and added up, then 2 times up-sampling is carried out, and softm is carried out in the channel dimensionax is computed to obtain a semantic segmentation result M, which has 21 channels (foreground + background) and is 128 × 128 in size.
And step S5, constructing a dual-branch network object boundary branch sub-model. As shown in FIG. 6, the output characteristics of stages 1 through 5 respectively reduces the channel to 32 by 5 edge _ layers, which are called edge _ layer1, edge _ layer2, … …, and edge _ layer 5. Each edge _ layer is composed of a 1 × 1 convolution, a group norm layer (the group number is 4), and a Relu layer in turn, and the edge _ layer3, the edge _ layer4, and the edge _ layer5 perform 2 times of upsampling before the Relu layer. And (3) inputting the obtained 5 feature maps concat together into edge _ layer6(1 × 1 convolution), outputting an object boundary map with a channel of 1, mapping values of the object boundary map to [0,1] through a sigmoid function, and recording as that B is 128 × 128.
Step S6, using the segmentation result M and the object boundary B to generate an online label Yonline(second training label, 512 × 512). Before the training picture is input into the model, random scaling and clipping enhancement are carried out, only a certain rectangular region R (h multiplied by w) in the input picture I contains the content of the original picture, and other regions are 0 complementing regions. The R region corresponds to an effective region R '(h/4 xw/4) in M, B, and the R' region in M, B is selected when the double-branch model propagates forwards
Figure BDA0003433395010000141
And
Figure BDA0003433395010000142
generation of Y through fractional propagationonlineTag of middle R region
Figure BDA0003433395010000143
YonlineThe 0-complementing area in (1) is set to 255 (uncertain tag).
The score propagation process is described below, and to reduce the number of computations and facilitate batch processing
Figure BDA0003433395010000144
And
Figure BDA0003433395010000145
adjusted to 64X 64
Figure BDA0003433395010000146
And
Figure BDA0003433395010000147
first based on
Figure BDA0003433395010000148
A pixel correlation sparse matrix a of 4096 × 4096 size is calculated. Consider that
Figure BDA0003433395010000149
Taking the maximum value beta of the boundary confidence of two pixels i, j with the upper distance not more than 3 and pixels (a plurality of pixels which are vertically closest to the i, j connecting line) near the two pixels i, j and the connecting line, and taking (1-beta)10As the degree of correlation of i, j, Ai,j=Aj,i=(1-β)10If the distance of the pixel m, n exceeds 3, Am,n=An,m0. Propagation of pixel correlation by matrix multiplication, calculation
Figure BDA00034333950100001410
To pair
Figure BDA00034333950100001411
Each column of (a) is subjected to normaize so that the sum thereof is 1.
Figure BDA00034333950100001412
The method is dense and describes semantic correlation between long-distance pixels, when two pixels are far away, the correlation is not accurately calculated according to the boundary confidence on the connecting line of the two pixels, and the long-distance pixel correlation is obtained by matrix continuous multiplication.
To obtain
Figure BDA00034333950100001413
Then, will
Figure BDA00034333950100001414
The middle input picture does not contain the channel corresponding to the category to set 0 and carry on the backSetting the scene channel to be 0.25, and adjusting each category i contained in the input picture
Figure BDA00034333950100001415
The ith channel in (i) is a 1 × 4096 vector, and the values are normalized to [0,1]]And is and
Figure BDA00034333950100001416
matrix multiplication is carried out to obtain a new vector Vi,ViV adjusted to 64 x 64 sizei 64×64I.e. the corrected ith channel. Finally obtaining the corrected segmentation result
Figure BDA00034333950100001417
Inputting argmax function, and calculating the maximum value in the channel dimension to obtain
Figure BDA00034333950100001418
Corresponding online label
Figure BDA00034333950100001419
Adjusting it to R size (h × w) to obtain
Figure BDA00034333950100001420
The 0 region is filled up by 255 to obtain complete Yonline。ViAnd
Figure BDA00034333950100001421
the calculation is as follows: (Vec () represents the vectorization of,
Figure BDA00034333950100001422
represent
Figure BDA00034333950100001423
The ith channel, labelICategory label representing input picture):
Figure BDA0003433395010000151
using Y obtained as described aboveonlineSupervision of the segmentation branch, for better supervision of the object boundary branch, for Y aboveonlineSome corrections are made. Will be provided with
Figure BDA0003433395010000152
Setting all channels corresponding to the categories of the middle input picture to be 0 and setting all the channels corresponding to the categories of the middle input picture to be 0.05, then performing dense CRF processing, and adjusting the size to be R (h multiplied by w) to obtain the image
Figure BDA0003433395010000153
(DenseCRF parameters: iter _ max ═ 10, pos _ xy _ std ═ 1, pos _ w ═ 3, bi _ xy _ std ═ 67, bi _ rgb _ std ═ 3, bi _ w ═ 4), yield
Figure BDA0003433395010000154
Is much smaller than the generated background threshold (0.05)
Figure BDA0003433395010000155
Background threshold (0.25), the former with higher confidence in the background region, the latter with the background region label correction, results in
Figure BDA0003433395010000156
As shown in (
Figure BDA0003433395010000157
To represent
Figure BDA0003433395010000158
Ith pixel):
Figure BDA0003433395010000159
the 0-complementing area is complemented by 255 to obtain complete YrefineBy YrefineAnd YinitAnd monitoring object boundary branches.
Step S7, train the dual branch network, as shown in FIG. 3, using YinitAnd YonlineM is supervised by Cross Encopy loss, while Y can be usedinitAnd YrefineAnd obtaining a semantic correlation matrix among different pixels in the B, and indirectly supervising the B by utilizing the semantic correlation matrix.
Will YinitOr YrefineDown-sampling to the size of B (128 x 128), only considering semantic correlation among pixels with definite class labels and supervising the boundary confidence of the relevant positions in B, considering the class labels of all other pixels with the distance of not more than 10 to the pixel p, if the class labels are the same as p, composing positive pair with p, if the class labels are different, composing negative pair with p, and if the uncertain labels are possessed, not considering. And monitoring the maximum values of the boundary confidence degrees of the two pixels on the B and the pixels near the connecting line of the two pixels, setting the maximum value label to be 0 for positive pair, setting the maximum value label to be 1 for negative pair, and monitoring the maximum values through Binary Cross control loss. The total loss function is (L)CEIs Cross Entrophy loss, LAAs loss of inter-pixel correlation):
L=LA(B,Yrefine)+LA(B,Yinit)+LCE(M,Yonline)+LCE(M,Yinit)
when the double-branch network is specifically trained, random scaling of [0.5,1.5] scale is carried out on an input picture, random horizontal inversion is carried out, pixel values are normalized to be [ -1,1], then random clipping is carried out to be 512 x 512, and the insufficient part is filled with 0 during clipping. Inputting the cut pictures into a network for training, wherein the backbone learning rate is 0.0025, all edge _ layer and seghead learning rates are 0.025, the batch _ size is 10, and 19 epochs are trained.
Step S8, after the model training is finished, the 10582 training pictures are horizontally turned over and amplified by 1.5 and 2 times to obtain 6 pictures, the 6 pictures are input into a double branch network, and the segmentation result and the object boundary adopt the average value M of the 6 resultsave、Bave. At this time Mave、BaveAll regions are valid, like generating Y in step S6onlineMethod based on BaveGenerating a correlation matrix
Figure BDA0003433395010000161
(generating sparse correlation matrix A takes into account pairs of pixels whose distance does not exceed 5), for MaveAnd carrying out score propagation to obtain a training pseudo label.
Step S9, training a basic semantic segmentation model (such as depeplab) by using the generated pseudo labels, and inputting the recognized picture into the semantic segmentation model after training to obtain the semantic segmentation result of the picture.
According to the technical scheme, the image classification label is used for training a classification model, the activation map (CAM) is used for obtaining the first training label (initial label) of the training picture, and the first training label is used as a supervision signal for training a double-branch model to predict the object boundary and the semantic segmentation result. In the process of training the double-branch model, a second training label (on-line label) is generated by using the object boundary and the semantic segmentation prediction result, the object boundary and the semantic segmentation branch are supervised, and iterative optimization is carried out. After the model training is finished, generating a high-quality training pseudo label by using the object boundary and the semantic segmentation prediction result, training a standard semantic segmentation model, and performing semantic segmentation on the picture by using the model. On one hand, the segmentation result of network prediction is more accurate than that of an activation map (CAM), on the other hand, false positive examples in object boundaries are reduced through iterative optimization, and foreground category score propagation is facilitated, so that the finally generated training pseudo labels mark more complete foreground regions, and the segmentation result of the basic semantic segmentation model trained based on the training pseudo labels is more accurate.
Based on the same inventive concept, another embodiment of the present invention provides a weakly supervised semantic segmentation apparatus, as shown in fig. 7, including:
the processing module 1 is used for acquiring a picture to be recognized and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
The weak supervised semantic segmentation apparatus described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which refers to the schematic structural diagram of the electronic device shown in fig. 8, and specifically includes the following contents: a processor 801, a memory 802, a communication interface 803, and a communication bus 804;
the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804; the communication interface 803 is used for realizing information transmission between devices;
the processor 801 is configured to call a computer program in the memory 802, and when the processor executes the computer program, the processor implements all the steps of one of the weak supervised semantic segmentation methods described above, for example: acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized; the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements all the steps of one of the above-mentioned weakly supervised semantic segmentation methods, such as: acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized; the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features. In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the weak semantic segmentation method described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A weakly supervised semantic segmentation method is characterized by comprising the following steps:
acquiring a picture to be recognized, and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
2. The weak supervised semantic segmentation method according to claim 1, wherein the CAM is obtained by performing feature recognition on the picture by a classification network model; the classification network model is obtained after training based on the image category labels.
3. The weakly supervised semantic segmentation method according to claim 1, wherein the training pseudo labels are obtained by identifying pictures by a dual-branch model, and include:
and obtaining the training pseudo label according to a semantic segmentation prediction result obtained by identifying the picture by the semantic segmentation branch and an object boundary result obtained by identifying the picture by the object boundary detection branch.
4. The weak supervised semantic segmentation method according to claim 1, wherein the dual-branch model is obtained after iterative training based on a first training label and a second training label, and includes:
processing the CAM and generating a first training label offline; under the constraint of an object boundary graph generated by the object boundary detection branch, a foreground category score in an initial segmentation probability graph generated by the semantic segmentation branch is propagated in a foreground category score propagation mode to obtain a modified segmentation probability graph, and a second training label is generated based on the modified segmentation probability graph;
according to the first training label and the second training label, supervising and training the object boundary detection branch and the semantic segmentation branch in the double-branch model;
processing the initial segmentation probability map based on the dense conditional random field dense CRF to obtain a background reference label, and correcting the second training label according to the background reference label to obtain a corrected second training label; and monitoring and training the object boundary submodel in the double-branch model according to the first training label and the modified second training label.
5. The weak supervision semantic segmentation method according to claim 3 or 4, wherein obtaining the training pseudo label according to a semantic segmentation prediction result obtained by recognizing a picture according to the semantic segmentation branch and an object boundary result obtained by recognizing the picture according to the object boundary submodel comprises:
after the picture is subjected to multi-scale scaling and horizontal turning, inputting the trained semantic segmentation branch to obtain a semantic segmentation prediction result, and inputting the trained object boundary detection branch to obtain an object boundary result;
and generating the training pseudo label according to the semantic segmentation prediction result and the object boundary result.
6. A weakly supervised semantic segmentation apparatus, comprising:
the processing module is used for acquiring a picture to be recognized and inputting the picture to be recognized into a semantic segmentation model to obtain a semantic segmentation result of the picture to be recognized;
the semantic segmentation model is obtained by training a basic semantic segmentation model based on a training pseudo label; the training pseudo label is obtained by identifying the picture by a double-branch model; the double-branch model is obtained after iterative training is carried out on the basis of a first training label and a second training label; wherein the first training label is an initial label generated by a classification network activation map (CAM); the initial label comprises foreground object position and shape information of the picture; the second training label is an online label output by the double-branch model; the online label is generated based on a semantic segmentation branch prediction result and an object boundary detection branch prediction result; the double-branch model is composed of the semantic segmentation branch and the object boundary detection branch, and the semantic segmentation branch and the object boundary detection branch share one trunk branch for extracting picture features.
7. The weakly supervised semantic segmentation apparatus according to claim 6, wherein the CAM is obtained by performing feature recognition on the picture by a classification network model; the classification network model is obtained after training based on the image category labels.
8. The weakly supervised semantic segmentation apparatus according to claim 6, wherein the processing module is specifically configured to:
and obtaining the training pseudo label according to a semantic segmentation prediction result obtained by identifying the picture by the semantic segmentation branch and an object boundary result obtained by identifying the picture by the object boundary detection branch.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the weakly supervised semantic segmentation method of any of claims 1 to 5.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the weakly supervised semantic segmentation method according to any one of claims 1 to 5.
CN202111602397.8A 2021-12-24 2021-12-24 Weak supervision semantic segmentation method and device, electronic equipment and storage medium Pending CN114463335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111602397.8A CN114463335A (en) 2021-12-24 2021-12-24 Weak supervision semantic segmentation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111602397.8A CN114463335A (en) 2021-12-24 2021-12-24 Weak supervision semantic segmentation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114463335A true CN114463335A (en) 2022-05-10

Family

ID=81408245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111602397.8A Pending CN114463335A (en) 2021-12-24 2021-12-24 Weak supervision semantic segmentation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114463335A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998595A (en) * 2022-07-18 2022-09-02 赛维森(广州)医疗科技服务有限公司 Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN115471662A (en) * 2022-11-03 2022-12-13 深圳比特微电子科技有限公司 Training method, recognition method, device and storage medium of semantic segmentation model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998595A (en) * 2022-07-18 2022-09-02 赛维森(广州)医疗科技服务有限公司 Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN114998595B (en) * 2022-07-18 2022-11-08 赛维森(广州)医疗科技服务有限公司 Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN115471662A (en) * 2022-11-03 2022-12-13 深圳比特微电子科技有限公司 Training method, recognition method, device and storage medium of semantic segmentation model

Similar Documents

Publication Publication Date Title
CN108470320B (en) Image stylization method and system based on CNN
WO2020238560A1 (en) Video target tracking method and apparatus, computer device and storage medium
US20220165045A1 (en) Object recognition method and apparatus
US11823443B2 (en) Segmenting objects by refining shape priors
CN109726627B (en) Neural network model training and universal ground wire detection method
EP4099220A1 (en) Processing apparatus, method and storage medium
CN111902825A (en) Polygonal object labeling system and method for training object labeling system
CN109960742B (en) Local information searching method and device
CN111612008A (en) Image segmentation method based on convolution network
CN112132156A (en) Multi-depth feature fusion image saliency target detection method and system
CN112308866B (en) Image processing method, device, electronic equipment and storage medium
CN114463335A (en) Weak supervision semantic segmentation method and device, electronic equipment and storage medium
US11163989B2 (en) Action localization in images and videos using relational features
CN113657560B (en) Weak supervision image semantic segmentation method and system based on node classification
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN112927209B (en) CNN-based significance detection system and method
CN111523463B (en) Target tracking method and training method based on matching-regression network
CN111598087B (en) Irregular character recognition method, device, computer equipment and storage medium
CN111028923A (en) Digital pathological image dyeing normalization method, electronic device and storage medium
US20230153965A1 (en) Image processing method and related device
CN114861842B (en) Few-sample target detection method and device and electronic equipment
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN114897136A (en) Multi-scale attention mechanism method and module and image processing method and device
CN113569852A (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
CN113393434A (en) RGB-D significance detection method based on asymmetric double-current network architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination