CN115311449A - Weak supervision image target positioning analysis system based on class reactivation mapping chart - Google Patents
Weak supervision image target positioning analysis system based on class reactivation mapping chart Download PDFInfo
- Publication number
- CN115311449A CN115311449A CN202210864306.6A CN202210864306A CN115311449A CN 115311449 A CN115311449 A CN 115311449A CN 202210864306 A CN202210864306 A CN 202210864306A CN 115311449 A CN115311449 A CN 115311449A
- Authority
- CN
- China
- Prior art keywords
- class
- foreground
- image
- reactivation
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image processing, and particularly relates to a weak supervision image target positioning analysis system based on a class reactivation mapping chart. The invention comprises the following steps: the device comprises a category context feature learning module, a category mapping map reactivation module and a category mapping map calibration module. The category context feature learning module extracts image features by using a convolutional neural network to generate an initial category mapping graph as index learning category context features; the category mapping image reactivation module takes the category context characteristics as a clustering cluster center, applies an expectation maximization algorithm to cluster the image pixel characteristics, and takes a hidden variable as a category reactivation mapping image; the class map calibration module calibrates foreground background activation values of the class reactivation maps and aggregates the class maps. The method effectively solves the problem of confusion of foreground and background activation values of the initial class mapping image, enables the distinction degree of the foreground and background activation values to be obvious, and improves the target positioning result when only the image class label is used as supervision.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a weak supervision image target positioning analysis system based on a class reactivation mapping chart.
Background
In recent years, deep learning has achieved a pleasing result in a variety of different computer vision applications. As one of the very considerable problems to be explored, image targeting aims at locating key objects for a given image, and this task plays a crucial role for image content analysis and scene understanding. Under the problem setting, a weak supervision target positioning task is derived. Compared with the traditional image target positioning, the task has no object positioning label in the training process and only has an image category label with weak first-level semantic features. Compared with the traditional target positioning task, the weak supervision target positioning task has higher difficulty and is more suitable for practical application. There are a large number of image-category label pairs on the internet, but few fine-label object locations. The development of weakly supervised object localization techniques makes it possible to learn with a large amount of internet data.
The main method of the image weak supervision target positioning task is to train a classification model and calculate a class mapping chart to position an image target. In the class map, the foreground is typically the region where the activation value is greater than the threshold τ, and the rest is defined as the background. However, the category map usually can only locate the most significant region of the target in the map, resulting in incomplete location. The main reason is that the classification model only needs to pay attention to the most significant part in the image and complete the classification judgment in the training process, and the classification result is obtained without paying attention to all the regions of the object. Thus, the category maps lack the ability to fully locate the target. The existing method mainly aims at the problem to provide a method comprising a first stage and a second stage. In a one-stage method comprising: (1) Continuously erasing the most discriminant area in the image in the process of training the classification model, and forcing the model to focus on other parts; (2) The method comprises the steps of (1) constraining a model to concern more objects by adding a regular term on a model loss function; and (3) adding an attention module to sense the rest part of the object. Although these methods have achieved good results in alleviating the incomplete target location problem, they are still limited to performing the task of weakly supervised target location under the classification framework based on class maps. The procedure defines the weakly supervised targeting problem as "which pixels contribute to the final class prediction". The two-stage method additionally trains a target locator on the basis of the first stage to decouple two subtasks of target location and image classification. The work is focused on that the perception of the overall position of the target is better in the early training stage in a one-stage method, but the classification effect is poor. With the progress of training, the accuracy of object classification is improved in the later stage of training, but positioning is incomplete. Decoupling the localization and classification therefore enables both to achieve better results at the same time. The system of the present invention can be applied to both the one-stage method and the two-stage method. In order to solve the above problems, it is very necessary to introduce a completely new target positioning paradigm besides a single dependency classification model, increase the discrimination between the foreground part and the background part of the class mapping image, and improve the image weak supervision target positioning result.
Disclosure of Invention
The invention aims to provide a weak supervision image target positioning system based on a class reactivation mapping chart, which is used for solving the problem that the foreground positioning in the current weak supervision image target positioning is incomplete.
The invention provides a weak supervision image target positioning system based on a class reactivation mapping chart, which comprises a class context feature learning module, a class map reactivation module and a class reactivation mapping chart calibration module; the category context feature learning module extracts image features and generates an initial category mapping graph as index learning category context features; the class mapping map reactivation module receives the image features and the class context features, judges the foreground and the background through pixel-level clustering, generates a class reactivation mapping map and inputs the class reactivation mapping map to the class mapping map calibration module; the class map calibration module locates the coarse foreground and background regions according to the class map and directs the class reactivation map to calibrate the foreground and background activation values.
In the invention, the category context feature learning module comprises an image feature extraction network and a full-connection neural network classifier; the image feature extraction network carries out hierarchical feature extraction on the image by using a VGG16 or inclusion-V3 or ResNet50 deep convolution neural network to generate a spatial feature vector f with dimensions h multiplied by w multiplied by 1, 024; the feature vector f is sent into a full-connection neural network classifier; the full-connection neural network classifier performs weighted summation on the spatial feature vector f and the full-connection network weight w on the c category to obtain an initial category mapping chart M with dimension h multiplied by w c (ii) a The process can be represented as:
f k being the kth component of the spatial feature vector f,the kth component of the weight w corresponding to the c-th class; based on the class map, the final class prediction for the image by the classifier can be expressed as:
where i, j represents a spatial position. From equation (2), solving the image weak supervised targeting problem using the fully connected neural network classifier and the class map can be generalized to solve "which pixels contribute to the class prediction". The class map is normalized to the [0,1] interval and binarized by a threshold τ. And regarding the numerical value of each position in the normalized class mapping image, if the numerical value is larger than tau, the numerical value is taken as a foreground, and otherwise, the numerical value is taken as a background. Incomplete positioning results from the class map focusing too much on the salient regions of the object.
The present invention further proposes to maintain class-by-class contextual feature vectors. For each class c, the foreground context feature vector and the background context feature vector are respectively represented asAndsuperscripts fg and bg represent foreground and background, respectively, the same as below; the context feature vectors are all d-dimensional feature vectors and can be used as the cluster center for each class to summarize the common foreground and background features of the class. Firstly, the invention binarizes the initial category mapping map:
wherein δ represents a threshold value; 1 () represents a directive function.Andcan be used as rough estimates of the foreground and background. For each sample, the deep features are F, using estimates of the foreground and backgroundAndand respectively obtaining foreground and background features, and updating the context feature vector by using the mean value. The specific process can be expressed as:
wherein, F ij A value representing the feature F at spatial location (i, j); i | · | purple wind 0 All non-zero values are counted. The foreground and background context feature vectors are updated using momentum, with a momentum parameter of λ. The use of momentum updates can ensure that context features are updated slowly and more historical features are maintained.
In the invention, the category mapping image reactivation module reactivates the category mapping image, improves the activation value of the foreground part, increases the discrimination of the foreground and background activation values, and enables the target positioning to be more accurate. The module defines the reactivation problem as a gaussian mixture model based parameter estimation problem and solves it using an expectation maximization algorithm. The expectation-maximization algorithm is an extension of the maximum likelihood estimation on a probability model containing hidden variables. Specifically, the method comprises the following steps:
for each sample x, the goal is to maximize the likelihood:
wherein the parametersFor model parameters, superscripts fg and bg represent the foreground and background, respectively. For each image pixel x ij It obeys a probability mixture model, which consists of foreground gaussian distribution and background gaussian distribution:
wherein the mixing weight a fg ,a bg Is [0,1]]Real number in between, and in line with a fg +a bg =1; foreground and background base model p fg And p bg And measuring image features and the learned category context feature vector. In the present invention, the radial direction in the Gaussian mixture model is not adopted in consideration of the realization efficiency factorBasis functions, and cosine similarity is taken as a measure:
wherein, sigma is the degree of smooth control of the hyper-parameter.
Next, the model is solved using an expectation-maximization algorithm. Defining hidden variables Z fg And Z bg Representing the probability that the image pixel belongs to the foreground and the background at position (i, j), respectively.
The expectation-maximization algorithm first empirically assigns an initial distribution to each class (i.e., hidden variable) by assuming a distribution parameter V fg And V bg And solving hidden variable expectation of each data according to the distribution parameters (step E); then, calculating the maximum likelihood value of the distribution parameters according to the classification result, and recalculating the expectation of the hidden variable of each data according to the maximum likelihood value (M step); and circulating until convergence.
First, in step E, the parameters of the current model are used to calculate the posterior distribution of the hidden variables, i.e.Andin T (T is more than or equal to 1 and less than or equal to T) in each iteration process, assuming that the model parameters are fixed, the calculation process of the hidden variables is represented as:
from the clustering perspective, formula (8) calculates the similarity between the pixel-by-pixel feature and the foreground and background context feature vectors, and assigns a soft label to the pixel, i.e., the probability of belonging to the foreground or the background. The method is different from a common expectation maximization algorithm in that the common expectation maximization algorithm uses random initialization, and the method is obtained by class context characteristic learning, so that a clustering center has more definite and richer image class semantic information.
In the M step, the goal is to adjust the context feature vector to fit to the current image feature. This is done by using calculated hidden variable values to maximize the expected likelihood of image features. The model new parameter update process is represented as:
wherein, V fg And V bg Updated by a weighted average of the features, a fg And a bg Updated by the number of valid pixels.
The E step and the M step are alternately carried out until convergence. At this time, the hidden variable Z fg(T) And Z bg(T) Representing the probability that a pixel feature belongs to the foreground and the background. The hidden variables can successfully complete reactivation because probabilities are used instead of the original activation values in the clustering process. The larger the probability, the more the representative pixel belongs to the foreground part. In addition to this, using feature clustering instead of the original classification modelThe class mapping map can also prevent the implicit bias of over-focusing on the local area caused by the global average pooling layer.
In the invention, the category reactivation map calibration module calibrates the category reactivation map. The resulting hidden variable Z is due to the use of the expectation-maximization algorithm in the class-map reactivation module fg(T) And Z bg(T) As an initial reactivation map, the foreground activation value is not guaranteed to be greater than the background activation value, and therefore calibration is required. I.e. if and only if Z fg(T) When the foreground activation value of (2) is greater than the background activation value, Z fg(T) As a class reactivation map, otherwise, selecting Z bg(T) . The module uses the initial class activation map as a guide for calibration. From equation (3), the coarse foreground portion can be obtainedWith background sectionThe invention by estimating Z fg(T) And Z bg(T) Average probability of belonging to foregroundAndand (4) judging:
wherein the foreground part is passedIt is given. Although it is not limited toOnly partial foreground can be marked, but the rough foreground and background regions are not influenced, and the obvious foreground region can still be used as a foreground prompt. According to the formulas (16) and (17), the class reactivation map corresponding to the one with the higher foreground average probability should belong to the class reactivation map of the foreground. The calibrated foreground reactivation mapThe calculation process of (c) can be expressed as:
to further distinguish the foreground and background partial activation values, the class reactivation map and the initial class map are fused in the class reactivation map calibration module, and the calculation process can be expressed as:
the resulting final class reactivation map, after normalization and thresholding, may generate a mask or bounding box marker image target location.
The invention relates to a weak supervision image target positioning analysis system based on a class reactivation mapping chart, which comprises the following working procedures:
firstly, training a deep convolutional neural network model in a category context feature learning module, performing feature representation on an image, and extracting deep image feature representation; utilizing a full-connection network classifier to represent deep features of the image, and carrying out weighted summation to obtain an initial class mapping chart; calculating the average feature vector of the foreground and the background based on the initial category mapping image and the image feature, and updating the category context feature by using momentum;
(II) taking deep features and category context features of the image as the input of a category map reactivation module, clustering each pixel feature of the image into foreground features or background features by using an expectation maximization algorithm according to the similarity of the deep features and the category context features of the image, and taking hidden foreground variables and hidden background variables as category reactivation maps;
thirdly, the foreground and background class reactivation mapping image is used as the input of a class reactivation mapping image calibration module, and according to the rough foreground and background positioning of the initial class mapping image, the foreground and the background of the reactivation mapping image are distinguished and the class reactivation mapping image is calibrated; and simultaneously, fusing the initial category mapping map and the category reactivation mapping map to obtain a final image target positioning result.
According to the invention, a target positioning label is not needed in the whole process, the image weak supervision target positioning is completed only by using the image type label, and an accurate positioning result is obtained.
The advantages of the invention include:
first, the problem of incomplete localization in exploring the initial class map is due to the confusion of the activation values of its foreground and background parts. In order to solve the problem, a weak supervision image target positioning analysis system based on a class reactivation mapping chart is provided, and accurate and complete image target positioning is achieved;
secondly, a category context feature learning module is provided for the first time, and momentum updating is carried out on foreground and background context features through the image features and the initial category mapping graph, so that each category of foreground and background context features can represent common features of the image foreground and background parts of the category;
and thirdly, a category map reactivation module is firstly provided, reactivation is used as a parameter estimation problem of the Gaussian mixture model, an expectation maximization algorithm is used for solving, and the obtained hidden variable is used as a category reactivation map. On the basis, a class reactivation map calibration module is provided, and the initial class map is calibrated and fused for the class reactivation map. The obtained final class reactivation mapping map can obviously distinguish the foreground from the background;
finally, the optimal weak supervision image target positioning result is obtained from the public data sets ImageNet, CUB and OpenImages, and the positioning result has interpretability.
Drawings
FIG. 1 is a system diagram of the present invention.
Fig. 2 is a full framework diagram of the model in the present invention.
Detailed Description
As is known in the art, most of the previous studies have been faced with the problems: incomplete targeting in an initial class map generated using a classification model. The invention carries out intensive research aiming at the problems, and the problem of incomplete target positioning is caused by the confusion of the activation values of the foreground part and the background part of the initial class mapping. Aiming at the problem, the invention provides a weakly supervised image target positioning analysis system based on a class reactivation mapping chart to realize accurate positioning, and a deep convolutional neural network is combined with a traditional expectation maximization algorithm in the system to solve the problem of weakly supervised target positioning from a novel clustering paradigm. The weak supervision image target positioning system provided by the invention is suitable for all one-stage and two-stage positioning models, and can remarkably improve the positioning accuracy.
The invention will be described in detail hereinafter with reference to the drawings.
As shown in the first figure, the class reactivation map-based weak surveillance image target localization analysis system of the present invention includes a class context feature learning module, a class map reactivation module, and a class map calibration module, and the work flow thereof is as follows:
firstly, the method comprises the following steps: the context characteristic learning module firstly extracts the hierarchical characteristics of the image, generates a spatial characteristic vector and sends the spatial characteristic vector to the full-connection neural network classifier; carrying out weighted summation on the space characteristic vector and the weight of the full-connection network to obtain an initial category mapping chart M c . The process can be represented as:
on the basis of the above, defining the category context characteristicsAnd withRespectively representing the common characteristics of the foreground and the background of the c-th category. According to the initial category mapping chart, respectively obtaining foreground areasAndand multiplying the image deep features element by element, averaging to obtain foreground region features and background region features, and updating the category context features. The update process is represented as:
II, secondly, the method comprises the following steps: the category mapping image reactivation module reactivates the category mapping image, improves the activation value of the foreground part, increases the discrimination of the foreground and background activation values, and enables the target positioning to be more accurate. First, class map reactivation is considered as a parameter estimation problem for the gaussian mixture model: secondly, for each image pixel x ij It obeys the probability mixture model and consists of foreground Gaussian distribution and background Gaussian distribution.
The problem is solved by an expectation maximization algorithm. Introducing an implicit variable Z fg And Z bg Representing the probability that the image pixel belongs to the foreground and the background at position (i, j), respectively. The expectation-maximization algorithm completes learning by alternately iterating the steps E and M. In step E, the hidden variables are estimated, assuming the model parameters are fixed.
In the step M, model parameters are updated assuming that hidden variables are fixed:
after T iterations, the model converges. Resulting hidden variable Z fg(T) And Z bg(T) As an initial class reactivation map, the foreground can be significantly distinguished from the background.
Thirdly, the steps of: the class reactivation map calibration module calibrates the class reactivation maps and fuses the initial class reactivation maps. The calibration process utilizes the initial class map as the foreground indexCalculating Z fg(T) And Z bg(T) The average probability of belonging to the foreground. The calculation process is as follows:
and selecting the class reactivation mapping map with high average probability as calibrated class reactivation mapping map, and fusing the initial class reactivation mapping map to further enhance the discrimination of the foreground and background activation values, so that the foreground positioning is more accurate.
The public data sets CUB, ILSVRC and OpenImages were used for the experiments. The CUB data set contains 200 different kinds of birds, and the training set and the test set respectively contain 5994 pictures and 5794 pictures. The ILSVRC contains 1.2 million training pictures and 5 million test pictures. Both the CUB and ILSVRC datasets provide an object bounding box as a label. The OpenImages dataset contains 100 types of pictures, and the training, validation and test sets contain 29819,2500 and 5000 pictures, respectively. Since the CUB and ILSVRC datasets provide object bounding boxes as labels, the evaluation index max box accuracy (MaxBoxAccV 2) is employed. The OpenImages dataset provides only pixel level labeling and therefore uses a threshold independent pixel level average accuracy (PxAP) evaluation. The feature extractor of the category context feature learning module in the invention adopts VGG16 or inclusion V3 or ResNet50 which is pre-trained on an ImageNet data set, an input image is firstly scaled to 256 × 256 resolution and randomly cut out to 224 × 224 resolution. Total training rounds of 50, 6 and 10 were set on the CUB, ILSVRC and OpenImages datasets, respectively. The initial learning rate was 0.001 and was reduced by a factor of 10 at each 15, 2 and 3 rounds, respectively. The neural network classifier in the category context feature learning module adopts random initialization, and the learning rate of the neural network classifier is set to be 10 times of the original learning rate. The model parameters were updated using a random batch gradient descent algorithm with the batch size set to 32. The comparison methods are classical weakly supervised object localization models CAM, HAS, ACoL, SPG, ADL and CutMix. The results of the experiments on the three datasets CUB, ILSVRC and OpenImages are shown in table 1, table 2 and table 3, respectively. Among them, the HaS, ACoL, SPG, ADL, cutMix and CREAM (the present invention) are expressed by the relative CAM (class map) values to show the superiority and inferiority. On three data sets, the method is obviously superior to other methods, and is obviously improved compared with a class mapping graph. Especially on the CUB data set, the network of the ResNet as the feature extractor is improved by 10.5 compared with the CAM. The invention embodies more accurate positioning capability in both the maximum frame accuracy using the frame as granularity and the pixel average accuracy using the pixel as granularity.
In summary, the present invention provides a novel class reactivation map-based weak supervised image target localization analysis system for the problem of confusion of foreground background activation values of class maps on the premise of only using weak supervised image level annotation, and achieves accurate and complete localization of an image target by three modules, namely class context feature learning, an expectation maximization algorithm for completing reactivation of an initial class map, and class reactivation map calibration, so that it is possible to complete image analysis using large-scale coarse-grained annotation data on the internet.
TABLE 1
TABLE 2
Method | VGG | Inception | ResNet | Average out |
Center Gauss | 48.9 | 48.9 | 48.9 | 48.9 |
CAM | 60.0 | 63.7 | 63.7 | 62.4 |
HaS | +0.6 | +0.3 | -0.3 | +0.2 |
ACoL | -2.6 | +0.3 | -1.4 | -1.2 |
SPG | -0.1 | -0.1 | -0.4 | -0.2 |
ADL | -0.2 | -2.0 | +0.0 | -0.7 |
CutMix | -0.6 | +0.5 | -0.4 | -0.2 |
CREAM | +6.2 | +2.1 | +3.7 | +5.1 |
TABLE 3
Method | VGG | Inception | ResNet | Average |
Center Gauss | 54.4 | 54.4 | 54.4 | 54.4 |
CAM | 58.3 | 63.2 | 58.5 | 60.0 |
HaS | -0.2 | -5.1 | -2.6 | -2.6 |
ACoL | -4.0 | -6.0 | -1.2 | -3.7 |
SPG | +0.0 | -0.9 | -1.8 | -0.9 |
ADL | +0.4 | -6.4 | -3.3 | -3.1 |
CutMix | -0.2 | -0.7 | -0.8 | -0.6 |
CREAM | +3.7 | +1.4 | +6.2 | +3.8 |
。
Claims (7)
1. The system is characterized by comprising a category context feature learning module, a category map reactivation module and a category reactivation map calibration module; the category context feature learning module extracts image features and generates an initial category mapping graph as index learning category context features; the class mapping map reactivation module receives the image features and the class context features, judges the foreground and the background through pixel-level clustering, generates a class reactivation mapping map and inputs the class reactivation mapping map to the class mapping map calibration module; the class map calibration module locates the coarse foreground and background regions according to the class map and directs the class reactivation map to calibrate the foreground and background activation values.
2. The weakly supervised image target localization analysis system of claim 1, wherein the category context feature learning module includes an image feature extraction network and a fully connected neural network classifier; the image feature extraction network uses a VGG16 or inclusion-V3 or ResNet50 deep convolution neural network to extract hierarchical features of the image, and a spatial feature vector f with dimensions of h multiplied by w multiplied by 1,024 is generated; the feature vector f is sent into a full-connection neural network classifier; the full-connection neural network classifier performs weighted summation on the spatial feature vector f and the full-connection network weight w on the c category to obtain an initial category mapping chart M with dimension h multiplied by w c (ii) a The process is represented as:
f k being the kth component of the spatial feature vector f,the kth component of the weight w corresponding to the c-th class; based on the class map, the final class prediction for the image by the classifier is represented as:
wherein i, j represents a spatial position; as can be seen from equation (2), solving the image weak surveillance target localization problem using the neural network classifier and the class map can be generalized to solve "which pixels contribute to the class prediction"; normalizing the class mapping chart into a [0,1] interval, and carrying out binarization through a threshold value tau; for the value of each position in the class map, if the value is greater than tau, the position is regarded as the foreground part, otherwise, the position is regarded as the background part.
3. The system for positioning and analyzing a weakly supervised image target according to claim 2, wherein the generating of the initial class map as the context feature of the index learning class is specifically: for each class c, the foreground context feature vector and the background context feature vector are respectively represented asAndthe context feature vectors are all d-dimensional feature vectors and are used as the cluster center of each category to summarize the common foreground and background features of the category; first, binarize the initial class map:
wherein δ represents a threshold value; 1 () represents a directive function;and withAs rough estimates of the foreground and background; for each sample, the deep features are F, using estimates of the foreground and backgroundAnd withObtain the foreground and the background respectivelyBackground features, and updating context feature vectors by using the mean values; the specific process is as follows:
wherein, F ij A value representing the feature F at spatial location (i, j); i | · | purple wind 0 Calculating the number of all non-zero values; updating the foreground and background context feature vectors by using momentum, wherein the momentum parameter is lambda; the use of momentum updates can ensure that context features are updated slowly, maintaining more historical features.
4. The system for positioning and analyzing the weak supervision image target according to claim 3, wherein the class map reactivation module reactivates the class map to improve the activation value of the foreground part and increase the discrimination of the foreground activation value to make the target positioning more accurate; the module defines a reactivation problem as a gaussian mixture model based parameter estimation problem and solves the problem using an expectation-maximization algorithm; the expectation maximization algorithm is the extension of the maximum likelihood estimation in a probability model containing hidden variables; specifically, the method comprises the following steps:
for each sample x, the goal is to maximize the likelihood:
wherein the parametersModel parameters (fg, bg represent foreground and background, respectively); for each image pixel x ij Obeying a probabilistic mixture model consisting of a set of foreground and background Gaussian distributionsThe composition is as follows:
wherein the mixing weight a fg ,a bg Is [0,1]]Real number in between, and in line with a fg +a bg =1; foreground and background base model p fg And p bg Measuring image features and learned category context feature vectors; cosine similarity is taken as a measure:
wherein, sigma is the degree of smooth control of the hyper-parameter.
5. The weakly supervised image target localization analysis system of claim 4, wherein the problem is solved using an expectation maximization algorithm, in particular:
defining hidden variables Z fg And Z bg Representing the probability that the image pixel belongs to the foreground and the background at position (i, j), respectively;
e, step E: the expectation-maximization algorithm first empirically assigns an initial distribution to each class, i.e., hidden variable, by assuming a distribution parameter V fg And V bg And solving hidden variable expectation of each data according to the distribution parameters;
and M: then, calculating the maximum likelihood value of the distribution parameters according to the classification result, and recalculating the expectation of the hidden variable of each data according to the maximum likelihood value; circulating until convergence;
in step E, the parameters of the current model are used to calculate the posterior distribution of the hidden variables, i.e.Andin T (T is more than or equal to 1 and less than or equal to T) in each iteration process, assuming that the model parameters are fixed, the calculation process of the hidden variables is represented as follows:
from the clustering angle, the similarity between the pixel-by-pixel characteristics and the foreground and background context characteristic vectors is calculated by the formula (8), and a soft label is allocated to the pixel, namely the probability of belonging to the foreground or the background;
in the step M, the aim is to adjust the context feature vector to be matched with the current image feature; the specific method is that the expected likelihood of the image features is maximized by using the calculated hidden variable values; the model new parameter update process is represented as:
wherein, V fg And V bg Updated by a weighted average of the features, a fg And a bg Updating through the effective pixel number;
the E step and the M step are alternately carried out until convergence.
6. The system of claim 5, wherein the category reactivation map calibration module calibrates a category reactivation map; the resulting hidden variable Z is due to the use of the expectation-maximization algorithm in the class-map reactivation module fg(T) And Z bg(T) As the initial reactivation map, the foreground activation value cannot be guaranteed to be greater than the background activation value, so calibration is required; i.e. if and only if Z fg(T) When the foreground activation value of (2) is greater than the background activation value, Z fg(T) As a class reactivation map, otherwise, selecting Z bg(T) (ii) a Specifically, the initial class activation mapping is used as a guide to carry out calibration; from equation (3), the coarse foreground portion can be obtainedAnd background sectionBy estimating Z fg(T) And Z bg(T) Average probability of belonging to foregroundAndand (4) judging:
wherein the foreground part is passedGiving out; according to the formulas (16) and (17), the class reactivation map corresponding to the one with higher foreground average probability should belong to the class reactivation map of the foreground; the calibrated foreground reactivation mapThe calculation process of (a) is expressed as:
in order to further distinguish the activation values of the foreground and background parts, the class reactivation mapping chart and the initial class mapping chart are fused in a class reactivation mapping chart calibration module, and the calculation process is represented as follows:
and after the obtained final class reactivation mapping map is normalized and thresholded, generating a mask or bounding box mark image target positioning.
7. The weakly supervised image target localization analysis system of claim 6, wherein the workflow is:
firstly, training a deep convolutional neural network model in a category context feature learning module, performing feature representation on an image, and extracting deep image feature representation; utilizing a full-connection network classifier to represent deep features of the image, and carrying out weighted summation to obtain an initial class mapping chart; calculating the average feature vector of the foreground and the background based on the initial category mapping image and the image feature, and updating the category context feature by using momentum;
secondly, taking the deep features and the category context features of the image as the input of a category map reactivation module, clustering each pixel feature of the image into foreground features or background features by using an expectation-maximization algorithm according to the similarity of the deep features and the category context features of the image, and taking the hidden foreground variables and the hidden background variables as category reactivation maps;
thirdly, the foreground and background class reactivation mapping image is used as the input of a class reactivation mapping image calibration module, and according to the rough foreground and background positioning of the initial class mapping image, the foreground and the background of the reactivation mapping image are distinguished and the class reactivation mapping image is calibrated; and simultaneously, fusing the initial category mapping map and the category reactivation mapping map to obtain a final image target positioning result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210864306.6A CN115311449A (en) | 2022-07-20 | 2022-07-20 | Weak supervision image target positioning analysis system based on class reactivation mapping chart |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210864306.6A CN115311449A (en) | 2022-07-20 | 2022-07-20 | Weak supervision image target positioning analysis system based on class reactivation mapping chart |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115311449A true CN115311449A (en) | 2022-11-08 |
Family
ID=83856260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210864306.6A Pending CN115311449A (en) | 2022-07-20 | 2022-07-20 | Weak supervision image target positioning analysis system based on class reactivation mapping chart |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115311449A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115908296A (en) * | 2022-11-10 | 2023-04-04 | 深圳大学 | Medical image class activation mapping evaluation method and device, computer equipment and storage medium |
CN116563953A (en) * | 2023-07-07 | 2023-08-08 | 中国科学技术大学 | Bottom-up weak supervision time sequence action detection method, system, equipment and medium |
-
2022
- 2022-07-20 CN CN202210864306.6A patent/CN115311449A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115908296A (en) * | 2022-11-10 | 2023-04-04 | 深圳大学 | Medical image class activation mapping evaluation method and device, computer equipment and storage medium |
CN115908296B (en) * | 2022-11-10 | 2023-09-22 | 深圳大学 | Medical image class activation mapping evaluation method, device, computer equipment and storage medium |
CN116563953A (en) * | 2023-07-07 | 2023-08-08 | 中国科学技术大学 | Bottom-up weak supervision time sequence action detection method, system, equipment and medium |
CN116563953B (en) * | 2023-07-07 | 2023-10-20 | 中国科学技术大学 | Bottom-up weak supervision time sequence action detection method, system, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN109190524B (en) | Human body action recognition method based on generation of confrontation network | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN110738247B (en) | Fine-grained image classification method based on selective sparse sampling | |
Zhao et al. | Closely coupled object detection and segmentation | |
Zhou et al. | Salient object detection via fuzzy theory and object-level enhancement | |
CN115311449A (en) | Weak supervision image target positioning analysis system based on class reactivation mapping chart | |
CN110909618B (en) | Method and device for identifying identity of pet | |
CN114021799A (en) | Day-ahead wind power prediction method and system for wind power plant | |
CN111027493A (en) | Pedestrian detection method based on deep learning multi-network soft fusion | |
CN112052802B (en) | Machine vision-based front vehicle behavior recognition method | |
CN111461039B (en) | Landmark identification method based on multi-scale feature fusion | |
CN112668579A (en) | Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution | |
CN113408605A (en) | Hyperspectral image semi-supervised classification method based on small sample learning | |
CN110716792B (en) | Target detector and construction method and application thereof | |
CN110363165B (en) | Multi-target tracking method and device based on TSK fuzzy system and storage medium | |
CN113362341B (en) | Air-ground infrared target tracking data set labeling method based on super-pixel structure constraint | |
Chen et al. | A semisupervised context-sensitive change detection technique via gaussian process | |
CN114139631B (en) | Multi-target training object-oriented selectable gray box countermeasure sample generation method | |
CN114201632B (en) | Label noisy data set amplification method for multi-label target detection task | |
CN115393631A (en) | Hyperspectral image classification method based on Bayesian layer graph convolution neural network | |
CN111815640A (en) | Memristor-based RBF neural network medical image segmentation algorithm | |
CN114549909A (en) | Pseudo label remote sensing image scene classification method based on self-adaptive threshold | |
CN112270285B (en) | SAR image change detection method based on sparse representation and capsule network | |
Han et al. | Accurate and robust vanishing point detection method in unstructured road scenes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |