CN112182274B - Labeling method and system for multi-label social network image - Google Patents

Labeling method and system for multi-label social network image Download PDF

Info

Publication number
CN112182274B
CN112182274B CN202011045407.8A CN202011045407A CN112182274B CN 112182274 B CN112182274 B CN 112182274B CN 202011045407 A CN202011045407 A CN 202011045407A CN 112182274 B CN112182274 B CN 112182274B
Authority
CN
China
Prior art keywords
matrix
image
tag
label
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011045407.8A
Other languages
Chinese (zh)
Other versions
CN112182274A (en
Inventor
李泽超
练连荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Haoxiang Basic Software Research Institute Co ltd
Nanjing University of Science and Technology
Original Assignee
Nanjing Haoxiang Basic Software Research Institute Co ltd
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Haoxiang Basic Software Research Institute Co ltd, Nanjing University of Science and Technology filed Critical Nanjing Haoxiang Basic Software Research Institute Co ltd
Priority to CN202011045407.8A priority Critical patent/CN112182274B/en
Publication of CN112182274A publication Critical patent/CN112182274A/en
Application granted granted Critical
Publication of CN112182274B publication Critical patent/CN112182274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a labeling method and a labeling system for a multi-label social network image, which relate to the technical field of social network image labeling, and the method comprises the following steps: acquiring an image dataset; obtaining a first tag matrix according to the image dataset; denoising the first tag matrix by using the Cauchy distribution to obtain a second tag matrix; inputting the image data set into a Resnet50 residual network to obtain an image extraction feature matrix; inputting the second label matrix and the image extraction feature matrix into a CNN network for training and optimizing to obtain an optimized CNN network; determining a feature vector to be marked; and inputting the feature vector to be marked into the optimized CNN network to obtain the label of the image to be marked. The method and the device can accurately label the image labels of the weakly-supervised social network.

Description

Labeling method and system for multi-label social network image
Technical Field
The invention relates to the technical field of social network image labeling, in particular to a labeling method and a labeling system for a multi-label social network image.
Background
With the explosion of social networks in recent years, massive social network images are being shared and browsed by network users. The vast amount of image data makes accurate retrieval difficult, so that effective image retrieval techniques are now urgently needed. Tag-based image retrieval is performed by establishing a semantic relationship between the image and the tag. Image annotation is important for image retrieval. Currently, user-provided labels, while describing visual content information to some extent, are inaccurate. Only half of the labels provided by the user may describe the visual content of the image. This is because in reality social network tags are often incomplete, inaccurate, and even have a very large percentage of images untagged (e.g., more than 50% of the pictures in the MIRFlickr dataset have no tags), and the tags provided by the visible users are poorly supervised. This, in turn, increases the difficulty of the related multimedia task, so it is necessary to improve the tag quality of the social image by learning the inherent links between the visual information of the image and the tag semantics.
For the problem of re-labeling (multi-label labeling) of social network images, previous work has proposed various solutions, such as employing matrix decomposition to minimize noise, thereby learning the image-label internal relationship. In addition, consistency of visual features of the image and correlation between labels are considered by verifying low rank of the re-labeling label matrix. And based on a low-rank non-negative model, the difference between the remarked label and the observed label is reduced better by introducing two potential factor matrixes to separate the optimization functions, and a more ideal image-label relation model is obtained. The effectiveness of the low-rank non-negative model in the social image re-labeling task is verified through the work.
In the former work, a square loss function is generally employed as an objective function based on consideration of model generalization ability. This implies the basic assumption that noise in social tags is gaussian-like. Because of the characteristics of the central limit theorem, the gaussian probability density function is widely applied to the fields of signal processing, image analysis and the like. Gaussian distribution fits well to the most frequent white noise. In reality, however, the inherent probability distribution of data noise is unknown, and various types of noise may exist. In real noise, the noise sources tend to be diverse. Assuming that the noise is composed of a sum of random variables of a plurality of different probability distributions, and each random variable is independent, the noise distribution approaches a gaussian distribution as the number of noise sources increases according to the central limit theorem. However, gaussian distribution can effectively solve small noise, is too sensitive to large noise, and social images belong to large noise.
In the image annotation task, the image features have significant meaning to model training, and the image features with deeper semantics can significantly improve training effect. Conventional feature extraction methods (e.g., gist, SIFT, HOG) are used in many tasks, which can result in the loss of deep semantics.
Currently, multi-label labeling of social pictures is mainly solved from two aspects: tag correlation studies and picture-tag relationship studies. The main method for researching the tag correlation is to carry out matrix decomposition on a tag matrix, and advanced methods include: li Z, tang J.Weakly-supervised deep nonnegative low-rank model for social image tag refinement and assignment (WDNL model), but this method does not analyze the tag noise, but in actual data the tag noise distribution is unknown. For research of image-label relation, a deep learning method is mostly adopted. The existing models comprise DMF, MPMF, LSCCA, TCCA, DNMF, WDMF, WDNL models, DCE models and the like, and the problem that the image labels of the weakly supervised social network cannot be accurately marked exists.
Disclosure of Invention
The invention aims to provide a labeling method and a labeling system for a multi-label social network image, so that accurate labeling of labels of a weak supervision social network image is realized.
In order to achieve the above object, the present invention provides the following solutions:
a method of labeling a multi-labeled social network image, the method comprising:
acquiring an image dataset; the image data set comprises a plurality of images and labels corresponding to the images;
obtaining a first tag matrix according to the image dataset;
denoising the first tag matrix by using Cauchy distribution to obtain a second tag matrix;
inputting the image data set into a Resnet50 residual network to obtain an image extraction feature matrix; the image extraction feature matrix comprises 2048-dimensional feature vectors corresponding to each image in the image dataset;
inputting the second label matrix and the image extraction feature matrix into a CNN network for training and optimizing to obtain an optimized CNN network;
determining a feature vector to be marked; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked;
and inputting the feature vector to be marked into the optimized CNN network to obtain the label of the image to be marked.
Optionally, the first label matrix is obtained according to the image dataset, and the specific formula is:
wherein F represents a first tag matrix; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
Optionally, denoising the first tag matrix by using a cauchy distribution to obtain a second tag matrix, which specifically includes:
assuming that the noise in the first tag matrix accords with the Cauchy distribution, and obtaining the distribution condition of each noise in the first tag matrix by using the Cauchy distribution;
and denoising the first tag matrix according to the distribution condition of each noise to obtain a second tag matrix.
Optionally, the training and optimizing the second label matrix and the image extraction feature matrix input to a CNN network to obtain an optimized CNN network specifically includes:
inputting the second tag matrix and the image extraction feature matrix into a CNN (computer numerical network) network to obtain a tag matrix actually output by the CNN network;
judging whether the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value;
if yes, outputting an optimized CNN network;
if not, denoising the label matrix actually output by the CNN network by using the Cauchy distribution to obtain a third label matrix;
and updating the second tag matrix by using the third tag matrix, and returning to the step of inputting the second tag matrix and the image extraction feature matrix into a CNN network to obtain a tag matrix actually output by the CNN network.
Optionally, the determining the feature vector to be annotated specifically includes:
inputting the image to be annotated into a Resnet50 residual network to obtain a feature vector to be annotated; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
The invention also provides the following scheme:
a labeling system for a multi-labeled social network image, comprising:
an image dataset acquisition module for acquiring an image dataset; the image data set comprises a plurality of images and labels corresponding to the images;
the first tag matrix acquisition module is used for acquiring a first tag matrix according to the image dataset;
the second tag matrix acquisition module is used for denoising the first tag matrix by using the Cauchy distribution to obtain a second tag matrix;
the image extraction feature matrix acquisition module is used for inputting the image data set into a Resnet50 residual network to obtain an image extraction feature matrix; the image extraction feature matrix comprises 2048-dimensional feature vectors corresponding to each image in the image dataset;
the training and optimizing module is used for inputting the second label matrix and the image extraction feature matrix into a CNN network for training and optimizing to obtain an optimized CNN network;
the feature vector to be marked determining module is used for determining the feature vector to be marked; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked;
and the image label obtaining module is used for inputting the feature vector to be marked into the optimized CNN network to obtain the label of the image to be marked.
Optionally, the specific formula of the first tag matrix acquisition module is:
wherein F represents a first tag matrix; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
Optionally, the second tag matrix acquisition module specifically includes:
the noise distribution situation obtaining unit is used for obtaining the distribution situation of each noise in the first tag matrix by using the Cauchy distribution under the assumption that the noise in the first tag matrix accords with the Cauchy distribution;
and the second tag matrix acquisition unit is used for denoising the first tag matrix according to the distribution condition of each noise to obtain a second tag matrix.
Optionally, the training and optimizing module specifically includes:
the actual output tag matrix acquisition module is used for inputting the second tag matrix and the image extraction feature matrix into a CNN network to obtain a tag matrix actually output by the CNN network;
the judging module is used for judging whether the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value;
the output module is used for outputting the optimized CNN network when the output result of the judging module is that the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value;
the third tag matrix acquisition module is used for denoising the tag matrix actually output by the CNN network by using the Cauchy distribution when the output result of the judgment module is that the error between the tag matrix actually output by the CNN network and the second tag matrix is larger than or equal to a set value, so as to obtain a third tag matrix;
and the circulation module is used for updating the second tag matrix by using the third tag matrix and returning to the actual output tag matrix acquisition module.
Optionally, the feature vector to be annotated determining module specifically includes:
the feature vector to be annotated determining unit is used for inputting the image to be annotated into the Resnet50 residual network to obtain the feature vector to be annotated; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the labeling method and the labeling system for the multi-label social network image, the Cauchy distribution which is more robust to various noises is selected for fitting the label noise aiming at the large-noise social image, the heavy tail characteristic of the Cauchy distribution can effectively model the large noise, and the Cauchy distribution is smooth at the peak value, so that the high-noise social network image has a good effect on dense noise, therefore, the Cauchy distribution modeling social image noise can be adopted, a more accurate label matrix can be obtained by denoising the label matrix through adopting the Cauchy distribution, meanwhile, the deep visual characteristics of the image are extracted by utilizing the Resnet50, the relationship between the deep visual characteristics of the image and the more accurate label matrix is trained by utilizing the three-layer CNN network, the label of the social network image can be optimized by learning the image-label relationship, the missing label in the social network image is complemented, the wrong label is deleted, and the label can be labeled on the new image, so that the accurate label is labeled on the weak supervision network image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an embodiment of a method for labeling a multi-labeled social network image of the present invention;
FIG. 2 is a schematic diagram of a social network image annotation and re-annotation model (CDNL) based on a noise Cauchy distribution of the present invention;
FIG. 3 is a block diagram of an embodiment of a tagging system for multi-tag social networking images of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a labeling method and a labeling system for a multi-label social network image, so that accurate labeling of labels of a weak supervision social network image is realized.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
FIG. 1 is a flowchart of an embodiment of a method for labeling a multi-labeled social network image of the present invention. Referring to fig. 1, the labeling method of the multi-label social network image includes:
step 101: acquiring an image dataset; the image dataset comprises a plurality of images and labels corresponding to the images.
Step 102: and obtaining a first tag matrix according to the image data set.
The specific formula of the step 102 is:
wherein F represents a first tag matrix; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
Step 103: and denoising the first tag matrix by using the Kexil distribution to obtain a second tag matrix.
The step 103 specifically includes:
and assuming that the noise in the first tag matrix accords with the Cauchy distribution, and obtaining the distribution condition of each noise in the first tag matrix by using the Cauchy distribution.
And denoising the first tag matrix according to the distribution condition of each noise to obtain a second tag matrix.
Step 104: inputting the image data set into a Resnet50 residual network to obtain an image extraction feature matrix; the image extraction feature matrix comprises 2048-dimensional feature vectors corresponding to each image in the image dataset.
In step 104, i.e. the function of the image-tag training module, in order to obtain the deeper visual content of the image, in this embodiment, the network of Resnet50 residuals is used to extract 2048-dimensional image features, and the pre-trained network of Resnet50 is used to extract visual features. In order to make the predicted label closer to the ideal label (i.e. the denoised label, the accurate label corresponding to the image), the deep visual features X, i.e. 2048-dimensional image features extracted by using the Resnet50 residual network, and the first label matrix, i.e. the observation label matrix F, are input into the CNN to pre-train the network. Where X is a two-dimensional matrix of the number n of images characteristic dimension 2048. The observation tag matrix F is self-contained in the image dataset, with a number of tags for each picture. F is a two-dimensional matrix of the number of images n x the number of labels.
Step 105: inputting the second label matrix and the image extraction feature matrix into a CNN network for training and optimizing to obtain an optimized CNN network.
The step 105 specifically includes:
and inputting the second tag matrix and the image extraction feature matrix into a CNN network to obtain a tag matrix actually output by the CNN network.
Judging whether the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value; the set value is 10 -5
If yes, outputting the optimized CNN network.
If not, denoising the label matrix actually output by the CNN network by using the Cauchy distribution to obtain a third label matrix.
And updating the second tag matrix by using the third tag matrix, and returning to the step of inputting the second tag matrix and the image extraction feature matrix into a CNN network to obtain a tag matrix actually output by the CNN network.
This step 105 learns the mapping relationship between the visual features and the high-level semantics using a three-layer CNN network based on the proposed visual features and the observation tag. Since the objective is multi-tag classification, a Sigmoid function is chosen as the activation function, and a bi-class cross entropy function (Binary Cross Entropy) is chosen as the loss function. In addition, for faster training of the model, the model is pre-trained with a feature matrix and an observation tag matrix.
Step 106: determining a feature vector to be marked; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
The step 106 specifically includes:
inputting the image to be annotated into a Resnet50 residual network to obtain a feature vector to be annotated; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
Step 107: and inputting the feature vector to be marked into the optimized CNN network to obtain the label of the image to be marked.
Step 107 is a function of the label labeling/optimizing module, when the CNN network converges (i.e. the error between the label matrix actually output by the CNN network and the second label matrix is smaller than 10 -5 ) When the method is used, a trained image-label model (namely an optimized CNN network) is utilized to annotate the image of the current data set, and label information is provided for the new image. In step 107, the label of the image to be marked, that is, the label obtained by accurate prediction through the optimized CNN network, that is, the predicted label.
In the above method steps of this embodiment, step 104 is the function of the image-tag training module, step 107 is the function of the tag labeling/optimizing module, and the remaining steps are all the functions of the optimizing function module. FIG. 2 is a schematic diagram of a social network image annotation and re-annotation model (CDNL) based on a noise Cauchy distribution of the present invention. Referring to fig. 2, the present embodiment provides a multi-label social image optimization and labeling method, which is a social network image label optimization and labeling method based on noise cauchy distribution, and the method integrally adopts a low-rank non-negative matrix model to learn a social image-label relation model, wherein in order to better solve the problem of weak supervision label noise distribution, cauchy probability distribution fitting noise which is more robust to various noises is selected, so that a low-rank non-negative model (CDNL) based on noise cauchy distribution is established. Meanwhile, the Resnet50 is also utilized to extract deep visual features of the image, the three-layer CNN network is utilized to train the relation between the visual features of the image and the semantic relations of the labels, the Cauchy distribution model with different scales is established to simulate the label noise, and the Cauchy model with the best fitting label noise is selected, so that the difference between an ideal label matrix (i.e. a predicted label matrix finally output through CNN network prediction) and an observed label matrix (i.e. an original label matrix initially input into the CNN network) is reduced better, and the effect of the CDNL model is improved. The CDNL model provided by the method can optimize the labels of the social network images by learning the image-label relationship, complement the missing labels and delete the wrong labels. The new image may also be labeled. The method establishes a social network image annotation and re-annotation model based on the noise cauchy distribution, and the model main body is divided into three modules: the system comprises an image-label training module, an optimization function module and a label labeling/optimizing module. By fitting the noise distribution with the cauchy distribution, extracting the image features with the Resnet50 residual network, training the image-label relationship with the CNN network and pre-training to get better results.
Wherein, the function of the optimization function module includes:
low rank non-negative model (CDNL) based on Cauchy distribution of noise, modeling tag noise with Cauchy distribution based on low rank non-negative model, anddue to the sparsity of noise, the norm l is used 1 To optimize the noise matrix. The specific modeling process is as follows:
defining A as matrix, all methods and derivations of the embodiment are based on matrix, A ij Representing the (i, j) th element of matrix A, the kernel norm of matrix A is I A I * L of matrix A 1 Norms ofF-norm of matrix A is +.>
For the image labeling problem, the image data set comprises n images and m user labels (tags), and the corresponding tag of each image can be known according to the image data set, so that an original tag matrix F is obtained. Each image corresponds to a plurality of labels to form a binary matrix F, for example, the image data set comprises 2000 pictures and 400 labels, each picture cannot be provided with all the labels, and for the image i and the label j, i is provided with the label j, the image i corresponds to F in the observation label matrix F ij 1, otherwise 0. The ideal tag matrix is defined as Y, which is the final desired output, in the same form as the observed tag matrix F, and the final ideal tag is obtained. Defining an image extraction feature matrix as X, wherein the feature matrix X is formed by a feature vector X corresponding to each image in an image dataset i The feature matrix X is the feature extracted by the Resnet50 of the data set image, and the feature vector X i Results are extracted for Resnet50, one 2048-dimensional vector for each graph.
Under the low rank framework, most important is optimizing a noise tag matrix E, an ideal tag matrix Y and a loss function corresponding to tag prediction, wherein the noise tag matrix E is defined for convenience of description, the noise tag matrix E is irrelevant to an updating process, only the ideal tag matrix Y, e=f-Y appears in updating, the loss function is an optimizing function, namely, a formula (2), and the calculating process of the loss function is as follows:
the observation tag matrix F is composed of an ideal tag matrix Y and a noise tag matrix E:
F=Y+E# (1)
in the formula (1) # is for explaining e=f-Y, e=original tag matrix F-ideal tag matrix Y, and S (E) =s (F-Y), which facilitates subsequent derivation.
According to the previous work, rank (Y), namely rank of Y, is used for measuring rank of matrix Y, S (E) (determined by different noise optimization targets) is used for measuring noise matrix E, S (E) is a default writing method and used for representing the loss of calculated noise label matrix E, and the smaller S (E) represents the smaller noise; loss (Y, W) g (X)) represents a loss function of the CNN network, which is used for measuring the loss predicted by the label and optimizing the CNN network; wherein W is g (X) is a CNN network prediction label, and finally, regularization term omega (theta) is introduced to form an optimization function of the model:
in the formula (2), lambda 1 ,λ 2 ,λ 3 Is a super parameter required by the model, and is adjusted according to different conditions in the experimental process.
For the optimization function described above, since Y is of low rank, rank (Y) = | Y|| * For optimization of S (E), however, it is assumed that the noise label matrix E conforms to the cauchy distribution:
in the formula (3), p is a probability representation of the Cauchy distribution, the formula (3) is to fit noise by using the Cauchy distribution, and the optimization is added in the formula (4). Equations (3), (4) are processes that introduce a cauchy distribution to fit to noise.
The optimization for S (E) is:
for deep learning modules (i.e. image-tag training modules, i.e. CNN network) loss adoptionAnd finally the optimization objective is:
optimizing the above-mentioned optimization problem, introducing two cofactors Y 1 And Y 2 To separate the ideal tag matrix Y and the separation optimization problem yields the following optimization problem:
s.t.Y 1 =Y,Y 2 =Y,Y 2 ≥0
and then according to the imprecise augmented Lagrangian method, corresponds to Y in the above formula 1 And Y 2 Introducing Z 1 And Z 2 The resulting augmented lagrangian equation is:
in expression (7), η represents an update intermediate variable and may be set at the start of update.
By solving the partial derivatives for the optimization function, the following is achieved:
obtaining Y, Y 1 ,Y 2 Is updated by the following formula:
in the definition of the above formulasoft(A ij ,δ,τ)sign(A ij )max(|A ij |-δ,τ),A=UΛV T (where Λ is the SVD decomposition of A).
The above derivation formulas (3) - (10) are all derived to fit noise according to the cauchy distribution, and obtain the updated result Y of the noise cauchy distribution model (see formula (8)), that is, obtain the more ideal label Y according to the derivation formula (cauchy model).
The above process provides an optimization function, equation (2), and the equation (5) is obtained by using the imprecise augmented lagrangian method to optimize the function. The above process is the optimization process of formula (2), the above deduction is to explore the label correlation, and the better and more ideal label matrix Y can be obtained by introducing the Cauchy distribution fitting noise into the label correlation, so that the learning and training of the following CNN network are facilitated; equations (8) - (10) are updates Y, Y 1 And Y 2 The ideal label Y obtained by updating is updated, so that on one hand, the correct label related to the current image is obtained, on the other hand, the better label matrix Y is beneficial to updating CNN, and a more robust CNN model can better label new images and correct old images. The predictive label matrix obtained by each CNN network is subjected to the optimization process to obtain an ideal label matrix Y and a factor momentArray Y 1 ,Y 2 (for the next update) and send the ideal label matrix Y into the CNN image-label relationship model again for training to obtain a more ideal visual feature and semantic relationship model.
The entire cauchy model is derived based on equation (5) and divided into two parts: the label correlation is researched, a more ideal label is obtained based on optimization function updating, namely, an ideal label matrix is obtained by using Cauchy distribution fitting noise, the other part is image-label relation research, the image-label relation is learned by using a CNN network, the two parts are updated and mutually dependent, and a CNN model and a Y which are finally output are obtained. The subsequent label labeling optimization module relies on updating the converged CNN model.
The algorithm flow of the low-rank non-negative model (CDNL) based on the noise Cauchy distribution comprises the following steps:
input: deep visual features X and an observation tag matrix F.
Pre-training: inputting X and F into a CNN network, setting a Sigmoid function as an activation function, a two-class cross entropy function (Binary Cross Entropy) as a loss function, a learning rate of 0.0001, an exponential decay rate of 0.9,0.999, and training the CNN in advance.
Preparation: loading pre-trained CNN, randomly initializing Y normally 1 ,Y 2 ,Z 1 ,Z 2
Training, repeat:
η=0.1,ρ=1.1;
updating Y, Y according to formulas (8) (9) (10) 1 ,Y 2
Z 1 =Z 1 +η(Y-Y 1 );
Z 2 =Z 2 +η(Y-Y 2 );
η=η*ρ;
CNN model update W g (X) (parameters are as above);
the Until model converges.
And (3) outputting: and (5) an ideal label matrix Y, a model parameter W and evaluation data.
The method comprises the following steps: and re-labeling the training set by using the trained model, and labeling new pictures in the testing set.
The whole flow can be summarized as follows: according to an optimisation function (wherein the parameters requiring iterative updating are only Y, W g (X), F is known, and Ω (θ) is a regularization term. Solving the partial derivative of equation (5) to obtain an updated equation for Y, in principle W g The (X) update procedure can also be derived, but here a deep learning CNN network is introduced to update W g (X). Therefore, the whole algorithm framework is divided into two parts, one part is used for researching the label correlation, the other part is used for updating Y to obtain a more ideal label, namely, the ideal label matrix Y is obtained by using the Cauchy distribution fitting noise, and the other part is used for researching the image-label relationship, and the image-label relationship is learned by using a CNN network. And W is g (X) and Y update procedures are interdependent, equation (8) is an update equation for Y, W g (X) is a formula input, the remaining variables are process variables; w (W) g (X) updating the CNN-dependent network, wherein the input is an ideal label matrix Y and the image characteristic X, and the output is W g (X). The two parts are iterated until the model converges, and finally the CNN model and Y are the outputs.
In this embodiment, three indexes, namely MicroAUC, macroAUC and average accuracy average (mAP), are adopted to evaluate the model, specifically:
the CDNL model is compared with the existing method, including a model DMF, MPMF, LSCCA, TCCA, DNMF, WDMF, WDNL, DCE and the like, the CDNL model provided by the embodiment is evaluated from two aspects of image re-labeling and new image labeling, and the effect of the CDNL model on MIRFlickr is significantly higher than that of the existing main stream method, and the effect of the CDNL model on NUS-WIDE is improved and is only slightly worse than that of DCE. The model proposed in this embodiment has significant improvement in both indexes, and has significant advantages over other methods in MIRFlickr, and advances in NUS-WIDE. The fit of the cauchy distribution to the noise can effectively improve the accuracy of the CDNL, although the effect is different on the two data sets, the model effect can still be improved well due to the robustness of the cauchy distribution to most of the noise. Extraction of deep semantic content of the image by the Resnet50 can obtain better visual characteristics, and the CNN network has great effect on training of the image-label relation model. The CDNL model comprehensively improves the accuracy of image annotation from the three aspects.
The present embodiment also performs a correlation comparison for a specific form of the coxib noise distribution. In the cauchy distribution, the scale parameter b represents the width of half of the maximum value in the distribution and represents the state of noise distribution, and by adjusting the b value to analyze the distribution characteristics and the state of noise in the data sets, experiments on two data sets can know that MIRFlickr is more fit to the cauchy distribution with the scale parameter b of 0.8, and NUS-WIDE is more fit to the cauchy distribution with the scale parameter of 0.6. Whether the fitting of noise is optimal or not can well improve the accuracy of the model.
In order to solve the labeling problem of the labels of the weakly supervised social network images, the labeling method of the multi-label social network images disclosed by the embodiment provides a low-rank non-negative model (CDNL model) based on the Cauchy distribution of noise. The input of the whole model is a binary matrix F formed by a plurality of labels corresponding to each image, namely an observation label matrix and an image characteristic matrix X, and the output is a CNN model and an ideal label matrix Y. For the image tag labeling problem, it is most important to reveal the inherent association of visual content and semantic tags. For social networking images, users typically provide some tagged images. The semantic space formed by these labels can be approximated by an explicit subset of labels in the real label space, and the user tends to select semantically related labels to label the image. Thus, the tag-image correlation matrix of social network images is inherently low rank. In addition, errors are not common in the labeling process, so the noise matrix is sparse. This is quite reasonable. Therefore, the embodiment adopts a low-rank non-negative model to solve the problem of social image annotation. The low-rank non-negative model mainly solves the problem of label correlation, namely, establishing a link between labels (if animal labels frequently occur when cat labels occur, then the two label relationships are tight), deleting the wrong labels as far as possible according to the label correlation, and adding the correct labels. For social networking images, users typically provide some tagged images. The semantic space formed by these labels can be approximated by an explicit subset of labels in the real label space, and the user tends to select semantically related labels to label the image. Thus, the tag-image correlation matrix of the social network image is inherently low rank. In addition, errors are not common in the labeling process, so the noise matrix is sparse. And the low-rank non-negative matrix factorization method can effectively establish the relationship between different categories. Meanwhile, in order to better solve the problem of tag noise distribution, the embodiment fits the noise by selecting a cauchy probability distribution which is more robust to various noises, thereby establishing a low-rank non-negative model (CDNL) based on the cauchy distribution of the noise. Since visual features of images are critical to image-tag learning, this embodiment selects Resnet50 to extract deep visual features. In addition, although there are many irrelevant and erroneous labels in the observation label matrix, the visual features of the image and the labels are still closely related, so the present embodiment adopts the CNN framework to learn the mapping relationship between the image and the labels. Based on the technical means, the labeling method of the multi-label social network image disclosed by the embodiment can find the correct label among a plurality of incorrect and inaccurate labels, and is connected with the visual characteristics of the image, a model is built, and the fact that the corresponding correct label can be obtained by inputting the picture next time is ensured.
FIG. 3 is a block diagram of an embodiment of a tagging system for multi-tag social networking images of the present invention. Referring to fig. 3, the labeling system of the multi-label social network image includes:
an image dataset acquisition module 301 for acquiring an image dataset; the image dataset comprises a plurality of images and labels corresponding to the images.
A first tag matrix acquisition module 302, configured to obtain a first tag matrix according to the image dataset.
The specific formula of the first tag matrix acquisition module 302 is:
wherein F represents a first tag momentAn array; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
And the second tag matrix obtaining module 303 is configured to perform denoising processing on the first tag matrix by using the cauchy distribution, so as to obtain a second tag matrix.
The second tag matrix acquisition module 303 specifically includes:
the noise distribution situation obtaining unit is used for obtaining the distribution situation of each noise in the first tag matrix by using the cauchy distribution under the assumption that the noise in the first tag matrix accords with the cauchy distribution.
And the second tag matrix acquisition unit is used for denoising the first tag matrix according to the distribution condition of each noise to obtain a second tag matrix.
The image extraction feature matrix obtaining module 304 is configured to input the image dataset into a Resnet50 residual network to obtain an image extraction feature matrix; the image extraction feature matrix comprises 2048-dimensional feature vectors corresponding to each image in the image dataset.
The training and optimizing module 305 is configured to input the second label matrix and the image extraction feature matrix into a CNN network for training and optimizing, so as to obtain an optimized CNN network.
The training and optimization module 305 specifically includes:
and the actual output tag matrix acquisition module is used for inputting the second tag matrix and the image extraction feature matrix into a CNN network to obtain a tag matrix actually output by the CNN network.
And the judging module is used for judging whether the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value.
And the output module is used for outputting the optimized CNN network when the output result of the judging module is that the error between the label matrix actually output by the CNN network and the second label matrix is smaller than a set value.
And the third tag matrix acquisition module is used for denoising the tag matrix actually output by the CNN network by using the Cauchy distribution when the output result of the judgment module is that the error between the tag matrix actually output by the CNN network and the second tag matrix is larger than or equal to a set value, so as to obtain the third tag matrix.
And the circulation module is used for updating the second tag matrix by using the third tag matrix and returning to the actual output tag matrix acquisition module.
The feature vector to be annotated determining module 306 is configured to determine feature vectors to be annotated; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
The feature vector to be annotated determination module 306 specifically includes:
the feature vector to be annotated determining unit is used for inputting the image to be annotated into the Resnet50 residual network to obtain the feature vector to be annotated; the feature vector to be marked is 2048-dimensional feature vector corresponding to the image to be marked.
And the image label to be marked obtaining module 307 is configured to input the feature vector to be marked into the optimized CNN network to obtain a label of the image to be marked.
The invention discloses a labeling method and a labeling system for a multi-label social network image, and provides a weak supervision non-negative low-rank depth model CDNL based on Cauchy noise distribution to solve the problems of false label re-labeling and new image labeling. The non-negative low-rank model is mainly used for optimizing an ideal label and suppressing noise, and the optimization target is separated by introducing two potential factors. The cauchy distribution can better fit the tag noise, and the difference between an ideal tag matrix and an observed tag matrix is reduced by optimizing a loss function. In addition, in order to better learn the internal mapping relation between the image and the ideal label, a Resnet50 network is selected to extract deep image features, and a CNN network is selected to learn the internal mapping of the feature matrix and the label matrix. For the problem of weak supervision of social image multi-label classification, the invention firstly provides a low-rank non-negative model (CDNL) based on the Cauchy distribution of noise to solve the problem, and introduces the Cauchy distribution into the low-rank non-negative model for the first time to model the noise, so that the scheme of using Gaussian distribution to realize noise modeling in the prior art is replaced.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (4)

1. A method for labeling a multi-label social network image, the method comprising:
step S1: acquiring an image dataset; the image data set comprises a plurality of images and labels corresponding to the images;
step S2: obtaining an observation tag matrix F according to the image dataset; the observation tag matrix F consists of an ideal tag matrix Y and a noise tag matrix E; when the noise tag matrix E satisfies the Kexil distributionThe loss function of the noise tag matrix E satisfies +.>Wherein p is a probability representation of the Cauchy distribution, E i,j Element (i, j) representing the noise tag matrix E, m representing the total number of tags in the image dataset, n representing the total number of images in the image datasetThe number i represents the image number, j represents the label number, F ij Represents the (i, j) th element, Y, of the observation tag matrix F i,j Representing the (i) th of the matrix Y, j) an element of the formula, I 1 Representation l 1 A norm;
step S3: training the ideal tag matrix Y by using a depth residual error network, and obtaining a low-rank non-negative model based on the characteristics of the tagsThe loss function of the depth residual network is +.>Where W is a model parameter, rank (Y) is the rank of Y, loss (Y, W g (X)) represents a loss function of the CNN network, W g (X) is a CNN network prediction label, loss (X, y) represents a loss function of the depth residual network;
step S4: according to And->Obtaining a low-rank non-negative model based on Cauchy distribution The specific calculation formula is->Wherein lambda is 1 ,λ 2 ,λ 3 Is a model-required hyper-parameter, S (E) represents the loss of the noise label matrix E, and Ω (θ) represents regularizationThe term "is used to refer to, Y * Represents the nuclear norm of Y, ++>Representing the F norm;
step S5: for a pair ofSolving to obtain an updated formula of Y->Wherein Z is 1 And Z 2 All represent a constant matrix, corresponding to the constant in the equation, η represents an updated intermediate variable, Y 1 And Y 2 Each representing a factor matrix for the next update;
step S6: acquiring an original image matrix X;
step S7: inputting the original image matrix X and the observation tag matrix F into a CNN network for pre-training to obtain an updated model of the CNN network; the CNN network is W g W in (X);
step S8: random initializationIn (2) by substituting X intoSolving Y;
step S9: substituting Y for the observation tag matrix F, substituting Y into the CNN network for training to obtain an updated model of the CNN network;
step S10: judgingWhether the value change of (2) is within a set range;
if not, returning to the step S7;
if yes, step S11 is executed: stopping training to obtain an ideal label matrix Y, a trained model and an image momentAn array X; at this time Y, E =y-W g (X) and X are known;
step S12: inputting an image matrix X into a model W g In (X), obtaining an ideal label matrix Y; wherein the noise matrix E is obtained by converting E into Y-W g (X) solving, inIs used for updating Y;
step S13: and marking the characteristics by using the ideal label matrix Y.
2. The labeling method of the multi-label social network image according to claim 1, wherein the specific formula of obtaining the observation label matrix according to the image dataset is:
wherein F represents an observation tag matrix; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
3. A labeling system for a multi-labeled social networking image, comprising:
an image dataset acquisition module for acquiring an image dataset; the image data set comprises a plurality of images and labels corresponding to the images;
the observation tag matrix acquisition module is used for acquiring an observation tag matrix F according to the image dataset; the observation tag matrix F consists of an ideal tag matrix Y and a noise tag matrix E; when the noise tag matrix E satisfies the Kexil distributionThe loss function of the noise tag matrix E satisfies +.>Wherein p is a probability representation of the Cauchy distribution, E i,j Element (i, j) representing the noise tag matrix E, m representing the total number of tags in the image dataset, n representing the total number of images in the image dataset, i representing the image sequence number, j representing the tag sequence number, F ij Represents the (i, j) th element, Y, of the observation tag matrix F i,j Representing the (i) th of the matrix Y, j) an element of the formula, I 1 Representation l 1 A norm;
an ideal tag matrix training module for training the ideal tag matrix Y by using a depth residual error network and obtaining a low-rank non-negative model based on the characteristics of the tagsThe loss function of the depth residual error network is thatWhere W is a model parameter, rank (Y) is the rank of Y, loss (Y, W g (X)) represents a loss function of the CNN network, W g (X) is a CNN network prediction label, loss (X, y) represents a loss function of the depth residual network;
low-rank non-negative model obtaining module based on Cauchy distribution for obtaining model based on Cauchy distribution And->Obtaining a low-rank non-negative model based on Cauchy distribution> The specific calculation formula is thatWherein lambda is 1 ,λ 2 ,λ 3 Is a super parameter required by the model, S (E) represents the loss of the noise label matrix E, Ω (θ) represents a regularization term, Y * Represents the nuclear norm of Y, ++>Representing the F norm;
low-rank non-negative model solving module based on Cauchy distribution and used for solvingSolving to obtain an updated formula of YWherein Z is 1 And Z 2 All represent a constant matrix, corresponding to the constant in the equation, η represents an updated intermediate variable, Y 1 And Y 2 Each representing a factor matrix for the next update;
the original image matrix acquisition module is used for acquiring an original image matrix X;
the CNN network pre-training module is used for inputting the original image matrix X and the observation tag matrix F into a CNN network for pre-training to obtain an updated model of the CNN network; the CNN network is W g W in (X);
y solving module for random initializationIs substituted into +.>Solving Y;
the CNN network updating model obtaining module is used for replacing the observation tag matrix F with Y, substituting Y into the CNN network for training to obtain an updating model of the CNN network;
a judging module for judgingWhether the value change of (2) is within a set range;
if not, returning to the CNN network pre-training module;
if yes, executing a training stopping module for stopping training to obtain an ideal label matrix Y, a trained model and an image matrix X; at this time Y, E =y-W g (X) and X are known;
an ideal label matrix Y obtaining module for inputting the image matrix X into the model W g In (X), obtaining an ideal label matrix Y; wherein the noise matrix E is obtained by converting E into Y-W g (X) solving, inIs used for updating Y;
and the characteristic labeling module is used for labeling the characteristics by utilizing the ideal label matrix Y.
4. The labeling system of the multi-tag social network image according to claim 3, wherein the specific formula of the observation tag matrix acquisition module is:
wherein F represents an observation tag matrix; f (F) ij Representing whether any image i of the n images has any tag j of the m tags; if the image i has the label j, the image i corresponds to F ij =1, if image i does not possess tag j, F ij =0。
CN202011045407.8A 2020-09-29 2020-09-29 Labeling method and system for multi-label social network image Active CN112182274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011045407.8A CN112182274B (en) 2020-09-29 2020-09-29 Labeling method and system for multi-label social network image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011045407.8A CN112182274B (en) 2020-09-29 2020-09-29 Labeling method and system for multi-label social network image

Publications (2)

Publication Number Publication Date
CN112182274A CN112182274A (en) 2021-01-05
CN112182274B true CN112182274B (en) 2023-12-01

Family

ID=73946470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011045407.8A Active CN112182274B (en) 2020-09-29 2020-09-29 Labeling method and system for multi-label social network image

Country Status (1)

Country Link
CN (1) CN112182274B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590167A (en) * 2017-01-18 2018-01-16 南京邮电大学 A kind of extensive social Automatic image annotation algorithm based on conclusion type matrix completion
CN110069644A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of compression domain large-scale image search method based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590167A (en) * 2017-01-18 2018-01-16 南京邮电大学 A kind of extensive social Automatic image annotation algorithm based on conclusion type matrix completion
CN110069644A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of compression domain large-scale image search method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种去除图像中Cauchy噪声的滤波算法;汪晓艳等;《数学物理学报》;20180430;第38卷(第4期);第823-832页 *

Also Published As

Publication number Publication date
CN112182274A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN110309331B (en) Cross-modal deep hash retrieval method based on self-supervision
CN106547880B (en) Multi-dimensional geographic scene identification method fusing geographic area knowledge
CN110119753B (en) Lithology recognition method by reconstructed texture
CN114239560B (en) Three-dimensional image classification method, apparatus, device, and computer-readable storage medium
CN107871014A (en) A kind of big data cross-module state search method and system based on depth integration Hash
CN113821670B (en) Image retrieval method, device, equipment and computer readable storage medium
CN111612100B (en) Object re-identification method, device, storage medium and computer equipment
CN111125406A (en) Visual relation detection method based on self-adaptive cluster learning
CN113761259A (en) Image processing method and device and computer equipment
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
Su et al. Clustering and recognition of spatiotemporal features through interpretable embedding of sequence to sequence recurrent neural networks
CN113553975B (en) Pedestrian re-identification method, system, equipment and medium based on sample pair relation distillation
Lonij et al. Open-world visual recognition using knowledge graphs
Hong et al. Graph-induced aligned learning on subspaces for hyperspectral and multispectral data
CN113762005B (en) Feature selection model training and object classification methods, devices, equipment and media
CN111506832B (en) Heterogeneous object completion method based on block matrix completion
CN113569081A (en) Image recognition method, device, equipment and storage medium
CN117475266A (en) Robot vision perception method and device based on multi-expert attention fusion
CN117171746A (en) Malicious code homology analysis method and device, electronic equipment and storage medium
CN112182274B (en) Labeling method and system for multi-label social network image
CN111582449A (en) Training method, device, equipment and storage medium for target domain detection network
CN116630694A (en) Target classification method and system for partial multi-label images and electronic equipment
Cong et al. Machine vision-based estimation of body size and weight of pearl gentian grouper
Stork Knowledge extraction from archives of natural history collections
Blount et al. Comparison of two individual identification algorithms for snow leopards after automated detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant