CN104036021A - Method for semantically annotating images on basis of hybrid generative and discriminative learning models - Google Patents

Method for semantically annotating images on basis of hybrid generative and discriminative learning models Download PDF

Info

Publication number
CN104036021A
CN104036021A CN201410295467.3A CN201410295467A CN104036021A CN 104036021 A CN104036021 A CN 104036021A CN 201410295467 A CN201410295467 A CN 201410295467A CN 104036021 A CN104036021 A CN 104036021A
Authority
CN
China
Prior art keywords
semantic
image
value
vector
test pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410295467.3A
Other languages
Chinese (zh)
Inventor
李志欣
张灿龙
吴璟莉
王金艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN201410295467.3A priority Critical patent/CN104036021A/en
Publication of CN104036021A publication Critical patent/CN104036021A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for semantically annotating images on the basis of hybrid generative and discriminative learning models. The method includes generatively building models of the images by means of continuous PLSA (probabilistic latent semantic analysis) at generative learning stages, acquiring corresponding model parameters and subject distribution of each image, and utilizing the corresponding subject distribution as an intermediate representation vector of each image; constructing cluster classifier chains to discriminatively learn from the intermediate representation vectors of the images at discriminative learning stages, creating the classifier chains and integrating contextual information among annotation keywords; automatically extracting visual features of each given unknown image at annotation stages, acquiring representation of subject vectors of the given unknown images by the aid of a continuous PLSA parameter estimation algorithm, classifying the subject vectors by the aid of trained cluster classifier chains and semantically annotating the images by a plurality of semantic keywords with the highest confidence. The method has the advantage that the annotation and retrieval performance of the method are superior to the annotation and retrieval performance of most current typical methods for automatically annotating images.

Description

Mix the linguistic indexing of pictures method of production and discriminant learning model
Technical field
The present invention relates to field of image search, be specifically related to a kind of linguistic indexing of pictures method of mixing production and discriminant learning model.
Background technology
According to the feature of used machine learning method, existing image automatic annotation method is broadly divided into mask method based on production model (generative model) and the mask method based on discriminative model (discriminative model).
The feature of the mask method based on production model is: first learn the joint probability of characteristics of image and keyword, and the posterior probability of each keyword while then calculating given characteristics of image by Bayesian formula, and carry out image labeling according to posterior probability.These class methods have extendible training process, lower to the quality requirements of the artificial mark of training plan image set.
The feature of the mask method based on discriminative model is: suppose that graphic feature is certain parameterized function to the mapping between keyword, directly in the parameter of this function of training data learning, and obtain the sorter of each semantic concept.Each semantic concept is considered as independently classification by these class methods, in general can obtain higher mark precision, but be not easy to utilize the priori of domain-specific.
The probability graph model of the method based on production model and the method based on discriminative model is respectively as (a) in Fig. 1 with (b), the two is compared and mainly contains following some difference: (1) method based on discriminative model is regarded image as training data, each semantic concept is regarded classification as, object is to sort images in each semantic classes, and image and text are all considered as training data by method based on production model, its objective is associated between study image and text; (2) method based on discriminative model is a sorter of each semantic concept training, and the method based on production model is only learnt a correlation model and this model is applied to all semantic concepts; (3) independence assumption difference.It is separate that method based on discriminative model is supposed between each semantic classes, and method based on production model supposes under the condition of given hidden variable, visual element and text element be condition independently.
In sum, production model and discriminative model respectively have its advantage and defect.
Summary of the invention
The present invention is directed to " semantic gap " problem that exists in image retrieval and the defect of production model and discriminative model, a kind of linguistic indexing of pictures method of mixing production and discriminant learning model is provided, it proposes to mix the automatic image annotation model HGDM (hybrid generative/discriminative model) of production and discriminant study on the basis of continuous probability latent semantic analysis and Multi-label learning, and has further realized the Semantic Image Retrieval based on keyword.
For addressing the above problem, the present invention is achieved by the following technical solutions:
The linguistic indexing of pictures method of mixing production and discriminant learning model, comprises the steps:
(1) process of training image being trained,
(1.1) adopt the visual signature of continuous probability latent semantic analysis (PLSA) Method Modeling training image, obtain given theme z kunder Gaussian Distribution Parameters μ kand Σ k, and the theme vector P (z of every width training image k/ d i);
(1.2) utilize the theme vector P (z of every width training image k/ d i) and original semantic tagger, adopt Multi-label learning method construct classifier chains;
(2) process test pattern being marked,
(2.1) the Gaussian Distribution Parameters μ that utilizes step (1.1) to obtain kand Σ k, and the visual signature of test pattern, adopt expectation maximization (Expectation Maximization, EM) method to calculate the theme vector P (z of every width test pattern k/ d new);
(2.2) classifier chains of utilizing step (1.2) to obtain, to this theme vector P (z k/ d new) carry out the semantic classification of test pattern;
(2.3) semantic tagger using an X the highest degree of confidence semantic classes as this test pattern; Wherein parameter X is artificial preset value.
Step (1.2) is the construction process of classifier chains, the training process that is classifier chains is specially: according to the flag sequence of specifying, the two-value sorter of the associated semantic key words mark of each circulation study, and each circulation all will add semantic key words label information corresponding to two-value sorter of having learnt, and constructs thus a two-value classifier chains; Wherein each two-value sorter C in this two-value classifier chains jbe responsible for and semantic key words mark l jrelevant study and prediction.Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
Step (2.2) is semantic classification process, and the assorting process of classifier chains is specially: by the two-value classifier chains of constructing in sorter training process, from two-value sorter C 1start constantly back-propagation, wherein two-value sorter C 1determine semantic key words mark l 1classification results Pr (l 1| x); Again by this classification results Pr (l 1| x) join in the theme vector of test pattern in the mode of two-value, by that analogy, follow-up two-value sorter C jdetermine mark l jclassification results Pr (l j| x, l 1, l 2..., l j-1), the theme vector that x is training image.Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
In step (1.1) and (2.1), also further comprise the process of training image and test pattern being carried out to Visual Feature Retrieval Process,
First, every width image is divided into (m × n) individual regular square;
Then, for each square extracts the proper vector that (a+b) ties up, the textural characteristics of the color characteristic that the proper vector of this (a+b) dimension comprises a dimension and b dimension; Wherein color characteristic is the color auto-correlogram calculating on quantized color and city block distance, and textural characteristics is Jia Bai (Gabor) energy coefficient calculating in yardstick and direction;
Finally, the visual signature of every width image is (m × n) set of the visual feature vector of individual (a+b) dimension;
Wherein parameter m, n, a and b are artificial preset value.
Compared with prior art, the present invention is integrated production model and discriminative model in learning process, study for input picture visual signature adopts production model, and adopts discriminative model for the semantic learning process of image, thereby has following feature:
(1) at production learning phase, adopt continuous P LSA Direct Modeling Image Visual Feature, do not need to carry out the quantizing process of visual signature, thereby can not lose important visual information.
(2) theme vector that continuous P LSA is transformed to a K dimension by image from the expression of characteristic set represents, also can be considered as the process of a dimensionality reduction.And the expression of this theme vector also integrated the implicit semantic information of image vision content, be significant for the semantic retrieval of image.
(3) build sorter based on Multi-label learning method, the association when image is classified between integrated image labeling keyword.Can be good at solving weak mark problem, training set scale is had to extensibility.
(4) adopt discriminative model cluster classification device chain to carry out the semantic classification of image, wherein each two-value sorter builds based on support vector machine (SVM), so operational efficiency and nicety of grading are all higher.
Brief description of the drawings
Fig. 1 is that the probability graph model of two class image automatic annotation methods represents: be (a) method based on discriminative model; (b) be the method based on production model.
Fig. 2 is the automatic image annotation framework that mixes production and discriminative model.
Embodiment
A kind of image automatic annotation method that mixes production and discriminative model.At production learning phase, adopt continuous P LSA to carry out production modeling to image, the priori of training set can be made full use of, and the theme distribution of corresponding model parameter and every width image can be obtained; This theme is distributed as the intermediate representation vector of every width image, the problem of automatic image annotation is just converted into a classification problem based on Multi-label learning so, to obtain the mark precision higher than production model again.At discriminant learning phase, use the method for structure cluster classification device chain to carry out discriminant study to the intermediate representation vector of image, contextual information in setting up classifier chains between also integrated mark keyword, when image is classified, also consider like this association between image labeling, thereby can obtain higher mark precision and better retrieval effectiveness.In the mark stage, a given width unknown images, can obtain the expression of its theme vector by the parameter estimation algorithm of automatic extraction visual signature and continuous P LSA; Re-using the cluster classification device chain training classifies to this theme vector; Finally, the semantic tagger using some semantic key words the highest degree of confidence as image.
Mix production and the study of discriminative model (HGDM) and the framework of mark as shown in Figure 2.
Be divided into two steps for the training process of training image: first, utilize the visual signature of continuous P LSA modeling training image, obtain given theme z kunder Gaussian Distribution Parameters μ kand Σ k, and the distribution of the theme of every width training image represents P (z k/ d i), this is the learning process of a production.Here the Gaussian Distribution Parameters μ obtaining kand Σ kthe parameter of continuous P LSA, still effective for the image outside training set by these parameters of independence assumption of continuous P LSA, and theme distribution represents P (z k/ d i) only corresponding to the character of every width training image itself, can not bring prior imformation to test pattern.But, can utilize this theme vector (K is potential theme number) that represents every width training image to be expressed as a K dimension, the space that these vectors form is a simple form (simplex).Then, utilize the theme vector of every width training image to represent and their original mark structural classification device, each class is corresponding to the semantic classes in text vocabulary, and this is the learning process of a discriminant.Because now every width image is all represented by a theme vector, but corresponding to multiple keyword marks, consistent with the situation of Multi-label learning, so adopted the method construct multicategory classification device of Multi-label learning, the related information between simultaneously also integrated keyword.
Also be divided into similar two steps for the mark process of test pattern: (1) first, utilizes the model parameter μ that the training stage obtains kand Σ kand the visual signature of test pattern, use expectation maximization (Expectation Maximization, EM) algorithm to calculate the theme vector P (z of every width test pattern k/ d new).(2) then, utilize the sorter that obtains of training to this theme vector classification, and semantic tagger using semantic classes the highest some degree of confidence of gained as this test pattern.
First Visual Feature Retrieval Process method of the present invention is divided into every width image of data centralization regular square (square size is by verifying that collection is defined as 16 × 16), then be the proper vector that each square extracts one 36 dimension, the textural characteristics of the color characteristic that comprises 24 dimensions and 12 dimensions, color characteristic is the color auto-correlogram calculating on 8 quantized colors and 3 city block distances, and textural characteristics is the Gabor energy coefficient calculating in 3 yardsticks and 4 directions.So, each square can be expressed as the proper vector of one 36 dimension, and every width image just can be expressed as one " feature bag ", the namely set of the visual feature vector of several 36 dimensions, thus provide accordant interface for further using topic model to carry out modeling.
At production learning phase, the theme number setting of continuous P LSA is very important, because the number that is the theme has determined the dimension of the intermediate representation of image.The excessive efficiency that can reduce system of this number, the too small image information of can losing.Because the matching of continuous P LSA is more time-consuming, the present invention has chosen five theme numbers (being respectively 90,120,150,180 and 210) and has tested, experimental result shows, in the time that theme number is 180, system performance, than using other theme number fashions, determines that the theme number using is 180 so final.
At discriminant learning phase, HGDM adopts the method for the cluster classification device chain in Multi-label learning method to carry out multiple labeling classification, and wherein each two-value sorter uses support vector machine (SVM) to realize.This method can be considered interrelated between multiple labeling and have acceptable computation complexity.
Classifier chains (classifier chain, CC) relevant to two-value (binary relevance, BR) method is the same, comprises | and L| two-value sorter, each sorter is processed the two-value relevant issues of a mark.But different from BR method, these two-value sorters all couple together by a chain, wherein the feature space of each node is relevant with the class mark of node above.
The training process of classifier chains is as shown in table 1, and training sample is expressed as (x, S) here; Wherein S is several semantic key words set of training image mark, l is all semantic key words set; Element in S can be semantic key words mark l with binary set j(l 1, l 2..., l / L/) represent, x is theme vector.In algorithm, according to the flag sequence of specifying, the two-value sorter of the associated mark of each circulation study, the more important thing is, each cycle specificity space all will add label information corresponding to two-value sorter of having learnt, thereby characteristic information is constantly enhanced, last, can construct a two-value classifier chains, each the sorter C in this two-value classifier chains jbe responsible for and mark l jrelevant study and prediction, a two-value sorter is responsible for a semantic key words.j=1,2,……|L|。
The assorting process of classifier chains is as shown in table 2, from two-value sorter C 1start, then back-propagation constantly.Two-value sorter C 1determine mark l 1classification results Pr (l 1| x), then this result is added to the feature of test sample book in the mode of two-value, follow-up sorter is determined mark l jclassification results Pr (l j| x, l 1, l 2..., l j-1).
Use the method for chain can between sorter, transmit label information, consider the related information between mark simultaneously, thereby can overcome the mark question of independence in BR method.And classifier chains still keeps the advantage of BR method, comprise that storage demand is low and operational efficiency is high.
Although on average will increase for each example | the characteristic amount of L|/2 dimensions, due in practice | L| is generally a limited value, thereby the computation complexity problem being caused by this reason is almost negligible.The computation complexity of classifier chains and BR method are very approaching, depend on the number of mark and the complexity of basic two-value sorter.The complexity of BR method is O (| L| × f (| X|, | D|)), and wherein f (| X|, | D|) is the complexity of basic two-value sorter.The complexity of classifier chains is O (| L| × f (| X|+|L|, | D|)), namely many | L| ties up additional eigenwert.And HGDM adopt SVM as basic two-value sorter, so the complexity of classifier chains can be reduced to O (| L| × | X| × | D|+|L| × | L| × | D|).Can see, as long as | L|<|X|, Section 1 will play a major role.Like this computation complexity of classifier chains be O (| L| × | X| × | D|), identical with the computation complexity of BR method.And only have when | when L|>|X|, the computation complexity of classifier chains just can be higher than BR method.
In addition, although the process of chain type means that classifier chains can not parallelization, it can serialization, namely at any time in internal memory, only need to retain a two-value sorter, and this is an obvious advantage for contrast method for distinguishing.
The order of classifier chains obviously can affect its precision.Although there are some heuritic approaches to determine the order of chain, we still adopt concentrating type framework to solve this problem.Adopt concentrating type method can improve overall precision, avoid over-fitting, also can realize parallelization.Here said cluster refers to the cluster of multiple labeling method, namely the cluster of classifier chains.
M classifier chains C of cluster classification device chain training 1, C 2..., C m, wherein each classifier chains is obtained by a random subset training of a random chain sequence and training set.Therefore each MODEL C kbe all mutually different and can provide different multiple labeling classification results.These classification results are made to read group total according to mark, and each mark can obtain some ballots so.Use a threshold value to select the highest mark of poll can form a multiple labeling set, and using this as final classification results.
If predicting the outcome as vectorial y of k independent model k=(l 1, l 2..., l / L/) ∈ { 0,1} | L|.Can obtain vectorial W=(λ to all model summations 1, λ 2..., | L|) ∈ R | L|, here .Therefore each j ∈ W has represented mark l jthe result of ∈ L ballot.Vectorial W is obtained to W do normalization norm, just can obtain a distribution on each being marked at [0,1].After finishing the judgement of threshold value, can distribute and do a sequence according to this.With other automatic image annotation model class seemingly, HGDM gets the semantic tagger of front 5 keyword tag that degree of confidence is the highest as image.
In Corel5k image data base, a cluster that comprises 90 classifier chains of this method structure, what each classifier chains was random choose a subset that comprises 500 width images trains.And while testing, use the cluster of 150 classifier chains on data set IAPR-TC12 and MIRFLICKR25000, what each classifier chains was random choose a subset that comprises 1000 width images trains.In addition, the two-value sorter of each node representative in classifier chains uses LIBSVM software package to realize, select RBF kernel function K (x, x')=exp (γ || x-x'||2), corresponding parameter is defined as by grid search method: (C, γ)=(27,21), wherein C is error penalty factor, and γ is kernel functional parameter.
By reasonable design learning framework, the linguistic indexing of pictures method of mixing production and discriminant learning model effectively combines the learning method of production and discriminative model and has inherited their advantages separately, has obtained better performance.Experimental result shows, the linguistic indexing of pictures method of mixing production and discriminant learning model had both possessed production model and can make full use of the advantage of training data, also can as discriminative model, can obtain higher nicety of grading, its mark and retrieval performance are better than current most of typical image automatic annotation method.

Claims (7)

1. the linguistic indexing of pictures method of mixing production and discriminant learning model, is characterized in that, comprises the steps:
(1) process of training image being trained,
(1.1) adopt the visual signature of continuous probability latent semantic analysis Method Modeling training image, obtain given theme z kunder Gaussian Distribution Parameters μ kand Σ k, and the theme vector P (z of every width training image k/ d i);
(1.2) utilize the theme vector P (z of every width training image k/ d i) and original semantic tagger, adopt Multi-label learning method construct classifier chains;
(2) process test pattern being marked,
(2.1) the Gaussian Distribution Parameters μ that utilizes step (1.1) to obtain kand Σ k, and the visual signature of test pattern, adopt expectation maximization (Expectation Maximization, EM) method to calculate the theme vector P (z of every width test pattern k/ d new);
(2.2) classifier chains of utilizing step (1.2) to obtain, to this theme vector P (z k/ d new) carry out the semantic classification of test pattern;
(2.3) semantic tagger using an X the highest degree of confidence semantic classes as this test pattern; Wherein parameter X is artificial preset value.
2. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, it is characterized in that, step (1.2) is specially: according to the flag sequence of specifying, the two-value sorter of the associated semantic key words mark of each circulation study, and each circulation all will add semantic key words label information corresponding to two-value sorter of having learnt, and constructs thus a two-value classifier chains; Wherein each two-value sorter C in this two-value classifier chains jbe responsible for and semantic key words mark l jrelevant study and prediction; Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
3. the linguistic indexing of pictures method of mixing production according to claim 2 and discriminant learning model, is characterized in that, step (2.2) is specially: by the two-value classifier chains of step (1.2) structure, from two-value sorter C 1start constantly back-propagation, wherein two-value sorter C 1determine semantic key words mark l 1classification results Pr (l 1| x); Again by this classification results Pr (l 1| x) join in the theme vector of test pattern in the mode of two-value, by that analogy, follow-up two-value sorter C jdetermine mark l jclassification results Pr (l j| x, l 1, l 2..., l j-1), the theme vector that x is training image; Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
4. according to the linguistic indexing of pictures method of the mixing production described in any one in claim 1~3 and discriminant learning model, it is characterized in that, in step (1.1) and (2.1), also further comprise the process of training image and test pattern being carried out to Visual Feature Retrieval Process,
First, every width image is divided into (m × n) individual regular square;
Then, for each square extracts the proper vector that (a+b) ties up, the textural characteristics of the color characteristic that the proper vector of this (a+b) dimension comprises a dimension and b dimension; Wherein color characteristic is the color auto-correlogram calculating on quantized color and city block distance, and textural characteristics is Jia Bai (Gabor) energy coefficient calculating in yardstick and direction;
Finally, the visual signature of every width image is (m × n) set of the visual feature vector of individual (a+b) dimension;
Wherein parameter m, n, a and b are artificial preset value.
5. the linguistic indexing of pictures method of mixing production according to claim 4 and discriminant learning model, is characterized in that, parameter m and n are all made as 16, and parameter a is made as 24, and parameter b is made as 12; Be that every width image is all divided into 16 × 16 regular squares, each square extracts the proper vector of one 36 dimension, the textural characteristics of the color characteristic that the proper vector of these 36 dimensions comprises 24 dimensions and 12 dimensions.
6. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, is characterized in that, in step (1.1), in the time of continuous probability latent semantic analysis, set theme number is 180.
7. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, it is characterized in that, in step (2.3), parameter X is made as 5, the semantic tagger by the highest 5 semantic classess of degree of confidence as this test pattern.
CN201410295467.3A 2014-06-26 2014-06-26 Method for semantically annotating images on basis of hybrid generative and discriminative learning models Pending CN104036021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410295467.3A CN104036021A (en) 2014-06-26 2014-06-26 Method for semantically annotating images on basis of hybrid generative and discriminative learning models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410295467.3A CN104036021A (en) 2014-06-26 2014-06-26 Method for semantically annotating images on basis of hybrid generative and discriminative learning models

Publications (1)

Publication Number Publication Date
CN104036021A true CN104036021A (en) 2014-09-10

Family

ID=51466791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410295467.3A Pending CN104036021A (en) 2014-06-26 2014-06-26 Method for semantically annotating images on basis of hybrid generative and discriminative learning models

Country Status (1)

Country Link
CN (1) CN104036021A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760365A (en) * 2016-03-14 2016-07-13 云南大学 Probability latent parameter estimation model of image semantic data based on Bayesian algorithm
WO2017166137A1 (en) * 2016-03-30 2017-10-05 中国科学院自动化研究所 Method for multi-task deep learning-based aesthetic quality assessment on natural image
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
CN107851174A (en) * 2015-07-08 2018-03-27 北京市商汤科技开发有限公司 The apparatus and method of linguistic indexing of pictures
WO2019055114A1 (en) * 2017-09-12 2019-03-21 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
US10908616B2 (en) 2017-05-05 2021-02-02 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542067A (en) * 2012-01-06 2012-07-04 上海交通大学 Automatic image semantic annotation method based on scale learning and correlated label dissemination
CN103336969A (en) * 2013-05-31 2013-10-02 中国科学院自动化研究所 Image meaning parsing method based on soft glance learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542067A (en) * 2012-01-06 2012-07-04 上海交通大学 Automatic image semantic annotation method based on scale learning and correlated label dissemination
CN103336969A (en) * 2013-05-31 2013-10-02 中国科学院自动化研究所 Image meaning parsing method based on soft glance learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIXIN LI 等: ""Learning semantic concepts from image database with hybrid generative/discriminative approach"", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107851174A (en) * 2015-07-08 2018-03-27 北京市商汤科技开发有限公司 The apparatus and method of linguistic indexing of pictures
CN107851174B (en) * 2015-07-08 2021-06-01 北京市商汤科技开发有限公司 Image semantic annotation equipment and method, and generation method and system of image semantic annotation model
CN105760365A (en) * 2016-03-14 2016-07-13 云南大学 Probability latent parameter estimation model of image semantic data based on Bayesian algorithm
WO2017166137A1 (en) * 2016-03-30 2017-10-05 中国科学院自动化研究所 Method for multi-task deep learning-based aesthetic quality assessment on natural image
US10685434B2 (en) 2016-03-30 2020-06-16 Institute Of Automation, Chinese Academy Of Sciences Method for assessing aesthetic quality of natural image based on multi-task deep learning
US10908616B2 (en) 2017-05-05 2021-02-02 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
WO2019055114A1 (en) * 2017-09-12 2019-03-21 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning

Similar Documents

Publication Publication Date Title
Yu et al. Hierarchical deep click feature prediction for fine-grained image recognition
Sener et al. Learning transferrable representations for unsupervised domain adaptation
US9190026B2 (en) Systems and methods for feature fusion
Cao et al. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes
Fang et al. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias
Cao et al. Spatially coherent latent topic model for concurrent object segmentation and classification
Grauman et al. Learning a tree of metrics with disjoint visual features
Moon et al. Multimodal transfer deep learning with applications in audio-visual recognition
CN104036021A (en) Method for semantically annotating images on basis of hybrid generative and discriminative learning models
CN105808752B (en) A kind of automatic image marking method based on CCA and 2PKNN
CN105205096A (en) Text modal and image modal crossing type data retrieval method
CN106202256A (en) Propagate based on semanteme and mix the Web graph of multi-instance learning as search method
Lim et al. Context by region ancestry
CN105184298A (en) Image classification method through fast and locality-constrained low-rank coding process
US20150131899A1 (en) Devices, systems, and methods for learning a discriminant image representation
Abdul-Rashid et al. Shrec’18 track: 2d image-based 3d scene retrieval
CN104281572A (en) Target matching method and system based on mutual information
Fidler et al. A coarse-to-fine taxonomy of constellations for fast multi-class object detection
Wang et al. Improved object categorization and detection using comparative object similarity
Xu et al. Transductive visual-semantic embedding for zero-shot learning
Zhou et al. Classify multi-label images via improved CNN model with adversarial network
CN103942214B (en) Natural image classification method and device on basis of multi-modal matrix filling
Chen et al. RRGCCAN: Re-ranking via graph convolution channel attention network for person re-identification
Ye et al. Practice makes perfect: An adaptive active learning framework for image classification
CN105117735A (en) Image detection method in big data environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140910

WD01 Invention patent application deemed withdrawn after publication