CN104036021A - Method for semantically annotating images on basis of hybrid generative and discriminative learning models - Google Patents
Method for semantically annotating images on basis of hybrid generative and discriminative learning models Download PDFInfo
- Publication number
- CN104036021A CN104036021A CN201410295467.3A CN201410295467A CN104036021A CN 104036021 A CN104036021 A CN 104036021A CN 201410295467 A CN201410295467 A CN 201410295467A CN 104036021 A CN104036021 A CN 104036021A
- Authority
- CN
- China
- Prior art keywords
- semantic
- image
- value
- vector
- test pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for semantically annotating images on the basis of hybrid generative and discriminative learning models. The method includes generatively building models of the images by means of continuous PLSA (probabilistic latent semantic analysis) at generative learning stages, acquiring corresponding model parameters and subject distribution of each image, and utilizing the corresponding subject distribution as an intermediate representation vector of each image; constructing cluster classifier chains to discriminatively learn from the intermediate representation vectors of the images at discriminative learning stages, creating the classifier chains and integrating contextual information among annotation keywords; automatically extracting visual features of each given unknown image at annotation stages, acquiring representation of subject vectors of the given unknown images by the aid of a continuous PLSA parameter estimation algorithm, classifying the subject vectors by the aid of trained cluster classifier chains and semantically annotating the images by a plurality of semantic keywords with the highest confidence. The method has the advantage that the annotation and retrieval performance of the method are superior to the annotation and retrieval performance of most current typical methods for automatically annotating images.
Description
Technical field
The present invention relates to field of image search, be specifically related to a kind of linguistic indexing of pictures method of mixing production and discriminant learning model.
Background technology
According to the feature of used machine learning method, existing image automatic annotation method is broadly divided into mask method based on production model (generative model) and the mask method based on discriminative model (discriminative model).
The feature of the mask method based on production model is: first learn the joint probability of characteristics of image and keyword, and the posterior probability of each keyword while then calculating given characteristics of image by Bayesian formula, and carry out image labeling according to posterior probability.These class methods have extendible training process, lower to the quality requirements of the artificial mark of training plan image set.
The feature of the mask method based on discriminative model is: suppose that graphic feature is certain parameterized function to the mapping between keyword, directly in the parameter of this function of training data learning, and obtain the sorter of each semantic concept.Each semantic concept is considered as independently classification by these class methods, in general can obtain higher mark precision, but be not easy to utilize the priori of domain-specific.
The probability graph model of the method based on production model and the method based on discriminative model is respectively as (a) in Fig. 1 with (b), the two is compared and mainly contains following some difference: (1) method based on discriminative model is regarded image as training data, each semantic concept is regarded classification as, object is to sort images in each semantic classes, and image and text are all considered as training data by method based on production model, its objective is associated between study image and text; (2) method based on discriminative model is a sorter of each semantic concept training, and the method based on production model is only learnt a correlation model and this model is applied to all semantic concepts; (3) independence assumption difference.It is separate that method based on discriminative model is supposed between each semantic classes, and method based on production model supposes under the condition of given hidden variable, visual element and text element be condition independently.
In sum, production model and discriminative model respectively have its advantage and defect.
Summary of the invention
The present invention is directed to " semantic gap " problem that exists in image retrieval and the defect of production model and discriminative model, a kind of linguistic indexing of pictures method of mixing production and discriminant learning model is provided, it proposes to mix the automatic image annotation model HGDM (hybrid generative/discriminative model) of production and discriminant study on the basis of continuous probability latent semantic analysis and Multi-label learning, and has further realized the Semantic Image Retrieval based on keyword.
For addressing the above problem, the present invention is achieved by the following technical solutions:
The linguistic indexing of pictures method of mixing production and discriminant learning model, comprises the steps:
(1) process of training image being trained,
(1.1) adopt the visual signature of continuous probability latent semantic analysis (PLSA) Method Modeling training image, obtain given theme z
kunder Gaussian Distribution Parameters μ
kand Σ
k, and the theme vector P (z of every width training image
k/ d
i);
(1.2) utilize the theme vector P (z of every width training image
k/ d
i) and original semantic tagger, adopt Multi-label learning method construct classifier chains;
(2) process test pattern being marked,
(2.1) the Gaussian Distribution Parameters μ that utilizes step (1.1) to obtain
kand Σ
k, and the visual signature of test pattern, adopt expectation maximization (Expectation Maximization, EM) method to calculate the theme vector P (z of every width test pattern
k/ d
new);
(2.2) classifier chains of utilizing step (1.2) to obtain, to this theme vector P (z
k/ d
new) carry out the semantic classification of test pattern;
(2.3) semantic tagger using an X the highest degree of confidence semantic classes as this test pattern; Wherein parameter X is artificial preset value.
Step (1.2) is the construction process of classifier chains, the training process that is classifier chains is specially: according to the flag sequence of specifying, the two-value sorter of the associated semantic key words mark of each circulation study, and each circulation all will add semantic key words label information corresponding to two-value sorter of having learnt, and constructs thus a two-value classifier chains; Wherein each two-value sorter C in this two-value classifier chains
jbe responsible for and semantic key words mark l
jrelevant study and prediction.Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
Step (2.2) is semantic classification process, and the assorting process of classifier chains is specially: by the two-value classifier chains of constructing in sorter training process, from two-value sorter C
1start constantly back-propagation, wherein two-value sorter C
1determine semantic key words mark l
1classification results Pr (l
1| x); Again by this classification results Pr (l
1| x) join in the theme vector of test pattern in the mode of two-value, by that analogy, follow-up two-value sorter C
jdetermine mark l
jclassification results Pr (l
j| x, l
1, l
2..., l
j-1), the theme vector that x is training image.Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
In step (1.1) and (2.1), also further comprise the process of training image and test pattern being carried out to Visual Feature Retrieval Process,
First, every width image is divided into (m × n) individual regular square;
Then, for each square extracts the proper vector that (a+b) ties up, the textural characteristics of the color characteristic that the proper vector of this (a+b) dimension comprises a dimension and b dimension; Wherein color characteristic is the color auto-correlogram calculating on quantized color and city block distance, and textural characteristics is Jia Bai (Gabor) energy coefficient calculating in yardstick and direction;
Finally, the visual signature of every width image is (m × n) set of the visual feature vector of individual (a+b) dimension;
Wherein parameter m, n, a and b are artificial preset value.
Compared with prior art, the present invention is integrated production model and discriminative model in learning process, study for input picture visual signature adopts production model, and adopts discriminative model for the semantic learning process of image, thereby has following feature:
(1) at production learning phase, adopt continuous P LSA Direct Modeling Image Visual Feature, do not need to carry out the quantizing process of visual signature, thereby can not lose important visual information.
(2) theme vector that continuous P LSA is transformed to a K dimension by image from the expression of characteristic set represents, also can be considered as the process of a dimensionality reduction.And the expression of this theme vector also integrated the implicit semantic information of image vision content, be significant for the semantic retrieval of image.
(3) build sorter based on Multi-label learning method, the association when image is classified between integrated image labeling keyword.Can be good at solving weak mark problem, training set scale is had to extensibility.
(4) adopt discriminative model cluster classification device chain to carry out the semantic classification of image, wherein each two-value sorter builds based on support vector machine (SVM), so operational efficiency and nicety of grading are all higher.
Brief description of the drawings
Fig. 1 is that the probability graph model of two class image automatic annotation methods represents: be (a) method based on discriminative model; (b) be the method based on production model.
Fig. 2 is the automatic image annotation framework that mixes production and discriminative model.
Embodiment
A kind of image automatic annotation method that mixes production and discriminative model.At production learning phase, adopt continuous P LSA to carry out production modeling to image, the priori of training set can be made full use of, and the theme distribution of corresponding model parameter and every width image can be obtained; This theme is distributed as the intermediate representation vector of every width image, the problem of automatic image annotation is just converted into a classification problem based on Multi-label learning so, to obtain the mark precision higher than production model again.At discriminant learning phase, use the method for structure cluster classification device chain to carry out discriminant study to the intermediate representation vector of image, contextual information in setting up classifier chains between also integrated mark keyword, when image is classified, also consider like this association between image labeling, thereby can obtain higher mark precision and better retrieval effectiveness.In the mark stage, a given width unknown images, can obtain the expression of its theme vector by the parameter estimation algorithm of automatic extraction visual signature and continuous P LSA; Re-using the cluster classification device chain training classifies to this theme vector; Finally, the semantic tagger using some semantic key words the highest degree of confidence as image.
Mix production and the study of discriminative model (HGDM) and the framework of mark as shown in Figure 2.
Be divided into two steps for the training process of training image: first, utilize the visual signature of continuous P LSA modeling training image, obtain given theme z
kunder Gaussian Distribution Parameters μ
kand Σ
k, and the distribution of the theme of every width training image represents P (z
k/ d
i), this is the learning process of a production.Here the Gaussian Distribution Parameters μ obtaining
kand Σ
kthe parameter of continuous P LSA, still effective for the image outside training set by these parameters of independence assumption of continuous P LSA, and theme distribution represents P (z
k/ d
i) only corresponding to the character of every width training image itself, can not bring prior imformation to test pattern.But, can utilize this theme vector (K is potential theme number) that represents every width training image to be expressed as a K dimension, the space that these vectors form is a simple form (simplex).Then, utilize the theme vector of every width training image to represent and their original mark structural classification device, each class is corresponding to the semantic classes in text vocabulary, and this is the learning process of a discriminant.Because now every width image is all represented by a theme vector, but corresponding to multiple keyword marks, consistent with the situation of Multi-label learning, so adopted the method construct multicategory classification device of Multi-label learning, the related information between simultaneously also integrated keyword.
Also be divided into similar two steps for the mark process of test pattern: (1) first, utilizes the model parameter μ that the training stage obtains
kand Σ
kand the visual signature of test pattern, use expectation maximization (Expectation Maximization, EM) algorithm to calculate the theme vector P (z of every width test pattern
k/ d
new).(2) then, utilize the sorter that obtains of training to this theme vector classification, and semantic tagger using semantic classes the highest some degree of confidence of gained as this test pattern.
First Visual Feature Retrieval Process method of the present invention is divided into every width image of data centralization regular square (square size is by verifying that collection is defined as 16 × 16), then be the proper vector that each square extracts one 36 dimension, the textural characteristics of the color characteristic that comprises 24 dimensions and 12 dimensions, color characteristic is the color auto-correlogram calculating on 8 quantized colors and 3 city block distances, and textural characteristics is the Gabor energy coefficient calculating in 3 yardsticks and 4 directions.So, each square can be expressed as the proper vector of one 36 dimension, and every width image just can be expressed as one " feature bag ", the namely set of the visual feature vector of several 36 dimensions, thus provide accordant interface for further using topic model to carry out modeling.
At production learning phase, the theme number setting of continuous P LSA is very important, because the number that is the theme has determined the dimension of the intermediate representation of image.The excessive efficiency that can reduce system of this number, the too small image information of can losing.Because the matching of continuous P LSA is more time-consuming, the present invention has chosen five theme numbers (being respectively 90,120,150,180 and 210) and has tested, experimental result shows, in the time that theme number is 180, system performance, than using other theme number fashions, determines that the theme number using is 180 so final.
At discriminant learning phase, HGDM adopts the method for the cluster classification device chain in Multi-label learning method to carry out multiple labeling classification, and wherein each two-value sorter uses support vector machine (SVM) to realize.This method can be considered interrelated between multiple labeling and have acceptable computation complexity.
Classifier chains (classifier chain, CC) relevant to two-value (binary relevance, BR) method is the same, comprises | and L| two-value sorter, each sorter is processed the two-value relevant issues of a mark.But different from BR method, these two-value sorters all couple together by a chain, wherein the feature space of each node is relevant with the class mark of node above.
The training process of classifier chains is as shown in table 1, and training sample is expressed as (x, S) here; Wherein S is several semantic key words set of training image mark,
l is all semantic key words set; Element in S can be semantic key words mark l with binary set
j(l
1, l
2..., l
/ L/) represent, x is theme vector.In algorithm, according to the flag sequence of specifying, the two-value sorter of the associated mark of each circulation study, the more important thing is, each cycle specificity space all will add label information corresponding to two-value sorter of having learnt, thereby characteristic information is constantly enhanced, last, can construct a two-value classifier chains, each the sorter C in this two-value classifier chains
jbe responsible for and mark l
jrelevant study and prediction, a two-value sorter is responsible for a semantic key words.j=1,2,……|L|。
The assorting process of classifier chains is as shown in table 2, from two-value sorter C
1start, then back-propagation constantly.Two-value sorter C
1determine mark l
1classification results Pr (l
1| x), then this result is added to the feature of test sample book in the mode of two-value, follow-up sorter is determined mark l
jclassification results Pr (l
j| x, l
1, l
2..., l
j-1).
Use the method for chain can between sorter, transmit label information, consider the related information between mark simultaneously, thereby can overcome the mark question of independence in BR method.And classifier chains still keeps the advantage of BR method, comprise that storage demand is low and operational efficiency is high.
Although on average will increase for each example | the characteristic amount of L|/2 dimensions, due in practice | L| is generally a limited value, thereby the computation complexity problem being caused by this reason is almost negligible.The computation complexity of classifier chains and BR method are very approaching, depend on the number of mark and the complexity of basic two-value sorter.The complexity of BR method is O (| L| × f (| X|, | D|)), and wherein f (| X|, | D|) is the complexity of basic two-value sorter.The complexity of classifier chains is O (| L| × f (| X|+|L|, | D|)), namely many | L| ties up additional eigenwert.And HGDM adopt SVM as basic two-value sorter, so the complexity of classifier chains can be reduced to O (| L| × | X| × | D|+|L| × | L| × | D|).Can see, as long as | L|<|X|, Section 1 will play a major role.Like this computation complexity of classifier chains be O (| L| × | X| × | D|), identical with the computation complexity of BR method.And only have when | when L|>|X|, the computation complexity of classifier chains just can be higher than BR method.
In addition, although the process of chain type means that classifier chains can not parallelization, it can serialization, namely at any time in internal memory, only need to retain a two-value sorter, and this is an obvious advantage for contrast method for distinguishing.
The order of classifier chains obviously can affect its precision.Although there are some heuritic approaches to determine the order of chain, we still adopt concentrating type framework to solve this problem.Adopt concentrating type method can improve overall precision, avoid over-fitting, also can realize parallelization.Here said cluster refers to the cluster of multiple labeling method, namely the cluster of classifier chains.
M classifier chains C of cluster classification device chain training
1, C
2..., C
m, wherein each classifier chains is obtained by a random subset training of a random chain sequence and training set.Therefore each MODEL C
kbe all mutually different and can provide different multiple labeling classification results.These classification results are made to read group total according to mark, and each mark can obtain some ballots so.Use a threshold value to select the highest mark of poll can form a multiple labeling set, and using this as final classification results.
If predicting the outcome as vectorial y of k independent model
k=(l
1, l
2..., l
/ L/) ∈ { 0,1}
| L|.Can obtain vectorial W=(λ to all model summations
1, λ
2..., | L|) ∈ R
| L|, here
.Therefore each j ∈ W has represented mark l
jthe result of ∈ L ballot.Vectorial W is obtained to W do normalization
norm, just can obtain a distribution on each being marked at [0,1].After finishing the judgement of threshold value, can distribute and do a sequence according to this.With other automatic image annotation model class seemingly, HGDM gets the semantic tagger of front 5 keyword tag that degree of confidence is the highest as image.
In Corel5k image data base, a cluster that comprises 90 classifier chains of this method structure, what each classifier chains was random choose a subset that comprises 500 width images trains.And while testing, use the cluster of 150 classifier chains on data set IAPR-TC12 and MIRFLICKR25000, what each classifier chains was random choose a subset that comprises 1000 width images trains.In addition, the two-value sorter of each node representative in classifier chains uses LIBSVM software package to realize, select RBF kernel function K (x, x')=exp (γ || x-x'||2), corresponding parameter is defined as by grid search method: (C, γ)=(27,21), wherein C is error penalty factor, and γ is kernel functional parameter.
By reasonable design learning framework, the linguistic indexing of pictures method of mixing production and discriminant learning model effectively combines the learning method of production and discriminative model and has inherited their advantages separately, has obtained better performance.Experimental result shows, the linguistic indexing of pictures method of mixing production and discriminant learning model had both possessed production model and can make full use of the advantage of training data, also can as discriminative model, can obtain higher nicety of grading, its mark and retrieval performance are better than current most of typical image automatic annotation method.
Claims (7)
1. the linguistic indexing of pictures method of mixing production and discriminant learning model, is characterized in that, comprises the steps:
(1) process of training image being trained,
(1.1) adopt the visual signature of continuous probability latent semantic analysis Method Modeling training image, obtain given theme z
kunder Gaussian Distribution Parameters μ
kand Σ
k, and the theme vector P (z of every width training image
k/ d
i);
(1.2) utilize the theme vector P (z of every width training image
k/ d
i) and original semantic tagger, adopt Multi-label learning method construct classifier chains;
(2) process test pattern being marked,
(2.1) the Gaussian Distribution Parameters μ that utilizes step (1.1) to obtain
kand Σ
k, and the visual signature of test pattern, adopt expectation maximization (Expectation Maximization, EM) method to calculate the theme vector P (z of every width test pattern
k/ d
new);
(2.2) classifier chains of utilizing step (1.2) to obtain, to this theme vector P (z
k/ d
new) carry out the semantic classification of test pattern;
(2.3) semantic tagger using an X the highest degree of confidence semantic classes as this test pattern; Wherein parameter X is artificial preset value.
2. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, it is characterized in that, step (1.2) is specially: according to the flag sequence of specifying, the two-value sorter of the associated semantic key words mark of each circulation study, and each circulation all will add semantic key words label information corresponding to two-value sorter of having learnt, and constructs thus a two-value classifier chains; Wherein each two-value sorter C in this two-value classifier chains
jbe responsible for and semantic key words mark l
jrelevant study and prediction; Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
3. the linguistic indexing of pictures method of mixing production according to claim 2 and discriminant learning model, is characterized in that, step (2.2) is specially: by the two-value classifier chains of step (1.2) structure, from two-value sorter C
1start constantly back-propagation, wherein two-value sorter C
1determine semantic key words mark l
1classification results Pr (l
1| x); Again by this classification results Pr (l
1| x) join in the theme vector of test pattern in the mode of two-value, by that analogy, follow-up two-value sorter C
jdetermine mark l
jclassification results Pr (l
j| x, l
1, l
2..., l
j-1), the theme vector that x is training image; Above-mentioned j=1,2 ... / L/ ,/L/ is the number of semantic key words.
4. according to the linguistic indexing of pictures method of the mixing production described in any one in claim 1~3 and discriminant learning model, it is characterized in that, in step (1.1) and (2.1), also further comprise the process of training image and test pattern being carried out to Visual Feature Retrieval Process,
First, every width image is divided into (m × n) individual regular square;
Then, for each square extracts the proper vector that (a+b) ties up, the textural characteristics of the color characteristic that the proper vector of this (a+b) dimension comprises a dimension and b dimension; Wherein color characteristic is the color auto-correlogram calculating on quantized color and city block distance, and textural characteristics is Jia Bai (Gabor) energy coefficient calculating in yardstick and direction;
Finally, the visual signature of every width image is (m × n) set of the visual feature vector of individual (a+b) dimension;
Wherein parameter m, n, a and b are artificial preset value.
5. the linguistic indexing of pictures method of mixing production according to claim 4 and discriminant learning model, is characterized in that, parameter m and n are all made as 16, and parameter a is made as 24, and parameter b is made as 12; Be that every width image is all divided into 16 × 16 regular squares, each square extracts the proper vector of one 36 dimension, the textural characteristics of the color characteristic that the proper vector of these 36 dimensions comprises 24 dimensions and 12 dimensions.
6. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, is characterized in that, in step (1.1), in the time of continuous probability latent semantic analysis, set theme number is 180.
7. the linguistic indexing of pictures method of mixing production according to claim 1 and discriminant learning model, it is characterized in that, in step (2.3), parameter X is made as 5, the semantic tagger by the highest 5 semantic classess of degree of confidence as this test pattern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410295467.3A CN104036021A (en) | 2014-06-26 | 2014-06-26 | Method for semantically annotating images on basis of hybrid generative and discriminative learning models |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410295467.3A CN104036021A (en) | 2014-06-26 | 2014-06-26 | Method for semantically annotating images on basis of hybrid generative and discriminative learning models |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104036021A true CN104036021A (en) | 2014-09-10 |
Family
ID=51466791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410295467.3A Pending CN104036021A (en) | 2014-06-26 | 2014-06-26 | Method for semantically annotating images on basis of hybrid generative and discriminative learning models |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104036021A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760365A (en) * | 2016-03-14 | 2016-07-13 | 云南大学 | Probability latent parameter estimation model of image semantic data based on Bayesian algorithm |
WO2017166137A1 (en) * | 2016-03-30 | 2017-10-05 | 中国科学院自动化研究所 | Method for multi-task deep learning-based aesthetic quality assessment on natural image |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
CN107851174A (en) * | 2015-07-08 | 2018-03-27 | 北京市商汤科技开发有限公司 | The apparatus and method of linguistic indexing of pictures |
WO2019055114A1 (en) * | 2017-09-12 | 2019-03-21 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
US10908616B2 (en) | 2017-05-05 | 2021-02-02 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102542067A (en) * | 2012-01-06 | 2012-07-04 | 上海交通大学 | Automatic image semantic annotation method based on scale learning and correlated label dissemination |
CN103336969A (en) * | 2013-05-31 | 2013-10-02 | 中国科学院自动化研究所 | Image meaning parsing method based on soft glance learning |
-
2014
- 2014-06-26 CN CN201410295467.3A patent/CN104036021A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102542067A (en) * | 2012-01-06 | 2012-07-04 | 上海交通大学 | Automatic image semantic annotation method based on scale learning and correlated label dissemination |
CN103336969A (en) * | 2013-05-31 | 2013-10-02 | 中国科学院自动化研究所 | Image meaning parsing method based on soft glance learning |
Non-Patent Citations (1)
Title |
---|
ZHIXIN LI 等: ""Learning semantic concepts from image database with hybrid generative/discriminative approach"", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107851174A (en) * | 2015-07-08 | 2018-03-27 | 北京市商汤科技开发有限公司 | The apparatus and method of linguistic indexing of pictures |
CN107851174B (en) * | 2015-07-08 | 2021-06-01 | 北京市商汤科技开发有限公司 | Image semantic annotation equipment and method, and generation method and system of image semantic annotation model |
CN105760365A (en) * | 2016-03-14 | 2016-07-13 | 云南大学 | Probability latent parameter estimation model of image semantic data based on Bayesian algorithm |
WO2017166137A1 (en) * | 2016-03-30 | 2017-10-05 | 中国科学院自动化研究所 | Method for multi-task deep learning-based aesthetic quality assessment on natural image |
US10685434B2 (en) | 2016-03-30 | 2020-06-16 | Institute Of Automation, Chinese Academy Of Sciences | Method for assessing aesthetic quality of natural image based on multi-task deep learning |
US10908616B2 (en) | 2017-05-05 | 2021-02-02 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
WO2019055114A1 (en) * | 2017-09-12 | 2019-03-21 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Hierarchical deep click feature prediction for fine-grained image recognition | |
Sener et al. | Learning transferrable representations for unsupervised domain adaptation | |
US9190026B2 (en) | Systems and methods for feature fusion | |
Cao et al. | Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes | |
Fang et al. | Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias | |
Cao et al. | Spatially coherent latent topic model for concurrent object segmentation and classification | |
Grauman et al. | Learning a tree of metrics with disjoint visual features | |
Moon et al. | Multimodal transfer deep learning with applications in audio-visual recognition | |
CN104036021A (en) | Method for semantically annotating images on basis of hybrid generative and discriminative learning models | |
CN105808752B (en) | A kind of automatic image marking method based on CCA and 2PKNN | |
CN105205096A (en) | Text modal and image modal crossing type data retrieval method | |
CN106202256A (en) | Propagate based on semanteme and mix the Web graph of multi-instance learning as search method | |
Lim et al. | Context by region ancestry | |
CN105184298A (en) | Image classification method through fast and locality-constrained low-rank coding process | |
US20150131899A1 (en) | Devices, systems, and methods for learning a discriminant image representation | |
Abdul-Rashid et al. | Shrec’18 track: 2d image-based 3d scene retrieval | |
CN104281572A (en) | Target matching method and system based on mutual information | |
Fidler et al. | A coarse-to-fine taxonomy of constellations for fast multi-class object detection | |
Wang et al. | Improved object categorization and detection using comparative object similarity | |
Xu et al. | Transductive visual-semantic embedding for zero-shot learning | |
Zhou et al. | Classify multi-label images via improved CNN model with adversarial network | |
CN103942214B (en) | Natural image classification method and device on basis of multi-modal matrix filling | |
Chen et al. | RRGCCAN: Re-ranking via graph convolution channel attention network for person re-identification | |
Ye et al. | Practice makes perfect: An adaptive active learning framework for image classification | |
CN105117735A (en) | Image detection method in big data environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140910 |
|
WD01 | Invention patent application deemed withdrawn after publication |