CN108984726A - A method of the sLDA model based on extension carries out title annotation to image - Google Patents

A method of the sLDA model based on extension carries out title annotation to image Download PDF

Info

Publication number
CN108984726A
CN108984726A CN201810759844.2A CN201810759844A CN108984726A CN 108984726 A CN108984726 A CN 108984726A CN 201810759844 A CN201810759844 A CN 201810759844A CN 108984726 A CN108984726 A CN 108984726A
Authority
CN
China
Prior art keywords
parameter
image
slda
formula
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810759844.2A
Other languages
Chinese (zh)
Other versions
CN108984726B (en
Inventor
秦丹阳
冯攀
纪萍
马静雅
张岩
杨松祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Litong Information Technology Co ltd
Original Assignee
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang University filed Critical Heilongjiang University
Priority to CN201810759844.2A priority Critical patent/CN108984726B/en
Publication of CN108984726A publication Critical patent/CN108984726A/en
Application granted granted Critical
Publication of CN108984726B publication Critical patent/CN108984726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The method that the present invention relates to a kind of sLDA model based on extension carries out title annotation to image, it is to solve existing annotation of images method and can encounter scalability issues, a small annotation vocabulary can only be handled, the shortcomings that lacking universal and ease for use and propose, it include: the image for input, the local feature of image is extracted, and obtains N number of visual vocabulary of image using K-means algorithm;The Posterior distrbutionp of given document hidden variable is indicated using LDA model;It introduces response variable and response variable distribution is defined as multivariable Bernoulli Jacob distribution;It is approximate that formula is carried out using the LDA reasoning variational algorithm based on convexity;Seek variational parameter value;Estimate model parameter;The distribution of predicated response variable.The present invention is suitable for image header annotation system.

Description

A method of the sLDA model based on extension carries out title annotation to image
Technical field
The present invention relates to annotation of images method fields, and in particular to a kind of sLDA model based on extension marks image The method that note is released.
Background technique
In in the past few decades, the problem of image and video frequency searching, is constantly in the forward position of computer vision research.To the greatest extent For pipe in this way, since nearest largely picture and video can be found on the net, people are receiving a kind of efficient algorithm on a large scale Search and the demand of navigation is concentrated also constantly to increase.Current state-of-the-art image search engine is depended critically upon using band annotation Text or title identify and retrieve image.Although this method allows to carry out high-level semantic query, for being based on text Search technique the vital heading message of success, usually obtain manually, and this process cannot be with current more The ever-increasing scale of media corpus and extend.Therefore, it is necessary to automate this annotation procedure.Since it is to being related to digital matchmaker The potential impact of the extensive application program of body archives, people are to the automatic chemical industry for designing and developing annotating images and video in recent years The attention rate of tool is growing day by day.
In the case where no title, annotate algorithm task be by study image and text between association mode come Predict the title of missing.The work in this field can be roughly divided into two groups in the past.In first group of work, annotation of images Problem is converted into a supervised learning problem, and in this problem, annotation will be taken as concept class.For every in vocabulary A word, class sigma-t be from learn in markd image.In annotation procedure, the posteriority point of class label is calculated Then cloth uses the concept of maximum probability as the title of prediction.In practice, this method can encounter scalability issues, And a small annotation vocabulary can only be handled, because each word must be learned by class sigma-t.
Another group, by modeling to the joint statistic correlation between two data types, is handled on the basis of more equality Annotation and image data.These models use a potential variable frame, by assuming that each document have one group it is hiding The factor controls the association between characteristics of image and corresponding heading, to understand the joint probability point of text and characteristics of image Cloth.
Summary of the invention
The purpose of the present invention is to solve existing annotation of images methods can encounter scalability issues, can only handle one A small annotation vocabulary, the shortcomings that lacking universal and ease for use, and propose a kind of sLDA model based on extension to image The method for carrying out title annotation is capable of handling the multidimensional binary response variable of annotation data, comprising:
Step 1: extracting the local feature of image, and obtain the N of image using K-means algorithm for the image of input A visual vocabulary wn, wherein wn∈{1,2...,N}。
Step 2: indicating the Posterior distrbutionp of given document hidden variable using LDA model.
Wherein α and β is model parameter, and z and θ are theme variable and theme ratio respectively.
Step 3: introducing the parameter η and δ of response variable y and response variable in step 2, while will and response be become Amount distribution is defined as multivariable Bernoulli Jacob distribution, i.e., indicates formula (3) are as follows:
Step 4: according to the LDA reasoning variational algorithm based on convexity by formula (5) similar toWherein Di Li Cray parameter γ and multiple parameters (φ12,...,φN) it is freedom Variational parameter;znFor n-th of descriptor;The desired difference of logp (θ, z, w | α, β, η, δ) and q (θ, z | γ, φ) is denoted as L。
Step 5: seeking the variational parameter γ and φ that the lower bound of L can be made to reach maximum value.
Step 6: estimation model parameter ψ={ α, β, η, δ }.
Step 7: according to the distribution p of model parameter ψ and variational parameter γ and φ predicated response variable y (y | w).
Further, step 3 specifically:
It utilizesη and δ generates response variable y, whereinIf the distribution of response variable y meets generalized linear Model:
WhereinThen formula (3) can be expressed as
Wherein
Further, step 4 specifically:
It is approximately by formula (5) by following formula
Enable L (γ, φ;α, β) expression (8) the right, formula (8) is expressed as
Logp (w | α, β)=L (γ, φ;α,β)+D(q(θ,z|γ,φ)||p(θ,z|w,α,β)) (9)
L is write by formula (10) by using the factorization of p and q:
Further, step 5 specifically:
Step 5 one, in formula (13), utilize φniMaximize the lower bound of L, φniIndicate n-th of visual vocabulary by hidden The probability that theme i is generated is hidden, thereforeAnd φ is included by separationniItem and add Lagrange multiplier appropriate To form Lagrangian:
ψ (x) is double gamma functions;
It calculates about φniDerivative:
Wherein βivIt indicates for suitable v'sV is v-th of word of dictionary;
It further obtains in the case where response variable obeys Bernoulli Jacob's distribution occasion, parameter phinMore new formula
Step 5 two utilizes γiMaximize above formula, γiIndicate i-th of component part of posteriority Di Li Cray parameter;Include γiItem:
To γiDerivation:
Enabling derivative is zero:
Iterative equation (16) to (19) is restrained until boundary, and then obtains the variation that the lower bound of L can be made to reach maximum value Parameter γ and φ.
Further, step 6 specifically:
Step 6 one, the formula for acquiring parameter beta are as follows:
Step 6 two, the process for acquiring parameter alpha are as follows: for formula (22),
Derivation is carried out to obtain
The value of α is sought by Newton iteration method to formula (23);
Step 6 three acquires parameter η and σ2Process are as follows:
Wherein μ ()=EGLM[Y|·];
To σ2Derivation,Upper assessment
By calculating, parameter estimation result is finally obtained:
By parameter alphai、βij、ηiAnd δiIt is combined and model parameter ψ={ α, β, η, δ } can be obtained.
Further, step 7 specifically:
To not have headed new document w as input, task is to be inferred to most probable heading, utilizes φnWith q (θ) Come approximate solution conditional probability p (y | w), as follows:
WhereinP (y | w) for inferring the new most probable heading of document w.
The invention has the benefit that
1, the present invention is made that adjustment to the structure of Corr-LDA, has deleted variable x, image subject is used directly for pre- Mark topic, is integrated (and this is the step of Corr-LDA needs) without the posterior probability to title.And to sLDA into Row extension, so that model is capable of handling multivariable binary responses variable, a response variable can only be handled by eliminating sLDA Deficiency, for image annotation in further detail, therefore image retrieval is also more convenient and accurate.
2, in the case where number of topics, vocabulary number are more, predictablity rate of the invention is apparently higher than Corr-LDA model, Averagely it is higher by 0.04.
Detailed description of the invention
Fig. 1 is the graphical model structure chart of sLDA-bin;
Fig. 2 is the graphical model structure chart of Corr-LDA;
Fig. 3 is the error curve diagram between the response of prediction and the observation of sLDA-bin and LDA;
The heading prediction graph of Corr-LDA and sLDA-bin when Fig. 4 is K=30;
The heading prediction graph of Corr-LDA and sLDA-bin when Fig. 5 is N=256;
The heading prediction graph of Corr-LDA and sLDA-bin when Fig. 6 is N=512;
Fig. 7 is the accuracy rate curve graph that Corr-LDA and sLDA-bin annotates partial objects.
Specific embodiment
The method that sLDA model based on extension of the invention carries out title annotation to image, may be simply referred to as sLDA-bin, Include:
Step 1: extracting the local feature of image, and obtain the N of image using K-means algorithm for the image of input A visual vocabulary wn, wherein wn∈{1,2...,N}。
Step 2: indicating the Posterior distrbutionp of given document hidden variable using LDA model:
Wherein α and β is model parameter, and z and θ are theme variable and theme ratio respectively.
Step 3: introducing the parameter η and δ of response variable y and response variable in step 2, while will and response be become Amount distribution is defined as multivariable Bernoulli Jacob distribution, i.e., indicates formula (3) are as follows:
Step 4: according to the LDA reasoning variational algorithm based on convexity by formula (5) similar toWherein Di Li Cray parameter γ and multiple parameters (φ12,...,φN) it is freedom Variational parameter;znFor the theme variable of n-th of word;By the desired difference of logp (θ, z, w | α, β, η, δ) and q (θ, z | γ, φ) Value is denoted as L, and determines the representation of L lower bound.
Step 5: seeking the variational parameter γ and φ that the lower bound of L can be made to reach maximum value.
Step 6: estimation model parameter ψ={ α, β, η, δ }.
Step 7: according to the distribution p of model parameter ψ and variational parameter γ and φ predicated response variable y (y | w).
The principle and process of present embodiment is specifically described below, it should be noted that footmark is attached to the variable of n and not attached Variable meaning with n is identical, and difference is that footmark is attached to the variable of n and emphasizes ordinal number n, that is, emphasizes that this variable is N number of word In n-th of corresponding parameter.For example, variable z and znIt is identical meaning, difference is znIt is emphasised that the theme of n-th of word Variable, z do not emphasize this point, and they be meant that it is identical.
Step 1: data indicates, N number of local feature of image is extracted, then N number of feature is clustered with K-means, Known k initial average point m1,...,mk, alternately according to following two steps:
Distribution: each observation point is assigned in cluster, so that quadratic sum reaches minimum in organizing.Because this quadratic sum is just Euclidean distance after being square, so very intuitively observation is assigned to from its nearest average point
Wherein each xpAll only it is assigned to a determining clusterIn, although theoretically it may be assigned to 2 A or more clusters.
It updates: for each cluster obtained in the previous step, with the center of fiqure of observation in cluster, as new average point
It has converted images into after the document being made of visual vocabulary, has carried out document annotation using LDA model.LDA is A kind of typical bag of words, i.e. a, it considers that document is a set being made of one group of word, without suitable between word and word Sequence and successive relationship.One document may include multiple themes, each word is given birth to by one of theme in document At.
Step 2: the mode of a document structure tree is as follows in LDA model:
Sampling generates the theme distribution θ of document i from Di Li Cray distribution αi, from the multinomial distribution θ of themeiMiddle sampling Generate the theme z of j-th of word of document ii,j, sampling generates theme z from Di Li Cray distribution βi,jWord distributionFrom word The multinomial distribution of languageMiddle sampling generates vocabulary wi,j
In order to use LDA, the crucial reasoning problems for needing to solve are the posteriority point for giving a document calculations hidden variable Cloth:
Step 3: utilizingη and δ generates response variable y, whereinThis variable, which can be, gives film Star quantity downloads the number of online article or the classification of document.We combine modeling document and response, most can be pre- to find Survey the potential theme of the response variable of the following unmarked document.
SLDA adapts to various types of responses using probability mechanism identical with generalized linear model: unrestricted reality Actual value, the restrained true value (such as fault time) being positive, orderly or unordered class label, nonnegative integer (such as count number According to) and other types.The distribution of response variable is a generalized linear model (GLM):
GLM frame improves flexibility, as long as the distribution of response variable can be write as form above, so that it may model. The corresponding different h (y, δ) of different distribution and
Then (3) formula becomes
Wherein
But combination of the θ and β on potential theme is difficult to directly calculate, using the LDA reasoning variational algorithm based on convexity.
In order to handle the multivariable binary responses variable of annotation data, response variable can be distributed and be modeled as multivariable Bernoulli Jacob is distributed, and its probability is defined with logical connection function.Wherein yi∈{0, 1 },It is logical function,{aiiBe vocabulary i regression coefficient.
Step 4: variation calculates.The basic thought of variation reasoning based on convexity is to utilize the acquisition pair of Jensen inequality The adjustable lower bound of number likelihood.This race is characterized as being following variation distribution:
Wherein Di Li Cray parameter γ and multiple parameters (φ12,...,φN) it is all free variational parameter.
It is one optimization problem of setting in next step after specifying simplified family of probability distribution, which determines that variation is joined The value of number γ and φ.The requirement for finding the most compact lower bound of log-likelihood is converted into following optimization problem
By minimizing the opposite entropy minimization between variation distribution and true Posterior distrbutionp p (θ, z | w, α, β), completion is above-mentioned Target.
Carry out the log-likelihood of restricted document by using Jensen inequality first.For the sake of simplicity, parameter γ is omitted And φ, have
Jensen inequality is that any variation is distributed the log-likelihood of q (θ, z | γ, φ) and provides a lower limit.Above formula Left-right difference, that is, variation distribution and the really relative entropy between Posterior distrbutionp.The right first item is the joint to hiding and observation variable The expectation of the logarithm of probability;Section 2 is the entropy of variation distribution, H (q)=- Eq[logq(θ,z)].Enable L (γ, φ;α, β) it indicates The right of above formula, above formula become
Logp (w | α, β)=L (γ, φ;α,β)+D(q(θ,z|γ,φ)||p(θ,z|w,α,β)) (9)
This shows to maximize lower bound L (γ, φ by γ and φ;α, β) it is equivalent to minimize variation posterior probability and true Relative entropy between posterior probability.
Lower bound is extended by using the factorization of p and q:
Then above formula is launched into the equation about model parameter (α, β) and variational parameter (γ, φ).First three items are
Variation distribution entropy be
Note that the variation objective function Section 4 of sLDA is different from LDA, that is, give the response variable of potential theme distribution It is expected that log probability.
It can be seen that it is related with two expectations to calculate lower bound.First is contemplated to beSecond is contemplated to beIt can directly be calculated in some models, but usually require approximation.Specific mode wants the class of combining response Type.
Here logical function it is expected that the calculating of E [logp (y | z, η, δ)] becomes complicated.Using convex duality and incite somebody to action Logical function is expressed asChi square function a point supremum.More specifically,
Wherein,
Step 5: maximizing L lower bound by variational parameter γ and φ;
Step 5 one utilizes φniIt maximizes, φniIndicate the probability that n-th of visual vocabulary is generated by hiding theme i, thereforeIt include φ by separationniItem and add Lagrange multiplier appropriate to form Lagrangian:
For simplicity, the parameter of L is abandoned, and subscript φniIndicate that we those of only remain in L item, this is φniFunction.It obtains about φniDerivative:
Wherein βivIt indicates for suitable v'sV is v-th of word of dictionary.As can be seen that for φni Update rely onWhen some, such as Gauss or Poisson distribution, we can obtain accurate result, He when, need the optimization method based on gradient.
It calculates in the case where response variable obeys Bernoulli Jacob's distribution occasion, parameter phinMore new formula
Step 5 two utilizes γiMaximize above formula, γiIndicate i-th of component part of posteriority Di Li Cray parameter.Include γiItem:
To γiDerivation,
Enabling derivative is zero,
Since equation (19) depend on variation multinomial φ, complete variation reasoning needs between equation (16) and (19) Alternately.Until boundary is restrained.
Step 6: parameter Estimation.
Step 6 one realizes that lower bound maximizes using condition multiple parameters β.It extracts the item containing β and Lagrange is added Multiplier.
To β derivation and to enable derivative be zero, is obtained
Finally acquire
Step 6 two realizes that lower bound maximizes using Di Li Cray parameter alpha.Item containing α is as follows
To αiDerivation obtains
Derivative relies on αj, wherein i ≠ j, therefore the method that we must use iteration find qualified α.Particularly, Its Hessian matrix can write H=diag (h)+1z1ΤForm,
Therefore linear session Newton iterative can be called.Then the update of α can write
αnewold-H(αold)-1g(αold) (25)
Wherein H (α) and g (α) is the Hessian matrix and gradient of point α respectively.
Multiplied by gradient, i-th of component is obtained:
Wherein
Step 6 three estimates GLM parameter.GLM parameter is η and σ2
Wherein μ ()=EGLM[Y|·]。
To σ2Derivation,Upper assessment:
It can be with calculation equation (29), because exactly or approximately being had evaluated while being optimized to coefficient η The summation of rightmost.According to h (y, δ) and its to the partial derivative of δ, we are availableIt can be closing form, can also To be one dimensional numerical optimization.By calculating, the parameter estimation result of model is finally obtained:
Step 7: being predicted.Given one does not have headed new document w, and task is to be inferred to most probable title, i.e., Response variable y.In order to complete this task, we calculate using variation until convergence on document, and utilize φnCome with q (θ) Approximate solution conditional probability p (y | w), it is as follows:
WhereinBy retaining title, we release φ from test imagen.And we compare The performance of sLDA-bin and Corr-LDA.For the title quality of measurement model, we calculate title prediction as defined below Probability
The protrusion effect of the present invention compared with the prior art can be specifically described according to Fig. 1 to Fig. 7:
Fig. 1 shows the graphical model structure of sLDA-bin, due to sLDA model heading is predicted it is outstanding Performance, we use the structure of sLDA, it can be seen that sLDA-bin is made that adjustment to the structure of Corr-LDA, deletes Variable, image subject are used directly for prediction title, are integrated (Corr-LDA need without the posterior probability to title The step of wanting).And sLDA is extended, so that model is capable of handling multivariable binary responses variable, eliminate sLDA only The deficiency that a response variable can be handled, for image annotation in further detail, therefore image retrieval is also more convenient and accurate.
The structure for the Corr-LDA that Fig. 2 is shown, Corr-LDA is a kind of currently popular LDA model.Not with SLDA-bin Together, Corr-LDA limits each heading associated with a specific image region.It can be seen that each heading is logical It crosses and selects an image-region come what is generated first, be embodied in the x variable additionally having more in Fig. 2.
Estimated performance analysis under different themes number:
Fig. 3 illustrates the comparison of sLDA-bin model and unsupervised LDA model in prediction film review.We use five Roll over the quality of cross validation assessment prediction.We measure the relevance between prediction and response variable.Relevance is stronger, illustrates to predict Closer with response variable observation, forecast quality is higher.Figure it is seen that when number of topics is lower, unsupervised LDA's Forecast quality is higher than sLDA-bin, but increasing along with number of topics, the forecast quality of sLDA-bin can be promoted rapidly, surpasses More unsupervised LDA model.This is because sLDA-bin uses the structure of sLDA, the theme variable and response variable of model are straight Connect relevant, therefore number of topics increases, and facilitates us and obtains the more information about response variable, for correctly predicted response Variable has valuable help
Fig. 5 and Fig. 6 is respectively shown under vocabulary number N=256 and 512, the shadow of the prediction probabilities of two kinds of models by number of topics K It rings.Five groups of data between we have randomly selected from K=5 to K=60 are compared.First it can be seen that, either N=256 Or N=512, the prediction curve of sLDA-bin are consistently higher than Corr-LDA, and good predictive ability is attributed to the fact that image by us Direct correlation model between theme and corresponding title.By eliminating hidden variable, image subject is used directly for predicting Title is integrated (step required for Corr-LDA) without the posterior probability to header topics.It is fixed in vocabulary number In the case where, sLDA-bin is more sensitive to number of topics, and with the growth of number of topics, sLDA-bin prediction effect, which has, obviously to be mentioned It rises, but if number of topics is excessive, will lead to prediction decline.This is because the response variable and theme of sLDA-bin have direct pass Connection, excessive number of topics will lead to over-fitting, and prediction effect is bad instead.So sLDA-bin performance, which plays, depends on number of topics With the cooperation of vocabulary number.
Estimated performance analysis under different vocabulary numbers:
As shown in Figure 4, fixed number of topics k=30, when vocabulary number N is respectively 128,256 and 512, sLDA-bin is Give the prediction effect better than Corr-LDA model.Then we conclude that, as we increase view in visual dictionary Feel vocabulary quantity, we also obtain better prediction probability, this point comparison diagram 5 and Fig. 6 it is also seen that.Because of vocabulary Several increases, model can obtain the detailed information of more images, thus preferably know the content information of image, it is more preferable to carry out The matching of image-title.
Image classification performance evaluation:
Fig. 7 shows that sLDA-bin model and Corr-LDA model, can be with to the annotation accuracy rate of part common objects Find out, in each single item, the annotation accuracy rate of sLDA-bin is all higher than Corr-LDA.On the whole, the accuracy rate of sLDA-bin Averagely exceed Corr-LDA about 0.04.Because Corr-LDA limits each heading associated with a specific image region. In fact, some annotation words describe entire scene on the whole, it the use of this restrictive association model is to be grossly inaccurate.It is logical Cross and recurrence operation carried out to theme ratio, sLDA-bin allow each heading by from all image-regions theme and spy Determine the influence of image-region, this depends on corresponding regression coefficient.Therefore, correlation model of the invention is more general, accurately How reflection generates the process really annotated.
The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to The protection scope of the appended claims of the present invention.

Claims (6)

1. a kind of method that sLDA model based on extension carries out title annotation to image characterized by comprising
Step 1: extracting the local feature of image, and obtain N number of view of image using K-means algorithm for the image of input Feel vocabulary wn, wherein n ∈ { 1,2..., N }, N are positive integer;
Step 2: indicating the Posterior distrbutionp of given document hidden variable using LDA model:
Wherein α and β is model parameter, and z and θ are theme variable and theme ratio respectively;
Step 3: introducing the parameter η and δ of response variable y and response variable in step 2, while will and response variable be divided Cloth is defined as multivariable Bernoulli Jacob distribution, i.e., indicates formula (3) are as follows:
Step 4: according to the LDA reasoning variational algorithm based on convexity by formula (5) similar toWherein Di Li Cray parameter γ and multiple parameters (φ12,...,φN) it is freedom Variational parameter;znFor the theme variable of n-th of word;By the desired difference of logp (θ, z, w | α, β, η, δ) and q (θ, z | γ, φ) Value is denoted as L;
Step 5: seeking the variational parameter γ and φ that the lower bound of L can be made to reach maximum value;
Step 6: estimation model parameter ψ={ α, β, η, δ };
Step 7: according to the distribution p of model parameter ψ and variational parameter γ and φ predicated response variable y (y | w).
2. the method that the sLDA model based on extension carries out title annotation to image according to claim 1, it is characterised in that: Step 3 specifically:
It utilizesη and δ generates response variable y, whereinIf the distribution of response variable y meets generalized linear model:
WhereinThen formula (3) can be expressed as
Wherein
3. the method that the sLDA model based on extension carries out title annotation to image according to claim 2, it is characterised in that: Step 4 specifically:
It is approximately by formula (5) by following formula
Enable L (γ, φ;α, β) expression (8) the right, formula (8) is expressed as
Logp (w | α, β)=L (γ, φ;α,β)+D(q(θ,z|γ,φ)||p(θ,z|w,α,β)) (9)
L is write by formula (10) by using the factorization of p and q:
4. the method that the sLDA model based on extension carries out title annotation to image according to claim 3, it is characterised in that: Step 5 specifically:
Step 5 one utilizes φniMaximize the lower bound of L, φniIndicate the probability that n-th of visual vocabulary is generated by hiding theme i, ThereforeAnd φ is included by separationniItem and add Lagrange multiplier appropriate to form Lagrangian:
ψ (x) is double gamma functions;
It calculates about φniDerivative:
Wherein βivIt indicates for suitable v'sV is v-th of word of dictionary;
It further obtains in the case where response variable obeys Bernoulli Jacob's distribution occasion, parameter phinMore new formula
Step 5 two utilizes γiMaximize the lower bound of L, γiIndicate i-th of component part of posteriority Di Li Cray parameter;Include γiItem:
To γiDerivation:
Enabling derivative is zero:
Iterative equation (16) to (19) is restrained until boundary, and then obtains the variational parameter that the lower bound of L can be made to reach maximum value γ and φ.
5. the method that the sLDA model based on extension carries out title annotation to image according to claim 4, it is characterised in that: Step 6 specifically:
Step 6 one, the formula for acquiring parameter beta are as follows:
Step 6 two, the process for acquiring parameter alpha are as follows: for formula (22),
Derivation is carried out to obtain
The value of α is sought by Newton iteration method to formula (23);Wherein M indicates the number of documents of training set;Footmark d indicates d Piece document;
Step 6 three acquires parameter η and σ2Process are as follows:
Wherein μ ()=EGLM[Y|·];
To σ2Derivation,Upper assessment
By calculating, parameter estimation result is finally obtained:
By parameter alphai、βij、ηiAnd δiIt is combined and model parameter ψ={ α, β, η, δ } can be obtained.
6. the method that the sLDA model based on extension carries out title annotation to image according to claim 5, it is characterised in that: Step 7 specifically:
To not have headed new document w as input, and utilize φnCome approximate solution conditional probability p (y | w) with q (θ), as follows:
WhereinP (y | w) for inferring the new most probable heading of document w.
CN201810759844.2A 2018-07-11 2018-07-11 Method for performing title annotation on image based on expanded sLDA model Active CN108984726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810759844.2A CN108984726B (en) 2018-07-11 2018-07-11 Method for performing title annotation on image based on expanded sLDA model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810759844.2A CN108984726B (en) 2018-07-11 2018-07-11 Method for performing title annotation on image based on expanded sLDA model

Publications (2)

Publication Number Publication Date
CN108984726A true CN108984726A (en) 2018-12-11
CN108984726B CN108984726B (en) 2022-10-04

Family

ID=64537058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810759844.2A Active CN108984726B (en) 2018-07-11 2018-07-11 Method for performing title annotation on image based on expanded sLDA model

Country Status (1)

Country Link
CN (1) CN108984726B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487185A (en) * 2020-11-27 2021-03-12 国家电网有限公司客户服务中心 Data classification method in power customer field

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102112987A (en) * 2008-05-30 2011-06-29 微软公司 Statistical approach to large-scale image annotation
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model
CN103810500A (en) * 2014-02-25 2014-05-21 北京工业大学 Place image recognition method based on supervised learning probability topic model
CN103942274A (en) * 2014-03-27 2014-07-23 东莞中山大学研究院 Labeling system and method for biological medical treatment image on basis of LDA
CN106980867A (en) * 2016-01-15 2017-07-25 奥多比公司 Semantic concept in embedded space is modeled as distribution

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102112987A (en) * 2008-05-30 2011-06-29 微软公司 Statistical approach to large-scale image annotation
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model
CN103810500A (en) * 2014-02-25 2014-05-21 北京工业大学 Place image recognition method based on supervised learning probability topic model
CN103942274A (en) * 2014-03-27 2014-07-23 东莞中山大学研究院 Labeling system and method for biological medical treatment image on basis of LDA
CN106980867A (en) * 2016-01-15 2017-07-25 奥多比公司 Semantic concept in embedded space is modeled as distribution

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
D. PUTTHIVIDHYA 等: "Supervised topic model for automatic image annotation", 《 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING》 *
李晓旭: "基于概率主题模型的图像分类和标注的研究", 《中国博士学位论文全文数据库 信息科技辑》 *
赵业东: "基于图像的交通场景理解", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487185A (en) * 2020-11-27 2021-03-12 国家电网有限公司客户服务中心 Data classification method in power customer field

Also Published As

Publication number Publication date
CN108984726B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
US11816888B2 (en) Accurate tag relevance prediction for image search
US11727243B2 (en) Knowledge-graph-embedding-based question answering
US11775578B2 (en) Text-to-visual machine learning embedding techniques
US11604822B2 (en) Multi-modal differential search with real-time focus adaptation
US11170262B2 (en) Training system, training device, method for training, training data creation device, training data creation method, terminal device, and threshold value changing device
CN111966917B (en) Event detection and summarization method based on pre-training language model
US10235623B2 (en) Accurate tag relevance prediction for image search
CN108399414B (en) Sample selection method and device applied to cross-modal data retrieval field
US8027977B2 (en) Recommending content using discriminatively trained document similarity
US20190251471A1 (en) Machine learning device
JP5506722B2 (en) Method for training a multi-class classifier
CN111324752B (en) Image and text retrieval method based on graphic neural network structure modeling
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
US8326787B2 (en) Recovering the structure of sparse markov networks from high-dimensional data
CN116561388A (en) Data processing system for acquiring labels
CN113535949B (en) Multi-modal combined event detection method based on pictures and sentences
CN111709475B (en) N-gram-based multi-label classification method and device
CN108984726A (en) A method of the sLDA model based on extension carries out title annotation to image
GB2584780A (en) Text-to-visual machine learning embedding techniques
CN111767402B (en) Limited domain event detection method based on counterstudy
CN111177384B (en) Multi-mark Chinese emotion marking method based on global and local mark correlation
CN116071636B (en) Commodity image retrieval method
CN116108156B (en) Topic law retrieval method based on cyclic association robust learning
JP6102401B2 (en) Image labeling method and apparatus
CN117891939A (en) Text classification method combining particle swarm algorithm with CNN convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221205

Address after: 212,000 First floor, No. 103, Zhongshan West Road, Runzhou District, Zhenjiang, Jiangsu

Patentee after: Zhenjiang Kangda Lianhe Intelligent Technology Co.,Ltd.

Address before: 150080 No. 74, Xuefu Road, Nangang District, Heilongjiang, Harbin

Patentee before: Heilongjiang University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230113

Address after: 150000 No. 74, Xuefu Road, Nangang District, Heilongjiang, Harbin

Patentee after: Heilongjiang University

Address before: 212,000 First floor, No. 103, Zhongshan West Road, Runzhou District, Zhenjiang, Jiangsu

Patentee before: Zhenjiang Kangda Lianhe Intelligent Technology Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231117

Address after: 509 Kangrui Times Square, Keyuan Business Building, 39 Huarong Road, Gaofeng Community, Dalang Street, Longhua District, Shenzhen, Guangdong Province, 518000

Patentee after: Shenzhen Litong Information Technology Co.,Ltd.

Address before: 150000 No. 74, Xuefu Road, Nangang District, Heilongjiang, Harbin

Patentee before: Heilongjiang University