CN105760365A

CN105760365A - Probability latent parameter estimation model of image semantic data based on Bayesian algorithm

Info

Publication number: CN105760365A
Application number: CN201610142356.8A
Authority: CN
Inventors: 文珊; 曹良坤; 肖湘云; 余洁
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2016-03-14
Filing date: 2016-03-14
Publication date: 2016-07-13

Abstract

The invention provides a Bayesian probability estimation model, aiming at the problem of a semantic gap in image search at present. Probability latent semantic subject term (hereinafter referred to as a subject term) parameters are added on the basis of image semantic retrieval to establish a document-subject term-semantic characteristic word probability relation; a Bayesian probability estimation algorithm theory is combined to calculate posterior probability, wherein the posterior probability is equal to an arithmetic product of prior probability and a likelihood function; a maximum likelihood estimated value of the posterior probability is solved by using an EM algorithm, so that the Bayesian probability estimation model is established. In view of the document-subject term-semantic characteristic word relation, a document is mapped, and the document corresponds to an image, so that the image needed by a user is finally displayed.

Description

Based on the bayesian algorithm potential parameter estimation model of probability to image, semantic data

Technical field

The present invention relates to about Bayesian probability algorithm for estimating.

Background technology

Theory of probability is the rigorous inferential strong mathematics branch of logic, and Bayesian formula is the most important in theory of probability Formula, many terms that Bayes is used are used so far.The Major Difficulties of machine learning be " being set forth " morphology and The semantic difference " really expressed ".Produce the reason of this problem mainly: 1. a word can have multiple meaning with many Individual usage.2. synonym and near synonym, and according to different linguistic context or other factors, different words is also possible to represent phase The same meaning.

The Major Difficulties utilizing Bayesian probability estimation principle to carry out computing machine study is very easily, is also current optimum Elegant method.

The content characteristic retrieval of image, is up to the present the most ripe a kind of searching system, but his retrieval effectiveness is Unsatisfactory.Subject matter is exactly problem produced by machine learning, and the present invention utilizes Bayesian probability algorithm for estimating exactly The problem solving machine learning.

Summary of the invention

The present invention carries out semantic number by Bayesian probability latent semantic analysis to the semantic feature of unsupervised learning image According to excavation, set up the model of a probability potential applications algorithm.Mainly solve the subject matter 1 in image retrieval, semanteme Wide gap；2, adopted many words and polysemy produced retrieval data are not in place and repeated retrieval.

If the corresponding document of piece image, if the key words of image, semantic feature correspondence document, set up a document- Key words semantic vector co-occurrence matrix.According to Bayesian probability estimation principle, the computational methods of conditional probability, applied probability is dived The maximum similarity of document is calculated in descriptor.

The technical scheme is that Bayesian Estimation algorithm based on machine learning sets up probability latent semantic model, bag Include following steps:

Step 1, is excavated without supervision image, semantic knowledge base data by probability latent semantic analysis method (PLSA), builds Probability semantic relation between vertical document (image)---potential applications---word (image, semantic feature), it is simply that each group (D, W) All relevant with potential applications Z；

Step 2, calculates potential applications main body word maximal possibility estimation and expectation maximization with Bayesian probability algorithm for estimating；

Step 3, sets up probability latent semantic model.

And, in step 1, unsupervised learning is a kind of based on image, semantic knowledge base, and its generation is to be used by computer The linguistic indexing of pictures method of gauss hybrid models (Gaussian Mixture Model, GMM) automatically generates；

And, in step 2, when Bayesian Estimation calculates, prior probability distribution meets BETA distribution, according to prior probability, seemingly So function calculates posterior probability；

And, in step 2, potential function Z maximal possibility estimation is trained, and algorithm the more commonly used in maximal possibility estimation is just It it is expectation-maximization algorithm.Expectation-maximization algorithm is divided into two steps:

1. the estimation of Expectation Step implicit parameter

2. Maximization Step determines actual parameter, then does maximal possibility estimation according to actual parameter.

Accompanying drawing explanation

Accompanying drawing 1 potential descriptor figure.

Accompanying drawing 2 asymmetric probability latent semantic model.

The symmetrical probability latent semantic model of accompanying drawing 3.

Accompanying drawing 4 implements schematic diagram for the present invention.

Detailed description of the invention

Technical solution of the present invention is described in detail below in conjunction with drawings and Examples.

The present invention proposes to supervise nothing based on Bayesian probability algorithm the parameter of the probability potential applications of image, semantic data Estimate model, it is assumed that image, semantic knowledge base by computer with gauss hybrid models (Gaussian Mixture Model, GMM) linguistic indexing of pictures method automatically generates, the most independent between mark semanteme, tandem useless.In order to the most detailed Thin elaboration specific embodiments of the present invention, are further described below based on accompanying drawing, and implementing procedure is as follows.

Step 1, if the corresponding document space of piece image, the keyword space of the Feature Mapping of image to document.Make A document, i.e. one vector being made up of word is represented with bag-of-word model.Word, independent of text semantic, is neglected Omit word order in a document, the frequency that only performance word occurs in a document, a dimension of a word representation space Degree, thus obtain document word co-occurrence matrix, each item of this matrix represents i-th word in jth piece document Weight.Co-occurrence matrix has passed through term frequency-inverse document frequency (TF IDF) dimensionality reduction.

Step 2, sets up the probabilistic relation between document, document key word and potential scenic themes word, Fig. 1.By generally Rate latent semantic analysis sets up PLSA probabilistic model figure, Fig. 2.Wherein observed data D represents document, and W represents one in document Feature word, potential applications descriptor Z is non-observed data.D=(d₁, d_2,..., d_i) there is the document sets of N piece document；Z=(Z₁, Z_2,..., Z_k) document have K hidden variable descriptor, theme word set；W=(W₁, W_2,..., W_j) document have M Word, set of words.Generating a document and the text set of word, co-occurrence matrix A=[P (di, Wj)] | D | × | W | represents, d generation Table a certain piece document, W represents some word, and (di, Wj) represents document di and word the Wj frequency of occurrences, namely P simultaneously (di, Wj) is the number that two variablees occur simultaneously.Every a line of matrix A represents a document, and every string represents a list Word.Then P (di, Wj) is an observable number, according to the mapping of Fig. 1, introduces unobservable hidden variable Z, and Z makes Document and word are with good conditionsi separate, and Z belongs in hidden variable collection Z, and wherein the value of K is sized depending on experience, on the one hand Wish that K is sufficiently large, it is possible to be suitable for all of potential applications structure, if but excessive, it is easily introduced noise, will produce for using Certain impact；If too small, then can not manifest error and other minor details of sample, generally take 20 to 100 Between.A (di) from N(i=1,2,3 ... ..N) piece document sets selects the probability of a document di；A (di, Wj) is that word Wj exists The frequency occurred in document di；A (Wj | Zk) represent under conditions of unobservable hidden variable Zk determines, word wj goes out Existing conditional probability；A(Z_k| di) represent that document di belongs to kth under conditions of the probability of occurrence of given i-th document The probability of descriptor；

Utilize Bayesian probability algorithm for estimating, can the generation model of the co-occurrence data of defined terms-document according to the following procedure:

1, first from document sets, select a document di with probability A (di)；

2, again with probability A (Z in document di_k| di) select a hiding theme class variable Z_kProbability；

3, determine hiding class variable Z_kAfter, with probability A (Wj | Z_k) select a word Wj probability.So can be obtained by one Individual observed value (di, Wj), and hidden variable Z not being observed_kIt is dropped.Assume that document d and word w are at given theme list It is satisfied with independent same distribution, say, that implicit class Z and word W are not dependent on what specific document D generated under conditions of word z. Then data above generation process can be stated by following Bayesian probability estimation formulas:

A (di, Wj)=A (di) A (Wj | di) (1)

Wherein: A (di): be prior distribution, A (Wj | di): it is likelihood function, A (di, Wj): be Posterior distrbutionp.Bayes side Method is thought, the data of learning parameter=priori+observe.The prior distribution of priori=parameter choose+distributed constant Choose.According to bayesian theory, in order to reach the distribution that Posterior distrbutionp and prior distribution belong to same, choosing prior distribution is Beta is conjugated distribution:

(2)

2 formulas are substituted into 1 formula obtain

(3)

Observe Fig. 2 to describe and (2 formula), according to the feature of conjugation distribution, so they are symmetric function, Fig. 2 model conversation be Model shown in Fig. 3.By the general A of bayes rule turn around condition (z/d), (1) can be rewritten as equation below

(4)

According to (1) formula, A (di, Wj) is that posteriority 2 is distributed, A (Z₁) it is prior distribution, and A (Wj | Z_k) A(di| Z_k) be Likelihood function.Prior distribution Z₁Being uniformly distributed, according to Bayesian assumption, A (Z on interval (0,1)₁)=1.So from (4) formula In we to obtain sample (di, Wj) be the linear combination about implicit variable Z, be made up of two multinomial distribution, two The parameter of multinomial distribution is A (W | Z) and A (d | Z), and namely probability latent semantic model is former according to Maximum-likelihood estimation Reason, by asking for the maximum of following log-likelihood function, obtains the model parameter of potential applications.

Step 3, existing maximal possibility estimation comes computation model parameter A (W | Z) and A (Z | d), the public affairs of likelihood function Formula:

(5)

Here the word frequency of all words that (di) occurred in being document di and, his value is to model parameter A (W | Z) and A (d | Z) there is no any impact.Upper examination can be write as

(6)

Owing to symbol of suing for peace in this object function occurs in integration, therefore this letter Number is difficult to maximize, in order to solve this problem, it is possible to use EM algorithm.

Step 4, (Expectation Maximization Algorithm, translates again expectation maximization to EM algorithm Algorithm), it is a kind of iterative algorithm, for the maximum likelihood of the probability parameter model containing hidden variable (hidden variable) Estimate or maximum a posteriori estimate.In statistical computation, greatest hope (EM) algorithm is at probability (probabilistic) mould Finding parameter maximal possibility estimation or the algorithm of MAP estimation in type, wherein probabilistic model depends on cannot observe hidden Hide variable (Latent Variable).EM algorithm alternately calculates through two steps:

The algorithm flow of EM is as follows:

1. initialize distributed constant

V=A (Wj | Zk) } m*k, U={A (di | Zk) } n*k

With square error criterion K-Means algorithm, calculate the initial value of U, V parameter respectively.Idiographic flow:

(1), select N number of sample to be divided into disjoint K subset, calculate average α of each subset₁,α₂,......,α_kAnd β, If i-th subset has Ni sample, totalIf Si represents i-th subset.Wherein:

β is error sum of squares clustering criteria, and it is k cluster centre α₁,α₂,......,α_kRepresent k sample set S1, S2 ... produced total square-error during Sk, the cluster making β minimum is exactly the optimum knot under error sum of squares criterion Really；

(2), an alternative sample x is randomly choosed, if this；

(3) if Ni=1, then forward to (2), the most discontinuously；

(4), calculate；

(5), for j=1 used, 2 ..., if there is certain in kMake, then x is moved on to from Si In St；

(6), α is recalculated_iValue, and revise β；

(7), β value keep constant, then iteration terminates, and otherwise continues iteration, turn (2).The result calculated is put into U and V respectively Middle as initial value；

2. repeat until restraining:

The first step is to calculate expectation (E), utilizes the existing estimated value to hidden variable, calculates its maximum likelihood estimator；

(7)

The formula left side represents under the Probability Condition i-th document, jth word occur, obtains existing kth and implies the general of theme Rate, it is understood that for containing document and the jth word contribution degree to kth theme of k theme probability given one, I-th document obtaining existing jth word belongs to the probability of kth theme.Denominator on the right of formula represents to be needed to join model Two products of number A (W | Z) and A (Z | d) are normalized that (concrete normalized mode is: belong to the document Arbitrarily theme and the product of this theme contribution degree is rescued by this word equally and be molecule, then belongs to specific master with document Probability and the word of topic are denominator to this particular topic contribution degree, ask business.Second step is to maximize (M), maximizes The maximum likelihood value tried to achieve in E step is to calculate the value of parameter:

(8)

Calculate and give the word j contribution degree to implicit theme k:

(9)

Calculate and give the probability that document (i-th document) belongs to implicit theme k:

(10)

K=K in implicit theme, when being recycled toTime stop calculate.(5th) formula reaches maximum.Finally obtain Degree of association between the joint probability of (di, wj), i.e. (di, wj).

Claims

1. the technical scheme is that based on Bayesian probability algorithm the probability potential applications without supervision image, semantic data Parameter estimation model, its feature is, use Bayesian Learning Theory basic concept for excavate without supervision image, semantic Potential relation between data:

Step 1, sets up image document---potential applications Feature Words---word, the probabilistic relation between image, semantic feature three；

Step 2, uses Bayesian probability algorithm for estimating, and posterior probability=prior probability * likelihood function, with Bayesian assumption to priori Probability carries out paying value；

Step 3, utilizes EM algorithm to seek the maximum likelihood estimator of posterior probability；

Step 4, carries out EM initial value setting with K-m algorithm；

Step 5, E step calculates desired value；

Step 6, M step calculates parameter maximum likelihood estimator；

Step 7, sets loop parameter K, and k < continues EM to calculate, terminates to calculate during k=K during K.