CN104933029A

CN104933029A - Text image joint semantics analysis method based on probability theme model

Info

Publication number: CN104933029A
Application number: CN201510350978.5A
Authority: CN
Inventors: 朱海龙; 庞彦伟
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2015-06-23
Filing date: 2015-06-23
Publication date: 2015-09-23

Abstract

The invention provides a text image joint semantics analysis method based on a probability theme model. The text image joint semantics analysis method comprises the following steps: collecting a great quantity of texts comprising images, carrying out proper processing on the texts and the images, and forming an image-text pairs database in an image and text one-to-one way; utilizing samples to train to obtain a joint theme distribution model used for the text image semantics analysis; for an input image to be analyzed, extracting a visual characteristic vocabulary; applying a PLSA (Probabilistic Latent Semantic Analysis) model to the image and the visual characteristic vocabulary, and combining with text image joint theme distribution to obtain theme semantics of the image to be analyzed; matching the theme semantics obtained tin the previous step with the theme of the text in the image-text pairs database to select an optimal matching text; and for the obtained matching text, combining with an input image to carry out semantics evaluation. The text image joint semantics analysis method can obtain more semantics knowledge in addition to visualized scene object information.

Description

A kind of text image combination semantic analytical approach based on probability topic model

Art

The present invention relates to the text image semantic analysis in the fields such as computer vision, pattern analysis and artificial intelligence, specifically, particularly relate to the text image combination semantic analytical approach based on probability topic model.

Background technology

Image understanding (Image Understanding, IU) is exactly the semantic interpretation to image.It take image as object, and knowledge is core, in research image what there is the mutual relationship between what target (what is where), target scene position, what scene image is and a science of how application scenarios.Image understanding input be data, output be knowledge, belong to High-level content [1] [2] of picture research field.Semantic (Semantics), as the basic description carrier of knowledge information, complete picture material can be converted to can the class text language performance of intuitivism apprehension, plays vital effect in image understanding.

Semantic analysis in image understanding is huge in the potentiality of application.Semantic knowledge abundant in image can provide more accurate image search engine (Searching Engine), and the visual scene generated in intelligent digital picture photograph album and virtual world describes.Simultaneously, in the research of image understanding body, effectively can form the mutual driving system of " data-knowledge ", comprise significant context (Context) information and layer structure (Hierarchical Structured) information, can identify more fast, more accurately and detect the specific objective in scene.

Although semantic analysis is in very important position in image understanding, traditional image analysis method has all avoided matter of semantics substantially, only analyzes for pure view data.Trace it to its cause and mainly concentrate on two aspects: 1) be difficult between the visual expression of image and semanteme set up rationally association, describe inter-entity and produce huge semantic gap (Semantic Gap); 2) semantic itself have the polysemy of expression and uncertain (Ambiguity).At present, increasing research has started to pay close attention to above-mentioned bottleneck, and is devoted to valid model and method to realize the semanteme table in image understanding.

The semantic gap solved in image understanding needs the corresponding relation set up between image and text, the thinking solved can be roughly divided into three classes. and Article 1 thinking lays particular emphasis on the research of image itself, by building and the consistent model of picture material or method, by semanteme implicitly (Implicitly) incorporate wherein, set up the oriented contact of " text is to image ", core is how to be melted in model and method by semanteme.The achievement in research adopting this strategy to be formed focuses mostly in generation (Generative) mode and differentiation (Discriminative) mode.Article 2 thinking expresses from semantic syntax (Grammar) own and structural relation is started with, analyze its composition and mutual relationship, express by setting up image vision element structure similar with it, semantic description and analytical approach explicitly (Explicitly) being implanted comprises in the vision figure of syntactic relation, sets up the oriented contact of " image is to text ".Core is how to build the vision graph of a relation meeting semantic rules.Article 3 thinking is application-oriented, with CBIR (ImageRetrieval) for core, increases semantic vocabulary scale, builds the image retrieval inquiry system of multi-semantic meaning multi-user multi-process.

Solve semantic ambiguity problem own to need to set up rational Description standard and structural system.The cognitive scholar of Princeton university and linguist as far back as the eighties in 20th century with regard to research and establishment more unified class tree structure [3].Nowadays be regarded as the semantic relation normative reference that visual pattern research field is generally acknowledged, for the design of large-scale image data collection with in marking, effectively sorted out and unified polysemy word.In addition, some objective semantic retrieval evaluation criterions are also in positive heuristic process.

Semantic objective evaluation is the significant process of measure algorithm quality.Classic method generally carries out recall ratio/precision ratio evaluation for limited semantic classes, judge that whether the target in scene occurs, recall ratio/precision ratio the curve (Recall Precision Curve, RPC) of two evaluation index formation is general as basic evaluation object.

This patent mainly solves the semantic gap in image understanding, set up the corresponding relation between image and text, the probability topic model analysis method used for reference in text semantic analysis obtains text image combination semantic analytical approach, belongs to the generation method that image, semantic is analyzed.The comparative maturity that the semantic understanding of text developed to today, the hidden semantic analysis of probability (PLSA) [4] [5] model and hidden Di Li Cray is had to analyze (LDA) [6] model, application in conjunction with text and image has by probability topic models applying to abundant upper [7] [8] [9] of text, but is not the semantic understanding for image.

Use for reference text analyzing strategy, first need to build the object corresponded, the corresponding entire chapter document (Document) of entire image (Image), and the vocabulary (Lexicon) in document also needs corresponding corresponding visual vocabulary (Visual Word). the acquisition of visual vocabulary is generally by extracting the low-level feature of image to the significance analysis of image information, low-level feature is mostly from image data acquisition, comprise some special complex characteristic of simple point-line-surface characteristic sum, suitable visual vocabulary is generated again by the feature representation mode of robust, visual vocabulary generally has high reusability and some invariant features.

List of references:

[1]J.Gao,Z.Xie.Image Understanding Theory and Approach.Beijing,China:SciencePress,2009(in Chinese).

[2]Z.Xie,J.Gao.A Novel Method for Scene Categorization with Constraint MechanismBased on Gaussian Statistical Model[J].Acta Electronica Sinica,2009(in Chinese).

[3]D.Cruse.Lexical Semantics.Cambridge,UK:Cambridge University Press,1986.

[4]T.Hofmann.Unsupervised Learning by Probabilistic Latent Semantic Analysis[J].Machine Learning,2001.

[5]T.Hofmann.Probabilistic Latent Semantic Indexing[C].Proceedings of the 15thConference on Uncertainty in Artificial Intelligence.Stockholm,Netherlands,1999.

[6]D.M.Blei,A.Y.Ng,M.I.Jordan.Latent Dirichlet Allocation[J].Journal of MachineLearning Research,2003.

[7]M.Bressan,G.Csurka,Y.Hoppenot,J.M.Renders.Travel Blog Assistant System(TBAS)-An Example Scenario of How to Enrich Text with Images and Images with Text using OnlineMultimedia Repositories[C].VISAPP Workshop on Metadata Mining for Image Understanding,2008.

[8]Y.Pang,X.Lu,Y.Yuan,X.Li.Travelogue enriching and scenic spot overview basedon textual and visual topic models[J].International Journal of Pattern Recognition andArtificial Intelligence,2011.

[9]Y.Pang,Q.Hao,Y.Yuan,T.Hu,R.Cai,L.Zhang.Summarizing tourist destinationsby mining user-generated travelogues and photos[J].Computer Vision and Image Understanding,2011.

[10]Z.Xie,J.Gao.Object Localization Based on Visual Statistical ProbabilisticModels[J].Journal of Image and Graphics,2007,12(7):1234-1242(in Chinese).

[11]P.Moravech.Obstacle Avoidance and Navigation in the Real World by a SeeingRobot Rover.Technical Report,CMU-RI-TR-80-03,Pittsburgh,USA:Carnegie Mellon University.Robotics Institute,1980.

Summary of the invention

The object of the invention is to overcome traditional image analysis method and avoid matter of semantics, only analyze for pure view data, the quantity of information that theres is provided of image, semantic analytical approach based on image region segmentation and object mark is little, lower to the understanding level of picture material, the Background sources of image, the indefinite deficiency of relation of scene and target, propose a kind of text image combination semantic analytical approach based on probability topic model, utilize the advantage of the large data of network to excavate the abundant high-level semantic of image as far as possible.Technical scheme of the present invention is as follows:

Based on a text image combination semantic analytical approach for probability topic model, comprise step below:

Step 1: gather the text comprising image in a large number, suitable process is carried out to text and image, by image text composition diagram picture one to one-text pairs database;

Step 2: utilize the associating theme distribution model that these sample trainings obtain for text image semantic analysis; ;

Step 3: for the image to be analyzed of input, extracts visual signature vocabulary;

Step 4: to image and visual vocabulary application PLSA model thereof, in conjunction with text image associating theme distribution, the theme obtaining image to be analyzed is semantic;

Step 5: theme semanteme obtained in the previous step is mated with the theme of the text in image-text pairs database, chooses best matched text; The similarity measurement that coupling adopts can adopt the method for measuring similarity such as Euclidean distance, KL distance metric or included angle cosine;

Step 6: for the matched text obtained, carries out semantic evaluation in conjunction with input picture.

Adopt the method for the invention, analyzed by text image combination semantic, the abundanter semantic information except object except seeing intuitively from image and scene can be obtained.Relative to traditional simple image object and area marking, and the image understanding method of image Scene afterwards and object relationship, text image combination semantic analytical approach based on probability topic model by means of the strength of large data, when carrying out image understanding, employ the reference of more text image information, the more multi-semantic meaning knowledge except object scene information intuitively can be obtained, backstory of such as news picture etc.

Accompanying drawing explanation

By referring to accompanying drawing come directviewing description the present invention adopt the main body frame of technical scheme.

Fig. 1 is the probability enigmatic language justice analytical model of image

Fig. 2 is the learning process of the text image Conjoint Analysis model based on probability topic model

Fig. 3 be the present invention adopt the implementation process of text image combination semantic analytical approach

Embodiment

Here with the semantic understanding of news picture for instantiation carrys out its preferred forms of brief description, certain the present invention does not limit the classification of text image.

About the PLSA model that text semantic is analyzed, the joint probability distribution of document vocabulary is expressed as

P (w, d) = P (d) \underset{z}{Σ} P (w | z) P (z | d)

\*MERGEFORMAT(1.1)

Wherein, d represents document (document), w represents the vocabulary (word) in document, z represents the theme of document, P (d) represents the probability distribution of document, P (w|z) represents the conditional probability that theme vocabulary distributes, and P (z|d) represents the conditional probability that document subject matter distributes.

According to PLSA model, parameter to be estimated is θ ₁=P (w|z), P (z|d) | and w ∈ V, d ∈ C, 1≤j≤k}, wherein, C is collection of document, and V represents all lexical sets in C, and the likelihood function of document C can be expressed as:

\begin{matrix} L (θ) = \log P (C | θ) = \underset{d &Element; C}{Σ} \underset{w &Element; V}{Σ} c (w, d) \times \log P (w, d) \\ = \underset{d &Element; C}{Σ} \underset{w &Element; V}{Σ} c (w, d) \times \log Σ_{k = 1}^{K} P (z_{k} | d) P (w | z_{k}) \end{matrix}

\*MERGEFORMAT(1.2)

EM algorithm iteration is adopted to solve the distribution obtaining hidden variable theme,

\begin{matrix} P (z_{k} | d_{i}) = \frac{\underset{w}{Σ} c (w, d_{i}) P (z_{k} | d_{i}, w)}{\underset{w}{Σ} c (w, d_{i})}, k = 1... K, d_{i} &Element; C \\ P (w_{j} | z_{k}) = \frac{\underset{w}{Σ} c (w_{j}, d) P (z_{k} | d, w_{j})}{\underset{d}{Σ} \underset{w}{Σ} c (w, d) P (z_{k} | d, w)}, k = 1... K, w_{j} &Element; V \end{matrix}

\*MERGEFORMAT(1.3)

In research before, Hofmann uses for reference the hidden semantic analysis of probability (the Probabilistic Latent SemanticAnalysis in text analyzing, PLSA) model, " semanteme " is described and puts into latent space Z, generate corresponding " topic " (Topic) node, it describes as shown in Figure 1 substantially.D is the set that M image d forms, z represents the concept classification (being called Topics) of target, every width image is formed by the convex combination of K Topics vector, parameter iteration is carried out by maximal possibility estimation, likelihood function is the exponential form of p (w|d), with the frequency dependence of semantic vocabulary and image.Model alternately performs E process (calculating hidden variable posterior probability to expect) and M process (parameter iteration maximization likelihood) by expectation maximization (Expectation Maximization, EM) algorithm.The hidden variable semanteme ownership of decision process meets

z^{*} = \arg \max_{z} P (z | d),

\*MERGEFORMAT(1.4)

PLSA model sets up the corresponding relation between feature and image by hidden variable, each text unit is combined in proportion by several semantic concepts, semanteme distribution in essence in latent space remains sparse discrete distribution, be difficult to the adequate condition meeting statistics. in addition, the information that the combination of the semantic concept simply obtained according to image vision vocabulary has more compared with visual vocabulary itself is very limited, is difficult to provide about the abundanter semanteme of image and the relevant more background knowledges of image.

In order to more be enriched complete semantic knowledge from Image Visual Feature vocabulary, the present invention, on the basis of image PLSA model, in conjunction with the PLSA model of text, forms the text image combination semantic analytical approach based on probability topic model.For image text pair, first according to image PLSA model and corresponding text PLSA model, there is the associating theme distribution that identical theme principle sets up text image, then input picture combining image PLSA model and text image associating theme distribution are obtained to the Subject Concept of being correlated with, finally mate with the theme of these Subject Concepts with database Chinese version, text corresponding to Optimum Matching is namely as the semantic understanding to input picture.

After text and image are respectively through PLSA model treatment, due to text and the image polysemy of semantic meaning representation and diversity separately, in order to obtain common semantic meaning representation, adopt following formula in the decision process of choosing about theme variable:

z^{*} = \arg \underset{z}{m a x} P (z | d_{d o c}) + λ P (z | d_{i m g}),

\*MERGEFORMAT(1.5)

Choose the theme variable simultaneously expressing text and image, give text and image simultaneously, due to text and the intrinsic multi-to-multi characteristic of image, additional mode is taked in the imparting for theme, is only appended on text or image by the theme also do not given.λ is used for weighing the semantic weight of text and image.

Generally speaking, method of the present invention is as follows

Step 1: the Chinese news report collecting a large amount of band picture, news content and image content are separated and form man-to-man relation, if one section of news package is containing several pictures, then each width picture all forms man-to-man text-image pairs with text.

Step 2: in order to obtain text image associating theme probability distribution and be every group of text and Computer image genration theme.

Step 3: for the image to be analyzed of input, equally with the processing mode of image in step 2 extract visual signature vocabulary.

Step 4: to image and visual vocabulary application PLSA model thereof, in conjunction with text image associating theme distribution theme collection, obtains the theme vector of image to be analyzed.

Step 5: mated by the theme of theme vector obtained in the previous step with the text in image-text pairs database, chooses best matched text.The similarity measurement that coupling adopts can adopt the method for measuring similarity such as Euclidean distance, KL distance metric or included angle cosine.

Step 6: for the matched text obtained, calculates the recall ratio/precision ratio of object scene.Or semantic evaluation can also by carrying out Similarity measures, as the accuracy rate of semantic understanding by input picture and the pairing associated picture of most matched text.

Claims

1., based on a text image combination semantic analytical approach for probability topic model, comprise step below: