CN113343679A - Multi-modal topic mining method based on label constraint - Google Patents
Multi-modal topic mining method based on label constraint Download PDFInfo
- Publication number
- CN113343679A CN113343679A CN202110762186.4A CN202110762186A CN113343679A CN 113343679 A CN113343679 A CN 113343679A CN 202110762186 A CN202110762186 A CN 202110762186A CN 113343679 A CN113343679 A CN 113343679A
- Authority
- CN
- China
- Prior art keywords
- document
- text
- topic
- label
- visual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000005065 mining Methods 0.000 title claims abstract description 14
- 230000000007 visual effect Effects 0.000 claims abstract description 121
- 238000009826 distribution Methods 0.000 claims abstract description 64
- 238000005070 sampling Methods 0.000 claims abstract description 12
- 239000000126 substance Substances 0.000 claims description 15
- 239000011541 reaction mixture Substances 0.000 claims description 3
- 238000007418 data mining Methods 0.000 abstract description 6
- 230000002349 favourable effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000013476 bayesian approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-modal subject mining method based on label constraint, which comprises the following steps: 1. the method comprises the steps of building a data set in the multi-modal document, 2 modeling document tag theme distribution, 3 modeling text tag themes and visual tag themes in the document, 4 building a multi-modal theme model based on tag constraint, and 5 utilizing a collapse Gibbs sampling algorithm to learn parameters. When the method is used for dealing with the associated text and image data with the tags, the multi-modal theme can be learned quickly and accurately, so that favorable support is provided for data mining tasks such as recommendation and retrieval.
Description
Technical Field
The invention relates to the technical field of topic mining of multi-modal data, in particular to a multi-modal topic mining method based on label constraint.
Background
The data mining task is a typical data-driven process, and a large amount of data has great significance for learning accurate results. With the rapid development of internet technology and the widespread use of various website platforms (e.g., Facebook, twitter), the amount of multimodal data is increasing. Some typical websites, such as microblog, surf, and panning, not only allow users to upload and share their multimodal data, but also allow them to provide relevant semantic descriptive terms. Moreover, the associated text and image not only have good correspondence, but also are easy to understand the semantic content of the text and the image. For some data mining tasks, such as recommendations, image retrieval and classification, jointly modeled text and images with tags are necessary.
In recent years, there has been increasing research on data mining. For example, the document [ undergradingLarge-Scale Dynamic purchasing Behavior,2021] carries out topic modeling from historical purchasing data of consumers, and can understand the purchasing Behavior of the consumers; the method comprises the following steps of (1) carrying out theme modeling on product data in a Probalistic Topic Model for Hybrid Recommander Systems, A Stochastical Variational Bayesian Approach,2018, concisely describing products according to hidden themes, and discovering consumer preferences through the themes so as to design a recommendation system; document [ discriminating Sketch Topic Model With Structural Constraint for SAR Image Classification,2020] classifies radar images using a method of Topic Model; a document [ Online Multi-modal Multi-expert Learning for Social Event Tracking,2018] analyzes media data by using a method of a Multi-modal topic model, and automatically identifies events; document [ Image Tag referenced by regulated Latent dictionary, 2014] refines tags using a topic model for accomplishing the task of Image retrieval. However, none of these methods is capable of processing tagged associated text and image data. Furthermore, learning large-scale data through the gibbs sampling algorithm results in a slow learning process.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-modal topic mining method based on label constraint, so that a multi-modal topic can be rapidly and accurately learned when large-scale multi-modal data is dealt with, and the data mining speed and accuracy are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a multi-modal topic mining method based on label constraint, which is characterized by comprising the following steps of:
step1, constructing a data set D of the multi-modal document;
step 1.1, constructing a text content set in the multi-modal document, and recording the text content set asWherein the content of the first and second substances,represents the text data in the mth text content and haswm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
step 1.2, constructing a visual content set in the multi-modal document, and recording the visual content set asWherein the content of the first and second substances,representing image data in the mth piece of visual content, and havingvm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ1,Λ2,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag spaceL is the number of different tags; l is any label serial number in the label space;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documentsWherein the content of the first and second substances,representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document asWherein, KjIndicates the number of topics associated with label j, θm,j,kRepresenting the fun of tag j in the mth article of multi-modal documentation on the kth topicInterest weight, K ∈ {1, 2.,. Kj},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label jObey parameter is betawHas a dirichlet distribution ofWherein T represents the number of non-repeating text words in the text content set,representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label jObey parameter is betavHas a dirichlet distribution ofWherein P represents the number of non-repeating visual words in the set of visual content,representing interest weight of the kth subject under the label j on the pth visual word;
step 4.1, define the topic number of all text words in the mth multi-modal document asWherein the content of the first and second substances,a topic number representing the t-th text word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Step 4.2, define the topic number of all visual words in the mth multi-modal document asWherein the content of the first and second substances,a topic number representing the p-th visual word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
step 5.1, calculating the observed text word W and the unobserved ith tag and the subject number z of the text content of all the multimodal documents by using the formula (1)wIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βw,βv);
p(W,V,zw,zv,l|α,βw,βv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there isxkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.1.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.1.3, calculating the generation probability p (z) of the tag topic of all the multi-modal documents by using the formula (4)w,zv,l|α);
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.2, solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.3, solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
In the formula (6), the reaction mixture is,the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
step 5.4, repeatedly circulating the step 5.2 and the step 5.3, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
step 5.5, calculating the interest weight theta of the kth topic under the label j in the mth multi-mode by using the formula (7)m,j,k:
Step 5.5, calculating the interest weight of the kth subject under the label j on the text word e by using the formula (8)
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
step 5.5, calculating the interest weight of the kth topic under the label j on the visual word f by using the formula (9)
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention jointly models text, image and label information: associated text and image datasets with tags. Therefore, the method is more convenient and valuable in practical application. Secondly, the multi-modal topics learned from the topic model can be associated with the tags, so that the semantic gap is effectively reduced, and the understanding and the explanation of the meaning of the topics are facilitated. In addition, visual information is integrated into the model, so that the model has good interpretability.
2. The collapse Gibbs sampling method is designed, so that the expandability of the method is more efficient, more accurate and easier to expand to big data. When dealing with large-scale multi-modal data, valuable subjects can be learned more quickly.
Drawings
FIG. 1 is a probability model diagram of a multi-modal topic mining method based on tag constraint according to the present invention.
FIG. 2 is a flow chart of a collapse Gibbs sampling algorithm of the multi-modal topic mining method based on tag constraint.
Detailed Description
In the embodiment, a multi-modal topic mining method based on tag constraint is designed for a tagged associated text and image data set, tags are used as supervision information and introduced into a topic model, a background tag facing the whole data set is introduced, a collapse Gibbs sampling method is adopted to carry out approximate estimation on the model, and the method is suitable for learning out valuable multi-modal topics and is applied to data mining tasks such as recommendation or classification. The method comprises the following specific steps:
step1, constructing a data set D of the multi-modal document;
the probability model diagram shown in fig. 1 shows the following symbols: w represents a text content set, V represents a visual content set, Λ represents a tag content set, l is any tag sequence number in a tag space, M represents the number of multimodal documents, and N represents a tag sequence number in a tag spacedRepresenting the number of words in the content of the nth piece of text; l isdRepresenting the number of words in the d-th piece of visual content; k represents the number of topics in the multimodal document; kdRepresenting the number of topics in the d-th multimodal document; θ is an M M matrix representing the distribution of topics in the multimodal document; phi is awIs a K multiplied by T matrix which represents the distribution of text subject words in the multi-modal document; phi is avIs a K multiplied by P matrix, which represents the distribution of visual subject words in the multi-modal document; α is a parameter of the topic distribution of the multimodal document, βwIs a text subject term distribution parameter, betavIs a visual subject word distribution parameter.
Step 1.1, constructing a text content set in the multi-modal document, and recording the text content set asWherein the content of the first and second substances,represents the text data in the mth text content and haswm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
and 1.2, constructing a visual content set in the multi-modal document, and coding the image by using the statistical frequency of the visual words contained in the image by using a visual word bag model (BOVW). Since the visual words in the image are not as readily available as in text, the visual words need to be extracted from the image. The Scale Invariant Feature Transform (SIFT) algorithm is the most widely used algorithm for extracting local invariant features from images at present. Therefore, the SIFT algorithm can extract invariant feature points from the image as visual words. The specific steps for representing the image by the BOVW model are as follows:
step 1: feature points are extracted from the image, which is important for understanding the image. Visual features are extracted from the image using the SIFT algorithm and all visual features are labeled.
Step 2: and after the feature extraction is finished, establishing a dictionary for the extracted image feature information by utilizing dictionary learning. In order to make the dictionary representative and effective, a large number of samples are randomly selected from images in the data set, and then the dictionary is learned through a K-mean clustering method.
And according to the feature points extracted by SIFT, randomly selecting H clustering centers, and iterating by using a K-means clustering algorithm until convergence is achieved to finally obtain the H clustering centers. Each cluster center is a visual vocabulary, and finally a visual dictionary is formed. And defining the Euclidean distance square sum as a distance formula of the K-means clustering algorithm.
Step 3: through dictionary learning, a specific vocabulary for image feature representation is obtained. With the SIFT algorithm, a number of feature points can be extracted from each image, which can be approximately replaced by visual words in a dictionary. Each image may be converted to a visual histogram in which the abscissa represents the visual word and the ordinate represents the number of times the visual word occurs.
Is marked asWherein the content of the first and second substances,representing image data in the mth piece of visual content, and havingvm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ1,Λ2,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag spaceL is the number of different tags; l is any label serial number in the label space; in the label spaceA global hidden label B is also arranged in the label;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documentsWherein the content of the first and second substances,representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document asWherein, KjIs shown andnumber of topics associated with label j, θm,j,kRepresents the interest weight of tag j in the mth article multimodal document on the kth topic, K ∈ {1,2j},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label jObey parameter is betawHas a dirichlet distribution ofWherein T represents the number of non-repeating text words in the text content set,representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label jObey parameter is betavHas a dirichlet distribution ofWherein P represents the number of non-repeating visual words in the set of visual content,representing interest weight of the kth subject under the label j on the pth visual word;
step 4.1, define all the texts in the mth multi-modal documentThe subject number of this word isWherein the content of the first and second substances,a topic number representing the t-th text word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Step 4.2, define the topic number of all visual words in the mth multi-modal document asWherein the content of the first and second substances,a topic number representing the p-th visual word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
the observed text word W and the unobserved ith tag are calculated using equation (1) with the topic number z of the text content of all multimodal documentswIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βw,βv);
p(W,V,zw,zv,l|α,βw,βv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there isxkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.3, calculating the generation probability p (z) of the tag topics of all the multi-modal documents by utilizing the formula (4)w,zv,l|α);
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
as shown in fig. 2, the flow chart of the collapse gibbs sampling algorithm includes the following specific steps:
the first step solves for the probability that the tth text word e in the mth multimodal document is assigned to the kth topic under the label j using equation (5):
solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
the second step solves for the probability that the tth visual word f in the mth multimodal document is assigned to the kth topic under the label j using equation (6):
solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
In the formula (6), the reaction mixture is,the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
thirdly, repeatedly circulating the first step and the second step, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
fourthly, calculating different interest weights:
calculating interest weight theta of kth subject under label j in mth multi-mode by using formula (7)m,j,k:
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.
Claims (1)
1. A multi-modal topic mining method based on label constraint is characterized by comprising the following steps:
step1, constructing a data set D of the multi-modal document;
step 1.1, constructing a text content set in the multi-modal document, and recording the text content set asWherein the content of the first and second substances,represents the text data in the mth text content and haswm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
step 1.2, constructing a visual content set in the multi-modal document, and recording the visual content set asWherein the content of the first and second substances,representing image data in the mth piece of visual content, and havingvm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ1,Λ2,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag spaceL is the number of different tags; l is any label serial number in the label space;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documentsWherein the content of the first and second substances,representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document asWherein, KjIndicates the number of topics associated with label j, θm,j,kRepresents the interest weight of tag j in the mth article multimodal document on the kth topic, K ∈ {1,2j},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label jObey parameter is betawHas a dirichlet distribution ofWherein T represents the number of non-repeating text words in the text content set,representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label jObey parameter is betavHas a dirichlet distribution ofWherein P represents the number of non-repeating visual words in the set of visual content,representing interest weight of the kth subject under the label j on the pth visual word;
step 4, establishing a multi-modal topic model based on label constraint;
step 4.1, define the topic number of all text words in the mth multi-modal document asWherein the content of the first and second substances,a topic number representing the t-th text word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Step 4.2, define the topic number of all visual words in the mth multi-modal document asWherein the content of the first and second substances,a topic number representing the p-th visual word in the m-th multimodal document, andcompliance parameter ofIs preferably a polynomial distribution of (a) and (b),andforming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
Step 5, applying a collapse Gibbs sampling method to obtain three interest weights thetam,j,k、Andlearning is carried out;
step 5.1, calculating the observed text word W and the unobserved ith tag and the subject number z of the text content of all the multimodal documents by using the formula (1)wIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βw,βv);
p(W,V,zw,zv,l|α,βw,βv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there isxkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.1.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.1.3, calculating the generation probability p (z) of the tag topic of all the multi-modal documents by using the formula (4)w,zv,l|α);
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.2, solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.3, solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
In the formula (6), the reaction mixture is,the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
step 5.4, repeatedly circulating the step 5.2 and the step 5.3, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
step 5.5, calculating the interest weight theta of the kth topic under the label j in the mth multi-mode by using the formula (7)m,j,k:
Step 5.5, calculating the interest weight of the kth subject under the label j on the text word e by using the formula (8)
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
step 5.5, calculating the interest weight of the kth topic under the label j on the visual word f by using the formula (9)
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762186.4A CN113343679B (en) | 2021-07-06 | 2021-07-06 | Multi-mode subject mining method based on label constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762186.4A CN113343679B (en) | 2021-07-06 | 2021-07-06 | Multi-mode subject mining method based on label constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113343679A true CN113343679A (en) | 2021-09-03 |
CN113343679B CN113343679B (en) | 2024-02-13 |
Family
ID=77482659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110762186.4A Active CN113343679B (en) | 2021-07-06 | 2021-07-06 | Multi-mode subject mining method based on label constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113343679B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070050388A1 (en) * | 2005-08-25 | 2007-03-01 | Xerox Corporation | Device and method for text stream mining |
US20080319974A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Mining geographic knowledge using a location aware topic model |
US8630975B1 (en) * | 2010-12-06 | 2014-01-14 | The Research Foundation For The State University Of New York | Knowledge discovery from citation networks |
CN105005558A (en) * | 2015-08-14 | 2015-10-28 | 武汉大学 | Multi-modal data fusion method based on crowd sensing |
CN105354280A (en) * | 2015-10-30 | 2016-02-24 | 中国科学院自动化研究所 | Social event tracking and evolving method based on social media platform |
CN105760507A (en) * | 2016-02-23 | 2016-07-13 | 复旦大学 | Cross-modal subject correlation modeling method based on deep learning |
KR20190008699A (en) * | 2017-07-17 | 2019-01-25 | 경희대학교 산학협력단 | Method, system and computer program for semantic image retrieval based on topic modeling |
CN113051932A (en) * | 2021-04-06 | 2021-06-29 | 合肥工业大学 | Method for detecting category of network media event of semantic and knowledge extension topic model |
-
2021
- 2021-07-06 CN CN202110762186.4A patent/CN113343679B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070050388A1 (en) * | 2005-08-25 | 2007-03-01 | Xerox Corporation | Device and method for text stream mining |
US20080319974A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Mining geographic knowledge using a location aware topic model |
US8630975B1 (en) * | 2010-12-06 | 2014-01-14 | The Research Foundation For The State University Of New York | Knowledge discovery from citation networks |
CN105005558A (en) * | 2015-08-14 | 2015-10-28 | 武汉大学 | Multi-modal data fusion method based on crowd sensing |
CN105354280A (en) * | 2015-10-30 | 2016-02-24 | 中国科学院自动化研究所 | Social event tracking and evolving method based on social media platform |
CN105760507A (en) * | 2016-02-23 | 2016-07-13 | 复旦大学 | Cross-modal subject correlation modeling method based on deep learning |
KR20190008699A (en) * | 2017-07-17 | 2019-01-25 | 경희대학교 산학협력단 | Method, system and computer program for semantic image retrieval based on topic modeling |
CN113051932A (en) * | 2021-04-06 | 2021-06-29 | 合肥工业大学 | Method for detecting category of network media event of semantic and knowledge extension topic model |
Non-Patent Citations (2)
Title |
---|
张志远;杨宏敬;赵越;: "基于吉布斯采样结果的主题文本网络构建方法", 计算机工程, no. 06 * |
赵臣升;吴国文;胡福玲;: "基于评论与转发的微博联合主题挖掘", 智能计算机与应用, no. 01 * |
Also Published As
Publication number | Publication date |
---|---|
CN113343679B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832663B (en) | Multi-modal emotion analysis method based on quantum theory | |
CN107590177B (en) | Chinese text classification method combined with supervised learning | |
CN111160037A (en) | Fine-grained emotion analysis method supporting cross-language migration | |
CN111966917A (en) | Event detection and summarization method based on pre-training language model | |
CN110969020A (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
EP3166020A1 (en) | Method and apparatus for image classification based on dictionary learning | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
CN106778878B (en) | Character relation classification method and device | |
CN110825850B (en) | Natural language theme classification method and device | |
CN111475622A (en) | Text classification method, device, terminal and storage medium | |
CN110008365B (en) | Image processing method, device and equipment and readable storage medium | |
Zhou et al. | Comparing the interpretability of deep networks via network dissection | |
Patel et al. | Dynamic lexicon generation for natural scene images | |
He et al. | Deep learning in natural language generation from images | |
CN115965818A (en) | Small sample image classification method based on similarity feature fusion | |
Dhar et al. | Bengali news headline categorization using optimized machine learning pipeline | |
Phukan et al. | An efficient technique for image captioning using deep neural network | |
Annisa et al. | Analysis and Implementation of CNN in Real-time Classification and Translation of Kanji Characters | |
CN107291686B (en) | Method and system for identifying emotion identification | |
CN113343679B (en) | Multi-mode subject mining method based on label constraint | |
CN110674293A (en) | Text classification method based on semantic migration | |
CN115906824A (en) | Text fine-grained emotion analysis method, system, medium and computing equipment | |
CN115687576A (en) | Keyword extraction method and device represented by theme constraint | |
Mousavi et al. | Collaborative learning of semi-supervised clustering and classification for labeling uncurated data | |
Yang et al. | Automatic metadata information extraction from scientific literature using deep neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |