CN113343679A - Multi-modal topic mining method based on label constraint - Google Patents

Multi-modal topic mining method based on label constraint Download PDF

Info

Publication number
CN113343679A
CN113343679A CN202110762186.4A CN202110762186A CN113343679A CN 113343679 A CN113343679 A CN 113343679A CN 202110762186 A CN202110762186 A CN 202110762186A CN 113343679 A CN113343679 A CN 113343679A
Authority
CN
China
Prior art keywords
document
text
topic
label
visual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110762186.4A
Other languages
Chinese (zh)
Other versions
CN113343679B (en
Inventor
姜元春
李�浩
钱洋
柴一栋
刘业政
孙见山
周凡
袁昆
梁瑞成
陶守正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202110762186.4A priority Critical patent/CN113343679B/en
Publication of CN113343679A publication Critical patent/CN113343679A/en
Application granted granted Critical
Publication of CN113343679B publication Critical patent/CN113343679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-modal subject mining method based on label constraint, which comprises the following steps: 1. the method comprises the steps of building a data set in the multi-modal document, 2 modeling document tag theme distribution, 3 modeling text tag themes and visual tag themes in the document, 4 building a multi-modal theme model based on tag constraint, and 5 utilizing a collapse Gibbs sampling algorithm to learn parameters. When the method is used for dealing with the associated text and image data with the tags, the multi-modal theme can be learned quickly and accurately, so that favorable support is provided for data mining tasks such as recommendation and retrieval.

Description

Multi-modal topic mining method based on label constraint
Technical Field
The invention relates to the technical field of topic mining of multi-modal data, in particular to a multi-modal topic mining method based on label constraint.
Background
The data mining task is a typical data-driven process, and a large amount of data has great significance for learning accurate results. With the rapid development of internet technology and the widespread use of various website platforms (e.g., Facebook, twitter), the amount of multimodal data is increasing. Some typical websites, such as microblog, surf, and panning, not only allow users to upload and share their multimodal data, but also allow them to provide relevant semantic descriptive terms. Moreover, the associated text and image not only have good correspondence, but also are easy to understand the semantic content of the text and the image. For some data mining tasks, such as recommendations, image retrieval and classification, jointly modeled text and images with tags are necessary.
In recent years, there has been increasing research on data mining. For example, the document [ undergradingLarge-Scale Dynamic purchasing Behavior,2021] carries out topic modeling from historical purchasing data of consumers, and can understand the purchasing Behavior of the consumers; the method comprises the following steps of (1) carrying out theme modeling on product data in a Probalistic Topic Model for Hybrid Recommander Systems, A Stochastical Variational Bayesian Approach,2018, concisely describing products according to hidden themes, and discovering consumer preferences through the themes so as to design a recommendation system; document [ discriminating Sketch Topic Model With Structural Constraint for SAR Image Classification,2020] classifies radar images using a method of Topic Model; a document [ Online Multi-modal Multi-expert Learning for Social Event Tracking,2018] analyzes media data by using a method of a Multi-modal topic model, and automatically identifies events; document [ Image Tag referenced by regulated Latent dictionary, 2014] refines tags using a topic model for accomplishing the task of Image retrieval. However, none of these methods is capable of processing tagged associated text and image data. Furthermore, learning large-scale data through the gibbs sampling algorithm results in a slow learning process.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-modal topic mining method based on label constraint, so that a multi-modal topic can be rapidly and accurately learned when large-scale multi-modal data is dealt with, and the data mining speed and accuracy are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a multi-modal topic mining method based on label constraint, which is characterized by comprising the following steps of:
step1, constructing a data set D of the multi-modal document;
step 1.1, constructing a text content set in the multi-modal document, and recording the text content set as
Figure BDA0003150338540000011
Wherein the content of the first and second substances,
Figure BDA0003150338540000012
represents the text data in the mth text content and has
Figure BDA0003150338540000021
wm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
step 1.2, constructing a visual content set in the multi-modal document, and recording the visual content set as
Figure BDA0003150338540000022
Wherein the content of the first and second substances,
Figure BDA0003150338540000023
representing image data in the mth piece of visual content, and having
Figure BDA0003150338540000024
vm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ12,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag space
Figure BDA0003150338540000025
L is the number of different tags; l is any label serial number in the label space;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documents
Figure BDA0003150338540000026
Wherein the content of the first and second substances,
Figure BDA0003150338540000027
representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document as
Figure BDA0003150338540000028
Wherein, KjIndicates the number of topics associated with label j, θm,j,kRepresenting the fun of tag j in the mth article of multi-modal documentation on the kth topicInterest weight, K ∈ {1, 2.,. Kj},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label j
Figure BDA0003150338540000029
Obey parameter is betawHas a dirichlet distribution of
Figure BDA00031503385400000210
Wherein T represents the number of non-repeating text words in the text content set,
Figure BDA00031503385400000211
representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label j
Figure BDA00031503385400000212
Obey parameter is betavHas a dirichlet distribution of
Figure BDA00031503385400000213
Wherein P represents the number of non-repeating visual words in the set of visual content,
Figure BDA0003150338540000031
representing interest weight of the kth subject under the label j on the pth visual word;
step 4, establishing a multi-modal topic model based on label constraint;
step 4.1, define the topic number of all text words in the mth multi-modal document as
Figure BDA0003150338540000032
Wherein the content of the first and second substances,
Figure BDA0003150338540000033
a topic number representing the t-th text word in the m-th multimodal document, and
Figure BDA0003150338540000034
compliance parameter of
Figure BDA0003150338540000035
Is preferably a polynomial distribution of (a) and (b),
Figure BDA0003150338540000036
and
Figure BDA0003150338540000037
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Figure BDA0003150338540000038
Step 4.2, define the topic number of all visual words in the mth multi-modal document as
Figure BDA0003150338540000039
Wherein the content of the first and second substances,
Figure BDA00031503385400000310
a topic number representing the p-th visual word in the m-th multimodal document, and
Figure BDA00031503385400000311
compliance parameter of
Figure BDA00031503385400000312
Is preferably a polynomial distribution of (a) and (b),
Figure BDA00031503385400000313
and
Figure BDA00031503385400000314
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
Figure BDA00031503385400000315
Step 5, applying a collapse Gibbs sampling method to obtain three interest weights thetam,j,k
Figure BDA00031503385400000316
And
Figure BDA00031503385400000317
learning is carried out;
step 5.1, calculating the observed text word W and the unobserved ith tag and the subject number z of the text content of all the multimodal documents by using the formula (1)wIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βwv);
p(W,V,zw,zv,l|α,βwv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
Figure BDA00031503385400000318
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there is
Figure BDA00031503385400000319
xkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.1.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
Figure BDA00031503385400000320
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.1.3, calculating the generation probability p (z) of the tag topic of all the multi-modal documents by using the formula (4)w,zv,l|α);
Figure BDA0003150338540000041
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.2, solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
Figure BDA0003150338540000042
Figure BDA0003150338540000043
Figure BDA0003150338540000044
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,
Figure BDA0003150338540000045
the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,
Figure BDA0003150338540000046
topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,
Figure BDA0003150338540000047
indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,
Figure BDA0003150338540000048
representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.3, solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
Figure BDA0003150338540000049
Figure BDA00031503385400000410
Figure BDA00031503385400000411
In the formula (6), the reaction mixture is,
Figure BDA0003150338540000051
the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,
Figure BDA0003150338540000052
topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,
Figure BDA0003150338540000053
indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,
Figure BDA0003150338540000054
indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
step 5.4, repeatedly circulating the step 5.2 and the step 5.3, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
step 5.5, calculating the interest weight theta of the kth topic under the label j in the mth multi-mode by using the formula (7)m,j,k
Figure BDA0003150338540000055
Step 5.5, calculating the interest weight of the kth subject under the label j on the text word e by using the formula (8)
Figure BDA0003150338540000056
Figure BDA0003150338540000057
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
step 5.5, calculating the interest weight of the kth topic under the label j on the visual word f by using the formula (9)
Figure BDA0003150338540000058
Figure BDA0003150338540000059
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention jointly models text, image and label information: associated text and image datasets with tags. Therefore, the method is more convenient and valuable in practical application. Secondly, the multi-modal topics learned from the topic model can be associated with the tags, so that the semantic gap is effectively reduced, and the understanding and the explanation of the meaning of the topics are facilitated. In addition, visual information is integrated into the model, so that the model has good interpretability.
2. The collapse Gibbs sampling method is designed, so that the expandability of the method is more efficient, more accurate and easier to expand to big data. When dealing with large-scale multi-modal data, valuable subjects can be learned more quickly.
Drawings
FIG. 1 is a probability model diagram of a multi-modal topic mining method based on tag constraint according to the present invention.
FIG. 2 is a flow chart of a collapse Gibbs sampling algorithm of the multi-modal topic mining method based on tag constraint.
Detailed Description
In the embodiment, a multi-modal topic mining method based on tag constraint is designed for a tagged associated text and image data set, tags are used as supervision information and introduced into a topic model, a background tag facing the whole data set is introduced, a collapse Gibbs sampling method is adopted to carry out approximate estimation on the model, and the method is suitable for learning out valuable multi-modal topics and is applied to data mining tasks such as recommendation or classification. The method comprises the following specific steps:
step1, constructing a data set D of the multi-modal document;
the probability model diagram shown in fig. 1 shows the following symbols: w represents a text content set, V represents a visual content set, Λ represents a tag content set, l is any tag sequence number in a tag space, M represents the number of multimodal documents, and N represents a tag sequence number in a tag spacedRepresenting the number of words in the content of the nth piece of text; l isdRepresenting the number of words in the d-th piece of visual content; k represents the number of topics in the multimodal document; kdRepresenting the number of topics in the d-th multimodal document; θ is an M M matrix representing the distribution of topics in the multimodal document; phi is awIs a K multiplied by T matrix which represents the distribution of text subject words in the multi-modal document; phi is avIs a K multiplied by P matrix, which represents the distribution of visual subject words in the multi-modal document; α is a parameter of the topic distribution of the multimodal document, βwIs a text subject term distribution parameter, betavIs a visual subject word distribution parameter.
Step 1.1, constructing a text content set in the multi-modal document, and recording the text content set as
Figure BDA0003150338540000061
Wherein the content of the first and second substances,
Figure BDA0003150338540000062
represents the text data in the mth text content and has
Figure BDA0003150338540000063
wm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
and 1.2, constructing a visual content set in the multi-modal document, and coding the image by using the statistical frequency of the visual words contained in the image by using a visual word bag model (BOVW). Since the visual words in the image are not as readily available as in text, the visual words need to be extracted from the image. The Scale Invariant Feature Transform (SIFT) algorithm is the most widely used algorithm for extracting local invariant features from images at present. Therefore, the SIFT algorithm can extract invariant feature points from the image as visual words. The specific steps for representing the image by the BOVW model are as follows:
step 1: feature points are extracted from the image, which is important for understanding the image. Visual features are extracted from the image using the SIFT algorithm and all visual features are labeled.
Step 2: and after the feature extraction is finished, establishing a dictionary for the extracted image feature information by utilizing dictionary learning. In order to make the dictionary representative and effective, a large number of samples are randomly selected from images in the data set, and then the dictionary is learned through a K-mean clustering method.
And according to the feature points extracted by SIFT, randomly selecting H clustering centers, and iterating by using a K-means clustering algorithm until convergence is achieved to finally obtain the H clustering centers. Each cluster center is a visual vocabulary, and finally a visual dictionary is formed. And defining the Euclidean distance square sum as a distance formula of the K-means clustering algorithm.
Step 3: through dictionary learning, a specific vocabulary for image feature representation is obtained. With the SIFT algorithm, a number of feature points can be extracted from each image, which can be approximately replaced by visual words in a dictionary. Each image may be converted to a visual histogram in which the abscissa represents the visual word and the ordinate represents the number of times the visual word occurs.
Is marked as
Figure BDA0003150338540000071
Wherein the content of the first and second substances,
Figure BDA0003150338540000072
representing image data in the mth piece of visual content, and having
Figure BDA0003150338540000073
vm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ12,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag space
Figure BDA0003150338540000074
L is the number of different tags; l is any label serial number in the label space; in the label space
Figure BDA0003150338540000075
A global hidden label B is also arranged in the label;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documents
Figure BDA0003150338540000076
Wherein the content of the first and second substances,
Figure BDA0003150338540000077
representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document as
Figure BDA0003150338540000078
Wherein, KjIs shown andnumber of topics associated with label j, θm,j,kRepresents the interest weight of tag j in the mth article multimodal document on the kth topic, K ∈ {1,2j},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label j
Figure BDA0003150338540000081
Obey parameter is betawHas a dirichlet distribution of
Figure BDA0003150338540000082
Wherein T represents the number of non-repeating text words in the text content set,
Figure BDA0003150338540000083
representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label j
Figure BDA0003150338540000084
Obey parameter is betavHas a dirichlet distribution of
Figure BDA0003150338540000085
Wherein P represents the number of non-repeating visual words in the set of visual content,
Figure BDA0003150338540000086
representing interest weight of the kth subject under the label j on the pth visual word;
step 4, establishing a multi-modal topic model based on label constraint;
step 4.1, define all the texts in the mth multi-modal documentThe subject number of this word is
Figure BDA0003150338540000087
Wherein the content of the first and second substances,
Figure BDA0003150338540000088
a topic number representing the t-th text word in the m-th multimodal document, and
Figure BDA0003150338540000089
compliance parameter of
Figure BDA00031503385400000810
Is preferably a polynomial distribution of (a) and (b),
Figure BDA00031503385400000811
and
Figure BDA00031503385400000812
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Figure BDA00031503385400000813
Step 4.2, define the topic number of all visual words in the mth multi-modal document as
Figure BDA00031503385400000814
Wherein the content of the first and second substances,
Figure BDA00031503385400000815
a topic number representing the p-th visual word in the m-th multimodal document, and
Figure BDA00031503385400000816
compliance parameter of
Figure BDA00031503385400000817
Is preferably a polynomial distribution of (a) and (b),
Figure BDA00031503385400000818
and
Figure BDA00031503385400000819
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
Figure BDA00031503385400000820
Step 5, applying a collapse Gibbs sampling method to obtain three interest weights thetam,j,k
Figure BDA00031503385400000821
And
Figure BDA00031503385400000822
learning is carried out;
the observed text word W and the unobserved ith tag are calculated using equation (1) with the topic number z of the text content of all multimodal documentswIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βwv);
p(W,V,zw,zv,l|α,βwv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
Figure BDA0003150338540000091
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there is
Figure BDA0003150338540000092
xkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
Figure BDA0003150338540000093
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.3, calculating the generation probability p (z) of the tag topics of all the multi-modal documents by utilizing the formula (4)w,zv,l|α);
Figure BDA0003150338540000094
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
as shown in fig. 2, the flow chart of the collapse gibbs sampling algorithm includes the following specific steps:
the first step solves for the probability that the tth text word e in the mth multimodal document is assigned to the kth topic under the label j using equation (5):
solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
Figure BDA0003150338540000095
Figure BDA0003150338540000096
Figure BDA0003150338540000097
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,
Figure BDA0003150338540000101
the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,
Figure BDA0003150338540000102
topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,
Figure BDA0003150338540000103
indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,
Figure BDA0003150338540000104
representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
the second step solves for the probability that the tth visual word f in the mth multimodal document is assigned to the kth topic under the label j using equation (6):
solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
Figure BDA0003150338540000105
Figure BDA0003150338540000106
Figure BDA0003150338540000107
In the formula (6), the reaction mixture is,
Figure BDA0003150338540000108
the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,
Figure BDA0003150338540000109
topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,
Figure BDA00031503385400001010
indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,
Figure BDA00031503385400001011
indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
thirdly, repeatedly circulating the first step and the second step, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
fourthly, calculating different interest weights:
calculating interest weight theta of kth subject under label j in mth multi-mode by using formula (7)m,j,k
Figure BDA0003150338540000111
Calculating interest weight of kth subject under label j on text word e by using formula (8)
Figure BDA0003150338540000112
Figure BDA0003150338540000113
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
calculating interest weight of kth subject under label j on visual word f by using formula (9)
Figure BDA0003150338540000114
Figure BDA0003150338540000115
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.

Claims (1)

1. A multi-modal topic mining method based on label constraint is characterized by comprising the following steps:
step1, constructing a data set D of the multi-modal document;
step 1.1, constructing a text content set in the multi-modal document, and recording the text content set as
Figure FDA0003150338530000011
Wherein the content of the first and second substances,
Figure FDA0003150338530000012
represents the text data in the mth text content and has
Figure FDA0003150338530000013
wm,tRepresenting the t-th text word, N, in the mth text contentmRepresenting the number of words in the mth piece of text content; m represents the number of multimodal documents;
step 1.2, constructing a visual content set in the multi-modal document, and recording the visual content set as
Figure FDA0003150338530000014
Wherein the content of the first and second substances,
Figure FDA0003150338530000015
representing image data in the mth piece of visual content, and having
Figure FDA0003150338530000016
vm,pRepresenting the p-th visual word, L, in the m-th visual contentmRepresenting the number of words in the mth piece of visual content;
step 1.3, constructing a label content set in the multi-modal document, and marking as Λ ═ Λ12,...,Λm,...,ΛMWherein, ΛmRepresenting a set of tagged content in the mth multimodal document; defining a tag space
Figure FDA0003150338530000017
L is the number of different tags; l is any label serial number in the label space;
step 1.4, constructing a data set D which comprises a text content set W, a visual content set V and a tag content Λ, wherein the data set D is { W, V, Λ };
step2, modeling label theme distribution of the multi-modal document;
defining topic distributions for multimodal documents
Figure FDA0003150338530000018
Wherein the content of the first and second substances,
Figure FDA0003150338530000019
representing the topic distribution of the m-th multimodal document and obeying the dirichlet distribution with parameter α;
defining a topic distribution of tag j in the mth multimodal document as
Figure FDA00031503385300000110
Wherein, KjIndicates the number of topics associated with label j, θm,j,kRepresents the interest weight of tag j in the mth article multimodal document on the kth topic, K ∈ {1,2j},j∈Λm(ii) a Each tag can be associated with multiple topics, but each topic can only be assigned to one tag;
step3, modeling a text label theme and a visual label theme in the multi-modal document;
step 3.1, determining the number of the topics in the multi-modal document as K;
step 3.2, defining the text probability distribution of the kth theme under the label j
Figure FDA00031503385300000111
Obey parameter is betawHas a dirichlet distribution of
Figure FDA00031503385300000112
Wherein T represents the number of non-repeating text words in the text content set,
Figure FDA0003150338530000021
representing the interest weight of the kth subject under the label j on the t text word;
step 3.3, defining visual probability distribution of kth topic under label j
Figure FDA0003150338530000022
Obey parameter is betavHas a dirichlet distribution of
Figure FDA0003150338530000023
Wherein P represents the number of non-repeating visual words in the set of visual content,
Figure FDA0003150338530000024
representing interest weight of the kth subject under the label j on the pth visual word;
step 4, establishing a multi-modal topic model based on label constraint;
step 4.1, define the topic number of all text words in the mth multi-modal document as
Figure FDA0003150338530000025
Wherein the content of the first and second substances,
Figure FDA0003150338530000026
a topic number representing the t-th text word in the m-th multimodal document, and
Figure FDA0003150338530000027
compliance parameter of
Figure FDA0003150338530000028
Is preferably a polynomial distribution of (a) and (b),
Figure FDA0003150338530000029
and
Figure FDA00031503385300000210
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the t text word w in the m-th multimodal documentm,tThe interest weight belonging to tag j is
Figure FDA00031503385300000211
Step 4.2, define the topic number of all visual words in the mth multi-modal document as
Figure FDA00031503385300000212
Wherein the content of the first and second substances,
Figure FDA00031503385300000213
a topic number representing the p-th visual word in the m-th multimodal document, and
Figure FDA00031503385300000214
compliance parameter of
Figure FDA00031503385300000215
Is preferably a polynomial distribution of (a) and (b),
Figure FDA00031503385300000216
and
Figure FDA00031503385300000217
forming a conjugate of the dirichlet distribution and the polynomial distribution; defining the p text word v in the m documentm,pThe interest weight belonging to tag j is
Figure FDA00031503385300000218
Step 5, applying a collapse Gibbs sampling method to obtain three interest weights thetam,j,k
Figure FDA00031503385300000219
And
Figure FDA00031503385300000220
learning is carried out;
step 5.1, calculating the observed text word W and the unobserved ith tag and the subject number z of the text content of all the multimodal documents by using the formula (1)wIs given by the joint probability distribution p (W, V, z)w,zv,l|α,βwv);
p(W,V,zw,zv,l|α,βwv)=p(W|zw,l,βw)p(V|zv,l,βv)p(zw,zv,l|α) (1)
In the formula (1), zvA topic number representing the visual content of all multimodal documents; alpha represents a hyper-parameter;
step 5.1.1, calculating the generation probability p (W | z) of all text words in the multi-modal document by using the formula (2)w,l,βw);
Figure FDA00031503385300000221
In the formula (2), n.,j,k,bDenotes the number of text words b generated by the kth topic under the label j, Δ is the operator, and for any K-dimensional vector X, there is
Figure FDA0003150338530000031
xkRepresents the kth component of the K-dimensional vector X, Γ (·) being a gamma function;
step 5.1.2, calculating the generation probability p (V | z) of all visual words in the multi-modal document by using the formula (3)v,l,βv);
Figure FDA0003150338530000032
In the formula (3), d.,j,k,cRepresenting the number of visual words c generated by the kth topic under the label j;
step 5.1.3, calculating the generation probability p (z) of the tag topic of all the multi-modal documents by using the formula (4)w,zv,l|α);
Figure FDA0003150338530000033
In the formula (4), nm,j,k,.Representing the number of text words corresponding to the kth topic under the label j in the mth multi-modal document; dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.2, solving the probability that the t text word e in the m multi-modal document is assigned to the k topic under the label j by using the formula (5)
Figure FDA0003150338530000034
Figure FDA0003150338530000035
In the formula (1), oc represents proportional to z, and I (·) represents an indicator function; lambda represents the logical sum ofm,tJ indicates that the t-th text word in the m-th multimodal document corresponds to a tag of j,
Figure FDA0003150338530000036
the topic number corresponding to the t text word in the m multi-modal document is k, l-m,tTags representing all text words except the t-th text word in the m-th multimodal document,
Figure FDA0003150338530000037
topic number, w, representing all text words except the t text word in the m-th multimodal documentm,tE means that the tth text word in the mth multimodal document is e,
Figure FDA0003150338530000038
indicating the number of text words e generated by topic k under tag j, in addition to the t-th text word in the m-th multimodal document,
Figure FDA0003150338530000039
representing the number of text words corresponding to the kth topic under the label j in the document m, except the t-th text word in the m-th multimodal document, dm,j,k,.Representing the number of visual words corresponding to the kth topic under the label j in the mth multi-modal document;
step 5.3, solving the probability that the t visual word f in the m multi-modal document is assigned to the k topic under the label j by using the formula (6)
Figure FDA0003150338530000041
Figure FDA0003150338530000042
In the formula (6), the reaction mixture is,
Figure FDA0003150338530000043
the topic number corresponding to the t visual word in the m multi-modal document is k, l-m,tTags representing all visual words except the t-th visual word in the m-th multimodal document,
Figure FDA0003150338530000044
topic number, v, representing all visual words except the t-th visual word in the m-th multimodal documentm,tF means that the mth visual word of the mth multimodal document is f,
Figure FDA0003150338530000045
indicating the number f of visual words generated by the kth topic under the label j, in addition to the t-th visual word in the mth multimodal document,
Figure FDA0003150338530000046
indicating the number of visual words corresponding to the kth topic under the label j in the document m except the tth visual word in the mth multi-modal document, nm,j,k,.Representing the number of text words corresponding to the kth topic under the m-th multi-modal document label j;
step 5.4, repeatedly circulating the step 5.2 and the step 5.3, and distributing label subjects to all text words and visual words in the multi-modal document by using a collapse Gibbs sampling method until an iteration condition is met;
step 5.5, calculating the interest weight theta of the kth topic under the label j in the mth multi-mode by using the formula (7)m,j,k
Figure FDA0003150338530000047
Step 5.5, calculating the interest weight of the kth subject under the label j on the text word e by using the formula (8)
Figure FDA0003150338530000048
Figure FDA0003150338530000049
In the formula (8), n.,j,k,eRepresenting the number of text words e generated by the kth topic under the label j;
step 5.5, calculating the interest weight of the kth topic under the label j on the visual word f by using the formula (9)
Figure FDA00031503385300000410
Figure FDA00031503385300000411
In the formula (9), d.,j,k,fRepresents the number of visual words f generated by the kth topic under the label j;
and taking the topic distribution of the multi-modal document, the text topic word distribution and the visual topic word distribution obtained by the interest weight as a topic mining result.
CN202110762186.4A 2021-07-06 2021-07-06 Multi-mode subject mining method based on label constraint Active CN113343679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110762186.4A CN113343679B (en) 2021-07-06 2021-07-06 Multi-mode subject mining method based on label constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110762186.4A CN113343679B (en) 2021-07-06 2021-07-06 Multi-mode subject mining method based on label constraint

Publications (2)

Publication Number Publication Date
CN113343679A true CN113343679A (en) 2021-09-03
CN113343679B CN113343679B (en) 2024-02-13

Family

ID=77482659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110762186.4A Active CN113343679B (en) 2021-07-06 2021-07-06 Multi-mode subject mining method based on label constraint

Country Status (1)

Country Link
CN (1) CN113343679B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070050388A1 (en) * 2005-08-25 2007-03-01 Xerox Corporation Device and method for text stream mining
US20080319974A1 (en) * 2007-06-21 2008-12-25 Microsoft Corporation Mining geographic knowledge using a location aware topic model
US8630975B1 (en) * 2010-12-06 2014-01-14 The Research Foundation For The State University Of New York Knowledge discovery from citation networks
CN105005558A (en) * 2015-08-14 2015-10-28 武汉大学 Multi-modal data fusion method based on crowd sensing
CN105354280A (en) * 2015-10-30 2016-02-24 中国科学院自动化研究所 Social event tracking and evolving method based on social media platform
CN105760507A (en) * 2016-02-23 2016-07-13 复旦大学 Cross-modal subject correlation modeling method based on deep learning
KR20190008699A (en) * 2017-07-17 2019-01-25 경희대학교 산학협력단 Method, system and computer program for semantic image retrieval based on topic modeling
CN113051932A (en) * 2021-04-06 2021-06-29 合肥工业大学 Method for detecting category of network media event of semantic and knowledge extension topic model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070050388A1 (en) * 2005-08-25 2007-03-01 Xerox Corporation Device and method for text stream mining
US20080319974A1 (en) * 2007-06-21 2008-12-25 Microsoft Corporation Mining geographic knowledge using a location aware topic model
US8630975B1 (en) * 2010-12-06 2014-01-14 The Research Foundation For The State University Of New York Knowledge discovery from citation networks
CN105005558A (en) * 2015-08-14 2015-10-28 武汉大学 Multi-modal data fusion method based on crowd sensing
CN105354280A (en) * 2015-10-30 2016-02-24 中国科学院自动化研究所 Social event tracking and evolving method based on social media platform
CN105760507A (en) * 2016-02-23 2016-07-13 复旦大学 Cross-modal subject correlation modeling method based on deep learning
KR20190008699A (en) * 2017-07-17 2019-01-25 경희대학교 산학협력단 Method, system and computer program for semantic image retrieval based on topic modeling
CN113051932A (en) * 2021-04-06 2021-06-29 合肥工业大学 Method for detecting category of network media event of semantic and knowledge extension topic model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张志远;杨宏敬;赵越;: "基于吉布斯采样结果的主题文本网络构建方法", 计算机工程, no. 06 *
赵臣升;吴国文;胡福玲;: "基于评论与转发的微博联合主题挖掘", 智能计算机与应用, no. 01 *

Also Published As

Publication number Publication date
CN113343679B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN107832663B (en) Multi-modal emotion analysis method based on quantum theory
CN107590177B (en) Chinese text classification method combined with supervised learning
CN111160037A (en) Fine-grained emotion analysis method supporting cross-language migration
CN111966917A (en) Event detection and summarization method based on pre-training language model
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
EP3166020A1 (en) Method and apparatus for image classification based on dictionary learning
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN106778878B (en) Character relation classification method and device
CN110825850B (en) Natural language theme classification method and device
CN111475622A (en) Text classification method, device, terminal and storage medium
CN110008365B (en) Image processing method, device and equipment and readable storage medium
Zhou et al. Comparing the interpretability of deep networks via network dissection
Patel et al. Dynamic lexicon generation for natural scene images
He et al. Deep learning in natural language generation from images
CN115965818A (en) Small sample image classification method based on similarity feature fusion
Dhar et al. Bengali news headline categorization using optimized machine learning pipeline
Phukan et al. An efficient technique for image captioning using deep neural network
Annisa et al. Analysis and Implementation of CNN in Real-time Classification and Translation of Kanji Characters
CN107291686B (en) Method and system for identifying emotion identification
CN113343679B (en) Multi-mode subject mining method based on label constraint
CN110674293A (en) Text classification method based on semantic migration
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
CN115687576A (en) Keyword extraction method and device represented by theme constraint
Mousavi et al. Collaborative learning of semi-supervised clustering and classification for labeling uncurated data
Yang et al. Automatic metadata information extraction from scientific literature using deep neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant