CN112307746A - Social network user search intention processing system based on user aggregation topic model - Google Patents
Social network user search intention processing system based on user aggregation topic model Download PDFInfo
- Publication number
- CN112307746A CN112307746A CN202011344972.4A CN202011344972A CN112307746A CN 112307746 A CN112307746 A CN 112307746A CN 202011344972 A CN202011344972 A CN 202011344972A CN 112307746 A CN112307746 A CN 112307746A
- Authority
- CN
- China
- Prior art keywords
- user
- distribution
- word
- search intention
- intention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention relates to a social network user search intention processing system based on a user aggregation topic model, which comprises: the online social network data acquisition module is used for acquiring network data in a social network online; the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set; the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing a network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention. The method and the device solve the problem of sparsity of the social network context, construct the user intention weight expression, realize the processing of the search intention of the social network user, and improve the search experience of the user.
Description
Technical Field
The invention belongs to the technical field of networks, and particularly relates to a social network user search intention processing system based on a user aggregation topic model.
Background
The social network provides a lightweight and rapid communication environment for the user, and the user can propagate and share news events, daily chatting and life and work state conditions by using the social network platform. When a user searches for relevant content from a social network, the system is required to be able to return the desired results and make recommendations based on their search intent. The existing research on social network user search intention processing mainly focuses on topic model-based methods, user clustering-based methods, and methods for comprehensively modeling a user's search intention by using information such as user's private data.
The conventional topic model is designed for modeling semantic information of a standard news document or a long document, and when the social network context is applied, the semantic information is sparse and word co-occurrence information of the context is lacked, so that the effect of good processing of the search intention of a user cannot be obtained. The method for comprehensively modeling the search intention of the user by using the private data of the user, such as search history, access log, click history and other information, is also a hot spot of current research, needs specific data and depends heavily on the private data of the user, such as search history, click history and the like, the acquisition of the private data is difficult for researchers, and the methods ignore the relationship among social network words and the effect of user attributes on the understanding of the search intention and cannot realize the universal application of the understanding of the search intention of the social network user. The clustering method does not consider the association relationship between words in the social network context and neglects the influence of common words on the processing of the search intention of the user.
Disclosure of Invention
In view of the above analysis, the present invention aims to disclose a social network user search intention processing system based on a user aggregation topic model, which solves the problems existing in the current user intention processing.
The invention discloses a social network user search intention processing system based on a user aggregation topic model, which comprises:
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
Further, the search intention acquisition module comprises a topic model submodule, a prior parameter construction submodule and an intention aggregation submodule;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
Further, the topic-ordinary word distribution model conforms to a dirichlet distribution containing a first hyper-parameter μ;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a);
the user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α;
the user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β;
the user-classification model conforms to a dirichlet distribution that includes a sixth hyperparameter η.
Further, the prior parameter construction sub-module performs prior construction based on a recurrent neural network and an inverse document frequency to obtain a second hyperparameter gammaiAnd a third hyperparameter gammaj。
Further, the prior parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
the inverse directionDocument frequency module for employing inverse document frequencyMeasuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,IDFwiis the word wiThe inverse document frequency of (d);is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
a parameter construction module for constructing the second hyperparameterThird hyperparameterWherein the content of the first and second substances,is a preset positive number.
Further, the hidden layer excitation function of the recurrent neural network in the recurrent neural network RNN module is a sigma function; the output layer excitation function is a softmax function.
Further, for each word pair C of the set of word pairs in the topic model submodulew∈C:
1) Utilizing user-searchUser search intention distribution theta output by intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multi(θu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes the user's intention assignment, u denotes the user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution modelAs a multinomial distribution of parameters, the intent assignment of a sample word pair:ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2As a multinomial distribution of parameters, a word w is sampled separatelyi~Multi(φz,1) And another word wj~Multi(φz,2)。
Further, in the topic model submodule, Gibbs sampling is adopted to carry out iterative sampling on the established social network user aggregation topic model, and user search intention distribution, intention distribution of user followers and word distribution of the user are obtained.
Further, the subject model sub-module outputs after gibbs sampling iterative sampling:
Word distribution of user search intentk=[φk,v1,φk,v2,,...,φk,vi,...,φk,vn];
Wherein n isu,kNumber of subject words, nuRepresenting the total number of words, suRepresenting the number of word pairs assigned to all topics of the user, su,kRepresenting the number of the word pairs distributed to the topics of the user attention, and K represents the number of the topics in the data set;nk,virepresenting words v in a set C of word pairsiThe number of times assigned to the subject term; n iskRepresenting the total number of times of assigning the subject term in the term pair C, V representing the number of all terms in the document, alpha, beta being the fourth, the fifth hyper-parameters, gamma being the second hyper-parameter gammaiOr a third hyperparameter gammaj。
Further, the intention aggregation sub-module aggregates the intentions by clusteringObtaining the weight omega of the user search intention, wherein the weight omega is used for expressing the search intention of the user; thetauSearching the intention distribution for all users;and pi is a weight parameter for the distribution of the search intention of all the user followers.
The invention can realize at least one of the following beneficial effects:
the method aims at the problems that the current mainstream social network user search intention processing method needs specific privacy data and does not have universality;
the user search intention distribution is obtained by constructing a social network user aggregation topic model, the problem of sparsity of social network context is solved, modeling subject words and common words are distinguished, and social network word relation learning is realized; and (4) considering the user search intention distribution and the attention person intention distribution, constructing a user intention weight representation, and realizing the understanding and mining of the search intention of the social network user.
The social network user intention processing method can effectively understand and mine the search intention of the user under the condition that no available access log such as search history, click log and other data exists, and the performance is obviously improved.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a schematic connection diagram of a social network user search intention processing system in the present embodiment;
FIG. 2 is a representation of an online social network user aggregation topic model in this embodiment;
fig. 3 is a structure diagram of the Elman RNN network in this embodiment.
Detailed Description
The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.
The embodiment discloses a social network user search intention processing system based on a user aggregation topic model, as shown in figure 1,
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
specifically, the online social network data acquisition module crawls data in an online social network through web crawler software, for example, crawls the data of the Xinlang microblog; the crawled data comprises information of microblog users, information of followers of the microblog users and online social content text information issued by the microblog users on a microblog.
The data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
specifically, the data preprocessing module is used for cleaning and processing the crawled data; deleting error and redundant data in the data, and dummy words without specific content, and only keeping the backbone of the microblog content to form a network data set;
the data preprocessing module comprises an extraction unit, a word segmentation unit and a classification storage unit;
the extraction unit is used for extracting user information, user attention information and user text content from the network data and eliminating messy information in the text content.
The word segmentation unit is used for carrying out word segmentation on the cleaned text content, deleting wrong and redundant words and null words without specific content, such as only the microblog content is reserved; and delete very short text such as "like", "applause", etc. that has no specific meaning.
A classification storage unit, configured to classify and store the user data, the user attendee data, and the social content data to form a microblog text set M ═ M1,m2,…,ml,…,mNdU & ltu & gt, a microblog user set1,u2…, topic set Z ═ Z1,z2,…,}。
The search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
Specifically, the search intention acquisition module comprises a topic model submodule, a prior parameter construction submodule and an intention aggregation submodule;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
More specifically, the present invention is to provide a novel,
the topic-ordinary word distribution model conforms to a Dirichlet distribution containing a first hyper-parameter μ; namely, for each topic z, the obtained common word distribution phi of the microblogz,bDir (μ); b represents a common word;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a); i.e. for each topic z, a word distribution phi in the microblog word pair distributionz,1~Dir(γi) And another word distribution phiz,2~Dir(γj);
The user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α; that is, for each user u, the resulting user search intention distribution θu~Dir(α);
The user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β; i.e., for each user u, the resulting distribution of search intentions of the user's followers
The user-classification model conforms to a dirichlet distribution containing a sixth hyperparameter η; i.e. for each user u, the resulting user's classification distribution τu~Dir(η)。
In particular, the online social network user aggregation topic model representation is shown in fig. 2.
Wherein, the first hyperparameter mu, the fourth hyperparameter alpha, the fifth hyperparameter beta and the sixth hyperparameter eta can adopt the conventional Dirichlet distribution hyperparameter value, such as 0.1 or 0.01.
The prior parameter construction sub-module performs prior construction based on a recurrent neural network and an inverse document frequency to obtain a second hyperparameter gammaiAnd a third hyperparameter gammaj(ii) a So that the model learns more consistent user search intentions.
Specifically, the prior parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
preferably, a network structure for learning relationships between words using Elman RNN is shown in fig. 3.
In the context of figure 3 of the drawings,indicating the current wordAnd T represents the size of the vector,a hidden unit is represented that is hidden from view,indicating the output unit at time t. x is the number oft=[wt,ht-1]Represents an input layer, whereinThe concealment unit and the output unit may perform calculation by equations (1) and (2):
Ht=δ(Uit) (1)
ot=h(VHt) (2)
wherein the content of the first and second substances,andrespectively, a parameter matrix and a vector, and δ (·) represents a sigma function, which is calculated as shown in equation (3):
g (-) is the softmax function, calculated as shown in equation (4):
in the output result, otRepresenting word pairs wj,1And wj,2Is expressed by the formula (5):
ot=P(wj,2|wj,1,ht-1) (5)
wherein o istDenotes a given wj,2,wj,1The probability of occurrence. Due to the hidden unit HtAnd Ht-1All previous words can be saved and therefore the association of previous words with the current word can be learned through the properties of the recurrent neural network RNN.
The inverse document frequency module is used for adopting inverse document frequencyMeasuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,IDFwiis the word wiThe inverse document frequency of (d);is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
a parameter construction module for constructing the second hyperparameterThird hyperparameterWherein the content of the first and second substances,is a preset positive number.
More specifically, for each word pair of the set of word pairs in the topic model submoduleCwC, performing user intention distribution of word pairs, intention distribution of users' followers and multi-term distribution of each word in the word pairs;
1) user search intention distribution theta output using user-search intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multi(θu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes user intention assignment, u denotes user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution modelAs a multinomial distribution of parameters, the intent assignment of a sample word pair:ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2As a multinomial distribution of parameters, a word w is sampled separatelyi~Multi(φz,1) And another word wj~Multi(φz,2)。
Further, in the topic model submodule, Gibbs sampling is adopted to carry out iterative sampling on the established social network user aggregation topic model, and user search intention distribution, intention distribution of user followers and word distribution of the user are obtained.
Unknown parameters in a social network User Aggregation Topic Model (UATM) may be derived using Gibbs sampling. The core of gibbs sampling is iterative sampling of hidden variables by a priori estimation. In the sampling process, the user search intention distribution θ u and the search intention distribution of the attention person need to be integratedAnd word distribution phi of the user, iteratively sampling a microblog set M, a theme Z and a switch variable x, and sampling the theme Z according to the following formula:
for all users, nu,bNumber of common words, nu,kIndicating the number of subject words. n isv,bNumber of times of assigning word V to ordinary word, nk,vRepresenting the number of times a word pair C is assigned to a subject word, nu,kRepresenting the number of microblogs assigned to topic Z. It is to be noted that nu=nu,b+nu,k,su,zIs the number of word pairs assigned to the topic of the user's attention.
Hidden variables can be derived by equation (6), where Γ (x) represents a gamma function, and π is a weight parameter for adjusting the weighted expression of the user's search intent and the user's attention. Based on the joint distribution and the chain rule, a conditional probability distribution as shown in formula (7) can be obtained:
where-i represents a statistical count that does not contain the ith microblog. Phi is a distribution set of all user search intents; theta is a search intention distribution set of all user followers; Ψ is the set of word distributions in the dataset.
After the conditional probability distribution is obtained, the theme zdi is directly sampled by using a chain rule, and the results shown in the formulas (8) and (9) are obtained by deriving the switch variable x:
where-j denotes the count of the non-statistical jth word, wiRepresenting the ith word in the microblog document.
In the initial state of gibbs sampling, the hidden variables are sampled according to equations (8) and (9). After sufficient iterations are completed, the user search intention distribution, the intention distribution of the user attendees and the word distribution of the user output by the topic model submodule are shown as formulas (10), (11), (12) and (13):
based on equation (11) and equation (12), the word distribution of the user search intention is obtained, as shown in equation (14):
φk=[φk,v1,φk,v2,,...,φk,vi,...,φk,vn] (14)
specifically, the intention aggregation sub-module constructs a weight representation Ω of the user search intention based on the user search intention and the search intention of the attendee to jointly mine the search intention of the user, and a calculation formula is shown as formula (15):
in the formula, thetauSearching the intention distribution for all users;and pi is a weight parameter for the distribution of the search intention of all the user followers.
And (3) obtaining the final search intention of the social network user according to the search intention distribution of the user obtained by the clustering formula (19). Therefore, an operator of the social network can provide online social content according to the search intention of the user and the word distribution of the search intention of the user, the search time of the user is shortened, and the user experience is improved.
In summary, the embodiments are directed to a social network user search intention processing method that currently mainly requires specific privacy data and has no universality problem; the user search intention distribution is obtained by constructing a social network user aggregation topic model, the problem of sparsity of social network context is solved, modeling subject words and common words are distinguished, and social network word relation learning is realized; the user intention weight representation is constructed by considering the user search intention distribution and the attention user intention distribution, the understanding and mining of the social network user search intention are realized, the user search intention can be effectively understood and mined under the condition that no available access logs such as search history, click logs and other data exist, and the performance is remarkably improved.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (10)
1. A social network user search intent processing system based on a user aggregate topic model, comprising:
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
2. The social network user search intention processing system of claim 1, wherein the search intention acquisition module comprises a topic model sub-module, a prior parameter construction sub-module, and an intention aggregation sub-module;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
3. The social network user search intent processing system of claim 2,
the topic-ordinary word distribution model conforms to a Dirichlet distribution containing a first hyper-parameter μ;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a);
the user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α;
the user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β;
the user-classification model conforms to a dirichlet distribution that includes a sixth hyperparameter η.
4. The social network user search intention processing system of claim 3, wherein the prior parameter construction sub-module derives a second hyper-parameter γ by performing a prior construction based on a recurrent neural network and an inverse document frequencyiAnd a third hyperparameter gammaj。
5. The social network user search intention processing system of claim 4, wherein the a priori parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module, and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
the inverse document frequency module is used for adopting inverse document frequencyMeasuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,IDFwiis the word wiThe inverse document frequency of (d);is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
6. The social network user search intention processing system of claim 5, wherein the recurrent neural network in the recurrent neural network RNN module has a hidden layer stimulus function that is a sigma function; the output layer excitation function is a softmax function.
7. The social network user search intent processing system of claim 5, wherein C is a word pair for each word pair of the set of word pairs in the topic model sub-modulew∈C:
1) User search intention distribution theta output using user-search intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multi(θu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes the user's intention assignment, u denotes the user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution modelAs a multinomial distribution of parameters, the intent assignment of a sample word pair:ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2AsMultiple distribution of parameters, each sampling a word wi~Multi(φz,1) And another word wj~Multi(φz,2)。
8. The system of any one of claims 1 to 7, wherein, in the topic model submodule, Gibbs sampling is used to iteratively sample the established social network user aggregation topic model, so as to obtain user search intention distribution, intention distribution of user attendees, and word distribution of the user.
9. The social network user search intention processing system of claim 8, wherein the topic model sub-module outputs after Gibbs sampling iterative sampling:
Word distribution of user search intentk=[φk,v1,φk,v2,,…,φk,vi,…,φk,vn];
Wherein n isu,kNumber of subject words, nuRepresenting the total number of words, suRepresenting the number of word pairs assigned to all topics of the user, su,kRepresenting the number of the word pairs distributed to the topics of the user attention, and K represents the number of the topics in the data set;nk,virepresenting words v in a set C of word pairsiThe number of times assigned to the subject term; n iskRepresenting the total number of times of assigning the subject word in the word pair C, V representing the number of all words in the document, alpha, beta being the fourth, the fifth hyper-parameters, gamma beingSecond hyperparameter gammaiOr a third hyperparameter gammaj。
10. The social network user search intent processing system of claim 9, wherein the intent aggregation sub-module clusters byObtaining the weight omega of the user search intention, wherein the weight omega is used for expressing the search intention of the user; thetauSearching the intention distribution for all users;and pi is a weight parameter for the distribution of the search intention of all the user followers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011344972.4A CN112307746B (en) | 2020-11-25 | 2020-11-25 | Social network user search intention processing system based on user aggregation topic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011344972.4A CN112307746B (en) | 2020-11-25 | 2020-11-25 | Social network user search intention processing system based on user aggregation topic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112307746A true CN112307746A (en) | 2021-02-02 |
CN112307746B CN112307746B (en) | 2021-08-17 |
Family
ID=74487813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011344972.4A Active CN112307746B (en) | 2020-11-25 | 2020-11-25 | Social network user search intention processing system based on user aggregation topic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112307746B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130144854A1 (en) * | 2011-12-06 | 2013-06-06 | Microsoft Corporation | Modeling actions for entity-centric search |
CN105830065A (en) * | 2013-12-19 | 2016-08-03 | 脸谱公司 | Generating recommended search queries on online social networks |
US20180032930A1 (en) * | 2015-10-07 | 2018-02-01 | 0934781 B.C. Ltd | System and method to Generate Queries for a Business Database |
CN108536868A (en) * | 2018-04-24 | 2018-09-14 | 北京慧闻科技发展有限公司 | The data processing method of short text data and application on social networks |
CN108921413A (en) * | 2018-06-22 | 2018-11-30 | 郑州大学 | A kind of social networks degree of belief calculation method based on user intention |
-
2020
- 2020-11-25 CN CN202011344972.4A patent/CN112307746B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130144854A1 (en) * | 2011-12-06 | 2013-06-06 | Microsoft Corporation | Modeling actions for entity-centric search |
US20170351772A1 (en) * | 2011-12-06 | 2017-12-07 | Microsoft Technology Licensing, Llc | Modeling actions for entity-centric search |
CN105830065A (en) * | 2013-12-19 | 2016-08-03 | 脸谱公司 | Generating recommended search queries on online social networks |
US20180032930A1 (en) * | 2015-10-07 | 2018-02-01 | 0934781 B.C. Ltd | System and method to Generate Queries for a Business Database |
CN108536868A (en) * | 2018-04-24 | 2018-09-14 | 北京慧闻科技发展有限公司 | The data processing method of short text data and application on social networks |
CN108921413A (en) * | 2018-06-22 | 2018-11-30 | 郑州大学 | A kind of social networks degree of belief calculation method based on user intention |
Also Published As
Publication number | Publication date |
---|---|
CN112307746B (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Priyadarshini et al. | A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis | |
US10891321B2 (en) | Systems and methods for performing a computer-implemented prior art search | |
Xiaomei et al. | Microblog sentiment analysis with weak dependency connections | |
CN110704640A (en) | Representation learning method and device of knowledge graph | |
Wu et al. | Personalized microblog sentiment classification via multi-task learning | |
CN112307762B (en) | Search result sorting method and device, storage medium and electronic device | |
CN110807101A (en) | Scientific and technical literature big data classification method | |
CN109992784B (en) | Heterogeneous network construction and distance measurement method fusing multi-mode information | |
Dinh et al. | A proposal of deep learning model for classifying user interests on social networks | |
Devika et al. | A semantic graph-based keyword extraction model using ranking method on big social data | |
Liu | Research on deep learning-based algorithm and model for personalized recommendation of resources | |
Savelev et al. | The high-level overview of social media content search engine | |
Wu et al. | A novel topic clustering algorithm based on graph neural network for question topic diversity | |
CN116843162B (en) | Contradiction reconciliation scheme recommendation and scoring system and method | |
Yarushkina et al. | Intelligent instrumentation for opinion mining in social media | |
Wang et al. | Emotional contagion-based social sentiment mining in social networks by introducing network communities | |
Qiu et al. | CLDA: An effective topic model for mining user interest preference under big data background | |
CN112231476A (en) | Improved graph neural network scientific and technical literature big data classification method | |
CN112307746B (en) | Social network user search intention processing system based on user aggregation topic model | |
CN111859955A (en) | Public opinion data analysis model based on deep learning | |
Huang | Research on sentiment classification of tourist destinations based on convolutional neural network | |
Kamel et al. | Robust sentiment fusion on distribution of news | |
Jasim et al. | Analyzing Social Media Sentiment: Twitter as a Case Study | |
CN112364260A (en) | Social network user intention processing method | |
Dritsas et al. | Aspect-based community detection of cultural heritage streaming data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |