CN112307746A - Social network user search intention processing system based on user aggregation topic model - Google Patents

Social network user search intention processing system based on user aggregation topic model Download PDF

Info

Publication number
CN112307746A
CN112307746A CN202011344972.4A CN202011344972A CN112307746A CN 112307746 A CN112307746 A CN 112307746A CN 202011344972 A CN202011344972 A CN 202011344972A CN 112307746 A CN112307746 A CN 112307746A
Authority
CN
China
Prior art keywords
user
distribution
word
search intention
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011344972.4A
Other languages
Chinese (zh)
Other versions
CN112307746B (en
Inventor
石磊
费廷伟
崔斌
段正轩
潘菁菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN202011344972.4A priority Critical patent/CN112307746B/en
Publication of CN112307746A publication Critical patent/CN112307746A/en
Application granted granted Critical
Publication of CN112307746B publication Critical patent/CN112307746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to a social network user search intention processing system based on a user aggregation topic model, which comprises: the online social network data acquisition module is used for acquiring network data in a social network online; the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set; the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing a network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention. The method and the device solve the problem of sparsity of the social network context, construct the user intention weight expression, realize the processing of the search intention of the social network user, and improve the search experience of the user.

Description

Social network user search intention processing system based on user aggregation topic model
Technical Field
The invention belongs to the technical field of networks, and particularly relates to a social network user search intention processing system based on a user aggregation topic model.
Background
The social network provides a lightweight and rapid communication environment for the user, and the user can propagate and share news events, daily chatting and life and work state conditions by using the social network platform. When a user searches for relevant content from a social network, the system is required to be able to return the desired results and make recommendations based on their search intent. The existing research on social network user search intention processing mainly focuses on topic model-based methods, user clustering-based methods, and methods for comprehensively modeling a user's search intention by using information such as user's private data.
The conventional topic model is designed for modeling semantic information of a standard news document or a long document, and when the social network context is applied, the semantic information is sparse and word co-occurrence information of the context is lacked, so that the effect of good processing of the search intention of a user cannot be obtained. The method for comprehensively modeling the search intention of the user by using the private data of the user, such as search history, access log, click history and other information, is also a hot spot of current research, needs specific data and depends heavily on the private data of the user, such as search history, click history and the like, the acquisition of the private data is difficult for researchers, and the methods ignore the relationship among social network words and the effect of user attributes on the understanding of the search intention and cannot realize the universal application of the understanding of the search intention of the social network user. The clustering method does not consider the association relationship between words in the social network context and neglects the influence of common words on the processing of the search intention of the user.
Disclosure of Invention
In view of the above analysis, the present invention aims to disclose a social network user search intention processing system based on a user aggregation topic model, which solves the problems existing in the current user intention processing.
The invention discloses a social network user search intention processing system based on a user aggregation topic model, which comprises:
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
Further, the search intention acquisition module comprises a topic model submodule, a prior parameter construction submodule and an intention aggregation submodule;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
Further, the topic-ordinary word distribution model conforms to a dirichlet distribution containing a first hyper-parameter μ;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a);
the user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α;
the user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β;
the user-classification model conforms to a dirichlet distribution that includes a sixth hyperparameter η.
Further, the prior parameter construction sub-module performs prior construction based on a recurrent neural network and an inverse document frequency to obtain a second hyperparameter gammaiAnd a third hyperparameter gammaj
Further, the prior parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
the inverse directionDocument frequency module for employing inverse document frequency
Figure BDA0002799627880000031
Measuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,
Figure BDA0002799627880000032
IDFwiis the word wiThe inverse document frequency of (d);
Figure BDA0002799627880000033
is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
a parameter construction module for constructing the second hyperparameter
Figure BDA0002799627880000034
Third hyperparameter
Figure BDA0002799627880000035
Wherein the content of the first and second substances,
Figure BDA0002799627880000036
is a preset positive number.
Further, the hidden layer excitation function of the recurrent neural network in the recurrent neural network RNN module is a sigma function; the output layer excitation function is a softmax function.
Further, for each word pair C of the set of word pairs in the topic model submodulew∈C:
1) Utilizing user-searchUser search intention distribution theta output by intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multiu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes the user's intention assignment, u denotes the user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution model
Figure BDA0002799627880000041
As a multinomial distribution of parameters, the intent assignment of a sample word pair:
Figure BDA0002799627880000042
ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2As a multinomial distribution of parameters, a word w is sampled separatelyi~Multi(φz,1) And another word wj~Multi(φz,2)。
Further, in the topic model submodule, Gibbs sampling is adopted to carry out iterative sampling on the established social network user aggregation topic model, and user search intention distribution, intention distribution of user followers and word distribution of the user are obtained.
Further, the subject model sub-module outputs after gibbs sampling iterative sampling:
user search intent distribution
Figure BDA0002799627880000043
Intent distribution of user followers
Figure BDA0002799627880000044
Word distribution of user search intentk=[φk,v1k,v2,,...,φk,vi,...,φk,vn];
Wherein n isu,kNumber of subject words, nuRepresenting the total number of words, suRepresenting the number of word pairs assigned to all topics of the user, su,kRepresenting the number of the word pairs distributed to the topics of the user attention, and K represents the number of the topics in the data set;
Figure BDA0002799627880000051
nk,virepresenting words v in a set C of word pairsiThe number of times assigned to the subject term; n iskRepresenting the total number of times of assigning the subject term in the term pair C, V representing the number of all terms in the document, alpha, beta being the fourth, the fifth hyper-parameters, gamma being the second hyper-parameter gammaiOr a third hyperparameter gammaj
Further, the intention aggregation sub-module aggregates the intentions by clustering
Figure BDA0002799627880000052
Obtaining the weight omega of the user search intention, wherein the weight omega is used for expressing the search intention of the user; thetauSearching the intention distribution for all users;
Figure BDA0002799627880000053
and pi is a weight parameter for the distribution of the search intention of all the user followers.
The invention can realize at least one of the following beneficial effects:
the method aims at the problems that the current mainstream social network user search intention processing method needs specific privacy data and does not have universality;
the user search intention distribution is obtained by constructing a social network user aggregation topic model, the problem of sparsity of social network context is solved, modeling subject words and common words are distinguished, and social network word relation learning is realized; and (4) considering the user search intention distribution and the attention person intention distribution, constructing a user intention weight representation, and realizing the understanding and mining of the search intention of the social network user.
The social network user intention processing method can effectively understand and mine the search intention of the user under the condition that no available access log such as search history, click log and other data exists, and the performance is obviously improved.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a schematic connection diagram of a social network user search intention processing system in the present embodiment;
FIG. 2 is a representation of an online social network user aggregation topic model in this embodiment;
fig. 3 is a structure diagram of the Elman RNN network in this embodiment.
Detailed Description
The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.
The embodiment discloses a social network user search intention processing system based on a user aggregation topic model, as shown in figure 1,
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
specifically, the online social network data acquisition module crawls data in an online social network through web crawler software, for example, crawls the data of the Xinlang microblog; the crawled data comprises information of microblog users, information of followers of the microblog users and online social content text information issued by the microblog users on a microblog.
The data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
specifically, the data preprocessing module is used for cleaning and processing the crawled data; deleting error and redundant data in the data, and dummy words without specific content, and only keeping the backbone of the microblog content to form a network data set;
the data preprocessing module comprises an extraction unit, a word segmentation unit and a classification storage unit;
the extraction unit is used for extracting user information, user attention information and user text content from the network data and eliminating messy information in the text content.
The word segmentation unit is used for carrying out word segmentation on the cleaned text content, deleting wrong and redundant words and null words without specific content, such as only the microblog content is reserved; and delete very short text such as "like", "applause", etc. that has no specific meaning.
A classification storage unit, configured to classify and store the user data, the user attendee data, and the social content data to form a microblog text set M ═ M1,m2,…,ml,…,mNdU & ltu & gt, a microblog user set1,u2…, topic set Z ═ Z1,z2,…,}。
The search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
Specifically, the search intention acquisition module comprises a topic model submodule, a prior parameter construction submodule and an intention aggregation submodule;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
More specifically, the present invention is to provide a novel,
the topic-ordinary word distribution model conforms to a Dirichlet distribution containing a first hyper-parameter μ; namely, for each topic z, the obtained common word distribution phi of the microblogz,bDir (μ); b represents a common word;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a); i.e. for each topic z, a word distribution phi in the microblog word pair distributionz,1~Dir(γi) And another word distribution phiz,2~Dir(γj);
The user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α; that is, for each user u, the resulting user search intention distribution θu~Dir(α);
The user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β; i.e., for each user u, the resulting distribution of search intentions of the user's followers
Figure BDA0002799627880000081
The user-classification model conforms to a dirichlet distribution containing a sixth hyperparameter η; i.e. for each user u, the resulting user's classification distribution τu~Dir(η)。
In particular, the online social network user aggregation topic model representation is shown in fig. 2.
Wherein, the first hyperparameter mu, the fourth hyperparameter alpha, the fifth hyperparameter beta and the sixth hyperparameter eta can adopt the conventional Dirichlet distribution hyperparameter value, such as 0.1 or 0.01.
The prior parameter construction sub-module performs prior construction based on a recurrent neural network and an inverse document frequency to obtain a second hyperparameter gammaiAnd a third hyperparameter gammaj(ii) a So that the model learns more consistent user search intentions.
Specifically, the prior parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
preferably, a network structure for learning relationships between words using Elman RNN is shown in fig. 3.
In the context of figure 3 of the drawings,
Figure BDA0002799627880000082
indicating the current wordAnd T represents the size of the vector,
Figure BDA0002799627880000088
a hidden unit is represented that is hidden from view,
Figure BDA0002799627880000089
indicating the output unit at time t. x is the number oft=[wt,ht-1]Represents an input layer, wherein
Figure BDA0002799627880000085
The concealment unit and the output unit may perform calculation by equations (1) and (2):
Ht=δ(Uit) (1)
ot=h(VHt) (2)
wherein the content of the first and second substances,
Figure BDA0002799627880000086
and
Figure BDA0002799627880000087
respectively, a parameter matrix and a vector, and δ (·) represents a sigma function, which is calculated as shown in equation (3):
Figure BDA0002799627880000091
g (-) is the softmax function, calculated as shown in equation (4):
Figure BDA0002799627880000092
in the output result, otRepresenting word pairs wj,1And wj,2Is expressed by the formula (5):
ot=P(wj,2|wj,1,ht-1) (5)
wherein o istDenotes a given wj,2,wj,1The probability of occurrence. Due to the hidden unit HtAnd Ht-1All previous words can be saved and therefore the association of previous words with the current word can be learned through the properties of the recurrent neural network RNN.
The inverse document frequency module is used for adopting inverse document frequency
Figure BDA0002799627880000093
Measuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,
Figure BDA0002799627880000094
IDFwiis the word wiThe inverse document frequency of (d);
Figure BDA0002799627880000095
is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
a parameter construction module for constructing the second hyperparameter
Figure BDA0002799627880000098
Third hyperparameter
Figure BDA0002799627880000097
Wherein the content of the first and second substances,
Figure BDA0002799627880000096
is a preset positive number.
More specifically, for each word pair of the set of word pairs in the topic model submoduleCwC, performing user intention distribution of word pairs, intention distribution of users' followers and multi-term distribution of each word in the word pairs;
1) user search intention distribution theta output using user-search intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multiu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes user intention assignment, u denotes user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution model
Figure BDA0002799627880000101
As a multinomial distribution of parameters, the intent assignment of a sample word pair:
Figure BDA0002799627880000102
ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2As a multinomial distribution of parameters, a word w is sampled separatelyi~Multi(φz,1) And another word wj~Multi(φz,2)。
Further, in the topic model submodule, Gibbs sampling is adopted to carry out iterative sampling on the established social network user aggregation topic model, and user search intention distribution, intention distribution of user followers and word distribution of the user are obtained.
Unknown parameters in a social network User Aggregation Topic Model (UATM) may be derived using Gibbs sampling. The core of gibbs sampling is iterative sampling of hidden variables by a priori estimation. In the sampling process, the user search intention distribution θ u and the search intention distribution of the attention person need to be integrated
Figure BDA0002799627880000103
And word distribution phi of the user, iteratively sampling a microblog set M, a theme Z and a switch variable x, and sampling the theme Z according to the following formula:
Figure BDA0002799627880000111
for all users, nu,bNumber of common words, nu,kIndicating the number of subject words. n isv,bNumber of times of assigning word V to ordinary word, nk,vRepresenting the number of times a word pair C is assigned to a subject word, nu,kRepresenting the number of microblogs assigned to topic Z. It is to be noted that nu=nu,b+nu,k,su,zIs the number of word pairs assigned to the topic of the user's attention.
Hidden variables can be derived by equation (6), where Γ (x) represents a gamma function, and π is a weight parameter for adjusting the weighted expression of the user's search intent and the user's attention. Based on the joint distribution and the chain rule, a conditional probability distribution as shown in formula (7) can be obtained:
Figure BDA0002799627880000112
where-i represents a statistical count that does not contain the ith microblog. Phi is a distribution set of all user search intents; theta is a search intention distribution set of all user followers; Ψ is the set of word distributions in the dataset.
After the conditional probability distribution is obtained, the theme zdi is directly sampled by using a chain rule, and the results shown in the formulas (8) and (9) are obtained by deriving the switch variable x:
Figure BDA0002799627880000121
Figure BDA0002799627880000122
where-j denotes the count of the non-statistical jth word, wiRepresenting the ith word in the microblog document.
In the initial state of gibbs sampling, the hidden variables are sampled according to equations (8) and (9). After sufficient iterations are completed, the user search intention distribution, the intention distribution of the user attendees and the word distribution of the user output by the topic model submodule are shown as formulas (10), (11), (12) and (13):
Figure BDA0002799627880000123
Figure BDA0002799627880000124
Figure BDA0002799627880000125
Figure BDA0002799627880000126
based on equation (11) and equation (12), the word distribution of the user search intention is obtained, as shown in equation (14):
φk=[φk,v1k,v2,,...,φk,vi,...,φk,vn] (14)
specifically, the intention aggregation sub-module constructs a weight representation Ω of the user search intention based on the user search intention and the search intention of the attendee to jointly mine the search intention of the user, and a calculation formula is shown as formula (15):
Figure BDA0002799627880000127
in the formula, thetauSearching the intention distribution for all users;
Figure BDA0002799627880000131
and pi is a weight parameter for the distribution of the search intention of all the user followers.
And (3) obtaining the final search intention of the social network user according to the search intention distribution of the user obtained by the clustering formula (19). Therefore, an operator of the social network can provide online social content according to the search intention of the user and the word distribution of the search intention of the user, the search time of the user is shortened, and the user experience is improved.
In summary, the embodiments are directed to a social network user search intention processing method that currently mainly requires specific privacy data and has no universality problem; the user search intention distribution is obtained by constructing a social network user aggregation topic model, the problem of sparsity of social network context is solved, modeling subject words and common words are distinguished, and social network word relation learning is realized; the user intention weight representation is constructed by considering the user search intention distribution and the attention user intention distribution, the understanding and mining of the social network user search intention are realized, the user search intention can be effectively understood and mined under the condition that no available access logs such as search history, click logs and other data exist, and the performance is remarkably improved.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A social network user search intent processing system based on a user aggregate topic model, comprising:
the online social network data acquisition module is used for acquiring network data including user information, information of a person concerned and online social content text of the user in a social network online by adopting a crawler technology;
the data preprocessing module is used for carrying out data cleaning on the network data to form a network data set;
the search intention acquisition module is used for establishing an online social network user aggregation topic model based on Dirichlet distribution and Gibbs sampling, and processing the network data set to obtain user search intention distribution, focused person search intention distribution and word distribution of user search intention; and aggregating the user intentions based on the user search intention distribution and the attendee search intention distribution to obtain the final social network user search intention.
2. The social network user search intention processing system of claim 1, wherein the search intention acquisition module comprises a topic model sub-module, a prior parameter construction sub-module, and an intention aggregation sub-module;
the topic model submodule comprises a topic-common word distribution model, a topic-word pair distribution model, a user-search intention distribution model, a user-attention person search intention distribution model and a user-classification model and is used for processing a network data set to obtain user search intention distribution, attention person search intention distribution and word distribution of user search intention;
the prior parameter construction sub-module is used for carrying out prior construction on the hyper-parameters in the topic-to-word distribution model;
the intention aggregation submodule is used for carrying out user intention aggregation based on the user search intention distribution and the attention person search intention distribution;
in the topic model sub-module,
processing the network data set based on the user-search intention distribution model to obtain user search intention distribution;
processing the network data set based on the user-attendee search intention distribution model to obtain user search intention distribution;
and processing the network data set based on the topic-common word distribution model, the topic-to-word distribution model and the user-classification model to obtain the word distribution of the user search intention.
3. The social network user search intent processing system of claim 2,
the topic-ordinary word distribution model conforms to a Dirichlet distribution containing a first hyper-parameter μ;
in the topic-to-word distribution model, words (w)i,wj) Is in accordance with a second hyperparameter gammaiDirichlet distribution of (a); another word wjThe distribution model conforms to the third hyperparameter gammajDirichlet distribution of (a);
the user-search intention distribution model conforms to a dirichlet distribution containing a fourth hyperparameter α;
the user-attendee search intention distribution model conforms to a dirichlet distribution containing a fifth hyperparameter β;
the user-classification model conforms to a dirichlet distribution that includes a sixth hyperparameter η.
4. The social network user search intention processing system of claim 3, wherein the prior parameter construction sub-module derives a second hyper-parameter γ by performing a prior construction based on a recurrent neural network and an inverse document frequencyiAnd a third hyperparameter gammaj
5. The social network user search intention processing system of claim 4, wherein the a priori parameter construction sub-module comprises a Recurrent Neural Network (RNN) module, an inverse document frequency module, a word pair set construction module, and a parameter construction module;
the recurrent neural network RNN module is used for learning words in the documents collected in the network data set through the recurrent neural network RNN to obtain the association probability of two associated words;
the inverse document frequency module is used for adopting inverse document frequency
Figure FDA0002799627870000021
Measuring the frequency of occurrence of each word; where | M | represents the total number of documents in the dataset, | Ml∈M:wi∈ml| representing the word wiThe number of occurrences in the document;
the word pair set building module is used for building and extracting a word pair set C ═ C based on output results of the recurrent neural network RNN module and the inverse document frequency module1,C2,…,Cw,…,CN};
Wherein the content of the first and second substances,
Figure FDA0002799627870000022
IDFwiis the word wiThe inverse document frequency of (d);
Figure FDA0002799627870000023
is the word wjThe inverse document frequency of (d); otFor the associated word w obtained by the learning of the recurrent neural network RNNiAnd wjN is the total number of word pairs;
a parameter construction module for constructing the second hyperparameter
Figure FDA0002799627870000033
Third hyperparameter
Figure FDA0002799627870000034
Wherein the content of the first and second substances,
Figure FDA0002799627870000035
is a preset positive number.
6. The social network user search intention processing system of claim 5, wherein the recurrent neural network in the recurrent neural network RNN module has a hidden layer stimulus function that is a sigma function; the output layer excitation function is a softmax function.
7. The social network user search intent processing system of claim 5, wherein C is a word pair for each word pair of the set of word pairs in the topic model sub-modulew∈C:
1) User search intention distribution theta output using user-search intention distribution modeluAs a multinomial distribution of parameters, the intention assignment of word pairs is sampled based on the multinomial distribution: z is a radical ofu,Cw~Multi(θu) Wherein Multi represents a multinomial distribution; z is a radical ofu,CwDenotes the user's intention assignment, u denotes the user, CwRepresenting a word pair;
2) user-attendee search intent distribution output with user-attendee search intent distribution model
Figure FDA0002799627870000031
As a multinomial distribution of parameters, the intent assignment of a sample word pair:
Figure FDA0002799627870000032
ze,Cwan intention assignment indicating a user attendee, e indicating an attendee;
3) for each word in the set of word pairs C;
distribution tau of user classes output by user-classification modeluTaking Bernoulli distribution as parameter, sampling binary switch variable x-Bern (tau)u) Wherein Bern denotes bernoulli distribution;
if x is 0, the general word distribution phi output by the topic-general word distribution modelz,bAs a polynomial distribution of the parameters, two words w are sampled separatelyi,wj~Multi(φz,b);
If x is 1, the word distribution phi output by the topic-to-word distribution modelz,1、φz,2AsMultiple distribution of parameters, each sampling a word wi~Multi(φz,1) And another word wj~Multi(φz,2)。
8. The system of any one of claims 1 to 7, wherein, in the topic model submodule, Gibbs sampling is used to iteratively sample the established social network user aggregation topic model, so as to obtain user search intention distribution, intention distribution of user attendees, and word distribution of the user.
9. The social network user search intention processing system of claim 8, wherein the topic model sub-module outputs after Gibbs sampling iterative sampling:
user search intent distribution
Figure FDA0002799627870000041
Intent distribution of user followers
Figure FDA0002799627870000042
Word distribution of user search intentk=[φk,v1k,v2,,…,φk,vi,…,φk,vn];
Wherein n isu,kNumber of subject words, nuRepresenting the total number of words, suRepresenting the number of word pairs assigned to all topics of the user, su,kRepresenting the number of the word pairs distributed to the topics of the user attention, and K represents the number of the topics in the data set;
Figure FDA0002799627870000043
nk,virepresenting words v in a set C of word pairsiThe number of times assigned to the subject term; n iskRepresenting the total number of times of assigning the subject word in the word pair C, V representing the number of all words in the document, alpha, beta being the fourth, the fifth hyper-parameters, gamma beingSecond hyperparameter gammaiOr a third hyperparameter gammaj
10. The social network user search intent processing system of claim 9, wherein the intent aggregation sub-module clusters by
Figure FDA0002799627870000044
Obtaining the weight omega of the user search intention, wherein the weight omega is used for expressing the search intention of the user; thetauSearching the intention distribution for all users;
Figure FDA0002799627870000045
and pi is a weight parameter for the distribution of the search intention of all the user followers.
CN202011344972.4A 2020-11-25 2020-11-25 Social network user search intention processing system based on user aggregation topic model Active CN112307746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011344972.4A CN112307746B (en) 2020-11-25 2020-11-25 Social network user search intention processing system based on user aggregation topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011344972.4A CN112307746B (en) 2020-11-25 2020-11-25 Social network user search intention processing system based on user aggregation topic model

Publications (2)

Publication Number Publication Date
CN112307746A true CN112307746A (en) 2021-02-02
CN112307746B CN112307746B (en) 2021-08-17

Family

ID=74487813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011344972.4A Active CN112307746B (en) 2020-11-25 2020-11-25 Social network user search intention processing system based on user aggregation topic model

Country Status (1)

Country Link
CN (1) CN112307746B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130144854A1 (en) * 2011-12-06 2013-06-06 Microsoft Corporation Modeling actions for entity-centric search
CN105830065A (en) * 2013-12-19 2016-08-03 脸谱公司 Generating recommended search queries on online social networks
US20180032930A1 (en) * 2015-10-07 2018-02-01 0934781 B.C. Ltd System and method to Generate Queries for a Business Database
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108921413A (en) * 2018-06-22 2018-11-30 郑州大学 A kind of social networks degree of belief calculation method based on user intention

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130144854A1 (en) * 2011-12-06 2013-06-06 Microsoft Corporation Modeling actions for entity-centric search
US20170351772A1 (en) * 2011-12-06 2017-12-07 Microsoft Technology Licensing, Llc Modeling actions for entity-centric search
CN105830065A (en) * 2013-12-19 2016-08-03 脸谱公司 Generating recommended search queries on online social networks
US20180032930A1 (en) * 2015-10-07 2018-02-01 0934781 B.C. Ltd System and method to Generate Queries for a Business Database
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108921413A (en) * 2018-06-22 2018-11-30 郑州大学 A kind of social networks degree of belief calculation method based on user intention

Also Published As

Publication number Publication date
CN112307746B (en) 2021-08-17

Similar Documents

Publication Publication Date Title
Priyadarshini et al. A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis
US10891321B2 (en) Systems and methods for performing a computer-implemented prior art search
Xiaomei et al. Microblog sentiment analysis with weak dependency connections
CN110704640A (en) Representation learning method and device of knowledge graph
Wu et al. Personalized microblog sentiment classification via multi-task learning
CN112307762B (en) Search result sorting method and device, storage medium and electronic device
CN110807101A (en) Scientific and technical literature big data classification method
CN109992784B (en) Heterogeneous network construction and distance measurement method fusing multi-mode information
Dinh et al. A proposal of deep learning model for classifying user interests on social networks
Devika et al. A semantic graph-based keyword extraction model using ranking method on big social data
Liu Research on deep learning-based algorithm and model for personalized recommendation of resources
Savelev et al. The high-level overview of social media content search engine
Wu et al. A novel topic clustering algorithm based on graph neural network for question topic diversity
CN116843162B (en) Contradiction reconciliation scheme recommendation and scoring system and method
Yarushkina et al. Intelligent instrumentation for opinion mining in social media
Wang et al. Emotional contagion-based social sentiment mining in social networks by introducing network communities
Qiu et al. CLDA: An effective topic model for mining user interest preference under big data background
CN112231476A (en) Improved graph neural network scientific and technical literature big data classification method
CN112307746B (en) Social network user search intention processing system based on user aggregation topic model
CN111859955A (en) Public opinion data analysis model based on deep learning
Huang Research on sentiment classification of tourist destinations based on convolutional neural network
Kamel et al. Robust sentiment fusion on distribution of news
Jasim et al. Analyzing Social Media Sentiment: Twitter as a Case Study
CN112364260A (en) Social network user intention processing method
Dritsas et al. Aspect-based community detection of cultural heritage streaming data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant