CN115712772A - Topic propagation prediction method based on topic association - Google Patents
Topic propagation prediction method based on topic association Download PDFInfo
- Publication number
- CN115712772A CN115712772A CN202211444811.1A CN202211444811A CN115712772A CN 115712772 A CN115712772 A CN 115712772A CN 202211444811 A CN202211444811 A CN 202211444811A CN 115712772 A CN115712772 A CN 115712772A
- Authority
- CN
- China
- Prior art keywords
- topic
- user
- influence
- representing
- derived
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association; the method comprises the following steps: obtaining topic information, and extracting internal attributes and external attributes of the topic information; selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector; according to all internal attributes and part of external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix; extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user; the method can help public opinion departments to quickly take targeted measures and has good application prospect.
Description
Technical Field
The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association.
Background
Generally, a topic refers to a hot problem which is most concerned by the public within a certain time and a certain range. Various topics are widely spread in social networks, and bear a large number of speech and information behaviors of network users. The topic data reflects the user interests, behaviors and social relations, and information recommendation can be effectively carried out by utilizing the data for research. Meanwhile, in the real world, information contained in a topic is truly and falsely mixed, and when the topic is widely spread in a social group, a series of influences are exerted on the cognition of people and the stability of the society.
With the development of the internet, the way in which topics are propagated has changed dramatically. On one hand, the social platforms developed vigorously, such as micro blogs, micro mails and forums, provide information communication channels which span space, time and regions and cover the whole population for topic propagation. On the other hand, internet users sink to enable the composition of network groups to be more diversified, the flow and the spread of information to be flatter, and the development and the derivation of topics to be more complicated. This means that compared with the traditional information transmission mode, the topic transmission speed is faster, the influence is wider, and the form is more complex nowadays. Therefore, for the research on the topic spreading situation, the information spreading characteristics can be better understood, and meanwhile, the method has important significance for preventing emergencies and public opinion management and control.
In recent years, scholars have conducted a series of researches on the propagation situation of topics in social networks from multiple dimensions and have achieved remarkable effects; with the continuous increase of data volume of the social networking platform and the maturity of deep learning technology, the prediction of topic propagation based on a neural network and a deep learning model is favored. However, many challenges remain with topic dissemination, such as: 1. relevance, complexity of the derived topic feature space. The derived topic is evolved from a primary topic, and compared with a single topic, the features of the derived topic and the primary topic are mutually interwoven, and information dynamic exchange is continuously carried out, so that the method is a challenge on how to effectively extract topic features; 2. complex associations of users in the process of propagation of the native-derived topics. In the topic transmission process, the primary topic and the derived topic are mutually played, how to quantify the user influence of the primary topic and the derived topic, and the problem that the hidden relation among users is urgently needed to be solved is excavated; 3. the stage and the timeliness of the dynamic evolution of the derived topics. The evolution trend of the derived topics is influenced by the original topics while dynamically changing along with the time, the topic states alternately evolve, and how to dynamically analyze the propagation situation of the derived topics is the difficulty faced by the current research.
Therefore, the invention provides an information propagation prediction method based on topic association, and the derived topics are introduced, so that the propagation situation of the topics can be effectively predicted, and the association and game relation of the original topics and the derived topics in the propagation process can be more truly reflected.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a topic propagation prediction method based on topic association, which comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;
the process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following steps:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set;
s2: selecting user interest feature keywords and user cognitive feature keywords from the user topic content set, and performing vector representation on the selected keywords to obtain a user historical behavior feature vector;
s3: according to all internal attributes and the friend driving power and the topic popularity of the user in the external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;
s4: extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user;
s5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
Preferably, the process of selecting the user interest feature keywords and the user cognitive feature keywords from the user topic content set by using the DTR2vec algorithm includes:
s21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user ;
S22: computing a set of native topic keywords key pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the original topic content set and the derived topic content set, and putting two keywords with the relevance greater than the similarity into the topic associated feature word set key com ;
S23: key set from native topic keywords pre And a key set of key words of the topic subject matter deri Feature word set key associated with topic com Computing user content keyword set keys user Interest weight and cognition weight of;
s24: root of each otherKey set from user content keywords according to interest weight and cognitive weight user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.
Further, a BM25 algorithm is adopted to calculate the similarity between the original topic content set and the derived topic content set, and the formula is as follows:
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of the native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
Further, a user content keyword set key is calculated user The formula of interest weight and cognitive weight of (a) is:
wherein, w i , inter Representing the interest weight, w, of the ith word in the user's content keyword set i , cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
Preferably, the process of quantifying the influence of the topic on the user by the evolutionary game theory comprises the following steps:
s31: calculating internal influence according to the internal attribute, and calculating external influence according to the friend drive and topic popularity of the user;
s32: calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence;
s33: defining two game strategies, calculating first benefits according to the first game strategy and the influence of the primary topic, and calculating second benefits according to the second game strategy and the influence of the derived topic;
s34: calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;
s35: and calculating a topic influence adjacency matrix according to the topic propagation behavior influence of the users in the original topic and the topic propagation behavior influence of the users in the derived topic.
Further, the formula for calculating the internal influence and the external influence is as follows:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of, ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i In the wordThe perception rate of the subject is determined,representing user u j For user u i The friend(s) is dynamic, and Hot (t) represents the topic popularity at the current time t.
Further, the formula for calculating the influence of the topic propagation behavior of the user in the original topic and the influence of the topic propagation behavior of the user in the derived topic is as follows:
wherein Mut P (u i ,u j ) Representing user u on native topic j For user u i Influence of (3), mut D (u i ,u j ) Representing users u in derived topics j For user u i Influence of (4), pro P (u i ,u j ) Denotes the first benefit, pro D (u i ,u j ) A second benefit is indicated.
Further, the topic influence adjacency matrix is represented as:
wherein the content of the first and second substances,a adjacency matrix representing the influence of the topic, represents topic propagation behavior influence, mut, among users P (u i ,u j ) Representing user u on native topic j For user u i Influence of, mut D (u i ,u j ) Representing users u in derived topics j For user u i The influence of (c).
Preferably, the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector includes: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.
Preferably, the expression of the DT-GCN model is as follows:
wherein Z represents a category probability output by the user,representing a preprocessed adjacency matrix, A representing an adjacency matrix, H 0 The input layer is represented by a representation of,representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W i Is the weight matrix of the i-layer network.
The beneficial effects of the invention are as follows: according to the method, the hidden relation between the original-derived topics is utilized to quantify the user interest and the cognitive process, the influence of the propagation process of mutual promotion and inhibition of the associated topics on the user behavior is focused, and a topic propagation prediction model based on topic association is constructed by combining topic characteristics, user characteristics and a strong neural network; the method introduces the associated topics, can effectively predict the propagation situation of the topics, and can more truly reflect the association and game relation of the original topics and the derived topics in the propagation process, so that the prediction effect is more in line with the actual situation, and the accuracy is high.
Drawings
FIG. 1 is a schematic structural diagram of a topic propagation prediction method based on topic association in the present invention;
FIG. 2 is a schematic diagram of a process of extracting network structure feature vectors of users according to the present invention;
FIG. 3 is a schematic diagram of the DT-GCN model structure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a topic propagation prediction method based on topic association, as shown in fig. 1, the method comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of the user to the topic; controlling the topic propagation trend according to the topic propagation trend of the user;
the topic information can be obtained from a public data website or by utilizing a mature social network public API, and comprises historical behavior information, topic participation information and user basic attribute information of all participating users of the original-derived topics in the life cycle of the users, the historical behavior information of the users comprises information such as user historical forwarding and comments, the topic participation information comprises information such as the time when the original-derived topics are forwarded and the comments are made, and the user basic attribute information comprises friend relation information of the participating users.
The topic information is preprocessed, specifically, the topic information is subjected to simple data cleaning, most unstructured data are structured, abnormal values or null values do not appear any more, and inconvenience brought to subsequent calculation is reduced.
The process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following contents:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set.
User activity Act (u) i ):
The user activity is the positive degree of the user forwarding behavior relative to other users, and the higher the user forwarding positivity is, the higher the number of times of forwarding in a period of time is. The invention defines users u i The activity of (A) is as follows:
wherein, num [ orig (u) ] i )]Representing a user u during a time period T before a outbreak of a derived topic i Number of original hair beats of (1), num [ retw (u) i )]Representing a user u during a time period T before a outbreak of a derived topic i The number of forwarded beats; since the predicted probability of the user forwarding the topic is the sigma epsilon [0,1]Representing the de-emphasis factor.
User topic perception rate Pre (u) i ):
The topic perception rate of the user reflects the probability of the user contacting a new topic, is reflected by the attention number of the user, and reflects the probability of the user participating when the new topic appears to a certain extent, and the topic perception rate of the user is defined as follows:
of these, fol (u) i ) Representing user u i Fol ave (net) represents the average number of interests of all users in the social network.
User historical forwarding rate Ret (u) i ):
The historical forwarding rate of the user reflects the user's tendency to forward behavior to some extent. The main source for the user to acquire the topic is the information of friends. Therefore, the invention defines the historical forwarding rate of the user as:
among them, retwnNum (u) i ) Representing the number of historical forwarded microblogs of the user, getRetNum (u) i ) And the number of all the microblogs forwarded by the user from the friend is represented, and the friend is the user concerned by the user.
Under a topic network, users usually participate in a certain topic under the influence of the propagation behavior of concerned users, and the higher the interaction frequency among the users is, the higher the driving force among the users is, and the higher the probability of mutually forwarding the topics is. Different friends of the user have different power to the friend, and the friend is structured as follows:
wherein the content of the first and second substances,representing user u j For user u i The good friends of the user are provided with power,representing user u i Forwarding the average number of original microblogs of friend users, count (Fri) is used for representingThe friends number of the user, ifOr user u j Not user u i Good friends of (1), then
Topic heat Hot (t):
the topic popularity is reflected in topic forwarding, comments, praise and the like in the social network, and can rapidly rise in a short time, but can rapidly fall after the popularity reaches the top. Considering that this process is similar to the half-life of an element, a half-life function is introducedDefining the topic heat as:
hot (t) represents the heat degree of the topic at the current time t, retNum (t) and RetNum (t-1) respectively represent the forwarding amount of the topic till the current time and the previous time, t' represents the time when the initial topic is generated, and w represents a regularization factor.
User topic content set TInfo (t):
in the process of spreading topics, due to the uniqueness of people, the ideas of users facing the same topic are different, and the comments are different, so the topic comments can reflect topic attributes and characteristics, and meanwhile, the topic characteristics can be changed along with the spreading of the topics, and the topic content set is represented as follows:
TInfo(t)={(u i ,info)|u i ∈U}
wherein info (t) represents user u in topic propagation space within time period t i The comments made. The native topic content set is denoted TInfo p (t), the set of derived topic content is denoted TInfo d (t)。
S2: selecting user interest characteristic keywords and user cognition characteristic keywords from a user topic content set by adopting a DTR2vec algorithm (a derived topic representation learning algorithm based on topic association), and carrying out vector representation on the selected keywords to obtain a user historical behavior characteristic vector.
The DTR2vec algorithm designed by the invention firstly utilizes an LDA (latent Dirichlet distribution) topic identification model to construct the correlation characteristics of the primary-derived topics and the user characteristics, then extracts the cognitive accumulation and the interest degree of the user according to the state transition of the user to the primary-derived topics, and finally utilizes representation learning to vectorize the low dimension of the user.
S21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user 。
Extracting the correlation characteristics of the original topic and the derived topic by using an LDA model, specifically, forming content expressed in the form of an article by regarding microblog content issued and forwarded by a user as a paragraph, and dividing the content into an original topic content set, a derived topic content set and a user social content set; and respectively regarding the original topic content set, the derived topic content set and the user social content set as an article, and respectively processing the three content sets by adopting an LDA (latent dirichlet allocation) model, wherein the optimal number of topics is obtained by performing multiple clustering experiments by utilizing different numbers of topics.
Through LDA model processing, a primary topic keyword set key is obtained pre Deriving topic keyword set keys deri And user content keyword set key user 。
S22: computing native topic keyword set keys pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the native topic content set and the derived topic content set, and enabling the association degree to be larger than two key degrees of similarityTopic putting word association feature word set key com 。
The derived topics are developed and changed from the original topics, so that necessary association exists between the original topics and the derived topics. The similarity score of the original topic content set and the derived topic content set is obtained by utilizing a BM25 (best match) algorithm and is used as a correlation degree threshold value of the original-derived topic, and the calculation formula is as follows:
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
Computing key set key of native topic by utilizing cosine computing formula pre And derived topic subject matter keyword set key deri The correlation degree between the two groups is calculated by the following formula:
wherein, X i ,Y i Vectors representing keywords key1, key2, respectively, and m represents a keyword vector dimension.
Putting two keywords with relevance degrees larger than similarity degrees into topic associated feature word set key com 。
S23: set of keys according to native topic keywords pre And key set key of topic subject matter deri Relevance degree and topic associated feature word set key cpm Computing user content keyword set keys user Interest weight and cognitive weight.
Whether a user forwards a topic is closely related to the interest and cognition of the user on the topic, and considering that the user accumulates certain cognition on the topic and weakens the common characteristic interest of the original-derived topic at the same time after the original topic appears, the method selects key words by using cosine distance as weight, and the formula for calculating the interest weight and the cognition weight is as follows:
wherein w i,inter Weight of interest, w, representing the ith word in the user content keyword set i , cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
S24: collecting key from user content keywords according to interest weight and cognitive weight respectively user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.
Vector representation is carried out on the selected keywords (the user interest feature keywords and the user cognitive feature keywords) based on a Doc2vec algorithm, and a user historical behavior feature vector D is output and represented as follows:
D=K×F a (u)
wherein K represents the number of users under the derived topic, F a (u) represents a corresponding applicationUser interest feature vectors and user cognitive feature vectors.
S3: and according to the driving power and the topic popularity of the user friends in all the internal attributes and the external attributes, the influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix.
The original topic and the derived topic are mutually gambled in the process of propagation, so when predicting the propagation behavior of the user, the influence of interaction between the topics should be considered. The invention introduces the evolutionary game theory to quantify the influence of the primary-derived topics on the users, and the process is as follows:
s31: and calculating the internal influence according to the internal attribute, and calculating the external influence according to the friend drive and the topic popularity of the user.
As shown in fig. 2, the internal influence is composed of internal attributes, and the external influence is composed of external attributes, and the calculation formula is as follows:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of (c), ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i Topic perception rate of (1).
S32: and calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence.
And (3) integrating internal influence factors and external factors, and constructing topic influence functions of the original topic and the derived topic by using a multiple linear regression algorithm:
wherein, inf Pre (u i ,u j ) Representing user u j For user u i Influence of the primary topic of, inf dri (u i ,u j ) Representing user u j For user u i The influence of the derived topic of (a),representing user u j For user u i The external influence of the primary topic of (a),representing user u j For user u i External influence of the derived topic of (1), ρ 0 、ρ 1 、ρ 2 The first partial regression coefficient, the second partial regression coefficient and the third partial regression coefficient are obtained by training a multiple linear regression algorithm respectively.
S33: two game strategies are defined, a first benefit is calculated according to the first game strategy and the influence of the primary topic, and a second benefit is calculated according to the second game strategy and the influence of the derived topic.
According to the principle of game theory, the invention defines two game strategies: strategy 1: "forward native topic", policy 2: "forward the derived topic". By P 1 、P 2 The method comprises the following steps of respectively representing the proportion of forwarding original topics and derived topics in adjacent users of a target user, wherein the revenue functions of the two strategies are as follows:
Pro P (ui,u j )=P 1 ×Inf p (u i ,u j )
Pro D (u i ,u j )=P 2 ×Inf d (u i ,u j )
wherein Pro is P (u i ,u j ) Representing a first benefitI.e. the income, pro, obtained by the user forwarding the native topic D (u i ,u j ) And expressing the second benefit, namely the benefit obtained by the user for forwarding the derived topic.
S34: and calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income.
Wherein Mut P (u i ,u j )、Mut D (u i ,u j ) Representing native and derived topic users u after an evolutionary game, respectively j For user u i The topic propagation behavior influence of (1).
S35: and calculating a topic influence matrix according to the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic.
Obtaining the final native-derived topic influence adjacency matrix in consideration of the competitiveness of the native and derived topics
Wherein the content of the first and second substances, the topic propagation behavior influence among users is shown, and if i = j, the topic propagation behavior influence is shown
S4: and extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user.
Considering the complexity of the social network structure, the invention uses an SDNE (graph network embedding algorithm) method for network representation learning, and directly uses a native-derived topic influence adjacency matrixAs input, the network structure feature vector is output as a user, expressed as:
S=K×F b
wherein K is the number of network nodes, namely the number of users under the derived topic, F b Is the social structure feature vector of the corresponding user.
S5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
The prediction task aims at predicting whether a potential user node can participate in forwarding of a related topic, judging whether the user node forwards or not, and if the user node forwards the related topic, forwarding a primary topic or a derived topic so as to convert the primary topic into a three-classification task; considering that the direct splicing of two types of features can cause overlong model input, as shown in fig. 3, the invention designs a DT-GCN (graph convolution neural network based on associated topics) model, which adds a CNN (convolution neural network) layer in front of the GCN (graph convolution neural network) model; the processing process of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises the following steps: constructing a primary topic and derived topic propagation mixed network according to topic information, obtaining an adjacency matrix according to connection information between nodes in the primary topic and derived topic propagation mixed network, inputting a user historical behavior feature vector and a user network structure feature vector into a CNN network for convolution, and obtaining a feature matrix; inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function, wherein the softmax function converts the convolution output into probability values of different nodes in different classifications, so as to obtain a spreading prediction result of the user on the topic.
Preprocessing adjacency matrices in GCN networksAnd calculating according to the adjacency matrix to obtain:
wherein, A represents an original adjacency matrix obtained by spreading connection information between nodes in the hybrid network according to the original topic and the derived topic, and D represents a degree matrix of the adjacency matrix.
The expression of the DT-GCN model is as follows:
wherein Z represents a category probability output by the user; reLU (x) = max (0,x) represents an activation function;the added drop layer is shown to randomly sample parameters according to a certain probability p, namely, the jth neuron of the ith layer in the network layer is sampled, and values are discarded to prevent the model from being over-fitted;
it is shown that the activation function is,representing the adjacency matrix with preprocessing, A representing the adjacency matrix, H 1 Representing the characteristic attribute of each layer of nodes, for input layer H 0 I.e. the user's historical behavior feature vector and the user's network structure feature vector, ReLU represents the activation function, CNN _ model () represents the CNN network, W i Is the weight matrix of the i-layer network.
The invention discusses a three-classification prediction problem, so that the model output Z = P (o, alpha, d | u) i ) The specific definition is as follows:
wherein, P (o | u) i ) Representing the probability of a user forwarding a native topic, P (a | u |) i ) Denotes the probability that the user is not participating in the topic, P (d | u) i ) Representing the probability of forwarding the derived topic by the user, and if the corresponding Y =1, judging the potential user u i Forwarding the native topic in the next time period; if Y = -1, judging potential user u i Forwarding the derived topic in the next time period; otherwise, potential user u i And not participating in the hot topic forwarding in the next time period.
The topic propagation prediction method based on topic association provided by the invention firstly provides a DTR2vec algorithm from the association relation among the characteristics of the primary-derived topic in different evolution stages, and the cognitive accumulation and the interest degree of a user are expressed and learned, so that the topic characteristic space is low in vectorization. Then, considering the antagonism and the symbiosis of the primary-derived topics in the propagation process, introducing an evolutionary game theory, and forming a network structure characteristic representation of the influence of the primary-derived topics among users; and finally, analyzing the association and game relation of the primary-derived topics in the spreading process, and predicting and analyzing the topic spreading situation by integrating the information.
The method and the system predict the propagation trend of the topic by predicting the propagation trend of the user to the topic, can be applied to user personalized recommendation and marketing, and are beneficial to mining the forwarding preference of the user, accurately putting advertisements and formulating a propaganda plan. Meanwhile, forwarding and propagation trends of monitoring hot topics can be known in advance, and public opinion departments can be helped to quickly take targeted measures aiming at bad information, so that the network environment is purified, and correct value guidance can be established in the society by leading to the development of positive energy topics.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A topic propagation prediction method based on topic association is characterized by comprising the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;
the process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following steps:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set;
s2: selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector;
s3: according to the driving force and the topic popularity of the user friends in all internal attributes and external attributes, influence of an evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;
s4: extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user;
s5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
2. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of selecting user interest feature keywords and user cognition feature keywords from a user topic content set using a DTR2vec algorithm comprises:
s21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user ;
S22: computing a set of native topic keywords key pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the original topic content set and the derived topic content set, and putting two keywords with the relevance greater than the similarity into the topic associated feature word set key com ;
S23: key set from native topic keywords pre And a key set of key words of the topic subject matter deri Feature word set key associated with topic com Computing user content keyword set keys user Interest weight and cognition weight of;
s24: collecting key from user content keywords according to interest weight and cognitive weight respectively user Top-k keywords are selected as user interest characteristic keywords and user cognitive characteristic keywords.
3. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the BM25 algorithm is used to calculate the similarity between the original topic content set and the derived topic content set, and the formula is:
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of the native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
4. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the user content keyword set key is calculated user The formula of interest weight and cognitive weight of (a) is:
wherein, w i,inter Weight of interest, w, representing the ith word in the user content keyword set i,cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
5. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of quantifying the influence of topics on users using evolutionary game theory comprises:
s31: calculating internal influence according to the internal attribute, and calculating external influence according to the friend drive and topic popularity of the user;
s32: calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence;
s33: defining two game strategies, calculating first benefits according to the first game strategy and the influence of the primary topic, and calculating second benefits according to the second game strategy and the influence of the derived topic;
s34: calculating the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;
s35: and calculating a topic influence adjacency matrix according to the topic propagation behavior influence of the users in the original topic and the topic propagation behavior influence of the users in the derived topic.
6. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the internal influence and the external influence is:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of, ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i The rate of topic perception of (a),representing user u j For user u i The friend(s) is dynamic, and Hot (t) represents the topic popularity at the current time t.
7. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the influence of topic propagation behavior of users in the original topic and the influence of topic propagation behavior of users in the derived topic is:
wherein, mut P (u i ,u j ) Representing user u on a native topic j For user u i Influence of, mut D (u i ,u j ) Representing users u in derived topics j For user u i Influence of (4), pro P (u i ,u j ) Indicates the first benefit, pro D (u i ,u j ) A second benefit is indicated.
8. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the topic influence adjacency matrix is expressed as:
wherein the content of the first and second substances,a adjacency matrix representing the influence of the topic, representing the influence of topic propagation behavior, mut, among users P (u i ,u j ) Representing user u on native topic j For user u i Influence of (3), mut D (u i ,u j ) Representing users u in derived topics j For user u i The influence of (c).
9. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.
10. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the expression of the DT-GCN model is:
wherein Z represents a category probability output by the user,representing a preprocessed adjacency matrix, A representing an adjacency matrix, H 0 Which represents the input layer(s) of the device,representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W i Is a weight matrix of the i-layer network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211444811.1A CN115712772A (en) | 2022-11-18 | 2022-11-18 | Topic propagation prediction method based on topic association |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211444811.1A CN115712772A (en) | 2022-11-18 | 2022-11-18 | Topic propagation prediction method based on topic association |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115712772A true CN115712772A (en) | 2023-02-24 |
Family
ID=85233873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211444811.1A Pending CN115712772A (en) | 2022-11-18 | 2022-11-18 | Topic propagation prediction method based on topic association |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115712772A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628179A (en) * | 2023-05-30 | 2023-08-22 | 道有道科技集团股份公司 | User operation data visualization and man-machine interaction recommendation method |
CN117635190A (en) * | 2023-11-27 | 2024-03-01 | 河北数港科技有限公司 | Log data analysis method and system |
-
2022
- 2022-11-18 CN CN202211444811.1A patent/CN115712772A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628179A (en) * | 2023-05-30 | 2023-08-22 | 道有道科技集团股份公司 | User operation data visualization and man-machine interaction recommendation method |
CN116628179B (en) * | 2023-05-30 | 2023-12-22 | 道有道科技集团股份公司 | User operation data visualization and man-machine interaction recommendation method |
CN117635190A (en) * | 2023-11-27 | 2024-03-01 | 河北数港科技有限公司 | Log data analysis method and system |
CN117635190B (en) * | 2023-11-27 | 2024-05-14 | 河北数港科技有限公司 | Log data analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yuvaraj et al. | Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification | |
Xiong et al. | An emotional contagion model for heterogeneous social media with multiple behaviors | |
CN110795641B (en) | Network rumor propagation control method based on representation learning | |
CN106651030B (en) | Improved RBF neural network hot topic user participation behavior prediction method | |
CN115712772A (en) | Topic propagation prediction method based on topic association | |
CN110807556B (en) | Method and device for predicting propagation trend of microblog rumors or/and dagger topics | |
CN110909529B (en) | User emotion analysis and prejudgment system of company image promotion system | |
CN103064917A (en) | Specific-tendency high-influence user group discovering method orienting microblog | |
Ma et al. | Mixed information flow for cross-domain sequential recommendations | |
Wang et al. | A multidimensional network link prediction algorithm and its application for predicting social relationships | |
Li et al. | Network embedding enhanced intelligent recommendation for online social networks | |
Xiao et al. | User behavior prediction of social hotspots based on multimessage interaction and neural network | |
Huang et al. | Information fusion oriented heterogeneous social network for friend recommendation via community detection | |
Yu et al. | Collaborative group embedding and decision aggregation based on attentive influence of individual members: A group recommendation perspective | |
Wu et al. | Unlocking author power: On the exploitation of auxiliary author-retweeter relations for predicting key retweeters | |
Zhang et al. | LBCF: A link-based collaborative filtering for overfitting problem in recommender system | |
Liu et al. | A reliable cross-site user generated content modeling method based on topic model | |
Thiriot et al. | USING ASSOCIATIVE NETWORKS TO REPRESENT ADOPTERS'BELIEFS IN A MULTIAGENT MODEL OF INNOVATION DIFFUSION | |
CN113919440A (en) | Social network rumor detection system integrating dual attention mechanism and graph convolution | |
Yang et al. | A model for early rumor detection base on topic-derived domain compensation and multi-user association | |
CN115495671A (en) | Cross-domain rumor propagation control method based on graph structure migration | |
CN112269945B (en) | Information propagation prediction method based on rumor splitting rumor promotion and three-way cognitive game | |
Yan et al. | Tackling the achilles heel of social networks: Influence propagation based language model smoothing | |
Yang et al. | Topic-Aware Popularity and Retweeter Prediction Model for Cascade Study | |
Lim et al. | Estimating domain-specific user expertise for answer retrieval in community question-answering platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |