CN115712772A - Topic propagation prediction method based on topic association - Google Patents

Topic propagation prediction method based on topic association Download PDF

Info

Publication number
CN115712772A
CN115712772A CN202211444811.1A CN202211444811A CN115712772A CN 115712772 A CN115712772 A CN 115712772A CN 202211444811 A CN202211444811 A CN 202211444811A CN 115712772 A CN115712772 A CN 115712772A
Authority
CN
China
Prior art keywords
topic
user
influence
representing
derived
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211444811.1A
Other languages
Chinese (zh)
Inventor
余翔
周心明
庞育才
段思睿
王蓉
肖云鹏
李暾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211444811.1A priority Critical patent/CN115712772A/en
Publication of CN115712772A publication Critical patent/CN115712772A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association; the method comprises the following steps: obtaining topic information, and extracting internal attributes and external attributes of the topic information; selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector; according to all internal attributes and part of external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix; extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user; the method can help public opinion departments to quickly take targeted measures and has good application prospect.

Description

Topic propagation prediction method based on topic association
Technical Field
The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association.
Background
Generally, a topic refers to a hot problem which is most concerned by the public within a certain time and a certain range. Various topics are widely spread in social networks, and bear a large number of speech and information behaviors of network users. The topic data reflects the user interests, behaviors and social relations, and information recommendation can be effectively carried out by utilizing the data for research. Meanwhile, in the real world, information contained in a topic is truly and falsely mixed, and when the topic is widely spread in a social group, a series of influences are exerted on the cognition of people and the stability of the society.
With the development of the internet, the way in which topics are propagated has changed dramatically. On one hand, the social platforms developed vigorously, such as micro blogs, micro mails and forums, provide information communication channels which span space, time and regions and cover the whole population for topic propagation. On the other hand, internet users sink to enable the composition of network groups to be more diversified, the flow and the spread of information to be flatter, and the development and the derivation of topics to be more complicated. This means that compared with the traditional information transmission mode, the topic transmission speed is faster, the influence is wider, and the form is more complex nowadays. Therefore, for the research on the topic spreading situation, the information spreading characteristics can be better understood, and meanwhile, the method has important significance for preventing emergencies and public opinion management and control.
In recent years, scholars have conducted a series of researches on the propagation situation of topics in social networks from multiple dimensions and have achieved remarkable effects; with the continuous increase of data volume of the social networking platform and the maturity of deep learning technology, the prediction of topic propagation based on a neural network and a deep learning model is favored. However, many challenges remain with topic dissemination, such as: 1. relevance, complexity of the derived topic feature space. The derived topic is evolved from a primary topic, and compared with a single topic, the features of the derived topic and the primary topic are mutually interwoven, and information dynamic exchange is continuously carried out, so that the method is a challenge on how to effectively extract topic features; 2. complex associations of users in the process of propagation of the native-derived topics. In the topic transmission process, the primary topic and the derived topic are mutually played, how to quantify the user influence of the primary topic and the derived topic, and the problem that the hidden relation among users is urgently needed to be solved is excavated; 3. the stage and the timeliness of the dynamic evolution of the derived topics. The evolution trend of the derived topics is influenced by the original topics while dynamically changing along with the time, the topic states alternately evolve, and how to dynamically analyze the propagation situation of the derived topics is the difficulty faced by the current research.
Therefore, the invention provides an information propagation prediction method based on topic association, and the derived topics are introduced, so that the propagation situation of the topics can be effectively predicted, and the association and game relation of the original topics and the derived topics in the propagation process can be more truly reflected.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a topic propagation prediction method based on topic association, which comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;
the process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following steps:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set;
s2: selecting user interest feature keywords and user cognitive feature keywords from the user topic content set, and performing vector representation on the selected keywords to obtain a user historical behavior feature vector;
s3: according to all internal attributes and the friend driving power and the topic popularity of the user in the external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;
s4: extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user;
s5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
Preferably, the process of selecting the user interest feature keywords and the user cognitive feature keywords from the user topic content set by using the DTR2vec algorithm includes:
s21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user
S22: computing a set of native topic keywords key pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the original topic content set and the derived topic content set, and putting two keywords with the relevance greater than the similarity into the topic associated feature word set key com
S23: key set from native topic keywords pre And a key set of key words of the topic subject matter deri Feature word set key associated with topic com Computing user content keyword set keys user Interest weight and cognition weight of;
s24: root of each otherKey set from user content keywords according to interest weight and cognitive weight user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.
Further, a BM25 algorithm is adopted to calculate the similarity between the original topic content set and the derived topic content set, and the formula is as follows:
Figure BDA0003949792240000031
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of the native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
Further, a user content keyword set key is calculated user The formula of interest weight and cognitive weight of (a) is:
Figure BDA0003949792240000041
Figure BDA0003949792240000042
wherein, w iinter Representing the interest weight, w, of the ith word in the user's content keyword set icong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
Preferably, the process of quantifying the influence of the topic on the user by the evolutionary game theory comprises the following steps:
s31: calculating internal influence according to the internal attribute, and calculating external influence according to the friend drive and topic popularity of the user;
s32: calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence;
s33: defining two game strategies, calculating first benefits according to the first game strategy and the influence of the primary topic, and calculating second benefits according to the second game strategy and the influence of the derived topic;
s34: calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;
s35: and calculating a topic influence adjacency matrix according to the topic propagation behavior influence of the users in the original topic and the topic propagation behavior influence of the users in the derived topic.
Further, the formula for calculating the internal influence and the external influence is as follows:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
Figure BDA0003949792240000043
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of, ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i In the wordThe perception rate of the subject is determined,
Figure BDA0003949792240000051
representing user u j For user u i The friend(s) is dynamic, and Hot (t) represents the topic popularity at the current time t.
Further, the formula for calculating the influence of the topic propagation behavior of the user in the original topic and the influence of the topic propagation behavior of the user in the derived topic is as follows:
Figure BDA0003949792240000052
Figure BDA0003949792240000053
wherein Mut P (u i ,u j ) Representing user u on native topic j For user u i Influence of (3), mut D (u i ,u j ) Representing users u in derived topics j For user u i Influence of (4), pro P (u i ,u j ) Denotes the first benefit, pro D (u i ,u j ) A second benefit is indicated.
Further, the topic influence adjacency matrix is represented as:
Figure BDA0003949792240000054
wherein the content of the first and second substances,
Figure BDA0003949792240000055
a adjacency matrix representing the influence of the topic,
Figure BDA0003949792240000056
Figure BDA0003949792240000057
Figure BDA0003949792240000058
represents topic propagation behavior influence, mut, among users P (u i ,u j ) Representing user u on native topic j For user u i Influence of, mut D (u i ,u j ) Representing users u in derived topics j For user u i The influence of (c).
Preferably, the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector includes: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.
Preferably, the expression of the DT-GCN model is as follows:
Figure BDA0003949792240000061
wherein Z represents a category probability output by the user,
Figure BDA0003949792240000062
representing a preprocessed adjacency matrix, A representing an adjacency matrix, H 0 The input layer is represented by a representation of,
Figure BDA0003949792240000063
representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W i Is the weight matrix of the i-layer network.
The beneficial effects of the invention are as follows: according to the method, the hidden relation between the original-derived topics is utilized to quantify the user interest and the cognitive process, the influence of the propagation process of mutual promotion and inhibition of the associated topics on the user behavior is focused, and a topic propagation prediction model based on topic association is constructed by combining topic characteristics, user characteristics and a strong neural network; the method introduces the associated topics, can effectively predict the propagation situation of the topics, and can more truly reflect the association and game relation of the original topics and the derived topics in the propagation process, so that the prediction effect is more in line with the actual situation, and the accuracy is high.
Drawings
FIG. 1 is a schematic structural diagram of a topic propagation prediction method based on topic association in the present invention;
FIG. 2 is a schematic diagram of a process of extracting network structure feature vectors of users according to the present invention;
FIG. 3 is a schematic diagram of the DT-GCN model structure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a topic propagation prediction method based on topic association, as shown in fig. 1, the method comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of the user to the topic; controlling the topic propagation trend according to the topic propagation trend of the user;
the topic information can be obtained from a public data website or by utilizing a mature social network public API, and comprises historical behavior information, topic participation information and user basic attribute information of all participating users of the original-derived topics in the life cycle of the users, the historical behavior information of the users comprises information such as user historical forwarding and comments, the topic participation information comprises information such as the time when the original-derived topics are forwarded and the comments are made, and the user basic attribute information comprises friend relation information of the participating users.
The topic information is preprocessed, specifically, the topic information is subjected to simple data cleaning, most unstructured data are structured, abnormal values or null values do not appear any more, and inconvenience brought to subsequent calculation is reduced.
The process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following contents:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set.
User activity Act (u) i ):
The user activity is the positive degree of the user forwarding behavior relative to other users, and the higher the user forwarding positivity is, the higher the number of times of forwarding in a period of time is. The invention defines users u i The activity of (A) is as follows:
Figure BDA0003949792240000071
wherein, num [ orig (u) ] i )]Representing a user u during a time period T before a outbreak of a derived topic i Number of original hair beats of (1), num [ retw (u) i )]Representing a user u during a time period T before a outbreak of a derived topic i The number of forwarded beats; since the predicted probability of the user forwarding the topic is the sigma epsilon [0,1]Representing the de-emphasis factor.
User topic perception rate Pre (u) i ):
The topic perception rate of the user reflects the probability of the user contacting a new topic, is reflected by the attention number of the user, and reflects the probability of the user participating when the new topic appears to a certain extent, and the topic perception rate of the user is defined as follows:
Figure BDA0003949792240000081
of these, fol (u) i ) Representing user u i Fol ave (net) represents the average number of interests of all users in the social network.
User historical forwarding rate Ret (u) i ):
The historical forwarding rate of the user reflects the user's tendency to forward behavior to some extent. The main source for the user to acquire the topic is the information of friends. Therefore, the invention defines the historical forwarding rate of the user as:
Figure BDA0003949792240000082
among them, retwnNum (u) i ) Representing the number of historical forwarded microblogs of the user, getRetNum (u) i ) And the number of all the microblogs forwarded by the user from the friend is represented, and the friend is the user concerned by the user.
User friend driving force
Figure BDA0003949792240000083
Under a topic network, users usually participate in a certain topic under the influence of the propagation behavior of concerned users, and the higher the interaction frequency among the users is, the higher the driving force among the users is, and the higher the probability of mutually forwarding the topics is. Different friends of the user have different power to the friend, and the friend is structured as follows:
Figure BDA0003949792240000084
wherein the content of the first and second substances,
Figure BDA0003949792240000085
representing user u j For user u i The good friends of the user are provided with power,
Figure BDA0003949792240000086
representing user u i Forwarding the average number of original microblogs of friend users, count (Fri) is used for representingThe friends number of the user, if
Figure BDA0003949792240000087
Or user u j Not user u i Good friends of (1), then
Figure BDA0003949792240000088
Topic heat Hot (t):
the topic popularity is reflected in topic forwarding, comments, praise and the like in the social network, and can rapidly rise in a short time, but can rapidly fall after the popularity reaches the top. Considering that this process is similar to the half-life of an element, a half-life function is introduced
Figure BDA0003949792240000089
Defining the topic heat as:
Figure BDA00039497922400000810
hot (t) represents the heat degree of the topic at the current time t, retNum (t) and RetNum (t-1) respectively represent the forwarding amount of the topic till the current time and the previous time, t' represents the time when the initial topic is generated, and w represents a regularization factor.
User topic content set TInfo (t):
in the process of spreading topics, due to the uniqueness of people, the ideas of users facing the same topic are different, and the comments are different, so the topic comments can reflect topic attributes and characteristics, and meanwhile, the topic characteristics can be changed along with the spreading of the topics, and the topic content set is represented as follows:
TInfo(t)={(u i ,info)|u i ∈U}
wherein info (t) represents user u in topic propagation space within time period t i The comments made. The native topic content set is denoted TInfo p (t), the set of derived topic content is denoted TInfo d (t)。
S2: selecting user interest characteristic keywords and user cognition characteristic keywords from a user topic content set by adopting a DTR2vec algorithm (a derived topic representation learning algorithm based on topic association), and carrying out vector representation on the selected keywords to obtain a user historical behavior characteristic vector.
The DTR2vec algorithm designed by the invention firstly utilizes an LDA (latent Dirichlet distribution) topic identification model to construct the correlation characteristics of the primary-derived topics and the user characteristics, then extracts the cognitive accumulation and the interest degree of the user according to the state transition of the user to the primary-derived topics, and finally utilizes representation learning to vectorize the low dimension of the user.
S21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user
Extracting the correlation characteristics of the original topic and the derived topic by using an LDA model, specifically, forming content expressed in the form of an article by regarding microblog content issued and forwarded by a user as a paragraph, and dividing the content into an original topic content set, a derived topic content set and a user social content set; and respectively regarding the original topic content set, the derived topic content set and the user social content set as an article, and respectively processing the three content sets by adopting an LDA (latent dirichlet allocation) model, wherein the optimal number of topics is obtained by performing multiple clustering experiments by utilizing different numbers of topics.
Through LDA model processing, a primary topic keyword set key is obtained pre Deriving topic keyword set keys deri And user content keyword set key user
S22: computing native topic keyword set keys pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the native topic content set and the derived topic content set, and enabling the association degree to be larger than two key degrees of similarityTopic putting word association feature word set key com
The derived topics are developed and changed from the original topics, so that necessary association exists between the original topics and the derived topics. The similarity score of the original topic content set and the derived topic content set is obtained by utilizing a BM25 (best match) algorithm and is used as a correlation degree threshold value of the original-derived topic, and the calculation formula is as follows:
Figure BDA0003949792240000101
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
Computing key set key of native topic by utilizing cosine computing formula pre And derived topic subject matter keyword set key deri The correlation degree between the two groups is calculated by the following formula:
Figure BDA0003949792240000102
wherein, X i ,Y i Vectors representing keywords key1, key2, respectively, and m represents a keyword vector dimension.
Putting two keywords with relevance degrees larger than similarity degrees into topic associated feature word set key com
S23: set of keys according to native topic keywords pre And key set key of topic subject matter deri Relevance degree and topic associated feature word set key cpm Computing user content keyword set keys user Interest weight and cognitive weight.
Whether a user forwards a topic is closely related to the interest and cognition of the user on the topic, and considering that the user accumulates certain cognition on the topic and weakens the common characteristic interest of the original-derived topic at the same time after the original topic appears, the method selects key words by using cosine distance as weight, and the formula for calculating the interest weight and the cognition weight is as follows:
Figure BDA0003949792240000111
Figure BDA0003949792240000112
wherein w i,inter Weight of interest, w, representing the ith word in the user content keyword set icong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
S24: collecting key from user content keywords according to interest weight and cognitive weight respectively user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.
Vector representation is carried out on the selected keywords (the user interest feature keywords and the user cognitive feature keywords) based on a Doc2vec algorithm, and a user historical behavior feature vector D is output and represented as follows:
D=K×F a (u)
wherein K represents the number of users under the derived topic, F a (u) represents a corresponding applicationUser interest feature vectors and user cognitive feature vectors.
S3: and according to the driving power and the topic popularity of the user friends in all the internal attributes and the external attributes, the influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix.
The original topic and the derived topic are mutually gambled in the process of propagation, so when predicting the propagation behavior of the user, the influence of interaction between the topics should be considered. The invention introduces the evolutionary game theory to quantify the influence of the primary-derived topics on the users, and the process is as follows:
s31: and calculating the internal influence according to the internal attribute, and calculating the external influence according to the friend drive and the topic popularity of the user.
As shown in fig. 2, the internal influence is composed of internal attributes, and the external influence is composed of external attributes, and the calculation formula is as follows:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
Figure BDA0003949792240000121
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of (c), ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i Topic perception rate of (1).
S32: and calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence.
And (3) integrating internal influence factors and external factors, and constructing topic influence functions of the original topic and the derived topic by using a multiple linear regression algorithm:
Figure BDA0003949792240000122
Figure BDA0003949792240000123
wherein, inf Pre (u i ,u j ) Representing user u j For user u i Influence of the primary topic of, inf dri (u i ,u j ) Representing user u j For user u i The influence of the derived topic of (a),
Figure BDA0003949792240000124
representing user u j For user u i The external influence of the primary topic of (a),
Figure BDA0003949792240000125
representing user u j For user u i External influence of the derived topic of (1), ρ 0 、ρ 1 、ρ 2 The first partial regression coefficient, the second partial regression coefficient and the third partial regression coefficient are obtained by training a multiple linear regression algorithm respectively.
S33: two game strategies are defined, a first benefit is calculated according to the first game strategy and the influence of the primary topic, and a second benefit is calculated according to the second game strategy and the influence of the derived topic.
According to the principle of game theory, the invention defines two game strategies: strategy 1: "forward native topic", policy 2: "forward the derived topic". By P 1 、P 2 The method comprises the following steps of respectively representing the proportion of forwarding original topics and derived topics in adjacent users of a target user, wherein the revenue functions of the two strategies are as follows:
Pro P (ui,u j )=P 1 ×Inf p (u i ,u j )
Pro D (u i ,u j )=P 2 ×Inf d (u i ,u j )
wherein Pro is P (u i ,u j ) Representing a first benefitI.e. the income, pro, obtained by the user forwarding the native topic D (u i ,u j ) And expressing the second benefit, namely the benefit obtained by the user for forwarding the derived topic.
S34: and calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income.
Figure BDA0003949792240000131
Figure BDA0003949792240000132
Wherein Mut P (u i ,u j )、Mut D (u i ,u j ) Representing native and derived topic users u after an evolutionary game, respectively j For user u i The topic propagation behavior influence of (1).
S35: and calculating a topic influence matrix according to the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic.
Obtaining the final native-derived topic influence adjacency matrix in consideration of the competitiveness of the native and derived topics
Figure BDA0003949792240000133
Figure BDA0003949792240000134
Wherein the content of the first and second substances,
Figure BDA0003949792240000135
Figure BDA0003949792240000136
the topic propagation behavior influence among users is shown, and if i = j, the topic propagation behavior influence is shown
Figure BDA0003949792240000137
S4: and extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user.
Considering the complexity of the social network structure, the invention uses an SDNE (graph network embedding algorithm) method for network representation learning, and directly uses a native-derived topic influence adjacency matrix
Figure BDA0003949792240000138
As input, the network structure feature vector is output as a user, expressed as:
S=K×F b
wherein K is the number of network nodes, namely the number of users under the derived topic, F b Is the social structure feature vector of the corresponding user.
S5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
The prediction task aims at predicting whether a potential user node can participate in forwarding of a related topic, judging whether the user node forwards or not, and if the user node forwards the related topic, forwarding a primary topic or a derived topic so as to convert the primary topic into a three-classification task; considering that the direct splicing of two types of features can cause overlong model input, as shown in fig. 3, the invention designs a DT-GCN (graph convolution neural network based on associated topics) model, which adds a CNN (convolution neural network) layer in front of the GCN (graph convolution neural network) model; the processing process of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises the following steps: constructing a primary topic and derived topic propagation mixed network according to topic information, obtaining an adjacency matrix according to connection information between nodes in the primary topic and derived topic propagation mixed network, inputting a user historical behavior feature vector and a user network structure feature vector into a CNN network for convolution, and obtaining a feature matrix; inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function, wherein the softmax function converts the convolution output into probability values of different nodes in different classifications, so as to obtain a spreading prediction result of the user on the topic.
Preprocessing adjacency matrices in GCN networks
Figure BDA0003949792240000141
And calculating according to the adjacency matrix to obtain:
Figure BDA0003949792240000142
wherein, A represents an original adjacency matrix obtained by spreading connection information between nodes in the hybrid network according to the original topic and the derived topic, and D represents a degree matrix of the adjacency matrix.
The expression of the DT-GCN model is as follows:
Figure BDA0003949792240000143
wherein Z represents a category probability output by the user; reLU (x) = max (0,x) represents an activation function;
Figure BDA0003949792240000144
the added drop layer is shown to randomly sample parameters according to a certain probability p, namely, the jth neuron of the ith layer in the network layer is sampled, and values are discarded to prevent the model from being over-fitted;
Figure BDA0003949792240000145
it is shown that the activation function is,
Figure BDA0003949792240000146
representing the adjacency matrix with preprocessing, A representing the adjacency matrix, H 1 Representing the characteristic attribute of each layer of nodes, for input layer H 0 I.e. the user's historical behavior feature vector and the user's network structure feature vector, ReLU represents the activation function, CNN _ model () represents the CNN network, W i Is the weight matrix of the i-layer network.
The invention discusses a three-classification prediction problem, so that the model output Z = P (o, alpha, d | u) i ) The specific definition is as follows:
Figure BDA0003949792240000151
wherein, P (o | u) i ) Representing the probability of a user forwarding a native topic, P (a | u |) i ) Denotes the probability that the user is not participating in the topic, P (d | u) i ) Representing the probability of forwarding the derived topic by the user, and if the corresponding Y =1, judging the potential user u i Forwarding the native topic in the next time period; if Y = -1, judging potential user u i Forwarding the derived topic in the next time period; otherwise, potential user u i And not participating in the hot topic forwarding in the next time period.
The topic propagation prediction method based on topic association provided by the invention firstly provides a DTR2vec algorithm from the association relation among the characteristics of the primary-derived topic in different evolution stages, and the cognitive accumulation and the interest degree of a user are expressed and learned, so that the topic characteristic space is low in vectorization. Then, considering the antagonism and the symbiosis of the primary-derived topics in the propagation process, introducing an evolutionary game theory, and forming a network structure characteristic representation of the influence of the primary-derived topics among users; and finally, analyzing the association and game relation of the primary-derived topics in the spreading process, and predicting and analyzing the topic spreading situation by integrating the information.
The method and the system predict the propagation trend of the topic by predicting the propagation trend of the user to the topic, can be applied to user personalized recommendation and marketing, and are beneficial to mining the forwarding preference of the user, accurately putting advertisements and formulating a propaganda plan. Meanwhile, forwarding and propagation trends of monitoring hot topics can be known in advance, and public opinion departments can be helped to quickly take targeted measures aiming at bad information, so that the network environment is purified, and correct value guidance can be established in the society by leading to the development of positive energy topics.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A topic propagation prediction method based on topic association is characterized by comprising the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;
the process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following steps:
s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set;
s2: selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector;
s3: according to the driving force and the topic popularity of the user friends in all internal attributes and external attributes, influence of an evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;
s4: extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user;
s5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.
2. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of selecting user interest feature keywords and user cognition feature keywords from a user topic content set using a DTR2vec algorithm comprises:
s21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key pre Deriving topic keyword set keys deri And user content keyword set key user
S22: computing a set of native topic keywords key pre And derived topic subject matter keyword set key deri The degree of association of (c); calculating the similarity of the original topic content set and the derived topic content set, and putting two keywords with the relevance greater than the similarity into the topic associated feature word set key com
S23: key set from native topic keywords pre And a key set of key words of the topic subject matter deri Feature word set key associated with topic com Computing user content keyword set keys user Interest weight and cognition weight of;
s24: collecting key from user content keywords according to interest weight and cognitive weight respectively user Top-k keywords are selected as user interest characteristic keywords and user cognitive characteristic keywords.
3. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the BM25 algorithm is used to calculate the similarity between the original topic content set and the derived topic content set, and the formula is:
Figure FDA0003949792230000021
wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W i Representing word weight, q i Represents the ith word in the set Q, n represents the total number of words in the set Q of the native topic content, R (Q) i And d) represents the ith word Q in the set Q i And the degree of correlation of the set d.
4. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the user content keyword set key is calculated user The formula of interest weight and cognitive weight of (a) is:
Figure FDA0003949792230000022
Figure FDA0003949792230000023
wherein, w i,inter Weight of interest, w, representing the ith word in the user content keyword set i,cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set i,u ,key deri ) Set key for representing ith candidate keyword and topic keyword deri Similarity between them, sim (key) i,u ,key com ) Representing feature word set key for indicating association between ith candidate keyword and topic com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) i,u ,key pre ) Set key representing ith candidate keyword and native topic keyword pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.
5. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of quantifying the influence of topics on users using evolutionary game theory comprises:
s31: calculating internal influence according to the internal attribute, and calculating external influence according to the friend drive and topic popularity of the user;
s32: calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence;
s33: defining two game strategies, calculating first benefits according to the first game strategy and the influence of the primary topic, and calculating second benefits according to the second game strategy and the influence of the derived topic;
s34: calculating the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;
s35: and calculating a topic influence adjacency matrix according to the topic propagation behavior influence of the users in the original topic and the topic propagation behavior influence of the users in the derived topic.
6. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the internal influence and the external influence is:
f in (u i )=Act(u i )×Ret(u i )×Pre(u i )
Figure FDA0003949792230000031
wherein f is in (u i ) Representing user u i Internal influence of f out (u i ,u j ) Representing user u j For user u i External influence of, act (u) i ) User u i Activity of, ret (u) i ) Representing user u i Historical forwarding rate of, pre (u) i ) Representing user u i The rate of topic perception of (a),
Figure FDA0003949792230000032
representing user u j For user u i The friend(s) is dynamic, and Hot (t) represents the topic popularity at the current time t.
7. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the influence of topic propagation behavior of users in the original topic and the influence of topic propagation behavior of users in the derived topic is:
Figure FDA0003949792230000041
Figure FDA0003949792230000042
wherein, mut P (u i ,u j ) Representing user u on a native topic j For user u i Influence of, mut D (u i ,u j ) Representing users u in derived topics j For user u i Influence of (4), pro P (u i ,u j ) Indicates the first benefit, pro D (u i ,u j ) A second benefit is indicated.
8. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the topic influence adjacency matrix is expressed as:
Figure FDA0003949792230000043
wherein the content of the first and second substances,
Figure FDA0003949792230000044
a adjacency matrix representing the influence of the topic,
Figure FDA0003949792230000045
Figure FDA0003949792230000046
Figure FDA0003949792230000047
representing the influence of topic propagation behavior, mut, among users P (u i ,u j ) Representing user u on native topic j For user u i Influence of (3), mut D (u i ,u j ) Representing users u in derived topics j For user u i The influence of (c).
9. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.
10. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the expression of the DT-GCN model is:
Figure FDA0003949792230000051
wherein Z represents a category probability output by the user,
Figure FDA0003949792230000052
representing a preprocessed adjacency matrix, A representing an adjacency matrix, H 0 Which represents the input layer(s) of the device,
Figure FDA0003949792230000053
representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W i Is a weight matrix of the i-layer network.
CN202211444811.1A 2022-11-18 2022-11-18 Topic propagation prediction method based on topic association Pending CN115712772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211444811.1A CN115712772A (en) 2022-11-18 2022-11-18 Topic propagation prediction method based on topic association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211444811.1A CN115712772A (en) 2022-11-18 2022-11-18 Topic propagation prediction method based on topic association

Publications (1)

Publication Number Publication Date
CN115712772A true CN115712772A (en) 2023-02-24

Family

ID=85233873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211444811.1A Pending CN115712772A (en) 2022-11-18 2022-11-18 Topic propagation prediction method based on topic association

Country Status (1)

Country Link
CN (1) CN115712772A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628179A (en) * 2023-05-30 2023-08-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method
CN117635190A (en) * 2023-11-27 2024-03-01 河北数港科技有限公司 Log data analysis method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628179A (en) * 2023-05-30 2023-08-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method
CN116628179B (en) * 2023-05-30 2023-12-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method
CN117635190A (en) * 2023-11-27 2024-03-01 河北数港科技有限公司 Log data analysis method and system
CN117635190B (en) * 2023-11-27 2024-05-14 河北数港科技有限公司 Log data analysis method and system

Similar Documents

Publication Publication Date Title
Yuvaraj et al. Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification
Xiong et al. An emotional contagion model for heterogeneous social media with multiple behaviors
CN110795641B (en) Network rumor propagation control method based on representation learning
CN106651030B (en) Improved RBF neural network hot topic user participation behavior prediction method
CN115712772A (en) Topic propagation prediction method based on topic association
CN110807556B (en) Method and device for predicting propagation trend of microblog rumors or/and dagger topics
CN110909529B (en) User emotion analysis and prejudgment system of company image promotion system
CN103064917A (en) Specific-tendency high-influence user group discovering method orienting microblog
Ma et al. Mixed information flow for cross-domain sequential recommendations
Wang et al. A multidimensional network link prediction algorithm and its application for predicting social relationships
Li et al. Network embedding enhanced intelligent recommendation for online social networks
Xiao et al. User behavior prediction of social hotspots based on multimessage interaction and neural network
Huang et al. Information fusion oriented heterogeneous social network for friend recommendation via community detection
Yu et al. Collaborative group embedding and decision aggregation based on attentive influence of individual members: A group recommendation perspective
Wu et al. Unlocking author power: On the exploitation of auxiliary author-retweeter relations for predicting key retweeters
Zhang et al. LBCF: A link-based collaborative filtering for overfitting problem in recommender system
Liu et al. A reliable cross-site user generated content modeling method based on topic model
Thiriot et al. USING ASSOCIATIVE NETWORKS TO REPRESENT ADOPTERS'BELIEFS IN A MULTIAGENT MODEL OF INNOVATION DIFFUSION
CN113919440A (en) Social network rumor detection system integrating dual attention mechanism and graph convolution
Yang et al. A model for early rumor detection base on topic-derived domain compensation and multi-user association
CN115495671A (en) Cross-domain rumor propagation control method based on graph structure migration
CN112269945B (en) Information propagation prediction method based on rumor splitting rumor promotion and three-way cognitive game
Yan et al. Tackling the achilles heel of social networks: Influence propagation based language model smoothing
Yang et al. Topic-Aware Popularity and Retweeter Prediction Model for Cascade Study
Lim et al. Estimating domain-specific user expertise for answer retrieval in community question-answering platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination