CN115712772A

CN115712772A - Topic propagation prediction method based on topic association

Info

Publication number: CN115712772A
Application number: CN202211444811.1A
Authority: CN
Inventors: 余翔; 周心明; 庞育才; 段思睿; 王蓉; 肖云鹏; 李暾
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-02-24

Abstract

The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association; the method comprises the following steps: obtaining topic information, and extracting internal attributes and external attributes of the topic information; selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector; according to all internal attributes and part of external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix; extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user; the method can help public opinion departments to quickly take targeted measures and has good application prospect.

Description

Topic propagation prediction method based on topic association

Technical Field

The invention belongs to the field of network public opinion analysis, and particularly relates to a topic propagation prediction method based on topic association.

Background

Generally, a topic refers to a hot problem which is most concerned by the public within a certain time and a certain range. Various topics are widely spread in social networks, and bear a large number of speech and information behaviors of network users. The topic data reflects the user interests, behaviors and social relations, and information recommendation can be effectively carried out by utilizing the data for research. Meanwhile, in the real world, information contained in a topic is truly and falsely mixed, and when the topic is widely spread in a social group, a series of influences are exerted on the cognition of people and the stability of the society.

With the development of the internet, the way in which topics are propagated has changed dramatically. On one hand, the social platforms developed vigorously, such as micro blogs, micro mails and forums, provide information communication channels which span space, time and regions and cover the whole population for topic propagation. On the other hand, internet users sink to enable the composition of network groups to be more diversified, the flow and the spread of information to be flatter, and the development and the derivation of topics to be more complicated. This means that compared with the traditional information transmission mode, the topic transmission speed is faster, the influence is wider, and the form is more complex nowadays. Therefore, for the research on the topic spreading situation, the information spreading characteristics can be better understood, and meanwhile, the method has important significance for preventing emergencies and public opinion management and control.

In recent years, scholars have conducted a series of researches on the propagation situation of topics in social networks from multiple dimensions and have achieved remarkable effects; with the continuous increase of data volume of the social networking platform and the maturity of deep learning technology, the prediction of topic propagation based on a neural network and a deep learning model is favored. However, many challenges remain with topic dissemination, such as: 1. relevance, complexity of the derived topic feature space. The derived topic is evolved from a primary topic, and compared with a single topic, the features of the derived topic and the primary topic are mutually interwoven, and information dynamic exchange is continuously carried out, so that the method is a challenge on how to effectively extract topic features; 2. complex associations of users in the process of propagation of the native-derived topics. In the topic transmission process, the primary topic and the derived topic are mutually played, how to quantify the user influence of the primary topic and the derived topic, and the problem that the hidden relation among users is urgently needed to be solved is excavated; 3. the stage and the timeliness of the dynamic evolution of the derived topics. The evolution trend of the derived topics is influenced by the original topics while dynamically changing along with the time, the topic states alternately evolve, and how to dynamically analyze the propagation situation of the derived topics is the difficulty faced by the current research.

Therefore, the invention provides an information propagation prediction method based on topic association, and the derived topics are introduced, so that the propagation situation of the topics can be effectively predicted, and the association and game relation of the original topics and the derived topics in the propagation process can be more truly reflected.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a topic propagation prediction method based on topic association, which comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;

the process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following steps:

s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set;

s2: selecting user interest feature keywords and user cognitive feature keywords from the user topic content set, and performing vector representation on the selected keywords to obtain a user historical behavior feature vector;

s3: according to all internal attributes and the friend driving power and the topic popularity of the user in the external attributes, influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;

s4: extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user;

s5: inputting the historical behavior feature vector of the user and the network structure feature vector of the user into a DT-GCN model to obtain a prediction result of the topic propagation of the user, wherein the prediction result comprises whether the user participates in the topic propagation and the type of the topic which the user participates in.

Preferably, the process of selecting the user interest feature keywords and the user cognitive feature keywords from the user topic content set by using the DTR2vec algorithm includes:

s21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key _pre Deriving topic keyword set keys _deri And user content keyword set key _user ；

S22: computing a set of native topic keywords key _pre And derived topic subject matter keyword set key _deri The degree of association of (c); calculating the similarity of the original topic content set and the derived topic content set, and putting two keywords with the relevance greater than the similarity into the topic associated feature word set key _com ；

S23: key set from native topic keywords _pre And a key set of key words of the topic subject matter _deri Feature word set key associated with topic _com Computing user content keyword set keys _user Interest weight and cognition weight of;

s24: root of each otherKey set from user content keywords according to interest weight and cognitive weight _user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.

Further, a BM25 algorithm is adopted to calculate the similarity between the original topic content set and the derived topic content set, and the formula is as follows:

wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W _i Representing word weight, q _i Represents the ith word in the set Q, n represents the total number of words in the set Q of the native topic content, R (Q) _i And d) represents the ith word Q in the set Q _i And the degree of correlation of the set d.

Further, a user content keyword set key is calculated _user The formula of interest weight and cognitive weight of (a) is:

wherein, w _i ， _inter Representing the interest weight, w, of the ith word in the user's content keyword set _i ， _cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set _i,u ,key _deri ) Set key for representing ith candidate keyword and topic keyword _deri Similarity between them, sim (key) _i,u ,key _com ) Representing feature word set key for indicating association between ith candidate keyword and topic _com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) _i,u ,key _pre ) Set key representing ith candidate keyword and native topic keyword _pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.

Preferably, the process of quantifying the influence of the topic on the user by the evolutionary game theory comprises the following steps:

s31: calculating internal influence according to the internal attribute, and calculating external influence according to the friend drive and topic popularity of the user;

s32: calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence;

s33: defining two game strategies, calculating first benefits according to the first game strategy and the influence of the primary topic, and calculating second benefits according to the second game strategy and the influence of the derived topic;

s34: calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;

s35: and calculating a topic influence adjacency matrix according to the topic propagation behavior influence of the users in the original topic and the topic propagation behavior influence of the users in the derived topic.

Further, the formula for calculating the internal influence and the external influence is as follows:

f _in (u _i )＝Act(u _i )×Ret(u _i )×Pre(u _i )

wherein f is _in (u _i ) Representing user u _i Internal influence of f _out (u _i ,u _j ) Representing user u _j For user u _i External influence of, act (u) _i ) User u _i Activity of, ret (u) _i ) Representing user u _i Historical forwarding rate of, pre (u) _i ) Representing user u _i In the wordThe perception rate of the subject is determined,

representing user u _j For user u _i The friend(s) is dynamic, and Hot (t) represents the topic popularity at the current time t.

Further, the formula for calculating the influence of the topic propagation behavior of the user in the original topic and the influence of the topic propagation behavior of the user in the derived topic is as follows:

wherein Mut _P (u _i ,u _j ) Representing user u on native topic _j For user u _i Influence of (3), mut _D (u _i ,u _j ) Representing users u in derived topics _j For user u _i Influence of (4), pro _P (u _i ,u _j ) Denotes the first benefit, pro _D (u _i ,u _j ) A second benefit is indicated.

Further, the topic influence adjacency matrix is represented as:

wherein the content of the first and second substances,

a adjacency matrix representing the influence of the topic,

represents topic propagation behavior influence, mut, among users _P (u _i ,u _j ) Representing user u on native topic _j For user u _i Influence of, mut _D (u _i ,u _j ) Representing users u in derived topics _j For user u _i The influence of (c).

Preferably, the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector includes: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.

Preferably, the expression of the DT-GCN model is as follows:

wherein Z represents a category probability output by the user,

representing a preprocessed adjacency matrix, A representing an adjacency matrix, H ⁰ The input layer is represented by a representation of,

representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W ⁱ Is the weight matrix of the i-layer network.

The beneficial effects of the invention are as follows: according to the method, the hidden relation between the original-derived topics is utilized to quantify the user interest and the cognitive process, the influence of the propagation process of mutual promotion and inhibition of the associated topics on the user behavior is focused, and a topic propagation prediction model based on topic association is constructed by combining topic characteristics, user characteristics and a strong neural network; the method introduces the associated topics, can effectively predict the propagation situation of the topics, and can more truly reflect the association and game relation of the original topics and the derived topics in the propagation process, so that the prediction effect is more in line with the actual situation, and the accuracy is high.

Drawings

FIG. 1 is a schematic structural diagram of a topic propagation prediction method based on topic association in the present invention;

FIG. 2 is a schematic diagram of a process of extracting network structure feature vectors of users according to the present invention;

FIG. 3 is a schematic diagram of the DT-GCN model structure of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a topic propagation prediction method based on topic association, as shown in fig. 1, the method comprises the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of the user to the topic; controlling the topic propagation trend according to the topic propagation trend of the user;

the topic information can be obtained from a public data website or by utilizing a mature social network public API, and comprises historical behavior information, topic participation information and user basic attribute information of all participating users of the original-derived topics in the life cycle of the users, the historical behavior information of the users comprises information such as user historical forwarding and comments, the topic participation information comprises information such as the time when the original-derived topics are forwarded and the comments are made, and the user basic attribute information comprises friend relation information of the participating users.

The topic information is preprocessed, specifically, the topic information is subjected to simple data cleaning, most unstructured data are structured, abnormal values or null values do not appear any more, and inconvenience brought to subsequent calculation is reduced.

The process of processing the preprocessed topic information based on the topic propagation prediction model of topic association comprises the following contents:

s1: extracting internal attributes and external attributes of the topic information; the internal attributes comprise user activity, user topic perception rate and user historical forwarding rate, and the external attributes comprise user friend drive, topic popularity and user topic content set.

User activity Act (u) _i )：

The user activity is the positive degree of the user forwarding behavior relative to other users, and the higher the user forwarding positivity is, the higher the number of times of forwarding in a period of time is. The invention defines users u _i The activity of (A) is as follows:

wherein, num [ orig (u) ] _i )]Representing a user u during a time period T before a outbreak of a derived topic _i Number of original hair beats of (1), num [ retw (u) _i )]Representing a user u during a time period T before a outbreak of a derived topic _i The number of forwarded beats; since the predicted probability of the user forwarding the topic is the sigma epsilon [0,1]Representing the de-emphasis factor.

User topic perception rate Pre (u) _i )：

The topic perception rate of the user reflects the probability of the user contacting a new topic, is reflected by the attention number of the user, and reflects the probability of the user participating when the new topic appears to a certain extent, and the topic perception rate of the user is defined as follows:

of these, fol (u) _i ) Representing user u _i Fol _ave (net) represents the average number of interests of all users in the social network.

User historical forwarding rate Ret (u) _i )：

The historical forwarding rate of the user reflects the user's tendency to forward behavior to some extent. The main source for the user to acquire the topic is the information of friends. Therefore, the invention defines the historical forwarding rate of the user as:

among them, retwnNum (u) _i ) Representing the number of historical forwarded microblogs of the user, getRetNum (u) _i ) And the number of all the microblogs forwarded by the user from the friend is represented, and the friend is the user concerned by the user.

User friend driving force

Under a topic network, users usually participate in a certain topic under the influence of the propagation behavior of concerned users, and the higher the interaction frequency among the users is, the higher the driving force among the users is, and the higher the probability of mutually forwarding the topics is. Different friends of the user have different power to the friend, and the friend is structured as follows:

wherein the content of the first and second substances,

representing user u _j For user u _i The good friends of the user are provided with power,

representing user u _i Forwarding the average number of original microblogs of friend users, count (Fri) is used for representingThe friends number of the user, if

Or user u _j Not user u _i Good friends of (1), then

Topic heat Hot (t):

the topic popularity is reflected in topic forwarding, comments, praise and the like in the social network, and can rapidly rise in a short time, but can rapidly fall after the popularity reaches the top. Considering that this process is similar to the half-life of an element, a half-life function is introduced

Defining the topic heat as:

hot (t) represents the heat degree of the topic at the current time t, retNum (t) and RetNum (t-1) respectively represent the forwarding amount of the topic till the current time and the previous time, t' represents the time when the initial topic is generated, and w represents a regularization factor.

User topic content set TInfo (t):

in the process of spreading topics, due to the uniqueness of people, the ideas of users facing the same topic are different, and the comments are different, so the topic comments can reflect topic attributes and characteristics, and meanwhile, the topic characteristics can be changed along with the spreading of the topics, and the topic content set is represented as follows:

TInfo(t)＝{(u _i ，info)|u _i ∈U}

wherein info (t) represents user u in topic propagation space within time period t _i The comments made. The native topic content set is denoted TInfo _p (t), the set of derived topic content is denoted TInfo _d (t)。

S2: selecting user interest characteristic keywords and user cognition characteristic keywords from a user topic content set by adopting a DTR2vec algorithm (a derived topic representation learning algorithm based on topic association), and carrying out vector representation on the selected keywords to obtain a user historical behavior characteristic vector.

The DTR2vec algorithm designed by the invention firstly utilizes an LDA (latent Dirichlet distribution) topic identification model to construct the correlation characteristics of the primary-derived topics and the user characteristics, then extracts the cognitive accumulation and the interest degree of the user according to the state transition of the user to the primary-derived topics, and finally utilizes representation learning to vectorize the low dimension of the user.

S21: the user topic content set comprises a native topic content set, a derived topic content set and a user social content set; respectively inputting the three content sets into an LDA topic identification model to obtain a primary topic keyword set key _pre Deriving topic keyword set keys _deri And user content keyword set key _user 。

Extracting the correlation characteristics of the original topic and the derived topic by using an LDA model, specifically, forming content expressed in the form of an article by regarding microblog content issued and forwarded by a user as a paragraph, and dividing the content into an original topic content set, a derived topic content set and a user social content set; and respectively regarding the original topic content set, the derived topic content set and the user social content set as an article, and respectively processing the three content sets by adopting an LDA (latent dirichlet allocation) model, wherein the optimal number of topics is obtained by performing multiple clustering experiments by utilizing different numbers of topics.

Through LDA model processing, a primary topic keyword set key is obtained _pre Deriving topic keyword set keys _deri And user content keyword set key _user 。

S22: computing native topic keyword set keys _pre And derived topic subject matter keyword set key _deri The degree of association of (c); calculating the similarity of the native topic content set and the derived topic content set, and enabling the association degree to be larger than two key degrees of similarityTopic putting word association feature word set key _com 。

The derived topics are developed and changed from the original topics, so that necessary association exists between the original topics and the derived topics. The similarity score of the original topic content set and the derived topic content set is obtained by utilizing a BM25 (best match) algorithm and is used as a correlation degree threshold value of the original-derived topic, and the calculation formula is as follows:

wherein Score (Q, d) represents the similarity Score of the original topic content set and the derived topic content set, Q represents the original topic content set, d represents the derived topic content set, W _i Representing word weight, q _i Represents the ith word in the set Q, n represents the total number of words in the set Q of native topic content, R (Q) _i And d) represents the ith word Q in the set Q _i And the degree of correlation of the set d.

Computing key set key of native topic by utilizing cosine computing formula _pre And derived topic subject matter keyword set key _deri The correlation degree between the two groups is calculated by the following formula:

wherein, X _i ，Y _i Vectors representing keywords key1, key2, respectively, and m represents a keyword vector dimension.

Putting two keywords with relevance degrees larger than similarity degrees into topic associated feature word set key _com 。

S23: set of keys according to native topic keywords _pre And key set key of topic subject matter _deri Relevance degree and topic associated feature word set key _cpm Computing user content keyword set keys _user Interest weight and cognitive weight.

Whether a user forwards a topic is closely related to the interest and cognition of the user on the topic, and considering that the user accumulates certain cognition on the topic and weakens the common characteristic interest of the original-derived topic at the same time after the original topic appears, the method selects key words by using cosine distance as weight, and the formula for calculating the interest weight and the cognition weight is as follows:

wherein w _i，inter Weight of interest, w, representing the ith word in the user content keyword set _i ， _cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set _i,u ,key _deri ) Set key for representing ith candidate keyword and topic keyword _deri Similarity between them, N represents the total number of words in the user content keyword set, sim (key) _i,u ,key _com ) Representing feature word set key for indicating association between ith candidate keyword and topic _com Similarity between them, sim (key) _i,u ,key _pre ) Set key representing ith candidate keyword and native topic keyword _pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.

S24: collecting key from user content keywords according to interest weight and cognitive weight respectively _user Top-k keywords are selected as user interest characteristic keywords and user cognition characteristic keywords.

Vector representation is carried out on the selected keywords (the user interest feature keywords and the user cognitive feature keywords) based on a Doc2vec algorithm, and a user historical behavior feature vector D is output and represented as follows:

D＝K×F ^a (u)

wherein K represents the number of users under the derived topic, F ^a (u) represents a corresponding applicationUser interest feature vectors and user cognitive feature vectors.

S3: and according to the driving power and the topic popularity of the user friends in all the internal attributes and the external attributes, the influence of the evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix.

The original topic and the derived topic are mutually gambled in the process of propagation, so when predicting the propagation behavior of the user, the influence of interaction between the topics should be considered. The invention introduces the evolutionary game theory to quantify the influence of the primary-derived topics on the users, and the process is as follows:

s31: and calculating the internal influence according to the internal attribute, and calculating the external influence according to the friend drive and the topic popularity of the user.

As shown in fig. 2, the internal influence is composed of internal attributes, and the external influence is composed of external attributes, and the calculation formula is as follows:

f _in (u _i )＝Act(u _i )×Ret(u _i )×Pre(u _i )

wherein f is _in (u _i ) Representing user u _i Internal influence of f _out (u _i ，u _j ) Representing user u _j For user u _i External influence of, act (u) _i ) User u _i Activity of (c), ret (u) _i ) Representing user u _i Historical forwarding rate of, pre (u) _i ) Representing user u _i Topic perception rate of (1).

S32: and calculating the influence of the original topic and the influence of the derived topic by adopting a multiple linear regression algorithm according to the internal influence and the external influence.

And (3) integrating internal influence factors and external factors, and constructing topic influence functions of the original topic and the derived topic by using a multiple linear regression algorithm:

wherein, inf _Pre (u _i ，u _j ) Representing user u _j For user u _i Influence of the primary topic of, inf _dri (u _i ，u _j ) Representing user u _j For user u _i The influence of the derived topic of (a),

representing user u _j For user u _i The external influence of the primary topic of (a),

representing user u _j For user u _i External influence of the derived topic of (1), ρ ₀ 、ρ ₁ 、ρ ₂ The first partial regression coefficient, the second partial regression coefficient and the third partial regression coefficient are obtained by training a multiple linear regression algorithm respectively.

S33: two game strategies are defined, a first benefit is calculated according to the first game strategy and the influence of the primary topic, and a second benefit is calculated according to the second game strategy and the influence of the derived topic.

According to the principle of game theory, the invention defines two game strategies: strategy 1: "forward native topic", policy 2: "forward the derived topic". By P ₁ 、P ₂ The method comprises the following steps of respectively representing the proportion of forwarding original topics and derived topics in adjacent users of a target user, wherein the revenue functions of the two strategies are as follows:

Pro _P (ui，u _j )＝P ₁ ×Inf _p (u _i ，u _j )

Pro _D (u _i ，u _j )＝P ₂ ×Inf _d (u _i ，u _j )

wherein Pro is _P (u _i ，u _j ) Representing a first benefitI.e. the income, pro, obtained by the user forwarding the native topic _D (u _i ，u _j ) And expressing the second benefit, namely the benefit obtained by the user for forwarding the derived topic.

S34: and calculating the topic propagation behavior influence of the user in the original topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income.

Wherein Mut _P (u _i ,u _j )、Mut _D (u _i ,u _j ) Representing native and derived topic users u after an evolutionary game, respectively _j For user u _i The topic propagation behavior influence of (1).

S35: and calculating a topic influence matrix according to the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic.

Obtaining the final native-derived topic influence adjacency matrix in consideration of the competitiveness of the native and derived topics

Wherein the content of the first and second substances,

the topic propagation behavior influence among users is shown, and if i = j, the topic propagation behavior influence is shown

S4: and extracting network structure features according to the topic influence adjacency matrix to obtain a network structure feature vector of the user.

Considering the complexity of the social network structure, the invention uses an SDNE (graph network embedding algorithm) method for network representation learning, and directly uses a native-derived topic influence adjacency matrix

As input, the network structure feature vector is output as a user, expressed as:

S＝K×F ^b

wherein K is the number of network nodes, namely the number of users under the derived topic, F ^b Is the social structure feature vector of the corresponding user.

The prediction task aims at predicting whether a potential user node can participate in forwarding of a related topic, judging whether the user node forwards or not, and if the user node forwards the related topic, forwarding a primary topic or a derived topic so as to convert the primary topic into a three-classification task; considering that the direct splicing of two types of features can cause overlong model input, as shown in fig. 3, the invention designs a DT-GCN (graph convolution neural network based on associated topics) model, which adds a CNN (convolution neural network) layer in front of the GCN (graph convolution neural network) model; the processing process of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises the following steps: constructing a primary topic and derived topic propagation mixed network according to topic information, obtaining an adjacency matrix according to connection information between nodes in the primary topic and derived topic propagation mixed network, inputting a user historical behavior feature vector and a user network structure feature vector into a CNN network for convolution, and obtaining a feature matrix; inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function, wherein the softmax function converts the convolution output into probability values of different nodes in different classifications, so as to obtain a spreading prediction result of the user on the topic.

Preprocessing adjacency matrices in GCN networks

And calculating according to the adjacency matrix to obtain:

wherein, A represents an original adjacency matrix obtained by spreading connection information between nodes in the hybrid network according to the original topic and the derived topic, and D represents a degree matrix of the adjacency matrix.

The expression of the DT-GCN model is as follows:

wherein Z represents a category probability output by the user; reLU (x) = max (0,x) represents an activation function;

the added drop layer is shown to randomly sample parameters according to a certain probability p, namely, the jth neuron of the ith layer in the network layer is sampled, and values are discarded to prevent the model from being over-fitted;

it is shown that the activation function is,

representing the adjacency matrix with preprocessing, A representing the adjacency matrix, H ¹ Representing the characteristic attribute of each layer of nodes, for input layer H ⁰ I.e. the user's historical behavior feature vector and the user's network structure feature vector, ReLU represents the activation function, CNN _ model () represents the CNN network, W ⁱ Is the weight matrix of the i-layer network.

The invention discusses a three-classification prediction problem, so that the model output Z = P (o, alpha, d | u) _i ) The specific definition is as follows:

wherein, P (o | u) _i ) Representing the probability of a user forwarding a native topic, P (a | u |) _i ) Denotes the probability that the user is not participating in the topic, P (d | u) _i ) Representing the probability of forwarding the derived topic by the user, and if the corresponding Y =1, judging the potential user u _i Forwarding the native topic in the next time period; if Y = -1, judging potential user u _i Forwarding the derived topic in the next time period; otherwise, potential user u _i And not participating in the hot topic forwarding in the next time period.

The topic propagation prediction method based on topic association provided by the invention firstly provides a DTR2vec algorithm from the association relation among the characteristics of the primary-derived topic in different evolution stages, and the cognitive accumulation and the interest degree of a user are expressed and learned, so that the topic characteristic space is low in vectorization. Then, considering the antagonism and the symbiosis of the primary-derived topics in the propagation process, introducing an evolutionary game theory, and forming a network structure characteristic representation of the influence of the primary-derived topics among users; and finally, analyzing the association and game relation of the primary-derived topics in the spreading process, and predicting and analyzing the topic spreading situation by integrating the information.

The method and the system predict the propagation trend of the topic by predicting the propagation trend of the user to the topic, can be applied to user personalized recommendation and marketing, and are beneficial to mining the forwarding preference of the user, accurately putting advertisements and formulating a propaganda plan. Meanwhile, forwarding and propagation trends of monitoring hot topics can be known in advance, and public opinion departments can be helped to quickly take targeted measures aiming at bad information, so that the network environment is purified, and correct value guidance can be established in the society by leading to the development of positive energy topics.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A topic propagation prediction method based on topic association is characterized by comprising the following steps: obtaining topic information and preprocessing the topic information; inputting the preprocessed topic information into a topic propagation prediction model based on topic association, and predicting the propagation trend of a user to topics; controlling the topic transmission trend according to the topic transmission trend of the user;

s2: selecting user interest characteristic keywords and user cognitive characteristic keywords from a user topic content set by adopting a DTR2vec algorithm, and performing vector representation on the selected keywords to obtain a user historical behavior characteristic vector;

s3: according to the driving force and the topic popularity of the user friends in all internal attributes and external attributes, influence of an evolutionary game theory on the user is quantified to obtain a topic influence adjacency matrix;

2. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of selecting user interest feature keywords and user cognition feature keywords from a user topic content set using a DTR2vec algorithm comprises:

s24: collecting key from user content keywords according to interest weight and cognitive weight respectively _user Top-k keywords are selected as user interest characteristic keywords and user cognitive characteristic keywords.

3. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the BM25 algorithm is used to calculate the similarity between the original topic content set and the derived topic content set, and the formula is:

4. The topic propagation prediction method based on topic association as claimed in claim 2 wherein the user content keyword set key is calculated _user The formula of interest weight and cognitive weight of (a) is:

wherein, w _i，inter Weight of interest, w, representing the ith word in the user content keyword set _i，cong Representing the cognitive weight, sim (key), of the ith word in the user content keyword set _i，u ，key _deri ) Set key for representing ith candidate keyword and topic keyword _deri Similarity between them, sim (key) _i，u ，key _com ) Representing feature word set key for indicating association between ith candidate keyword and topic _com Similarity between them, N represents the total number of words in the user content keyword set, sim (key) _i，u ，key _pre ) Set key representing ith candidate keyword and native topic keyword _pre The similarity between the two topics is represented by t, t 'and w, wherein t represents the current moment, t' represents the initial topic generation moment, and w represents a regularization factor.

5. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the process of quantifying the influence of topics on users using evolutionary game theory comprises:

s34: calculating the topic propagation behavior influence of the user in the native topic and the topic propagation behavior influence of the user in the derived topic according to the first income and the second income;

6. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the internal influence and the external influence is:

f _in (u _i )＝Act(u _i )×Ret(u _i )×Pre(u _i )

wherein f is _in (u _i ) Representing user u _i Internal influence of f _out (u _i ，u _j ) Representing user u _j For user u _i External influence of, act (u) _i ) User u _i Activity of, ret (u) _i ) Representing user u _i Historical forwarding rate of, pre (u) _i ) Representing user u _i The rate of topic perception of (a),

7. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the formula for calculating the influence of topic propagation behavior of users in the original topic and the influence of topic propagation behavior of users in the derived topic is:

wherein, mut _P (u _i ，u _j ) Representing user u on a native topic _j For user u _i Influence of, mut _D (u _i ，u _j ) Representing users u in derived topics _j For user u _i Influence of (4), pro _P (u _i ，u _j ) Indicates the first benefit, pro _D (u _i ，u _j ) A second benefit is indicated.

8. The topic propagation prediction method based on topic association as claimed in claim 5 wherein the topic influence adjacency matrix is expressed as:

wherein the content of the first and second substances,

a adjacency matrix representing the influence of the topic,

representing the influence of topic propagation behavior, mut, among users _P (u _i ，u _j ) Representing user u on native topic _j For user u _i Influence of (3), mut _D (u _i ，u _j ) Representing users u in derived topics _j For user u _i The influence of (c).

9. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the processing procedure of the DT-GCN model on the user historical behavior feature vector and the user network structure feature vector comprises: spreading connection information between nodes in a hybrid network according to the original topics and the derived topics to obtain an adjacency matrix, and inputting the user historical behavior feature vector and the user network structure feature vector into a CNN network for convolution to obtain a feature matrix; and inputting the characteristic matrix and the adjacency matrix into a GCN, adding a drop layer into the GCN, and finally processing by adopting a softmax function to obtain a topic propagation prediction result of the user.

10. The topic propagation prediction method based on topic association as claimed in claim 1 wherein the expression of the DT-GCN model is:

wherein Z represents a category probability output by the user,

representing a preprocessed adjacency matrix, A representing an adjacency matrix, H ⁰ Which represents the input layer(s) of the device,

representing the random sampling of parameters by the added drop layer according to a certain probability, CNN _ model () representing a CNN network, W ⁱ Is a weight matrix of the i-layer network.