CN110336700B - Microblog popularity prediction method based on time and user forwarding sequence - Google Patents

Microblog popularity prediction method based on time and user forwarding sequence Download PDF

Info

Publication number
CN110336700B
CN110336700B CN201910621977.8A CN201910621977A CN110336700B CN 110336700 B CN110336700 B CN 110336700B CN 201910621977 A CN201910621977 A CN 201910621977A CN 110336700 B CN110336700 B CN 110336700B
Authority
CN
China
Prior art keywords
user
time
microblog
popularity
forwarding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910621977.8A
Other languages
Chinese (zh)
Other versions
CN110336700A (en
Inventor
黄宏宇
刘海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910621977.8A priority Critical patent/CN110336700B/en
Publication of CN110336700A publication Critical patent/CN110336700A/en
Application granted granted Critical
Publication of CN110336700B publication Critical patent/CN110336700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于时间及用户的转发序列的微博流行度预测模型,属于涉及社交网络中消息流行度预测领域,包括以下步骤:S1:利用循环神经网络对微博的转发序列进行建模,用来捕获消息传播过程的长距离依赖;S2:将隐藏层的输出结果进行非线性变换网络,学习在传播过程中每个时间步的速率;S3:利用速率得到的早期趋势加速度和早期的流行度,并在用户活跃度的优化下,对微博未来的流行度进行预测。本发明保证了在消息传播的早期更精准地预测其在未来的流行趋势,该模型既利用了历史传播信息,又很好地刻画了微博的传播过程。

Figure 201910621977

The invention relates to a microblog popularity prediction model based on time and user's forwarding sequence, which belongs to the field of message popularity prediction in social networks. , used to capture the long-distance dependence of the message propagation process; S2: nonlinearly transform the output of the hidden layer to learn the rate at each time step in the propagation process; S3: use the rate to obtain the early trend acceleration and the early Popularity, and under the optimization of user activity, predict the future popularity of Weibo. The present invention ensures that the popular trend in the future can be more accurately predicted in the early stage of news dissemination, and the model not only utilizes historical dissemination information, but also describes the dissemination process of microblog well.

Figure 201910621977

Description

Microblog popularity prediction method based on time and user forwarding sequence
Technical Field
The invention belongs to the field of message popularity prediction in social networks, and relates to a microblog popularity prediction model 5 based on time and a forwarding sequence of a user
Background
The popularity and cheapness of web2.0 services have changed the way content is generated and consumed online. In recent years, internet technology is rapidly developing, and with the rapid rise and popularization of the internet, our lives cannot leave the network at present. Due to the network, content producers can reach an unimaginable audience using traditional channels, and services involving video, photo, music sharing, weblogs, social bookmarking sites, collaboration portals, and content submission, browsing, conducting ratings and discussions of content news aggregators, etc., are implemented worldwide. Social networking services, represented by Facebook, Twitter, microblog, WeChat, etc., play an important role in propagating hot spot incidents, and users rely on these social networks to receive updates for personal and global hot news.
Social networks have gradually emerged, and people increasingly like to publish their own speech and comment events on the internet. Social networks such as microblogs bring great convenience to people to acquire and share information. However, people are impacted by social networks while enjoying the benefits of the social networks, such as unrealistic messages and defamation spread by people on the internet, and if the messages are spread rapidly in the network, the judgment of people is affected, and people receive false information, so that unpredictable loss is caused. Therefore, if the fashion trend of the event can be predicted in advance in the early period of the event, public opinion control is well achieved for relevant government departments, and a company can greatly help to deal with the emergency in advance. The popularity prediction problem is a work with great value when the hot spots are exploded and the server is down. It is of great significance to network dimensions (e.g., caching and replication), online marketing (e.g., recommendation systems and media advertisements) or real-world outcome prediction (e.g., economic trends), emergency management, but is also a very difficult problem due to the structure of the social network itself and the large number of users.
Currently, the popularity prediction problem is generally solved by three methods. In detail, one is a machine learning method based on features, which adopts a classification or regression model to perform modeling, and the key point of the problem becomes the feature extraction, and the other is a method based on a point random process, which is used for modeling the message propagation process, can better depict the message propagation process and learn the message arrival process. The other is based on an infectious disease model, and a kinetic equation is used for expressing the message transmission rule. Classification or regression based models rely on feature extraction, do not characterize the process of message propagation, point random process based methods are deficient in performance and cannot adapt to every social network due to the diversity of social networks and do not take advantage of historical message supervision. Based on the analysis, a microblog popularity prediction model based on time and a forwarding sequence of a user is provided.
Disclosure of Invention
In view of the above, the present invention provides a microblog popularity prediction model based on time and a forwarding sequence of a user, which utilizes a recurrent neural network to model the forwarding sequence of a microblog and is used to capture long-distance dependence of a message propagation process, then performs a nonlinear transformation network on an output result of a hidden layer, learns a rate of each time step in the propagation process, and finally predicts future popularity of the microblog by using an early trend acceleration and an early popularity obtained by the rate under optimization of user liveness.
In order to achieve the purpose, the invention provides the following technical scheme:
a microblog popularity prediction model based on time and user forwarding sequence comprises
S1: modeling a microblog forwarding sequence by utilizing a recurrent neural network, and capturing long-distance dependence of a message propagation process;
s2: carrying out a nonlinear transformation network on the output result of the hidden layer, and learning the rate of each time step in the transmission process;
s3: and predicting the future popularity of the microblog by using the early trend acceleration and the early popularity obtained by the speed under the optimization of the activity of the user.
Further, step S1 includes the steps of:
s11: mapping of time vectors, converting each time composition unit into the length of the unit according to the unit at the upper stage, then setting the length of the unit in the vector, vectorizing user information, collecting historical microblog text information of each user in a microblog, aggregating the historical microblog text information into a document representing the user, aggregating all user documents into a document set, randomly generating topic-word distribution of each topic and document-topic distribution of each user microblog document, generating words in all documents according to the document-topic distribution and the topic-word distribution, continuously training the models according to Gibbs sampling of an LDA topic model, finally obtaining the topic distribution of each user document, and using the topic distribution as an interest vector of the user;
s12: splicing time and a user vector to be input as a whole, and performing embedding operation according to a certain rule;
s13: inputting the result of the step S12 into a recurrent neural network, inputting the result into a bottom RNN through an embedding layer for propagation training, solving the problem of gradient disappearance in a standard neural network by adopting an LSTM as the recurrent neural network, and finally obtaining hidden layer output of each time step through a forgetting gate, an input gate and an output gate;
the forget gate formula is:
ft=σ(Wf.[ht-1,xt]+bf),
wherein x istIs the input of the t-th layer, htHidden layer information, h, representing the current time stept-1Denotes hidden layer information at the previous time step, ". denotes multiplication of vectors, middle brackets denote that two vectors are connected and merged, σ (-) is a sigmoid activation function, WfAs a weight matrix, bfIs a bias vector.
The input gate and network status updates are:
it=σ(Wi.[ht-1,xt]+bi),
Figure BDA0002125794280000031
wherein, WCAnd bCRespectively representing a weight matrix and a bias vector, and tanh is a hyperbolic tangent function;
the output gate is:
ot=σ(WO.[ht-1,xt])+bo),ht=ot*tanh(Ct)
wherein, WOAnd boRespectively, the weight matrix and the bias parameters of the output gates.
Further, in step S2, the hidden layer output of the recurrent neural network is obtained, then nonlinear transformation is performed to obtain the propagation rate of the microblog at each forwarding time, the forwarding process of the message is modeled as a random point process, and the calculation formula is as follows:
vt=exp(Wmht+bm)
wherein, WmAs a weight matrix, bmAs a bias parameter, HtIs reflected in WmhtUpper, htIs the hidden layer information of the recurrent neural network and also represents the historical information in the sequence data.
Further, step S3 includes the steps of:
s31: the obtained rate function is used for calculating the propagation trend acceleration of the microblogs in the observation time, propagation trends of different types of microblogs are greatly different, and the propagation trend difference leads to future popularity, so that a feature capable of indicating the popularity trend change needs to be found and fused into a model, and the future popularity of the microblogs can be more accurately predicted, and the calculation formula is as follows:
Figure BDA0002125794280000032
wherein, TobsRepresenting observation time, n representing the number of elements in the forwarding sequence, viA rate function representing each forwarding instant;
s32: and quantifying the user activity to obtain the user activity of each time period on the microblog platform. The specific quantization formula is as follows:
Figure BDA0002125794280000033
n (t) represents the average number of microblogs issued by the user from the start time of a day to the current time t, η represents the average number of microblogs issued by the user in unit time on the microblog platform, and the unit time can be hours, minutes and seconds.
S33: dividing the trending acceleration and the popularity of the early message of step S31 by the user activity of step S32, respectively, yields a relative trending acceleration and a relative popularity, as follows:
Figure BDA0002125794280000041
then, combining the two to establish a linear regression model, wherein the calculation formula is as follows:
Figure BDA0002125794280000042
wherein, beta012Are model parameters.
The invention has the beneficial effects that: the method and the device ensure that the future fashion trend of the message is predicted more accurately in the early stage of message propagation, and the model not only utilizes historical propagation information, but also well describes the propagation process of the microblog.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a system diagram of a microblog popularity prediction model based on time and a user's forwarding sequence;
FIG. 2 is a user vector generation process in a forwarding sequence;
fig. 3 is a schematic diagram of the operation of an input vector by LSTM.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Before introducing the summary of the solution, 7 necessary concepts of the invention are presented.
Concept 1: predicting the popularity of the message, wherein the message refers to information generated in a social network, such as a microblog in a Sina microblog, and the popularity refers to a final result of the future propagation of the message and can be measured by the forwarding times of the microblog; message popularity refers to predicting the specific number of forwards a message will be in the future early in its publication.
Concept 2: a recurrent neural network is a neural network for processing sequence data, for example, time sequence data refers to data collected at different time points, and such data reflects the changing state or degree of a certain object, phenomenon, etc. with time. The invention is an LSTM network, and the idea of LSTM is to reasonably utilize three gates. The first is a forgetting gate which is responsible for controlling to continuously save the state of the long-term unit; the second is an input gate which is responsible for controlling the input of the network at the current moment to the long-term unit state; the third is an output gate which is responsible for controlling whether the long term cell state is taken as the current LSTM output.
Concept 3: the topic model is a method for modeling texts and learning the implicit topic distribution in the texts, overcomes the defects of a document similarity calculation method in the traditional information retrieval, and can automatically find out semantic topics among characters in massive Internet data.
Concept 4: the linear regression model, which is mainly a learning linear model, aims to predict the output of input values almost accurately. In this model, the dependent variable is continuous, and the independent variable may be continuous or discrete. If only one independent variable and one dependent variable are included and the relationship can be approximately represented by a straight line, the analysis is called unary linear regression analysis. If two or more independent variables are included in the regression analysis and the dependent variable and the independent variable are in a linear relationship, it is referred to as a multiple linear regression analysis.
Concept 5: the point random process is called a point process on a positive real number domain by setting that the forwarding time in a certain microblog forwarding sequence is a non-negative random variable generated according to a time sequence, and the definition formula is as follows:
Figure BDA0002125794280000051
wherein HtThe historical propagation process between the forwarding moments t is shown, the above formula shows the relation of the rate changing along with the time in the microblog propagation process, and H is addedtBecause it is considered that the current forwarding action is influenced by the history propagation process.
Concept 6: the observation time, the time elapsed when the message publication was propagated for a period of time before the prediction began.
Concept 7: the popularity of messages tends to stabilize for the time that does not grow any longer.
The invention provides a microblog popularity prediction model based on time and a user forwarding sequence, which takes the information of a Xinlang microblog source microblog and subsequent forwarded microblog information as training sets and can more accurately predict the future popularity of the microblog after training. The model is modeled by utilizing a forwarding sequence of the microblog, the purpose of predicting the future popularity of the message is finally achieved, the model is totally divided into three parts, as shown in figure 1, in the first part, the forwarding sequence of the microblog is modeled by utilizing a recurrent neural network and is used for capturing the long-distance dependence of the message propagation process; the second part carries out nonlinear transformation network on the output result of the hidden layer and learns the speed of each time step in the transmission process; and the third part predicts the future popularity of the microblog by using the early trend acceleration and the early popularity obtained by the speed under the optimization of the activity of the user.
1. The first part comprises the following three steps:
step 1: the mapping of the time vector, for each time component unit, converts to the length of the unit according to the unit of the upper level, and then sets its length in the vector. For example, in the unit of minute, the unit of the upper level is hour, and one hour has 60 minutes, so according to the above definition, the length of the minute in the vector is 60, and given a time at will, the minute time in the time vector can be known, and the number m is obtained by taking the modulus of the length of the unit, and then the m-th position of the corresponding unit in the time vector is 1, and the rest positions are 0, so that the numerical value of the minute can be represented in the time vector. Vectorizing user information, collecting historical microblog text information of each user in a microblog, aggregating the historical microblog text information into a document representing the user, aggregating all user documents into a document set, randomly generating topic-word distribution of each topic and document-topic distribution of each user microblog document, generating words in all documents according to the document-topic distribution and the topic-word distribution, continuously performing model training according to Gibbs sampling of an LDA topic model, finally obtaining the topic distribution of each user document, and using the topic distribution as an interest vector of the user, wherein the specific process is shown in FIG. 2.
Step 2: the time and the user vector are spliced together to be input as a whole, and embedding operation is carried out according to a certain rule.
And step 3: and (3) inputting the result of the step (2) into a recurrent neural network, and then inputting the result into a bottom RNN through an embedding layer for propagation training, wherein the standard recurrent neural network has the gradient disappearance problem, and in order to solve the problem, a LSTM based on a door mechanism can be adopted. The LSTM is characterized in that the output of the hidden layer depends not only on the current input but also on the output of the previous layer, and the output of the hidden layer is obtained through the forgetting gate, the input gate and the output gate, and the specific process is shown in fig. 3. The forget gate formula is: f. oft=σ(Wf.[ht-1,xt]+bf) The input gate and network status are updated as follows: i.e. it=σ(Wi.[ht-1,xt]+bi),
Figure BDA0002125794280000061
The output gate is ot=σ(WO.[ht-1,xt])+bo), ht=ot*tanh(Ct)。
2. A second part comprising one of the steps of:
step 1: and acquiring hidden layer output of the recurrent neural network, and then performing nonlinear transformation to obtain the propagation rate of the microblog at each forwarding moment. The forwarding process of the message is modeled as a random point process, and a specific calculation formula is as follows:
vt=exp(Wmht+bm)
wherein, WmAs a weight matrix, bmAs a bias parameter, HtIs reflected in WmhtUpper, htIs the hidden layer information of the recurrent neural network and also represents the historical information in the sequence data.
3. The third part comprises the following three steps:
step 1: and calculating the propagation trend acceleration of the microblog in the observation time by using the obtained rate function. The spreading trends of different types of microblogs are greatly different, and the differences of the spreading trends lead to future popularity, so that a feature capable of showing the changes of the popularity trends needs to be found and is fused into a model, and the future popularity of the microblogs can be accurately predicted. The calculation formula is as follows:
Figure BDA0002125794280000071
wherein, TobsRepresenting observation time, n representing the number of elements in the forwarding sequence, viRepresenting a rate function for each forwarding instant.
And 2, quantifying the user activity to obtain the user activity of each time period on the microblog platform. The specific quantization formula is as follows:
Figure BDA0002125794280000072
n (t) represents the average number of microblogs issued by the user from the start time of a day to the current time t, η represents the average number of microblogs issued by the user in unit time on the microblog platform, and the unit time can be hours, minutes and seconds.
And 3, dividing the trend acceleration and the popularity of the early message in the step 1 by the user activity in the step 2 respectively to obtain the relative trend acceleration and the relative popularity, wherein the relative trend acceleration and the relative popularity are as follows:
Figure BDA0002125794280000073
then, combining the two to establish a linear regression model, wherein the calculation formula is as follows:
Figure BDA0002125794280000074
wherein, beta012Are model parameters.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (3)

1.一种基于时间及用户的转发序列的微博流行度预测方法,其特征在于:包括以下步骤:1. a microblog popularity prediction method based on time and user's forwarding sequence, is characterized in that: comprise the following steps: S1:利用循环神经网络对微博的转发序列进行建模,用来捕获消息传播过程的长距离依赖;S1: Use recurrent neural network to model the forwarding sequence of Weibo to capture the long-distance dependence of the message propagation process; S2:获取循环神经网络的隐藏层输出,然后进行非线性变换,得到微博在每个转发时刻的传播速率;S2: Obtain the output of the hidden layer of the recurrent neural network, and then perform nonlinear transformation to obtain the propagation rate of the microblog at each forwarding moment; S3:利用速率得到的早期趋势加速度和早期的流行度,并在用户活跃度的优化下,对微博未来的流行度进行预测;包括以下步骤:S3: Using the early trend acceleration and early popularity obtained by the rate, and under the optimization of user activity, predict the future popularity of Weibo; including the following steps: S31:利用得到的速率函数计算微博到观察时间内的传播趋势加速度,计算公式如下:S31: Use the obtained rate function to calculate the propagation trend acceleration from the microblog to the observation time, and the calculation formula is as follows:
Figure FDA0003112459230000011
Figure FDA0003112459230000011
其中,Tobs表示观察时间,n表示转发序列中元素个数,vi表示每个转发时刻的速率函数;Among them, T obs represents the observation time, n represents the number of elements in the forwarding sequence, and vi represents the rate function at each forwarding moment; S32:将用户活跃度量化,得到微博平台上每个时间段的用户活跃度,具体的量化公式如下:S32: Quantify the user activity to obtain the user activity of each time period on the Weibo platform. The specific quantification formula is as follows:
Figure FDA0003112459230000012
Figure FDA0003112459230000012
其中,N(t)表示从一天的开始时间到当前时间t为止用户发布微博的平均数量,η表示微博平台上单位时间内的用户发布微博的平均数量;Wherein, N(t) represents the average number of microblogs posted by users from the start of the day to the current time t, and n represents the average number of microblogs posted by users within a unit time on the microblog platform; S33:将步骤S31的趋势加速度和消息早期的流行度分别除以步骤S32的用户活跃度得到相对趋势加速度和相对流行度,如下:S33: Divide the trend acceleration in step S31 and the early popularity of the message by the user activity in step S32 to obtain the relative trend acceleration and relative popularity, as follows:
Figure FDA0003112459230000013
Figure FDA0003112459230000013
然后联合两者建立线性回归模型,计算公式如下:Then combine the two to establish a linear regression model, the calculation formula is as follows:
Figure FDA0003112459230000014
Figure FDA0003112459230000014
其中,β012为模型参数。Among them, β 0 , β 1 , and β 2 are model parameters.
2.根据权利要求1所述的基于时间及用户的转发序列的微博流行度预测方法,其特征在于:步骤S1包括以下步骤:2. the microblog popularity prediction method based on time and user's forwarding sequence according to claim 1, is characterized in that: step S1 comprises the following steps: S11:时间向量的映射,对于每个时间组成单位,按照其上一级的单位转换为该单位的长度,然后设置它在向量中的长度,接着将用户信息向量化,收集微博中每个用户的历史微博文本信息,聚合成代表该用户的文档,所有的用户文档聚合成一个文档集,随机生成各个主题的主题-词分布以及各个用户微博文档的文档-主题分布,根据文档-主题分布和主题-词分布,生成全部文档中的词,根据LDA主题模型的吉布斯抽样不断进行模型的训练,最终得到每个用户文档的主题分布,用该主题分布作为用户的兴趣向量;S11: The mapping of the time vector, for each time component unit, convert it to the length of the unit according to the unit of the previous level, then set its length in the vector, then vectorize the user information, collect each microblog in the The user's historical microblog text information is aggregated into documents representing the user, all user documents are aggregated into a document set, and the topic-word distribution of each topic and the document-topic distribution of each user's microblog document are randomly generated, according to the document- Topic distribution and topic-word distribution, generate words in all documents, continuously train the model according to the Gibbs sampling of the LDA topic model, and finally obtain the topic distribution of each user document, and use the topic distribution as the user's interest vector; S12:将时间和用户向量拼接起来作为一个整体输入,按照一定的规则进行嵌入操作;S12: Concatenate time and user vector as a whole input, and perform the embedding operation according to certain rules; S13:将步骤S12的结果作为输入到循环神经网络中,经过嵌入层输入到底层RNN中进行传播训练,采用LSTM作为循环神经网络来解决标准神经网路中梯度消失的问题,经过遗忘门、输入门、输出门最终得到每个时间步的隐藏层输出;S13: The result of step S12 is used as input into the recurrent neural network, which is input into the underlying RNN through the embedding layer for propagation training, and LSTM is used as the recurrent neural network to solve the problem of gradient disappearance in the standard neural network. After the forgetting gate, input The gate and output gate finally get the hidden layer output of each time step; 遗忘门公式为:The forget gate formula is: ft=σ(Wf.[ht-1,xt]+bf),f t =σ(W f .[h t-1 ,x t ]+b f ), 其中,xt是第t层的输入,ht表示当前时间步的隐藏层信息,ht-1表示前一个时间步的隐藏层信息,“.”表示向量的乘法运算,中括号表示两个向量相连合并,σ(·)为sigmoid激活函数,Wf为权重矩阵,bf为偏置向量Among them, x t is the input of the t-th layer, h t represents the hidden layer information of the current time step, h t-1 represents the hidden layer information of the previous time step, "." represents the multiplication operation of the vector, and square brackets represent two The vectors are connected and merged, σ( ) is the sigmoid activation function, W f is the weight matrix, and b f is the bias vector 输入门及网络状态更新为:The input gate and network state are updated as: it=σ(Wi.[ht-1,xt]+bi),
Figure FDA0003112459230000021
i t =σ(W i .[h t-1 ,x t ]+ bi ),
Figure FDA0003112459230000021
其中,WC和bC分别代表权重矩阵和偏置向量,tanh是双曲正切函数;Among them, W C and b C represent the weight matrix and the bias vector, respectively, and tanh is the hyperbolic tangent function; 输出门为:The output gate is: ot=σ(WO.[ht-1,xt])+bo),ht=ot*tanh(Ct)o t =σ(W O .[h t-1 ,x t ])+b o ), h t =o t *tanh(C t ) 其中,Wo和bo分别是权重矩阵和偏置参数。where W o and b o are the weight matrix and bias parameters, respectively.
3.根据权利要求1所述的基于时间及用户的转发序列的微博流行度预测方法,其特征在于:所述步骤S2中,获取循环神经网络的隐藏层输出,然后进行非线性变换,得到微博在每个转发时刻的传播速率,将消息的转发过程建模成随机点过程,计算公式如下:3. the microblog popularity prediction method based on time and user's forwarding sequence according to claim 1, is characterized in that: in described step S2, obtain the hidden layer output of recurrent neural network, then carry out nonlinear transformation, obtain The propagation rate of Weibo at each forwarding moment, the forwarding process of the message is modeled as a random point process, and the calculation formula is as follows: vt=exp(Wmht+bm)v t =exp(W m h t +b m ) 其中,Wm为权重矩阵,bm为偏置参数,Ht的影响体现在Wmht上,ht是循环神经网络的隐藏层信息,也代表序列数据中的历史信息。Among them, W m is the weight matrix, b m is the bias parameter, the influence of H t is reflected in W m h t , h t is the hidden layer information of the recurrent neural network, and also represents the historical information in the sequence data.
CN201910621977.8A 2019-07-10 2019-07-10 Microblog popularity prediction method based on time and user forwarding sequence Active CN110336700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910621977.8A CN110336700B (en) 2019-07-10 2019-07-10 Microblog popularity prediction method based on time and user forwarding sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910621977.8A CN110336700B (en) 2019-07-10 2019-07-10 Microblog popularity prediction method based on time and user forwarding sequence

Publications (2)

Publication Number Publication Date
CN110336700A CN110336700A (en) 2019-10-15
CN110336700B true CN110336700B (en) 2021-09-14

Family

ID=68146339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910621977.8A Active CN110336700B (en) 2019-07-10 2019-07-10 Microblog popularity prediction method based on time and user forwarding sequence

Country Status (1)

Country Link
CN (1) CN110336700B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241392B (en) * 2020-01-07 2024-01-26 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining popularity of article
CN112580878B (en) * 2020-12-23 2023-10-10 河南广播电视台 Information popularity prediction method based on graph neural network
CN112580879B (en) * 2020-12-23 2023-10-10 河南广播电视台 An information popularity prediction method based on graph neural network
CN113190733B (en) * 2021-04-27 2023-09-12 中国科学院计算技术研究所 Network event popularity prediction method and system based on multiple platforms
CN113536144B (en) * 2021-06-17 2022-04-19 中国人民解放军国防科技大学 Social network information propagation scale prediction method and device
CN114912941B (en) * 2022-04-11 2023-08-11 四川大学 A shoe fashion trend prediction system and method based on big data
CN114997464B (en) * 2022-04-26 2024-08-06 北京交通大学 A popularity prediction method based on graph temporal information learning
CN115470994B (en) * 2022-09-15 2023-07-11 苏州大学 Information popularity prediction method and system based on explicit time and cascade attention
CN117134997B (en) * 2023-10-26 2024-03-01 中电科大数据研究院有限公司 Edge sensor energy consumption attack detection method, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975504A (en) * 2016-04-28 2016-09-28 中国科学院计算技术研究所 Recurrent neural network-based social network message burst detection method and system
CN109063827A (en) * 2018-10-25 2018-12-21 电子科技大学 It takes automatically in the confined space method, system, storage medium and the terminal of specific luggage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975504A (en) * 2016-04-28 2016-09-28 中国科学院计算技术研究所 Recurrent neural network-based social network message burst detection method and system
CN109063827A (en) * 2018-10-25 2018-12-21 电子科技大学 It takes automatically in the confined space method, system, storage medium and the terminal of specific luggage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于RNN 的社交消息爆发预测模型;笱程成等;《软件学报》;20171130;摘要,第1.2节-第1.2节,第3.1节-第3.2节 *

Also Published As

Publication number Publication date
CN110336700A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110336700B (en) Microblog popularity prediction method based on time and user forwarding sequence
Medvedev et al. The anatomy of Reddit: An overview of academic research
Xiong et al. An emotional contagion model for heterogeneous social media with multiple behaviors
CN106682770B (en) Dynamic microblog forwarding behavior prediction system and method based on friend circle
Kumar et al. Explainable artificial intelligence for sarcasm detection in dialogues
CN111885399B (en) Content distribution method, device, electronic equipment and storage medium
CN111339404A (en) Content popularity prediction method and device based on artificial intelligence and computer equipment
CN104933622A (en) Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN108399575A (en) A kind of five-factor model personality prediction technique based on social media text
CN102394798A (en) Multi-feature based prediction method of propagation behavior of microblog information and system thereof
Yang et al. Measuring topic network centrality for identifying technology and technological development in online communities
CN113536144B (en) Social network information propagation scale prediction method and device
CN107292390A (en) A kind of Information Propagation Model and its transmission method based on chaology
CN113590928A (en) Content recommendation method and device and computer-readable storage medium
CN108229731B (en) A user behavior prediction system and method for multi-message interaction under hot topics
Hernandez et al. Twitter analysis: Methods for data management and a word count dictionary to measure city-level job satisfaction
CN110995485A (en) Social message propagation range prediction method without topological structure
CN115470991A (en) Prediction Method of Internet Rumor Spread Based on User's Short-term Emotion and Evolutionary Game
WO2023087933A1 (en) Content recommendation method and apparatus, device, storage medium, and program product
Xiao et al. Social media emotional state classification prediction based on Arctic Puffin Algorithm (APO) optimization of Transformer mode
CN112464082A (en) Rumor-splitting game propagation control method based on sparse representation and tensor completion
CN115878902A (en) Automatic information key theme extraction system of media fusion platform based on neural network model
CN114218457B (en) False news detection method based on forwarding social media user characterization
CN107515854B (en) Time sequence community and topic detection method based on right-carrying time sequence text network
CN112269945A (en) Information dissemination prediction method based on rumors to refute rumors and promote rumors and tripartite cognitive game

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant