CN110083699B - News popularity prediction model training method based on deep neural network - Google Patents

News popularity prediction model training method based on deep neural network Download PDF

Info

Publication number
CN110083699B
CN110083699B CN201910202638.6A CN201910202638A CN110083699B CN 110083699 B CN110083699 B CN 110083699B CN 201910202638 A CN201910202638 A CN 201910202638A CN 110083699 B CN110083699 B CN 110083699B
Authority
CN
China
Prior art keywords
news
popularity
training
prediction model
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910202638.6A
Other languages
Chinese (zh)
Other versions
CN110083699A (en
Inventor
刘春阳
王乾宇
张旭
何赛克
张翔宇
郑晓龙
曾大军
彭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
National Computer Network and Information Security Management Center
Original Assignee
Institute of Automation of Chinese Academy of Science
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, National Computer Network and Information Security Management Center filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910202638.6A priority Critical patent/CN110083699B/en
Publication of CN110083699A publication Critical patent/CN110083699A/en
Application granted granted Critical
Publication of CN110083699B publication Critical patent/CN110083699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a news popularity prediction model training method based on a deep neural network, which comprises the following steps: acquiring news article data of a specific theme in a set time period, cleaning the data by using Pandas, and then sequentially grouping according to a set time length to acquire a news popularity sequence arranged according to a time sequence; according to the news popularity sequence, sequentially taking a continuous sequence with the sampling length of w as an input sample from the first popularity, and sampling data of the next period as an output sample to construct a training sample set; randomly selecting training samples from the training sample set to train the LSTM network-based news popularity prediction model, performing relevance analysis by adopting Pearson correlation coefficients to delete bad training samples, and circulating the training process until the training is finished. The invention can obtain a news popularity prediction model for predicting trendless, seasonality-free and nonlinear news popularity with higher accuracy.

Description

News popularity prediction model training method based on deep neural network
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a news popularity prediction model training method based on a deep neural network.
Background
With the increasing influence of the internet on the lives of people and the wide popularization of mobile terminal devices, data in human production and life are rapidly growing in recent years. The amount of data newly generated each year is almost the sum of thousands of years of history. With the improvement and the development of deep learning theory, the intrinsic value contained in the big data can be continuously mined. The value of the method has attracted high attention from governments, business industries and scientific and technological boundaries of various countries.
For the media industry, besides the traditional paper media, various new media platforms such as microblog, blog, forum, Twitter, etc. are also developed. These emerging media are gradually changing the habits of people in obtaining information, and the amount of data generated each day is also quite large. In the social transformation period of rapid economic development in China, social events such as accident disaster events, public health events, social security events and the like frequently occur. New media websites are now becoming the main channel for people to get news events. Therefore, based on new media, news is analyzed and researched, the development trend, wind direction and young age of the news are comprehensively predicted and analyzed, and the pertinence and the foresight of event handling are necessarily improved.
The conventional media industry often selects 3 methods for performing time sequence prediction on news popularity, which are respectively as follows: holt linear exponential smoothing (Holt quadratic exponential smoothing), Holt-Winters seasonal exponential smoothing (Holt cubic exponential smoothing), and ARIMA (Autoregressive Integrated Moving Average Model). The Holt linear exponential smoothing method is only suitable for predicting with trends, and if the popularity of news does not accord with the conventional trend, the prediction accuracy is low. The Holt-Winters seasonal index smoothing method is more suitable for seasonal related prediction, and if the time sequence of news events is unrelated to seasons, accurate prediction of results is difficult. ARIMA is a very popular time sequence prediction algorithm at present, has a wide application field, but is difficult to capture the regularity of nonlinear unstable data.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problem that the prediction accuracy of the current prediction model for trendless, seasonality-free and non-linear popularity of news is low, the invention provides a method for training a popularity prediction model based on a deep neural network, the method comprising the following steps:
step S10, obtaining news article data of a specific theme in a set time period as a first news article data set of the theme;
step S20, using Pandas to perform data cleaning on the first news article data set to obtain a second news article data set;
step S30, the second news article data set is grouped in sequence according to the set time length, the news popularity corresponding to each group is calculated, and the news popularity sequences are obtained by arranging according to the time sequence;
step S40, according to the news popularity sequence, sequentially taking a continuous sequence with the sampling length of w from the first popularity as an input sequence X in a time step, sampling data of the next period as a Y, and constructing a training sample set by taking X as an input sample and Y as an output sample;
s50, randomly selecting N training samples from the training sample set to train the news popularity prediction model based on the LSTM network; if the training end condition is reached, executing step S70, otherwise executing step S60;
step S60, calculating a prediction result and a correlation coefficient r of a corresponding output sample by adopting a Pearson correlation coefficient, and removing N training samples selected in the training of the round from the training sample set when r is smaller than a first set threshold value; step S50 is executed;
and step S70, obtaining a trained news popularity prediction model.
In some preferred embodiments, step S10 "obtaining news article data for a specific topic for a set time period" includes:
s101, collecting news article data in a set time period;
step S102, regarding the collected news article data, taking a specific theme as an object, and performing relevance clustering on similar articles by adopting a SimHash algorithm to obtain the news article data of the specific theme in a set time period.
In some preferred embodiments, step S101 "collects news article data in a set time period," and the collection source thereof includes one or more of news, forums, blogs, and microblogs.
In some preferred embodiments, during the training of the news popularity prediction model, 10% is randomly extracted from the training sample set as the validation set, and 10-fold cross validation is performed based on the training sample set and the validation set.
In some preferred embodiments, the first set threshold is 0.6.
In some preferred embodiments, the sampling length w ∈ [10,20 ].
In some preferred embodiments, the news popularity prediction model is trained by using RMSprop algorithm to perform iterative update of the weight parameters.
In some preferred embodiments, the news popularity prediction model, the initialization parameters before training are set as: the number of hidden layer neurons is n, and n belongs to [40,60 ]; the recursion times in the time step are k, and k belongs to [10,20 ]; the number of training rounds is set as q, and q belongs to [1500,2500 ]; the training batch size is set to j, j ∈ [40,60 ].
In another aspect of the present invention, a method for predicting popularity of news based on a deep neural network is provided, the method comprising the following steps:
step A10, obtaining news article data of a selected subject in a set time period;
step A20, acquiring a news popularity sequence of the selected theme by adopting the methods of step S20 and step S30 in the deep neural network-based news popularity prediction model training method;
step A30, selecting w news popularity with the latest time sequence from the news popularity sequence as input data;
and step A40, predicting the popularity of the news in the later period by inputting data by using the trained news popularity prediction model based on the LSTM network obtained by the deep neural network-based news popularity prediction model training method.
In some preferred embodiments, the method further comprises, after step a 40:
step A50, if the prediction times are less than the set prediction period, adding the predicted news popularity into the news popularity sequence, and executing step A30.
The invention has the beneficial effects that:
according to the method for training the news popularity prediction model based on the deep neural network, the news popularity prediction model for effectively predicting and analyzing trendless, seasonality-free and nonlinear news popularity can be obtained, news popularity prediction is carried out through the model, algorithm efficiency is improved, time complexity is reduced, and accuracy of prediction results is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a News popularity prediction model training method based on a deep neural network according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for predicting popularity of news based on a deep neural network according to an embodiment of the present invention;
FIG. 3 is a comparison of results of a prediction of popularity of a certain trade battle made by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention builds a database from news data of each social media, summarizes the data and finds the earliest released data of each channel in time. The articles of the same news topic are gathered through a SimHash algorithm, and the popularity of news in each time period is calculated by taking an hour as a unit. And then the data is cleaned by using Pandas. And then, analyzing and predicting the popularity of the news by adopting a deep LSTM neural network, and performing secondary processing on part of data by utilizing a Pearson Correlation Coefficient (Pearson Correlation Coefficient) in the predicting process. And finally, performing cross verification on the obtained result to obtain the popularity of the news in a future period of time, thereby helping new media companies to analyze and judge news public opinion topics. Compared with the traditional method, the process improves the efficiency of the algorithm, reduces the time complexity, optimizes the accuracy of the prediction result, and has good prediction effect on trendless, seasonality-free and non-linear news.
The invention discloses a news popularity prediction model training method based on a deep neural network, which comprises the following steps of:
step S10, obtaining news article data of a specific theme in a set time period as a first news article data set of the theme;
step S20, using Pandas to perform data cleaning on the first news article data set to obtain a second news article data set;
step S30, the second news article data set is grouped in sequence according to the set time length, the news popularity corresponding to each group is calculated, and the news popularity sequences are obtained by arranging according to the time sequence;
step S40, according to the news popularity sequence, sequentially taking a continuous sequence with the sampling length of w from the first popularity as an input sequence X in a time step, sampling data of the next period as a Y, and constructing a training sample set by taking X as an input sample and Y as an output sample;
s50, randomly selecting N training samples from the training sample set to train the news popularity prediction model based on the LSTM network; if the training end condition is reached, executing step S70, otherwise executing step S60;
step S60, calculating a prediction result and a correlation coefficient r of a corresponding output sample by adopting a Pearson correlation coefficient, and removing N training samples selected in the training of the round from the training sample set when r is smaller than a first set threshold value; step S50 is executed;
and step S70, obtaining a trained news popularity prediction model.
In order to more clearly describe the method for training the news popularity prediction model based on the deep neural network, the following describes the steps of the method in detail with reference to an embodiment.
The invention discloses a news popularity prediction model training method based on a deep neural network, which comprises the following steps:
in step S10, news article data of a specific topic in a set time period is obtained as a first news article data set of the topic.
In this embodiment, news article data may be obtained by:
step S101, collecting news article data in a set time period.
And collecting news contents in multiple channels. After the news is published, the news is generally spread through various channels, and the channels generally comprise news, forums, blogs and microblogs. The news collection is mainly characterized in that data from four channels are crawled and gathered respectively through a web crawler, a database is built, and the data from the four channels are stored in the corresponding database respectively. And simultaneously storing all data acquired by the four channels into a source-tracing summary database.
Step S102, regarding the collected news article data, taking a specific theme as an object, and performing relevance clustering on similar articles by adopting a SimHash algorithm to obtain the news article data of the specific theme in a set time period.
News summary on the same topic. In the process, the SimHash algorithm is adopted to perform relevance clustering on similar articles, and the SimHash algorithm can effectively classify and match various types of articles in the process, integrates similar news, and has an excellent effect on the combing performance of the news.
And step S20, performing data cleaning on the first news article data set by using Pandas to obtain a second news article data set.
And preprocessing the data of the first news article data set by adopting a Pandas method. The deep processing is carried out on the conditions that no column header exists, one column has a plurality of parameters, the unit of column data is not uniform, missing values, empty rows, repeated data, non-ASCII characters, some column headers are data instead of column name parameters and the like, so that the original state of data cleaning is achieved.
And step S30, sequentially grouping the second news article data set according to a set time length, calculating the news popularity corresponding to each group, and arranging the news popularity according to the time sequence to obtain a news popularity sequence.
The set time period in this step may be one hour; and sequencing the news popularity according to the time sequence of the news popularity in the news popularity sequence. In this embodiment, the method for calculating the popularity of news is performed in units of hours (for example, the popularity of news of a certain trade war shown in fig. 3), and the method for calculating the popularity of news of a specific topic in the period a is as follows:
acquiring a specific subject news total browsing volume A2 in an A time period in the second news article data set, acquiring all news total browsing volumes A1 in the A time period in the first news article data set, and calculating the proportion B of A2 in A1;
obtaining a news search index (which can be obtained from any search website such as google and baidu, and can also be obtained by weighting calculation after a plurality of search websites are obtained) of a specific subject in the period A, and normalizing the index to obtain C;
and B and C are weighted and summed to obtain the popularity of the news of the specific subject in the period A.
The above-described method for calculating the popularity of the news of the specific topic in the period a is only an example, and other existing methods for calculating the popularity of the news can be adopted, and are not described in detail here.
And step S40, according to the news popularity sequence, sequentially taking a continuous sequence with the sampling length of w from the first popularity as an input sequence X in a time step, sampling data in the next period as a Y, and constructing a training sample set by taking X as an input sample and Y as an output sample.
In order to process the original univariate time sequence data into data types (with input sample X and output sample Y (true value)) acceptable by LSTM, a news popularity sequence is sequentially sampled from the first to obtain a continuous sequence with the length w (w epsilon [10,20], 12 in the embodiment) as an input sequence X in a time step, data in a later period is sampled as a Y, and a plurality of xs and corresponding Y form a training sample set.
S50, randomly selecting N training samples from the training sample set to train the news popularity prediction model based on the LSTM network; if the training end condition is reached, step S70 is executed, otherwise step S60 is executed.
Before training a news popularity prediction model, a network structure needs to be determined, parameters need to be initialized, and then model training is carried out.
(1) Network architecture
The embodiment adopts a method for predicting the value of time series data by constructing a news popularity prediction model based on a deep LSTM (Long Short-Term Memory) network.
Because the conventional RNN model needs to couple the current hidden state calculation with the previous n times of calculation when realizing the long-term memory function, as shown in formula (1)
St=f(U*Xt+W1*St-1+W2*St-2+...+Wn*St-n) (1)
Wherein, XtFor the input of the input layer at the time of the t-th calculation, StAs a hidden layer, W1...WnU is a weight, e.g., when n equals 1, the hidden layer state is St=f(U*Xt+W*St-1). In this way, the calculation amount is exponentially increased, so that the time for model training is greatly increased, and therefore, the long-term memory calculation based on the deep LSTM network model is selected. Because the depth LSTM-based network model has a cell processor for judging whether the information is valid or not, part of the information which does not accord with the rule can be screened out through a triple gate. Therefore, the algorithm has more optimal prediction effect on long-time sequence data.
Determining an activation function Sigmoid of a fully connected artificial neural network receiving the LSTM output; determining the rejection rate of each layer of network nodes to be 20%; determining a mean square error range of 20%; determining an iterative updating mode of the weight parameter by adopting an RMSprop algorithm; determining epoch and batch size of model training; the epoch is set to 10. The more the number of layers of the LSTM module is, the stronger the learning ability expressed for the high-level time is, and the number of layers is set to 3 in this embodiment; meanwhile, a common neural network layer is added for dimension reduction of output results.
(2) Parameter initialization
Setting the number of hidden layers as m layers, wherein m is generally 1; setting the number of hidden layer neurons as n, wherein n belongs to [40,60], and taking 50 in the embodiment; setting the recursion times in the time step as k, wherein k belongs to [10,20], and taking 15 in the embodiment; setting the number of training rounds as i (i is less, worse and higher), i belongs to [1500,2500], in this embodiment, 2000 is taken; the training batch size is set to j (representing that j sets of sequence samples are extracted from the training set for training in each round), j ∈ [40,60], and 50 is taken in this embodiment. The above parameters are set to initialize the data.
(3) Model training
During each model training, randomly selecting N training samples from a training sample set to train the news popularity prediction model; and judging whether to finish the training according to a preset training finishing condition, if so, executing the step S70, otherwise, executing the step S60. The training end condition may be a set number of iterations, or may be convergence of a loss function calculation value region.
Step S60, calculating a correlation coefficient r between the prediction result and the corresponding output sample by using the Pearson correlation coefficient, removing N training samples selected in the training of the current round from the training sample set when r is smaller than a first set threshold, and executing step S50.
The Pearson Correlation Coefficient (Pearson Correlation Coefficient) performs Correlation analysis on the output prediction result, specifically referring to the following formula (2):
Figure GDA0002062448710000101
wherein r is a correlation coefficient, N is the total number of input samples during training, xi、yiThe predicted value of the later period of the ith input sample and the true value of the later period are respectively.
And (3) performing relevance analysis on the predicted value by using the Pearson correlation coefficient, wherein the closer the corresponding r value is to 1, the higher the relevance between the r value and the predicted value is proved, and the closer the r value is to-1, the lower the relevance is proved. When r is lower than the set threshold (0.6 in this embodiment), N training samples selected in the training of the current round are removed from the training sample set, and then step S50 is performed again for the next training.
And in the training process, the Pearson correlation coefficient is introduced to delete bad sample data again, so that the retained data is more favorable for training the news popularity prediction model, the training speed is improved, and the model parameters can be further optimized.
And step S70, obtaining a trained news popularity prediction model.
In addition, in the training process of the news popularity prediction model, 10% of the training sample set is randomly extracted as a verification set (namely the proportion of the training sample set and the verification set after random splitting is 9:1), and 10-fold cross verification is carried out on the basis of the training sample set and the verification set so as to prevent overfitting.
In an embodiment of the invention, as shown in fig. 2, a news popularity prediction method based on a deep neural network includes the following steps:
step A10, obtaining news article data of a selected subject in a set time period;
step A20, acquiring a news popularity sequence of the selected theme by adopting the methods of step S20 and step S30 in the deep neural network-based news popularity prediction model training method;
step A30, selecting w news popularity with the latest time sequence from the news popularity sequence as input data;
and step A40, predicting the popularity of the news in the later period by inputting data by using the trained news popularity prediction model based on the LSTM network obtained by the deep neural network-based news popularity prediction model training method.
In practical use, if a trained news popularity prediction model is used for predicting an unprecedented next period, the last w steps are directly input to obtain a predicted value of a future step, and if predicted values of more periods are obtained, the predicted values can be gradually accumulated, namely, the predicted values are used as actually-occurring values for prediction, and selection is performed according to a specific period needing prediction.
Therefore, in order to obtain predicted values of more periods, step a50 may be added after step a40 of the method, and if the prediction times are less than the set prediction periods, the predicted popularity of the news is added to the popularity sequence of the news, and step a30 is performed.
Fig. 3 shows the prediction status of a trade battle by using the trained news popularity prediction model in an embodiment of the present invention. The horizontal and vertical tables in the figure are: time, ordinate is: news popularity (normalized presentation of news popularity for ease of display, with values normalized to a specified interval, in this example designated as [0, 100]), with the gray curve being the true value and the black curve being the predicted value. The news popularity of the trade wars in the last 3 months (24 days in 7 months to 24 days in 10 months) is counted. It can be seen that the news popularity reached the first peak at 18 th 9, the second peak at 25 th 9, and then the popularity of the news slightly dropped. The predicted news popularity (black curve) is basically the same as the actual news popularity (gray curve) trend and value, and the method has excellent effect.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the deep neural network-based news popularity prediction method described above may refer to the corresponding process in the embodiment of the deep neural network-based news popularity prediction model training method, and details are not repeated herein.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A news popularity prediction model training method based on a deep neural network is characterized by comprising the following steps:
step S10, obtaining news article data of a specific theme in a set time period as a first news article data set of the theme;
step S20, using Pandas to perform data cleaning on the first news article data set to obtain a second news article data set;
step S30, the second news article data set is grouped in sequence according to the set time length, the news popularity corresponding to each group is calculated, and the news popularity sequences are obtained by arranging according to the time sequence;
step S40, according to the news popularity sequence, sequentially taking a continuous sequence with the sampling length of w from the first popularity as an input sequence X in a time step, sampling data of the next period as a Y, and constructing a training sample set by taking X as an input sample and Y as an output sample;
s50, randomly selecting N training samples from the training sample set to train the news popularity prediction model based on the LSTM network; if the training end condition is reached, executing step S70, otherwise executing step S60;
step S60, calculating a prediction result and a correlation coefficient r of a corresponding output sample by adopting a Pearson correlation coefficient, and removing N training samples selected in the training of the round from the training sample set when r is smaller than a first set threshold value; step S50 is executed;
step S70, obtaining a trained news popularity prediction model;
the news popularity is calculated by the following method:
acquiring the total news browsing volume A2 of a specific subject in the A time period in the second news article data set, acquiring all the total news browsing volumes A1 in the A time period in the first news article data set, and calculating the proportion B of A2 in A1;
obtaining a news search index of a specific subject in the period A, and carrying out normalization processing on the index to obtain C;
and B and C are weighted and summed to be used as the news popularity corresponding to the A period.
2. The method for training a news popularity prediction model based on a deep neural network as claimed in claim 1, wherein the step S10 "obtaining news article data of a specific topic in a set time period" comprises the steps of:
s101, collecting news article data in a set time period;
step S102, regarding the collected news article data, taking a specific theme as an object, and performing relevance clustering on similar articles by adopting a SimHash algorithm to obtain the news article data of the specific theme in a set time period.
3. The deep neural network-based news popularity prediction model training method according to claim 2, wherein in step S101, "news article data in a set time period is collected", and the collected sources include one or more of news, forums, blogs, and microblogs.
4. The training method of the news popularity prediction model based on the deep neural network as claimed in claim 1, wherein 10% of training samples are randomly extracted as a validation set during the training process of the news popularity prediction model, and 10-fold cross validation is performed based on the training samples and the validation set.
5. The deep neural network-based news popularity prediction model training method of claim 1, wherein the first set threshold is 0.6.
6. The deep neural network-based news popularity prediction model training method of claim 1, wherein a sampling length w e [10,20 ].
7. The deep neural network-based news popularity prediction model training method of claim 1, wherein the news popularity prediction model is trained by using an RMSprop algorithm to perform iterative update of weight parameters.
8. The deep neural network-based news popularity prediction model training method according to any one of claims 1 to 7, wherein the news popularity prediction model is initialized with the parameters before training set as: the number of hidden layer neurons is n, and n belongs to [40,60 ]; the recursion times in the time step are k, and k belongs to [10,20 ]; the number of training rounds is set as q, and q belongs to [1500,2500 ]; the training batch size is set to j, j ∈ [40,60 ].
9. A news popularity prediction method based on a deep neural network is characterized by comprising the following steps:
step A10, obtaining news article data of a selected subject in a set time period;
step A20, acquiring the news popularity sequence of the selected subject by adopting the method of step S20 and step S30 in the deep neural network-based news popularity prediction model training method of any one of claims 1 to 8;
step A30, selecting w news popularity with the latest time sequence from the news popularity sequence as input data;
step A40, predicting the popularity of the next stage of news by inputting data by using the well-trained LSTM network-based news popularity prediction model obtained by the deep neural network-based news popularity prediction model training method of any one of claims 1 to 8.
10. The method for predicting popularity of news based on a deep neural network as claimed in claim 9, wherein the method further comprises after the step a 40:
step A50, if the prediction times are less than the set prediction period, adding the predicted news popularity into the news popularity sequence, and executing step A30.
CN201910202638.6A 2019-03-18 2019-03-18 News popularity prediction model training method based on deep neural network Active CN110083699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910202638.6A CN110083699B (en) 2019-03-18 2019-03-18 News popularity prediction model training method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910202638.6A CN110083699B (en) 2019-03-18 2019-03-18 News popularity prediction model training method based on deep neural network

Publications (2)

Publication Number Publication Date
CN110083699A CN110083699A (en) 2019-08-02
CN110083699B true CN110083699B (en) 2021-01-12

Family

ID=67413261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910202638.6A Active CN110083699B (en) 2019-03-18 2019-03-18 News popularity prediction model training method based on deep neural network

Country Status (1)

Country Link
CN (1) CN110083699B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476281B (en) * 2020-03-27 2020-12-22 北京微播易科技股份有限公司 Information popularity prediction method and device
CN112328779B (en) * 2020-11-04 2024-02-13 中国平安人寿保险股份有限公司 Training sample construction method, device, terminal equipment and storage medium
CN113485986A (en) * 2021-06-25 2021-10-08 国网江苏省电力有限公司信息通信分公司 Electric power data restoration method
CN116340639B (en) * 2023-03-31 2023-12-12 北京百度网讯科技有限公司 News recall method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012929A (en) * 2010-11-26 2011-04-13 北京交通大学 Network consensus prediction method and system
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN106570597A (en) * 2016-11-14 2017-04-19 广州大学 Content popularity prediction method based on depth learning under SDN architecture
CN108876044A (en) * 2018-06-25 2018-11-23 中国人民大学 Content popularit prediction technique on a kind of line of knowledge based strength neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (en) * 2008-11-12 2009-03-25 北京交通大学 Network public opinion prediction method based on time sequence
US9047316B2 (en) * 2012-06-04 2015-06-02 Yellowpages.Com Llc Venue prediction based on ranking
US9147009B2 (en) * 2013-02-12 2015-09-29 National Taiwan University Method of temporal bipartite projection
CN109413694B (en) * 2018-09-10 2020-02-18 北京邮电大学 Small cell caching method and device based on content popularity prediction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012929A (en) * 2010-11-26 2011-04-13 北京交通大学 Network consensus prediction method and system
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN106570597A (en) * 2016-11-14 2017-04-19 广州大学 Content popularity prediction method based on depth learning under SDN architecture
CN108876044A (en) * 2018-06-25 2018-11-23 中国人民大学 Content popularit prediction technique on a kind of line of knowledge based strength neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades;Qi Cao;《Proceedings of the 2017 ACM on Conference on Information and Knowledge》;20171130;全文 *
社交网络中流行度演化分析与预测问题研究;胡颖;《中国博士学位论文全文数据库 信息科技辑》;20190228(第2期);全文 *

Also Published As

Publication number Publication date
CN110083699A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110083699B (en) News popularity prediction model training method based on deep neural network
CN110674604B (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
CN110717098B (en) Meta-path-based context-aware user modeling method and sequence recommendation method
CN109582864B (en) Course recommendation method and system based on big data science and dynamic weight adjustment
Lima et al. Domain knowledge integration in data mining using decision tables: case studies in churn prediction
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN111177473B (en) Personnel relationship analysis method, device and readable storage medium
CN111241421B (en) User forwarding behavior prediction method based on social context information
CN111753093A (en) Method and device for evaluating level of network public opinion crisis
CN101221583A (en) Question recommending method and system
CN108549685A (en) Behavior analysis method, device, system and readable storage medium storing program for executing
CN113379457A (en) Intelligent marketing method oriented to financial field
CN110796485A (en) Method and device for improving prediction precision of prediction model
CN115795535A (en) Differential private federal learning method and device for providing adaptive gradient
CN112162860A (en) CPU load trend prediction method based on IF-EMD-LSTM
CN116362823A (en) Recommendation model training method, recommendation method and recommendation device for behavior sparse scene
Jembere et al. Matrix factorisation for predicting student performance
Baranowski et al. Social welfare in the light of topic modelling
Kim et al. A daily tourism demand prediction framework based on multi-head attention CNN: The case of the foreign entrant in South Korea
CN111753151B (en) Service recommendation method based on Internet user behavior
CN116887201B (en) Intelligent short message pushing method and system based on user analysis
Kuperwajs et al. Using deep neural networks as a guide for modeling human planning
CN110175289B (en) Mixed recommendation method based on cosine similarity collaborative filtering
CN116976491A (en) Information prediction method, device, equipment, storage medium and program product
CN113656692B (en) Product recommendation method, device, equipment and medium based on knowledge migration algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant