CN107992976B - Hot topic early development trend prediction system and prediction method - Google Patents

Hot topic early development trend prediction system and prediction method Download PDF

Info

Publication number
CN107992976B
CN107992976B CN201711351709.6A CN201711351709A CN107992976B CN 107992976 B CN107992976 B CN 107992976B CN 201711351709 A CN201711351709 A CN 201711351709A CN 107992976 B CN107992976 B CN 107992976B
Authority
CN
China
Prior art keywords
topic
class
time sequence
neural network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711351709.6A
Other languages
Chinese (zh)
Other versions
CN107992976A (en
Inventor
殷复莲
张贝贝
王颜颜
苏沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201711351709.6A priority Critical patent/CN107992976B/en
Publication of CN107992976A publication Critical patent/CN107992976A/en
Application granted granted Critical
Publication of CN107992976B publication Critical patent/CN107992976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a hot topic early development trend prediction system and a prediction method, which collect a topic time sequence; judging whether the sequence enters a decline period or not; if the topic is entered, classifying the topic as a complete topic time sequence by adopting a clustering method to obtain each topic class, substituting each topic time sequence of each topic class into a prediction model for training to obtain an intra-class prediction model of each topic class; if not, analyzing the similarity between the new topic time sequence and each complete topic time sequence in the topic class, and taking the average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence from high to low of the matching degree; calling an intra-class prediction model of the screened topic classes, and respectively substituting the intra-class prediction model into the new topic time sequence to obtain predicted values of the set number; and endowing the predicted value with different weight values to be combined to obtain the predicted value of the new topic at the future moment. The system and the method can accurately predict the early development trend of the new topic.

Description

Hot topic early development trend prediction system and prediction method
Technical Field
The invention relates to the field of public opinion analysis, in particular to a hot topic early development trend prediction system and a hot topic early development trend prediction method.
Background
The hot topics are often characterized by difficult prediction, great influence, complex situation, high sensitivity, serious consequences and the like, and improper treatment of the hot topics can cause a plurality of adverse reactions in the society and influence the social stability. Most of the existing hot topic development trend prediction algorithms predict the future development situation of the topics according to the existing development situations of the topics, so that the heat degree and the change trend of the topics cannot be predicted when the topics just appear. Although the long-term prediction of the topics has important application value, the short-term prediction, especially the heat prediction of new topics, is more meaningful because the network public opinion has the characteristic of short outbreak period. In fact, many public opinion topics of widespread interest, from topic emergence to wide attention, take as little as a few days.
Currently, researchers are focusing on long-term prediction of topics, including traditional statistics-based prediction models and intelligent machine algorithm-based prediction models. The traditional statistical prediction model generally comprises a Logistic model, an exponential smoothing model, an ARIMA model, a moving average model and the like. The model requires strong regularity of prediction data, and a good fitting effect can be obtained. A prediction model based on a machine learning algorithm mainly combines an artificial intelligence technology and time series prediction. The relevant theoretical basis mainly relates to a neural network, a grey theory, a Bayesian network, a fuzzy set theory and the like. The prediction methods are based on the early data of a single topic for training, and the later data is used as test data for model training. As is well known, topic development generally has large fluctuation at the early stage and is stable at the later stage, and the expansibility and universality of a model cannot be ensured based on a more stable data test prediction model. And the prediction model obtained based on early data training can only accurately predict the development trend of the middle and later periods, and can not predict the heat and the change trend of the topic when the topic appears, so that the timeliness of the model is lost.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a system and a method for predicting an early development trend of a microblog topic based on hot topic similarity, so as to predict a topic heat development trend of a network hot topic, especially precisely predict a development trend of a new topic.
According to an aspect of the present invention, there is provided a hot topic early development tendency prediction system, including: the system comprises an acquisition part, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence which is a time sequence formed by topic reading amounts corresponding to different acquisition times; the judging part is used for judging whether the topic time sequence enters the decline period or not, sending the topic time sequence entering the decline period to the knowledge storage part as a complete topic time sequence, and sending each topic time sequence not entering the decline period to the new topic prediction part as each new topic time sequence; the knowledge storage part comprises a clustering module, a prediction model building module and a prediction model library, wherein the clustering module classifies the complete topic time sequence by adopting a clustering method to obtain corresponding topic classes, the prediction model building module substitutes each topic time sequence in each topic class into a prediction model to train to obtain an intra-class prediction model corresponding to each topic class, and the prediction model library stores the intra-class prediction models; the new topic prediction part is used for monitoring the development trend of each new topic time sequence and comprises a similarity analysis module, a matching module and a trend prediction module, wherein the similarity analysis module calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method, an average value is taken as the matching degree between the new topic and the topic classes, and the matching module screens the topic classes with a set number according to the sequence of the matching degree between the new topic and the topic; the trend prediction module predicts the development trend of the new topic according to an intra-class prediction model of the screened topic classes, wherein the trend prediction module comprises a calling unit, calls the intra-class prediction models corresponding to the screened topic classes with the set number, and substitutes the time series of the new topic into the intra-class prediction models respectively for prediction to obtain the predicted value of the set number corresponding to the future time of the new topic; and the assignment unit is used for assigning the predicted values of the set number of different topic class intra-class prediction models to different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.
According to another aspect of the present invention, there is provided a method for predicting an early development trend of a hot topic, including: collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times; judging whether each topic time sequence enters a decline period or not; if the topic time sequence enters a decline period, classifying the topic time sequence as a complete topic time sequence by adopting a clustering method to obtain different topic classes, substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class; if the topic time series does not enter the decline period, monitoring the development trend of the topic as a new topic, wherein the monitoring comprises the following steps: analyzing the similarity between the time sequence of the new topic and each complete topic time sequence in the topic class by adopting a sequence similarity method, and taking an average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence of high-to-low matching degree with the new topic; calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topics; and giving different weight values to the predicted values of the set number of different topic class intra-class prediction models for combination to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.
According to the microblog topic early development trend prediction system and method based on hot topic similarity, time series clustering is performed on topics to obtain categories with different development trends, and a prediction model is repeatedly trained by using the similarity of topics in the categories, so that the model is more robust and more universal, and thus when a new topic comes, the existing topic categories can be matched by using less data, and further, the topic trend prediction can be performed by using the existing prediction model, and the prediction has higher precision.
Drawings
Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description and appended claims, taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a schematic diagram of a block diagram of an early development trend prediction system for hot topics, according to the present invention;
FIG. 2 is a block diagram of a predictive model building block according to the present invention;
FIG. 3 is a schematic diagram of the BP neural network structure according to the present invention;
FIG. 4 is a flowchart illustrating an early development trend prediction method for hot topics according to the present invention;
FIG. 5 is a flow chart of a method for performing initial assignment of various parameters of a BP neural network using a genetic algorithm according to the present invention;
FIG. 6 is a schematic diagram of the matching of the new topic time series with different complete topic time series in the same topic class according to the present invention;
FIG. 7 is a schematic illustration of the present invention determining the similarity of a new topic time series to a completion time topic time series based on a sliding window;
FIG. 8 is a schematic illustration of the present invention determining similarity of a new topic time series to a completion time topic time series based on a variable window;
FIG. 9 is a schematic diagram of the trend of the time series curve of each complete topic in different topic classes according to the present invention;
FIG. 10 is a graphical representation of the accuracy of the sample matching to the topic class of the present invention;
FIG. 11 is a diagram illustrating MSE quantiles predicted by different combinations of weights assigned to predicted values of different classes of intra-prediction models according to the present invention;
FIG. 12 is a graphical representation of the MSE and APE distributions of the present invention as a function of sample length;
fig. 13 is a schematic diagram comparing the MSE and APE distributions of the prediction method for the early development trend of the hot topic according to the present invention and the existing prediction method.
The same reference numbers in all figures indicate similar or corresponding features or functions.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a block diagram of the system for predicting the early development trend of the hot topic, as shown in fig. 1, the system for predicting the early development trend of the hot topic includes:
the system comprises an acquisition part 1, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence which is a time sequence formed by topic reading amounts corresponding to different acquisition times;
a determination unit 2 that determines whether or not the topic time series enters a decline period (for example, selects a time series of a past period of time, fits a slope of a straight line by a least square method, and if the slope is within a certain range, the topic time series enters the decline period), transmits the topic time series entering the decline period to the knowledge storage unit 3 as a complete topic time series, and transmits each topic time series that does not enter the decline period to the new topic prediction unit 4 as each new topic time series;
the knowledge storage part 3 comprises a clustering module 31, a prediction model construction module 32 and a prediction model library 33, wherein the clustering module 31 classifies the complete topic time sequence by adopting a clustering method (such as K-means, hierarchical clustering, FCM and the like) to obtain a corresponding topic class; the prediction model construction module 32 substitutes each topic time sequence in each topic class into a prediction model (e.g., a Logistic model, an exponential smoothing model, an ARIMA model, a moving average model, etc.) for training to obtain an intra-class prediction model corresponding to each topic class; the prediction model library 33 stores the intra-class prediction models;
the new topic prediction part 4 is used for monitoring the development trend of each new topic time sequence, and comprises a similarity analysis module 41, a matching module 42 and a trend prediction module 43, wherein the similarity analysis module 41 calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method (such as Euclidean distance, dynamic bending distance, editing distance and the like), and takes an average value as the matching degree between the new topic and the topic classes; the matching module 42 screens out a set number of topic classes according to the sequence of the matching degree with the new topic from high to low; the trend prediction module 43 predicts the development trend of the new topic according to the intra-class prediction model for screening out the topic classes with the set number, preferably, the set number is in the range of 0.25 to 0.75 times of the total number of the topic classes classified by the clustering module 31.
The development trends of topics in each topic class in the topic clustering result of the hot topic early development trend prediction system are highly similar, in order to fully utilize the similarity of topics in the class to predict the development trend of a new topic, the invention establishes an intra-class prediction model for each topic class of the topic clustering result, and repeatedly trains the intra-class prediction models of various classes by utilizing similar topic development trend data, so that the obtained intra-class prediction model is more robust, better in robustness and stronger in pertinence prediction capability, and can realize the early prediction of the new topic.
In one embodiment of the present invention, the determining unit 2 includes:
a fitting unit 21 for performing normalization processing on the topic time series in a set time period (last 24 hours) and fitting the normalized topic time series by using a least square method to obtain a slope of a fitting curve of the topic;
and the judging unit 22 is used for judging whether the slope is in a range of-0.02-0, and if the slope of the fitted curve of the topic is in the range, the topic enters a decline period.
In one embodiment of the present invention, the knowledge storage 3 further includes:
the BP neural network structure determining module 34 is used for determining the number of nodes of an input layer, a hidden layer and an output layer in the BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer;
a sample preprocessing module 35 for preprocessing each topicEach complete topic time sequence in the class is divided into a plurality of subsequences according to the number of nodes of an input layer and the number of nodes of an output layer, and one complete topic time sequence is set as Var ═ Var1,Var2,…,Vart]Then the converted multiple subsequences are listed as,
Figure BDA0001510387520000041
each row is a subsequence and is used as input and output data of a training set or a test set in the future. The first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic type is combined together according to rows after the preprocessing, so that a sample of each topic type is formed.
As shown in fig. 2, the prediction model building module 32 includes:
the BP neural network construction unit 321 calls the neural network structure determined by the BP neural network structure, and constructs a model, i.e., a prediction model, output by a hidden layer and an output layer of the neural network according to the following formulas (1) and (2), wherein:
Figure BDA0001510387520000051
Figure BDA0001510387520000052
wherein, wijIs the connection weight, w, of the ith node of the input layer and the jth node of the hidden layerjkIs the connection weight of the jth node of the hidden layer and the kth node of the output layer, ajThreshold for the jth node of the hidden layer, bkIs the threshold of the kth node of the output layer, n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, k is 1,2 … m, m is the number of nodes of the output layer, xiIs a variable of the ith node of the input layer, hjIs the output value of the jth node of the hidden layer, OkIs the output value of the kth node of the output layer, and f is the excitation function
Figure BDA0001510387520000053
An initialization unit 322, which performs initial assignment on parameters of the BP neural network, where the parameters include a connection weight of a hidden layer and an output layer, a connection weight of an input layer and a hidden layer, a hidden layer threshold, and an output layer threshold;
a training set and test set segmentation unit 323, which samples the samples of each topic processed by the sample preprocessing module according to a set proportion and picks out a training set, and the rest are test sets, preferably, the proportion of the training set is 3/4-4/5;
a sample training unit 324 for training the model output by the hidden layer and the output layer of the neural network according to the training set samples, the sample training unit executing the following steps:
step 1, inputting a first sample of a training set;
step 2, substituting the input data of the sample into formulas (1) and (2), and calculating the output of each node of the hidden layer and the output of each node of the output layer;
step 3, calculating the error of each node of the output layer,
ek=yk-ok
wherein, ykIs the actual value of the kth node of the sample, okIs the predicted value of the kth node of the sample;
and 4, updating the parameters of the BP neural network according to the following formulas (3) to (6) in sequence, wherein:
ωij'=ωij+αhj(1-hj)xi(3)
ωjk'=ωjk+αhjej(4)
aj'=aj+αhj(1-hj)(5)
bk'=bk+ek(6)
wherein, ω isij、ωjk、ajAnd bkFor the BP neural network parameters before updating, omegaij'、ωjk'、aj' and bk' isUpdating the BP neural network parameters;
step 5, starting to train the next sample, and circulating the steps 2-5 until the training of all the training set samples is finished;
step 6, calculating a test error, substituting the input data of the test set into the BP neural network trained in the step to obtain a test error MSE of the trained BP neural network,
Figure BDA0001510387520000061
wherein N is the number of the test set samples,
Figure BDA0001510387520000062
to test the predicted value of the set sample Z at the kth node of the output layer,
Figure BDA0001510387520000063
the actual value of the kth node corresponding to the test set sample Z;
the BP neural network optimization termination condition determining unit 325 determines whether the BP neural network training satisfies a termination condition, and if the termination condition is satisfied, outputs an intra-class prediction model of the question class, where the intra-class prediction model includes structure, weight and threshold information of the BP neural network satisfying the termination condition, and if the termination condition is not satisfied, returns parameters of the BP neural network after the training update to the sample training unit to continue training the model, where the termination condition includes one or both of a first termination condition and/or a second termination condition, the first termination condition is that the current iteration number is greater than a set maximum iteration number (generally 1000-plus 5000), and the second termination condition is that the test error change of the BP neural network is smaller than a set target value during multiple iterations.
On one hand, the BP neural network algorithm is a gradient descent method essentially, so that the convergence speed of the algorithm is low, and on the other hand, the BP neural network is an optimization method of local search, and the weight of the network is gradually adjusted along the local improvement direction, so that the weight is converged to a local minimum value. In addition, BP neural networks are very sensitive to initial network weights, initializing the network with different weights, which tend to converge to different local minima. The invention introduces genetic algorithm into initial weight for determining BP neural network, improves convergence speed, avoids falling into local minimum, the genetic algorithm is from natural selection of Darwin and thought of genetics, through selecting, crossing and mutating individuals, the individuals have capability of global search solution, the genetic algorithm is introduced into initial weight and threshold value for determining BP neural network, can improve algorithm convergence speed, avoids falling into local minimum, specifically, the initialization unit 322 comprises:
the initial population setting subunit 3221 randomly generates an initial population of P individuals, where the population scale is P, and G ═ G (G)1,G2,…,Gp)TSelecting a certain symmetry interval [ -W, W](e.g., W)<Random real numbers in 3) constitute a real number vector of length S, given to the individuals G in the populationi=(g1,g2,…,gs),i=1,2,…,P,S=n*l+l*m+l+m,gsIs an individual GiThe S gene of (1);
the individual evaluation subunit 3222 determines an evaluation function of the individual of the initial population, uses each gene of each individual as a connection weight of the hidden layer and the output layer of the initialization unit, a connection weight of the initial input layer and the hidden layer, an initial hidden layer threshold and an initial assignment of the initial output layer threshold, and obtains fitness of each individual through a prediction model construction module, so as to obtain fitness of each individual by using the individual G in the initial population GiFitness value of
Figure BDA0001510387520000064
For illustration, specifically:
Figure BDA0001510387520000065
wherein the content of the first and second substances,
Figure BDA0001510387520000066
is an individual GiFor BP nerveMean square error when the parameters of the network are initially assigned;
the individual selecting subunit 3223 selects, by using a roulette operator, individuals in the initial population based on a fitness proportion selecting strategy, which specifically includes:
the selection probability calculating unit 32231 calculates the probability that each individual is selected
Figure BDA0001510387520000071
Comprises the following steps:
Figure BDA0001510387520000072
wherein
Figure BDA0001510387520000073
The cumulative probability calculating unit 32232 calculates the individual cumulative probability qjComprises the following steps:
Figure BDA0001510387520000074
pseudo-random number generation unit 32233 at [0, 1 ]]Generating P uniformly distributed pseudo random numbers (r) in the interval1,r2,…,ri,…,rp) P is the size of the population;
the individual selecting unit 32234 sequentially selects, P times, individuals in the initial population according to the P random numbers to obtain a selected individual, specifically: if r1<q1Then, 1 st time, the selected individual is G1Otherwise, the individual is selected as G for the 1 st timekSo that q isk-1<r1≤qkIt is true, assume that the 1 st selected individual is GkThen, r is determined at the 2 nd selection2Whether or not less than qkRepeating the selection similar to the first time, and obtaining the selected individual G after p times of selectionu
The cross subunit 3224 performs cross update on the individuals selected by the individual selecting subunit, performs cross update on the updated individual, uses the maximum value of each updated gene as the last point of the gene, and performs cross update on each updated geneThe minimum value of each gene is used as the lower bound of the gene, and selected individuals G of the initial population G are used asuFor illustration, the cross operation is performed at r _ pick position by using each of the other individuals of the initial population G, and the v-th individual G of the initial population GvAnd selecting the individual GuSelecting individuals from the v-th cross
Figure BDA0001510387520000075
Figure BDA0001510387520000076
Wherein r _ pick is a random integer in the interval [0, S ], S is the chromosome length; selecting genes of individuals, wherein the maximum value corresponding to the crossed selected individuals is the upper limit of the genes, and the minimum value is the lower limit of the genes;
the mutation subunit 3225 performs mutation operation on the genes in the individuals after the crossover subunit to obtain mutated individuals, substitutes the mutated individuals into the individual evaluation subunit, and evolves the initial population, for example, by selecting the individual GuJ gene g of (2)jPerforming mutation operation, and selecting individual G according to each gene pair after mutationuUpdating to evolve the initial population, wherein the g after mutationj' is:
Figure BDA0001510387520000077
Figure BDA0001510387520000078
wherein, gjTo select an individual GuThe jth gene of (1), gjmaxAnd gjminIs gene gjUpper and lower bounds of rpTo select an individual GuPseudo random number generated P times by the pseudo random number generating unit, iternowIs the current evolutionary algebra, itermaxIs the set maximum evolution algebra;
the genetic algorithm optimization termination condition determining subunit 3226 determines whether the genetic algorithm meets an algorithm termination condition, and if the genetic algorithm meets the algorithm termination condition, outputs an optimal population individual as a connection weight of the hidden layer and the output layer of the initialization unit, a connection weight of the input layer and the hidden layer, a threshold of the hidden layer, and a final initial value of the threshold of the output layer, and if the algorithm termination condition is not met, returns to the individual evaluation subunit, where the algorithm termination condition includes a first algorithm termination condition or/and a second algorithm termination condition, the first algorithm termination condition is that a current evolution algebra is greater than a set maximum evolution algebra (generally 20-500), and the second algorithm termination condition is that a change in individual fitness value is less than a set target value when evolution is performed for a plurality of times continuously.
The hot topic early development trend prediction system introduces Genetic Algorithm (GA) to optimize initial parameters of a prediction model of the neural network, optimizes the determination of the initial parameters of the BP neural network, and obtains the neural network prediction model (GABP) optimized by the Genetic Algorithm, so that the prediction accuracy of the traditional BP neural network used for topic development trend prediction is greatly improved.
In an embodiment of the present invention, the similarity analysis module 41 of the system for predicting the early development tendency of the hot topic includes:
a window setting unit 411 for setting the size of a window according to the length of the new topic time series;
the segmentation sequence distance calculation unit 413 is used for calculating the distance between the time sequence of the new topic and the subsequence of each completion time sequence in the topic class, which corresponds to the window size, in a set window, wherein the distance is the Euclidean distance or the dynamic bending distance;
the similarity determining unit 414 determines a minimum value of distances between the plurality of subsequences in each completion time sequence and the new topic time sequence as a similarity between the new topic and the completion time sequence.
Preferably, the similarity analysis module 41 further includes a window scaling unit 412, which changes the size of the window set by the window setting unit within the setting range.
In an embodiment of the present invention, the trend prediction module 43 of the early development trend prediction system of the hot topic according to the present invention, as shown in fig. 1, includes:
the calling unit 431 is used for calling the intra-class prediction models corresponding to the screened topic classes with the set number, and substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic;
and the assigning unit 433 is configured to assign different weight values to the predicted values of the set number of different intra-class prediction models of different topic classes to combine the predicted values to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the intra-class prediction model of the topic class with the high matching degree is not less than the weight value of the predicted value of the intra-class prediction model of the topic class with the low matching degree.
Preferably, the trend prediction module 43 further comprises:
the eliminating unit 432 eliminates the topic class with the largest predicted value called by the calling unit, and sends the other topic classes to the assigning unit 433.
Fig. 4 is a flowchart of the method for predicting the early development trend of the hot topic, as shown in fig. 4, the method for predicting the early development trend of the hot topic includes:
step S1, collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times;
step S2, judging whether each topic time sequence enters a decline period;
if the topic time sequence enters the decline period, in step S3, classifying the topic time sequence as a complete topic time sequence by using a clustering method to obtain different topic classes, and substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class;
if the topic time series does not enter the decline period, its trend is monitored as a new topic at step 4.
In an embodiment of the present invention, in step 1, when the occurrence of a topic is monitored, the heat (topic reading amount) x of the topic is collected every set period T (for example, T takes 4 hours)njFinally, a topic time sequence is obtained
Figure BDA0001510387520000091
Wherein n represents the number of days from the topic collection date to the topic occurrence date, j represents the jth collection point in a certain day of the topic, bstIndicates the sample length, l, of day 1 after the topic occurredstThe length of the sample of the day when the topic enters the decline period is shown, n is more than or equal to 1, j is more than or equal to 1,
Figure BDA0001510387520000092
preferably, the sample data of the first day is supplemented before the topic time series occurs
Figure BDA0001510387520000093
0, indicating which time of day the topic was in just happening. Since the development rules of topics are not completely the same when the topics occur in different time periods of a day such as the morning, the afternoon and the evening, the information is effectively reserved through the operation of supplementing 0.
In an embodiment of the present invention, the step S2 includes:
normalizing the topic time sequence in a set time period (last 24 hours), and fitting by adopting a least square method to obtain the slope of a fitting curve of the topic;
judging whether the slope is in the range of-0.02-0, and if the slope of the fitted curve of the topic is in the range, enabling the topic to enter a decline period.
In step S3, the complete topic time sequence is classified by using a clustering method, and the method for obtaining different topic classes may be based on Euclidean distance (Euc) or Dynamic time warping Distance (DTW) and perform clustering by using K-means, FCM (fuzzy C mean), hierarchical clustering, and various improved algorithms based on basic algorithms, such as K _ SC (K-spectral central), WKSC (Wavelet-based K _ SC), and the like.
However, the distance matching is performed by the traditional euclidean distance across days, which easily increases the distance or causes information confusion, preferably, a segmented euclidean distance is adopted, the segmented euclidean distance segments the time sequence according to a 'natural day', then the segmented euclidean distance is calculated, and finally the segmented distance is integrated (the first day of the two sequences is aligned by adopting 0 filling, and the last day and the following days of the shorter sequence are aligned by adopting 0 filling and the longer sequence), so that the distance increase caused by the distance matching performed across days is prevented.
The traditional DTW carries out bending on a time axis in a day-crossing manner, information in different days after topics are disordered, and preferably, a segmentation dynamic time bending distance (S-DTW) is obtained. On one hand, the disorder caused by information alignment in different days is effectively avoided by segmenting according to 'natural days', on the other hand, the development rules of topics at different moments in the same day are similar, so that data in the same day can be properly subjected to telescopic transformation on a time axis to enable the sequence to be matched to the minimum distance.
In addition, preferably, the complete topic time series is clustered by adopting a hierarchical clustering algorithm, which includes:
classifying each complete topic time sequence into one class, and measuring the distance between classes by adopting the maximum S-DTW between classes;
respectively finding two closest classes of the maximum S-DTW between the classes, and combining the two closest classes into one class, wherein the total number of the classes is reduced by one;
calculating the contour coefficient of the sub-cluster, contour coefficient siThe integrated contour coefficients for all samples for each clustering, for example:
Figure BDA0001510387520000094
miand niRespectively elements in different complete topic time sequences;
repeating the steps to obtain a curve of the profile coefficient changing along with the number of clusters, observing whether the curve has an extreme point, taking the number of clusters corresponding to the maximum value or the maximum value of the profile coefficient as the optimal number of clusters, and taking the clustering result corresponding to the maximum value or the maximum value of the profile coefficient as the classification result of each topic.
In step S3, as shown in fig. 5, the method for obtaining the intra-class prediction model corresponding to each topic class by substituting each topic time series in each topic class into the prediction model and training includes:
step S31, determining the node numbers of an input layer, a hidden layer and an output layer in the BP neural network structure, wherein n is the node number of the input layer, l is the node number of the hidden layer, and m is the node number of the output layer;
step 32, constructing a model, namely a prediction model, of the hidden layer and the output layer output of the neural network according to the following formulas (1) and (2), wherein:
Figure BDA0001510387520000101
Figure BDA0001510387520000102
wherein, wijIs the connection weight, w, of the ith node of the input layer and the jth node of the hidden layerjkIs the connection weight of the jth node of the hidden layer and the kth node of the output layer, ajThreshold for the jth node of the hidden layer, bkIs the threshold of the kth node of the output layer, n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, k is 1,2 … m, m is the number of nodes of the output layer, xiIs a variable of the ith node of the input layer, hjIs the output value of the jth node of the hidden layer, OkIs the output value of the kth node of the output layer, and f is the excitation function
Figure BDA0001510387520000103
Step S33, dividing each complete topic time sequence in each topic into a plurality of subsequences according to the number of nodes of the input layer and the number of nodes of the output layer, and setting a complete topic time sequenceColumn is Var ═ Var1,Var2,…,Vart]Then the converted multiple subsequences are listed as,
Figure BDA0001510387520000104
each row is a subsequence and is used as input and output data of a training set or a test set in the future. The first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the rows after the preprocessing, so that a sample of each topic class is formed;
step S34, sampling the samples of each topic in line according to a set proportion to pick out a training set, and taking the rest as a test set;
step S35, carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a hidden layer threshold and an output layer threshold;
step S36, substituting the training set into the model output by the hidden layer and the output layer of the BP neural network for training, comprising: step 1, inputting a first sample of a training set;
step 2, substituting the input data of the sample into formulas (1) and (2), and calculating the output of each node of the hidden layer and the output of each node of the output layer;
step 3, calculating the error of each node of the output layer,
ek=yk-ok
wherein, ykIs the actual value of the kth node of the sample, okIs the predicted value of the kth node of the sample;
and 4, updating the parameters of the BP neural network according to the following formulas (3) to (6) in sequence, wherein:
ωij'=ωij+αhj(1-hj)xi(3)
ωjk'=ωjk+αhjek(4)
aj'=aj+αhj(1-hj)(5)
bk'=bk+ek(6)
wherein, ω isij、ωjk、ajAnd bkFor the BP neural network parameters before updating, omegaij'、ωjk'、aj' and bk' is the updated BP neural network parameter;
step 5, starting to train the next sample, and circulating the steps 2-5 until the training of all the training set samples is finished;
step 6, calculating a test error, substituting the input data of the test set into the BP neural network trained in the step to obtain a test error MSE of the trained BP neural network,
Figure BDA0001510387520000111
wherein N is the number of the test set samples,
Figure BDA0001510387520000112
to test the predicted value of the set sample Z at the kth node of the output layer,
Figure BDA0001510387520000113
the actual value of the kth node corresponding to the test set sample Z;
step S37, judging whether the BP neural network training meets an end condition, wherein the end condition comprises one or two of a first end condition and/or a second end condition, the first end condition is that the current iteration time is more than the set maximum iteration time, and the second end condition is that the test error change of the BP neural network is less than the set target value when the BP neural network training is continuously iterated for multiple times;
if the end condition is satisfied, in step S38, outputting an intra-class prediction model of the topic class, where the intra-class prediction model includes the structure, weight, and threshold information of the BP neural network that satisfies the end condition;
and if the end condition is not met, returning to the step S35, and returning the updated parameters of the BP neural network after training to the step of carrying out initial assignment on the parameters of the BP neural network for circular training until the end condition is met.
Preferably, in step S35, as shown in fig. 5, the method includes:
step S351, assuming that the population size is P, randomly generates an initial population of P individuals, where G ═ G (G)1,G2,…,Gp)TSelecting a certain symmetry interval [ -W, W]The random real numbers in the population form a real number vector with the length Si=(g1,g2,…,gs),=1,2,…,P,S=n*l+l*m+l+m,gsIs an individual GiThe S gene of (1);
step S352, using each gene of each individual as a connection weight of a hidden layer and an output layer of the BP neural network, a connection weight of an initial input layer and a hidden layer, an initial hidden layer threshold, and an initial assignment of an initial output layer threshold, respectively, substituting a sample belonging to each topic class into a prediction model for training, and obtaining an output of each node of the output layer corresponding to each sample, thereby obtaining a fitness of each individual, wherein:
Figure BDA0001510387520000121
wherein MSE is the individual GiThe network global error when the parameters of the BP neural network are initially assigned is
Figure BDA0001510387520000122
Is an individual G in an initial population GiThe fitness of (1), Z is the sample index, and N is the total number of samples;
step S323, selecting individuals in the initial population by adopting a roulette operator and a selection strategy based on fitness proportion to obtain selected individuals Gu
Step S324, adopting a single-point crossover operator to perform crossover updating on the selected individuals, performing crossover updating, taking the maximum value of each updated gene as the upper bound of the gene, and taking the minimum value of each updated gene as the lower bound of the gene;
step S325, performing mutation operation on the selected individuals subjected to cross update to obtain mutated individuals, substituting the mutated individuals into the individual evaluation subunit, and evolving the initial population, wherein:
Figure BDA0001510387520000123
Figure BDA0001510387520000124
wherein, gjTo select an individual GuThe jth gene of (1), gjmaxAnd gjminIs gene gjUpper and lower bounds of rpTo select an individual GuPseudo random number generated P times by the pseudo random number generating unit, iternowIs the current evolutionary algebra, itermaxIs the set maximum evolution algebra;
step S326, judging whether the genetic algorithm meets an algorithm ending condition, wherein the algorithm ending condition comprises a first algorithm ending condition or/and a second algorithm ending condition, the first algorithm ending condition is that the current evolution algebra is larger than a set maximum evolution algebra, and the second algorithm ending condition is that the variation of the individual fitness value is smaller than a set target value when the evolution is carried out for multiple times continuously;
if the algorithm ending condition is met, returning to the step S321, performing initial assignment on the parameters of the BP neural network of the evolved initial seed group, and repeating the steps until the algorithm ending condition is met;
if the algorithm end condition is satisfied, in step S327, the optimal population is output as the final initial values of the connection weight of the hidden layer and the output layer, the connection weight of the input layer and the hidden layer, the threshold of the hidden layer, and the threshold of the output layer.
In an embodiment of the present invention, the step S4 includes:
step S41, analyzing the time sequence of the new topic and the time sequence of each complete topic in the topic class by adopting a sequence similarity methodThe similarity of the new topic is taken as the matching degree of the new topic and the topic class, and the ith class C is setiWherein, there are lp samples, the similarity between the new topic time sequence and each completed topic similarity sequence is similaritykAnd k is 1,2, …, lp, the average similarity between the new topic time series and each completed topic similarity series (the new topic time series and the topic class C)iIs a degree of matching) of
Figure BDA0001510387520000125
Figure BDA0001510387520000126
Step S42, screening a set number c of topic categories in descending order of matching degree with the new topic, wherein the set number c is preferably 0.25-0.75 times of the total number of topic categories determined in step S3;
step S43, calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic;
and step S44, the predicted values of the set number of different topic class intra-class prediction models are endowed with different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.
According to the method for predicting the early development trend of the hot topic, the category c before the similarity replaces the category with the highest similarity to serve as the category matched with the new topic, as shown in FIG. 6, the topic A and the topic B belong to different categories, the matching degrees of the topic X and the topics A and B are high, but the later development trend of the topic X is unknown. The matching method of the method reserves the category of c before the matching degree, namely reserves various subsequent possible development trends of the topic X as far as possible.
In addition, the prediction value of the prediction model of the category with higher matching degree should be closer to trueThe prediction method of the invention adopts a method of weighting and combining the predicted values of the prediction models of the categories c before the matching degree to predict the development trend of the new topic, gives higher weight to the predicted value of the prediction model of the category with the highest matching degree, obtains c categories with the highest matching degree after the new topic is matched with the existing category, predicts the data of the points under the sample by using the GABP prediction models of the c categories to obtain c predicted values, and sets the predicted values of the prediction models of the categories 1,2, … and c with the matching degree as pred respectively1、pred2、…、predcThe set of predicted values is Q ═ pred { (pred }1,…,predcAnd the weight corresponding to each predicted value is wp1,wp2,…,wpcThe weight set is WP { WP ═ WP1,wp2,…,wpcAnd the combined predicted value calculation formula is as follows:
Figure BDA0001510387520000131
the weighted combination has the problem that the predicted value with higher weight u is more likely to be abnormal, so that the adverse effect of abnormal predicted value is preferably weakened by removing the maximum value in Q and weighting the rest of the predicted values. Let the set with the maximum value in Q removed be Q ', and the weight set with the corresponding weight in W removed be W', then the combined predicted value calculation formula is as follows:
Figure BDA0001510387520000132
in one embodiment of the present invention, in step 41, the method for analyzing the similarity between the time series of the new topic and each complete topic time series in the topic class by using the sequence similarity method, as shown in fig. 7, includes:
the window size is set according to the length of the new topic time series,assume that the length of the target sequence (new topic time series) is N0The window length is equal to the length N of the time sequence of the new topic0
Calculating the distance between the new topic time sequence and the complete time sequence in the topic class within a set window, wherein the distance is a Euclidean distance or a dynamic bending distance, the window length is kept unchanged, the window moves to the right M times from 0 on the complete topic time sequence in the unit length of delta N, and then actually calculating the subsequence distance M +1 times within the window, preferably, the distance is S-DTW.
And taking the minimum value of the distance as the similarity of the new topic and the completion time sequence.
The time complexity of calculating the similarity of the new topic time sequence and the complete topic time sequence based on a sliding window method is high. To fully utilize the initial data of the new topic, the new topic should be aligned with the time-series data starting endpoint of the existing sample. In order to ensure that the new topic and the existing sample obtain the minimum distance within a certain range, preferably, a scalable window is set at the positions of the new topic time sequence, the complete topic time sequence and the like, and a subsequence similarity calculation method based on the variable length window is formed, that is, preferably, in step 41, the size of the window is scaled within the length setting range of the new topic time sequence, and the minimum value of the distance between the new topic time sequence and the completion time sequence in the topic class within the window scaling range is used as the similarity between the new topic and the completion time sequence, as shown in fig. 8, it is assumed that the length of the target sequence is N0Then, the length of the variable length window is set to be N0-m Δ N to N0And (c) the range of + M × Δ N is flexible (M is 0-3, and M is far smaller than M).
Assume that the complexity of calculating the time series distance once within a window is O (N)0) The storage space of the index of the window is V. The performance ratio of the two methods is shown in table 1. When M is much larger than M, the computational complexity of the subsequence matching algorithm based on the variable length window is much smaller than that of the subsequence matching algorithm based on the sliding window.
TABLE 1
Method of producing a composite material Complexity of calculation Index storage space
Sliding window based sub-sequence matching (M+1)O(N0) (M+1)V
Variable window based subsequence matching mO(N0) mV
In a specific embodiment of the present invention, as shown in fig. 9, after clustering complete topic time series data of a hot topic, various internal sample curve trends are obtained, the development of the hot topic usually takes 24 hours as a period, the data sampling period of the present invention is 4 hours, so there are 6 reading data in a day, a 3-layer BP neural network structure can be adopted, the number of neurons in the input layer of the neural network is set to 6, the number of neurons in the output layer is 1, the number of neurons in the hidden layer takes an empirical value of 10, i.e., the structure of the prediction model used in the present invention is 6-10-1, various types of clustered data are circularly organized into a 6-1 structure, and various types of sample numbers are as follows,
TABLE 2
Questions like 1 2 3 4 5 6
Number of samples 996 7038 1590 648 4002 1980
The two key loops of the prediction method are that step S3 trains neural network prediction models of various categories and step S4 matches new topics with existing categories, step S3 needs to guarantee that the prediction models of a single category have strong pertinence prediction capability and high prediction accuracy, and step S4 needs to guarantee that the new topics are matched with the existing categories, the true categories can be accurately matched, the two conditions are key factors established by the prediction method, the new topics are only matched with the true categories, and each category has strong pertinence prediction capability, so that accurate prediction of the new topics can be realized.
In step S3, the maximum training times of the prediction model is 1000, the training target error is 0.00001, the learning rate is 0.1, the population size is 20, the evolution algebra is 50, the crossing rate is 0.8, and the variation rate is 0.1, the data under the above 6 categories are divided into training sets and test sets, the training sets account for 90%, the BP neural network training of the prior art and the GABP neural network (prediction model) training of the present invention are respectively performed, and the tests are performed on the same test set, the mean square error mse (mean Absolute percentage error) and the mean Absolute percentage error mape (mean Absolute percentage error) and the Absolute percentage error ape (Absolute percentage error) are selected as the evaluation indexes to evaluate the performance of different prediction models, as follows,
Figure BDA0001510387520000151
Figure BDA0001510387520000152
APE=|(yi-oi)/oi|×100%
yiand oiRespectively, an expected value and a model predicted value, and N is the number of samples.
The results of the prior art BP neural network and the GABP neural network (predictive model) training and testing according to the present invention using the above data under the 6 categories are shown in table 3 below,
TABLE 3
Figure BDA0001510387520000153
It is apparent from table 3 that the GABP neural network performance far surpasses the conventional BP neural network performance.
In step S4, the prediction method of the present invention uses various prediction models after clustering the topic time series of the hot topic, and has a strong pertinence prediction capability when predicting the development trend of the intra-class samples (new topics) belonging to the topic class.
Predicting the topic time sequence of the hot topic in the 6 topic classes by using the class internal prediction models of all the topic classes respectively, and calculating the prediction error MSE of the ith class internal prediction modeliPrediction error MSE higher than class model to which sample belongsCThe calculation formula is as follows:
Figure BDA0001510387520000154
the prediction errors of the various classes are higher than the prediction error of the class to which the sample belongs, as shown in table 4 below,
TABLE 4
Figure BDA0001510387520000155
Figure BDA0001510387520000161
As can be seen from the above table, the positive values are more and most of the positive values are greater than 100%, which indicates that when most samples are predicted by using the intra-class prediction model of the class to which the samples belong, the prediction error reaches the minimum, and when the samples are predicted by using the non-belonging prediction model, the prediction error is generally more than 1 time higher than that of the class to which the samples belong, which indicates that the prediction accuracy of the belonging class of a single sample is higher than that of the prediction of other classes, and the specific prediction capability of the intra-class prediction model is stronger.
In step S4, initial topic data with different lengths is selected from each complete topic time series in 6 topics as new topics to be matched with each topic class, c categories with highest matching degree of each new topic under different sample lengths are obtained, c is 3, if the category of the new topic before the matching degree 3 includes the category to which the new topic belongs, the topic is calculated to be matched with the category to which the new topic belongs, the number of samples of the category before the matching degree 3 is respectively matched with the category to which the sample belongs is compared, and the number of samples of the category to which the sample belongs in total are set, and as a result, as shown in fig. 10, 80% of the new topics can be guaranteed to be matched with the category to which the new topic actually belongs in the previous 3 topics with the highest matching degree, and as the new topic data (sequence length) increases, the proportion basically keeps rising trend. When the topic sequence length is 10, about 95% of new topics are matched with the categories to which the new topics actually belong, and the accuracy is extremely high.
In step S4, the prediction effects of the prediction methods for the prediction values of the respective question classes of different weight combinations are compared, and the weight combination method with the best effect is selected as the combination predictorFor example, c is 3, the length of the new topic sequence is 4, the predicted values of the prediction models of the classes before the new topic is matched with the existing classes and the degree of matching is obtained after prediction, namely, pred1、pred2、pred3Corresponding weight is w1,w2,…,w3. The 3 predicted values are predicted according to different weight combination methods of the following table,
TABLE 5
Figure BDA0001510387520000162
The prediction error MSE is calculated according to the prediction values of the above method 1-method 10, the distribution of MSE is shown in FIG. 11, a Quartile (Quartile) is a value obtained by arranging all values from small to large in statistics and dividing the values into four equal parts at three division point positions, the smaller the three quantiles are, the smaller the MSE of more samples is, the higher the prediction accuracy is, wherein:
a first quartile (Q1), also called the lower quartile, equal to the 25 th percentile of all values in the sample after the arrangement from small to large;
a second quartile (Q2), also called the median, equal to the 50 th% of the numbers in the sample after all the numbers are arranged from small to large;
the third quartile (Q3), also known as the upper quartile, is equal to the 75% of all values in the sample after the values are arranged from small to large.
As can be seen from fig. 11, the MSE of method 4-method 10 occupies a proportion of 3/4 or more in the interval (0,0.05), and the distribution is extremely tight, which indicates that the prediction accuracy of the weight combination method is high; in methods 1,2 and 3, the quantiles of method 1 are smaller, which indicates that MSE of method 1 is smaller and prediction precision is higher, because method 1 takes the prediction value of the category with the highest matching degree as the sample prediction value; compared with the method 4-9, the MSE of the method 10 is reduced by about 4% -35%, the prediction precision is higher, and therefore the method 10 is selected as a weight combination method.
Selecting the sequence length of the complete topic time sequence as a new topic time sequence, namely, the time sequence 16, 24, 32, 40, 48 hours after the topic occurs as the new topic time sequence, comparing the effect of predicting the new topic by the topic early development trend prediction method provided by the invention, performing weight combination on the prediction value of the topic class internal prediction model 3 before the matching degree by adopting the method 10, predicting the development trend of each new topic under different sample lengths, wherein the distribution condition of prediction error MSE is shown in figure 12, as the length of the new topic sequence increases, the overall distribution of mean square error MSE is close to 0, which shows that the error has a decreasing trend, about 75% of the relative error APE of the new topic is stabilized below 80%, about 25% of the relative error APE of the new topic is stabilized between 80% and 100%, and combining figure 10, the probability that the new topic is matched with the category of the new topic is larger and smaller with the increase of the effective length of the new topic, namely the prediction effect of the public opinion early trend prediction scheme provided by the invention is better and better with the increase of the sequence length of the new topic.
The hot topic early development trend prediction system and the prediction method provided by the invention have the greatest characteristic that a new topic is predicted based on the similarity of the hot topic development trends, firstly, a hot topic set with different development trends is established, and the traditional topic trend prediction method is to establish a prediction model based on the historical data of a single topic and predict the later development trend. The sequence of 16 hours of topic occurrence of the complete topic time sequence in the 6 topic classes is selected as a new topic time sequence, the topic development trend after 4 hours is respectively predicted by adopting the prediction method and the traditional prediction method (such as ARIMA, grey prediction, BP neural network and the like), the distribution condition of prediction error MSE is shown in figure 13, and the distribution condition of MSE is obviously shown in the figure, the MSE distribution and APE distribution of the prediction method of the early development trend of the hot topic provided by the invention are far lower than those of other traditional prediction methods, the MSE of the prediction method provided by the invention is reduced by at least 90% compared with the traditional prediction method, and the APE is reduced by at least 24%, thus the prediction method provided by the invention has stronger prediction accuracy and timeliness for the prediction of the early trend of the topic.
In summary, the system and the method for predicting the early development tendency of the hot topic proposed by the present invention are described by way of example with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that various modifications could be made to the system and method of the present invention described above without departing from the spirit of the invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims (14)

1. A hot topic early development trend prediction system is characterized by comprising:
the system comprises an acquisition part, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence, and the topic time sequence is a time sequence formed by topic reading amounts corresponding to different acquisition moments;
the judging part is used for judging whether the topic time sequence enters the decline period or not, sending the topic time sequence entering the decline period to the knowledge storage part as a complete topic time sequence, and sending each topic time sequence not entering the decline period to the new topic predicting part as each new topic time sequence, wherein the judging part selects the topic time sequence of a past period of time, and fits a linear slope by a least square method, and if the slope is within a certain range, the topic time sequence enters the decline period;
the knowledge storage part comprises a clustering module, a prediction model building module and a prediction model library, wherein the clustering module classifies the complete topic time sequence by adopting a clustering method to obtain corresponding topic classes, the prediction model building module substitutes each topic time sequence in each topic class into a prediction model to train to obtain an intra-class prediction model corresponding to each topic class, and the prediction model library stores the intra-class prediction models of each topic class;
the new topic prediction part is used for monitoring the development trend of each new topic time sequence and comprises a similarity analysis module, a matching module and a trend prediction module, wherein the similarity analysis module calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method, an average value is taken as the matching degree between the new topic and the topic classes, and the matching module screens the topic classes with a set number according to the sequence of the matching degree between the new topic and the topic; the trend prediction module predicts the development trend of the new topic according to an intra-class prediction model of the screened topic class,
the trend prediction module comprises a calling unit, a prediction unit and a trend prediction unit, wherein the calling unit is used for calling the intra-class prediction models corresponding to the screened topic classes with the set number, and the new topic time sequences are respectively substituted into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic; and the assignment unit is used for assigning the predicted values of the set number of different topic class intra-class prediction models to different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.
2. The system for predicting the early-stage development tendency of the hot topic as claimed in claim 1, wherein the similarity analysis module comprises:
a window setting unit for setting the size of the window according to the length of the new topic time sequence;
the segmentation sequence distance calculation unit is used for calculating the distance between the time sequence of the new topic and the subsequence of each complete topic time sequence in the topic class corresponding to the window size in a set window, wherein the distance is the Euclidean distance or the dynamic bending distance;
and the similarity determining unit is used for taking the minimum value of the distances between the subsequences with the sizes of the plurality of corresponding windows in each complete topic time sequence and the new topic time sequence as the similarity of the new topic and the complete topic time sequence.
3. The system for predicting the early development tendency of the hot topic as claimed in claim 2, wherein the similarity analysis module further comprises a window expansion unit for changing the size of the window set by the window setting unit within a set range.
4. The system for predicting the early-stage development tendency of the hot topic as claimed in claim 1, wherein the tendency prediction module further comprises:
and the eliminating unit is used for eliminating the topic class with the largest predicted value called by the calling unit and sending the rest topic classes to the assignment unit.
5. The system for predicting the early development tendency of the hot topics as claimed in claim 1, further comprising a BP neural network structure determination module for determining the number of nodes of the input layer, the hidden layer and the output layer in the BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer.
6. The system of claim 5, further comprising a sample preprocessing module, wherein each complete topic time sequence in each topic class is divided into a plurality of subsequences according to the number of nodes in the input layer and the number of nodes in the output layer, and one complete topic time sequence is set to Var ═ Var [ Var ═ r1,Var2,...,Vart]Then the converted multiple subsequences are listed as,
Figure FDA0002554374160000021
each row is a subsequence and is used as input and output data of a training set or a test set in the future, the first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the rows after the preprocessing, so that a sample of each topic class is formed.
7. The system for predicting the early development tendency of the hot topic as claimed in claim 6, wherein the prediction model building module comprises:
the BP neural network construction unit calls the neural network structure determined by the BP neural network structure, and constructs a model of the output of the hidden layer and the output layer of the neural network according to the following formulas (1) and (2), wherein:
Figure FDA0002554374160000022
Figure FDA0002554374160000023
wherein, wijIs the connection weight, w, of the ith node of the input layer and the jth node of the hidden layerjkIs the connection weight of the jth node of the hidden layer and the kth node of the output layer, ajThreshold for the jth node of the hidden layer, bkIs the threshold of the kth node of the output layer, n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, k is 1,2 … m, m is the number of nodes of the output layer, xiIs a variable of the ith node of the input layer, hjIs the output value of the jth node of the hidden layer, OkIs the output value of the kth node of the output layer, and f is the excitation function
Figure FDA0002554374160000024
The initialization unit is used for carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a threshold of the hidden layer and a threshold of the output layer;
a training set and test set segmentation unit, which samples the samples of each topic processed by the sample preprocessing module according to a set proportion and picks out the training set, and the rest are test sets;
the sample training unit trains the models output by the hidden layer and the output layer of the neural network according to the training set samples, and executes the following steps:
step 1, inputting a first sample of a training set;
step 2, substituting the input data of the sample into formulas (1) and (2), and calculating the output of each node of the hidden layer and the output of each node of the output layer;
step 3, calculating the error of each node of the output layer,
ek=yk-ok
wherein, ykIs the actual value of the kth node of the sample, okIs the predicted value of the kth node of the sample;
and 4, updating the parameters of the BP neural network according to the following formulas (3) to (6) in sequence, wherein:
ωij′=ωij+αhj(1-hj)xi(3)
ωjk′=ωjk+αhjek(4)
aj′=aj+αhj(1-hj) (5)
bk′=bk+ek(6)
wherein, ω isij、ωjk、ajAnd bkFor the BP neural network parameters before updating, omegaij′、ωjk′、aj' and bk' is the updated BP neural network parameter;
step 5, starting to train the next sample, and circulating the steps 2-5 until the training of all the training set samples is finished;
step 6, calculating a test error, substituting the input data of the test set into the BP neural network trained in the step to obtain a test error MSE of the trained BP neural network,
Figure FDA0002554374160000031
wherein N is the number of the test set samples,
Figure FDA0002554374160000032
to test the predicted value of the set sample Z at the kth node of the output layer,
Figure FDA0002554374160000033
the actual value of the kth node corresponding to the test set sample Z;
and the BP neural network optimization termination condition judgment unit is used for judging whether the BP neural network training meets the termination condition or not, if so, outputting an intra-class prediction model of the question class, wherein the intra-class prediction model comprises the structure, weight and threshold information of the BP neural network meeting the termination condition, and if not, returning the BP neural network parameters after the training and updating to the sample training unit to continue training the model, wherein the termination condition comprises one or two of a first termination condition and/or a second termination condition, the first termination condition is that the current iteration frequency is greater than the set maximum iteration frequency, and the second termination condition is that the test error change of the BP neural network is smaller than the set target value when the BP neural network is continuously iterated for multiple times.
8. The system for predicting the early development tendency of hot topics as claimed in claim 7, wherein,
the initialization unit comprises an initial population setting subunit, an individual evaluation subunit, an individual selection subunit, a cross subunit, a variation subunit and a genetic algorithm optimization termination condition judgment subunit, wherein,
the initial population setting subunit sets the population scale to be P, randomly generates an initial population of P individuals, and generates G ═ G1,G2,...,Gp)TSelecting a certain symmetry interval [ -W, W]The random real numbers in the population form a real number vector with the length Si=(g1,g2,...,gs),i=1,2,...,P,S=n*l+l*m+l+m,gsIs an individual GiThe S gene of (1);
the individual evaluation subunit determines an evaluation function of the individual of the initial population, takes each gene of each individual as the initial assignment of the connection weight of the hidden layer and the output layer of the initialization unit, the connection weight of the initial input layer and the hidden layer, the initial hidden layer threshold and the initial output layer threshold, and obtains the fitness of each individual through a prediction model building module, wherein:
Figure FDA0002554374160000041
wherein the content of the first and second substances,
Figure FDA0002554374160000042
is an individual GiThe mean square error of the test when the initial assignment is made to the BP neural network of the initialization unit,
Figure FDA0002554374160000043
is an individual G in an initial population GiA fitness value of;
the individual selection subunit selects the individuals in the initial population by adopting a roulette operator based on a selection strategy of fitness proportion to obtain a selected individual Gu
The cross subunit adopts a single-point cross operator to perform cross updating on the individuals selected by the individual selection subunit, the maximum value of each updated gene is used as the upper bound of the gene, and the minimum value of each updated gene is used as the lower bound of the gene;
the mutation subunit performs mutation operation on the genes in the individuals after passing through the cross subunit to obtain the mutated individuals, substitutes the mutated individuals into the individual evaluation subunit, and evolves the initial population, wherein:
Figure FDA0002554374160000044
Figure FDA0002554374160000045
wherein, gjTo select an individual GuThe jth gene of (1), gjmaxAnd gjminIs gene gjUpper and lower bounds of rpTo select an individual GuPseudo random number generated P times by the pseudo random number generating unit, iternowIs the current evolutionary algebra, itermaxIs the set maximum evolution algebra;
the genetic algorithm optimization termination condition judgment subunit judges whether the genetic algorithm meets an algorithm termination condition, if the genetic algorithm meets the algorithm termination condition, the optimal population individuals are output and used as the connection weight of a hidden layer and an output layer of the initialization unit, the connection weight of an input layer and the hidden layer, a hidden layer threshold and a final initial value of the output layer threshold, and if the genetic algorithm termination condition does not meet the algorithm termination condition, the optimal population individuals are returned to the individual evaluation subunit.
9. A method for predicting the early development trend of a hot topic is characterized by comprising the following steps:
collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times;
judging whether each topic time sequence enters a decline period or not, wherein the judgment comprises the following steps: selecting a topic time sequence of a past period of time, fitting a linear slope by using a least square method, and if the slope is within a certain range, entering a decline period by the topic time sequence;
if the topic time sequence enters a decline period, classifying the topic time sequence as a complete topic time sequence by adopting a clustering method to obtain different topic classes, substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class;
if the topic time series does not enter the decline period, monitoring the development trend of the topic as a new topic, wherein the monitoring comprises the following steps: analyzing the similarity between the time sequence of the new topic and each complete topic time sequence in the topic class by adopting a sequence similarity method, and taking an average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence of high-to-low matching degree with the new topic; calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topics; and giving different weight values to the predicted values of the set number of different topic class intra-class prediction models for combination to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.
10. The method for predicting the early development trend of the hot topic as claimed in claim 9, wherein the method for analyzing the similarity between the time series of the new topic and each complete topic time series in the topic class by using the sequence similarity method comprises:
setting the size of a window according to the length of the new topic time sequence;
calculating the distance between the time sequence of the new topic and the subsequence of each complete topic time sequence in the topic class corresponding to the window size in a set window, wherein the distance is an Euclidean distance or a dynamic bending distance;
and taking the minimum value of the distances between the subsequences in each complete topic time sequence and the new topic time sequence as the similarity of the new topic and the complete topic time sequence.
11. The method for predicting the early development tendency of the hot topic as claimed in claim 10, further comprising: and the size of the window is expanded and contracted within the length setting range of the new topic time sequence, and the minimum value of the distances between the new topic time sequence in the expansion and contraction range of the window and a plurality of subsequences of the complete topic time sequence in the topic class is used as the similarity of the new topic and the complete topic time sequence.
12. The method for predicting the early development tendency of the hot topic as claimed in claim 9, further comprising:
and eliminating the topic class with the maximum called predicted value, and giving different weighted values to the predicted values of the other called topic classes for combination to obtain the predicted value of the new topic at the future time.
13. The method for predicting the early development trend of the hot topics as claimed in claim 9, wherein the step of substituting each topic time sequence in each topic class into the prediction model for training to obtain the intra-class prediction model corresponding to each topic class comprises:
determining the number of nodes of an input layer, a hidden layer and an output layer in a BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer;
constructing a model of the hidden layer and output layer outputs of the neural network according to equations (1) and (2) below, wherein:
Figure FDA0002554374160000051
Figure FDA0002554374160000061
wherein, wijIs the connection weight, w, of the ith node of the input layer and the jth node of the hidden layerjkIs the connection weight of the jth node of the hidden layer and the kth node of the output layer, ajThreshold for the jth node of the hidden layer, bkIs the threshold of the kth node of the output layer, n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, k is 1,2 … m, m is the number of nodes of the output layer, xiIs a variable of the ith node of the input layer, hjIs the output value of the jth node of the hidden layer, OkIs the output value of the kth node of the output layer, and f is the excitation function
Figure FDA0002554374160000062
Carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a threshold of the hidden layer and a threshold of the output layer;
dividing each complete topic time sequence in each topic class into a plurality of subsequences according to the number of nodes of an input layer and the number of nodes of an output layer, and setting a complete topic time sequence as Var ═ Var1,Var2,...,Vart]Then the converted multiple subsequences are listed as,
Figure FDA0002554374160000063
each row is a subsequence and is used as input and output data of a training set or a test set in the future, the first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the row after being preprocessed to form a sample of each topic;
sampling samples of each topic class according to a set proportion and picking out a training set, and taking the rest as a test set;
substituting the training set into a model output by a hidden layer and an output layer of the BP neural network for training, and the training comprises the following steps:
step 1, inputting a first sample of a training set;
step 2, substituting the input data of the sample into formulas (1) and (2), and calculating the output of each node of the hidden layer and the output of each node of the output layer;
step 3, calculating the error of each node of the output layer,
ek=yk-ok
wherein, ykIs the actual value of the kth node of the sample, okIs the predicted value of the kth node of the sample;
and 4, updating the parameters of the BP neural network according to the following formulas (3) to (6) in sequence, wherein:
ωij′=ωij+αhj(1-hj)xi(3)
ωjk′=ωjk+αhjek(4)
aj′=aj+αhj(1-hj) (5)
bk′=bk+ek(6)
wherein, ω isij、ωjk、ajAnd bkFor the BP neural network parameters before updating, omegaij′、ωjk′、aj' and bk' is the updated BP neural network parameter;
step 5, starting to train the next sample, and circulating the steps 2-5 until the training of all the training set samples is finished;
step 6, calculating a test error, substituting the input data of the test set into the BP neural network trained in the step to obtain a test error MSE of the trained BP neural network,
Figure FDA0002554374160000071
wherein N is the number of the test set samples,
Figure FDA0002554374160000072
to test the predicted value of the set sample Z at the kth node of the output layer,
Figure FDA0002554374160000073
the actual value of the kth node corresponding to the test set sample Z;
judging whether the BP neural network training meets an end condition or not, wherein the end condition comprises one or two of a first end condition and/or a second end condition, the first end condition is that the current iteration frequency is greater than a set maximum iteration frequency, and the second end condition is that the test error change of the BP neural network is less than a set target value when the BP neural network training is iterated for a plurality of times continuously;
if the ending condition is met, outputting an intra-class prediction model of the topic class, wherein the intra-class prediction model comprises the structure, weight and threshold information of the BP neural network meeting the ending condition;
and if the end condition is not met, returning the BP neural network parameters after the training update to the step of carrying out initial assignment on the parameters of the BP neural network for carrying out the circular training until the end condition is met.
14. The method for predicting the early development trend of the hot topic as claimed in claim 13, wherein the method for initially assigning the parameters of the BP neural network comprises:
assuming the population size is P, randomly generating an initial population of P individuals, and G ═ G1,G2,...,Gp)TSelecting a symmetric interval [ -W, W]The random real numbers in the group constitute a real number vector of length S, and the individual G _ i ═ (G _1, G _ 2., G _ S), i ═ 1, 2., P, S ═ n × l × m + l + m, G ═ msIs an individual GiThe S gene of (1);
taking each gene of each individual as a connection weight of a hidden layer and an output layer of a BP (back propagation) neural network, a connection weight of an initial input layer and the hidden layer, an initial hidden layer threshold and an initial assignment of an initial output layer threshold, respectively substituting samples belonging to each topic class into a model output by the hidden layer and the output layer of the neural network, and training to obtain the output of each node of the output layer corresponding to each sample, thereby obtaining the fitness of each individual, wherein:
Figure FDA0002554374160000074
wherein MSE is the individual GiThe network global error when the parameters of the BP neural network are initially assigned is
Figure FDA0002554374160000075
Is an individual G in an initial population GiThe fitness of (1), Z is the sample index, and N is the total number of samples;
selecting individuals in the initial population by adopting a roulette operator and a selection strategy based on fitness proportion to obtain selected individuals Gu
Adopting a single-point crossover operator to carry out crossover updating on the selected individuals, taking the maximum value of each updated gene as the upper bound of the gene, and taking the minimum value of each updated gene as the lower bound of the gene;
carrying out variation operation on the selected individuals after cross updating to obtain the varied individuals, substituting the varied individuals into an individual evaluation subunit, and evolving the initial population, wherein:
Figure FDA0002554374160000081
Figure FDA0002554374160000082
wherein, gjTo select an individual GuThe jth gene of (1), gjmaxAnd gjminIs gene gjUpper and lower bounds of rpTo select an individual GuPseudo random number generated P times by the pseudo random number generating unit, iternowIs the current evolutionary algebra, itermaxIs the set maximum evolution algebra;
judging whether the genetic algorithm meets an algorithm ending condition or not, wherein the algorithm ending condition comprises a first algorithm ending condition or/and a second algorithm ending condition, the first algorithm ending condition is that the current evolution algebra is larger than a set maximum evolution algebra, and the second algorithm ending condition is that the variation of the individual fitness value is smaller than a set target value when the evolution is continuously carried out for multiple times;
if the algorithm end condition is met, outputting the optimal population individuals as the connection weight of the hidden layer and the output layer, the connection weight of the input layer and the hidden layer, the threshold of the hidden layer and the final initial value of the threshold of the output layer;
and if the algorithm ending condition is not met, initially assigning values to the parameters of the BP neural network of the evolved initial seed group, and repeating the steps until the algorithm ending condition is met.
CN201711351709.6A 2017-12-15 2017-12-15 Hot topic early development trend prediction system and prediction method Active CN107992976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711351709.6A CN107992976B (en) 2017-12-15 2017-12-15 Hot topic early development trend prediction system and prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711351709.6A CN107992976B (en) 2017-12-15 2017-12-15 Hot topic early development trend prediction system and prediction method

Publications (2)

Publication Number Publication Date
CN107992976A CN107992976A (en) 2018-05-04
CN107992976B true CN107992976B (en) 2020-09-29

Family

ID=62037689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711351709.6A Active CN107992976B (en) 2017-12-15 2017-12-15 Hot topic early development trend prediction system and prediction method

Country Status (1)

Country Link
CN (1) CN107992976B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214562A (en) * 2018-08-24 2019-01-15 国网山东省电力公司电力科学研究院 A kind of power grid scientific research hotspot prediction and method for pushing based on RNN
CN109710755A (en) * 2018-11-22 2019-05-03 合肥联宝信息技术有限公司 Training BP neural network model method and device and the method and apparatus that text classification is carried out based on BP neural network
CN110275939B (en) * 2019-06-10 2023-01-17 腾讯科技(深圳)有限公司 Method and device for determining conversation generation model, storage medium and electronic equipment
CN110399489B (en) * 2019-07-08 2022-06-17 厦门市美亚柏科信息股份有限公司 Chat data segmentation method, device and storage medium
CN110522446A (en) * 2019-07-19 2019-12-03 东华大学 A kind of electroencephalogramsignal signal analysis method that accuracy high practicability is strong
CN110580570B (en) * 2019-08-14 2021-01-15 平安国际智慧城市科技股份有限公司 Law enforcement analysis method, device and medium
CN112650847B (en) * 2019-10-11 2023-05-09 中国农业科学院农业信息研究所 Technological research hotspot theme prediction method
CN111832815B (en) * 2020-07-02 2023-12-05 国网山东省电力公司电力科学研究院 Scientific research hot spot prediction method and system
CN112651560B (en) * 2020-12-28 2023-04-25 华润电力技术研究院有限公司 Ultra-short-term wind power prediction method, device and equipment
CN114882333A (en) * 2021-05-31 2022-08-09 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113780569A (en) * 2021-07-19 2021-12-10 中国科学院计算技术研究所 Popularity prediction method and system based on similar topics

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012929A (en) * 2010-11-26 2011-04-13 北京交通大学 Network consensus prediction method and system
JP2013257765A (en) * 2012-06-13 2013-12-26 Ntt Data Corp Term extraction device, term extraction method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012929A (en) * 2010-11-26 2011-04-13 北京交通大学 Network consensus prediction method and system
JP2013257765A (en) * 2012-06-13 2013-12-26 Ntt Data Corp Term extraction device, term extraction method, and program

Also Published As

Publication number Publication date
CN107992976A (en) 2018-05-04

Similar Documents

Publication Publication Date Title
CN107992976B (en) Hot topic early development trend prediction system and prediction method
Hammouda et al. A comparative study of data clustering techniques
CN111079836B (en) Process data fault classification method based on pseudo label method and weak supervised learning
CN113705877B (en) Real-time moon runoff forecasting method based on deep learning model
CN111222290A (en) Large-scale equipment residual service life prediction method based on multi-parameter feature fusion
Hassan et al. A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction
CN110969290A (en) Runoff probability prediction method and system based on deep learning
CN109214503B (en) Power transmission and transformation project cost prediction method based on KPCA-LA-RBM
CN114547974A (en) Dynamic soft measurement modeling method based on input variable selection and LSTM neural network
Tembusai et al. K-nearest neighbor with K-fold cross validation and analytic hierarchy process on data classification
CN111340069A (en) Incomplete data fine modeling and missing value filling method based on alternate learning
CN116451556A (en) Construction method of concrete dam deformation observed quantity statistical model
CN115114128A (en) Satellite health state evaluation system and evaluation method
CN113010504A (en) Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm
CN116503118A (en) Waste household appliance value evaluation system based on classification selection reinforcement prediction model
CN110110447B (en) Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN113282747A (en) Text classification method based on automatic machine learning algorithm selection
CN112766548A (en) Order completion time prediction method based on GASA-BP neural network
CN117076691A (en) Commodity resource knowledge graph algorithm model oriented to intelligent communities
CN115794805A (en) Medium-low voltage distribution network measurement data supplementing method
CN113496255B (en) Power distribution network mixed observation point distribution method based on deep learning and decision tree driving
CN112949599B (en) Candidate content pushing method based on big data
CN115619028A (en) Clustering algorithm fusion-based power load accurate prediction method
CN114611803A (en) Switch device service life prediction method based on degradation characteristics
CN113360772A (en) Interpretable recommendation model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant