CN107992976B

CN107992976B - Hot topic early development trend prediction system and prediction method

Info

Publication number: CN107992976B
Application number: CN201711351709.6A
Authority: CN
Inventors: 殷复莲; 张贝贝; 王颜颜; 苏沛
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2020-09-29
Anticipated expiration: 2037-12-15
Also published as: CN107992976A

Abstract

The invention provides a hot topic early development trend prediction system and a prediction method, which collect a topic time sequence; judging whether the sequence enters a decline period or not; if the topic is entered, classifying the topic as a complete topic time sequence by adopting a clustering method to obtain each topic class, substituting each topic time sequence of each topic class into a prediction model for training to obtain an intra-class prediction model of each topic class; if not, analyzing the similarity between the new topic time sequence and each complete topic time sequence in the topic class, and taking the average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence from high to low of the matching degree; calling an intra-class prediction model of the screened topic classes, and respectively substituting the intra-class prediction model into the new topic time sequence to obtain predicted values of the set number; and endowing the predicted value with different weight values to be combined to obtain the predicted value of the new topic at the future moment. The system and the method can accurately predict the early development trend of the new topic.

Description

Hot topic early development trend prediction system and prediction method

Technical Field

The invention relates to the field of public opinion analysis, in particular to a hot topic early development trend prediction system and a hot topic early development trend prediction method.

Background

The hot topics are often characterized by difficult prediction, great influence, complex situation, high sensitivity, serious consequences and the like, and improper treatment of the hot topics can cause a plurality of adverse reactions in the society and influence the social stability. Most of the existing hot topic development trend prediction algorithms predict the future development situation of the topics according to the existing development situations of the topics, so that the heat degree and the change trend of the topics cannot be predicted when the topics just appear. Although the long-term prediction of the topics has important application value, the short-term prediction, especially the heat prediction of new topics, is more meaningful because the network public opinion has the characteristic of short outbreak period. In fact, many public opinion topics of widespread interest, from topic emergence to wide attention, take as little as a few days.

Currently, researchers are focusing on long-term prediction of topics, including traditional statistics-based prediction models and intelligent machine algorithm-based prediction models. The traditional statistical prediction model generally comprises a Logistic model, an exponential smoothing model, an ARIMA model, a moving average model and the like. The model requires strong regularity of prediction data, and a good fitting effect can be obtained. A prediction model based on a machine learning algorithm mainly combines an artificial intelligence technology and time series prediction. The relevant theoretical basis mainly relates to a neural network, a grey theory, a Bayesian network, a fuzzy set theory and the like. The prediction methods are based on the early data of a single topic for training, and the later data is used as test data for model training. As is well known, topic development generally has large fluctuation at the early stage and is stable at the later stage, and the expansibility and universality of a model cannot be ensured based on a more stable data test prediction model. And the prediction model obtained based on early data training can only accurately predict the development trend of the middle and later periods, and can not predict the heat and the change trend of the topic when the topic appears, so that the timeliness of the model is lost.

Disclosure of Invention

In view of the above problems, the present invention aims to provide a system and a method for predicting an early development trend of a microblog topic based on hot topic similarity, so as to predict a topic heat development trend of a network hot topic, especially precisely predict a development trend of a new topic.

According to an aspect of the present invention, there is provided a hot topic early development tendency prediction system, including: the system comprises an acquisition part, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence which is a time sequence formed by topic reading amounts corresponding to different acquisition times; the judging part is used for judging whether the topic time sequence enters the decline period or not, sending the topic time sequence entering the decline period to the knowledge storage part as a complete topic time sequence, and sending each topic time sequence not entering the decline period to the new topic prediction part as each new topic time sequence; the knowledge storage part comprises a clustering module, a prediction model building module and a prediction model library, wherein the clustering module classifies the complete topic time sequence by adopting a clustering method to obtain corresponding topic classes, the prediction model building module substitutes each topic time sequence in each topic class into a prediction model to train to obtain an intra-class prediction model corresponding to each topic class, and the prediction model library stores the intra-class prediction models; the new topic prediction part is used for monitoring the development trend of each new topic time sequence and comprises a similarity analysis module, a matching module and a trend prediction module, wherein the similarity analysis module calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method, an average value is taken as the matching degree between the new topic and the topic classes, and the matching module screens the topic classes with a set number according to the sequence of the matching degree between the new topic and the topic; the trend prediction module predicts the development trend of the new topic according to an intra-class prediction model of the screened topic classes, wherein the trend prediction module comprises a calling unit, calls the intra-class prediction models corresponding to the screened topic classes with the set number, and substitutes the time series of the new topic into the intra-class prediction models respectively for prediction to obtain the predicted value of the set number corresponding to the future time of the new topic; and the assignment unit is used for assigning the predicted values of the set number of different topic class intra-class prediction models to different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.

According to another aspect of the present invention, there is provided a method for predicting an early development trend of a hot topic, including: collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times; judging whether each topic time sequence enters a decline period or not; if the topic time sequence enters a decline period, classifying the topic time sequence as a complete topic time sequence by adopting a clustering method to obtain different topic classes, substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class; if the topic time series does not enter the decline period, monitoring the development trend of the topic as a new topic, wherein the monitoring comprises the following steps: analyzing the similarity between the time sequence of the new topic and each complete topic time sequence in the topic class by adopting a sequence similarity method, and taking an average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence of high-to-low matching degree with the new topic; calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topics; and giving different weight values to the predicted values of the set number of different topic class intra-class prediction models for combination to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.

According to the microblog topic early development trend prediction system and method based on hot topic similarity, time series clustering is performed on topics to obtain categories with different development trends, and a prediction model is repeatedly trained by using the similarity of topics in the categories, so that the model is more robust and more universal, and thus when a new topic comes, the existing topic categories can be matched by using less data, and further, the topic trend prediction can be performed by using the existing prediction model, and the prediction has higher precision.

Drawings

Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description and appended claims, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a schematic diagram of a block diagram of an early development trend prediction system for hot topics, according to the present invention;

FIG. 2 is a block diagram of a predictive model building block according to the present invention;

FIG. 3 is a schematic diagram of the BP neural network structure according to the present invention;

FIG. 4 is a flowchart illustrating an early development trend prediction method for hot topics according to the present invention;

FIG. 5 is a flow chart of a method for performing initial assignment of various parameters of a BP neural network using a genetic algorithm according to the present invention;

FIG. 6 is a schematic diagram of the matching of the new topic time series with different complete topic time series in the same topic class according to the present invention;

FIG. 7 is a schematic illustration of the present invention determining the similarity of a new topic time series to a completion time topic time series based on a sliding window;

FIG. 8 is a schematic illustration of the present invention determining similarity of a new topic time series to a completion time topic time series based on a variable window;

FIG. 9 is a schematic diagram of the trend of the time series curve of each complete topic in different topic classes according to the present invention;

FIG. 10 is a graphical representation of the accuracy of the sample matching to the topic class of the present invention;

FIG. 11 is a diagram illustrating MSE quantiles predicted by different combinations of weights assigned to predicted values of different classes of intra-prediction models according to the present invention;

FIG. 12 is a graphical representation of the MSE and APE distributions of the present invention as a function of sample length;

fig. 13 is a schematic diagram comparing the MSE and APE distributions of the prediction method for the early development trend of the hot topic according to the present invention and the existing prediction method.

The same reference numbers in all figures indicate similar or corresponding features or functions.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a block diagram of the system for predicting the early development trend of the hot topic, as shown in fig. 1, the system for predicting the early development trend of the hot topic includes:

the system comprises an acquisition part 1, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence which is a time sequence formed by topic reading amounts corresponding to different acquisition times;

a determination unit 2 that determines whether or not the topic time series enters a decline period (for example, selects a time series of a past period of time, fits a slope of a straight line by a least square method, and if the slope is within a certain range, the topic time series enters the decline period), transmits the topic time series entering the decline period to the knowledge storage unit 3 as a complete topic time series, and transmits each topic time series that does not enter the decline period to the new topic prediction unit 4 as each new topic time series;

the knowledge storage part 3 comprises a clustering module 31, a prediction model construction module 32 and a prediction model library 33, wherein the clustering module 31 classifies the complete topic time sequence by adopting a clustering method (such as K-means, hierarchical clustering, FCM and the like) to obtain a corresponding topic class; the prediction model construction module 32 substitutes each topic time sequence in each topic class into a prediction model (e.g., a Logistic model, an exponential smoothing model, an ARIMA model, a moving average model, etc.) for training to obtain an intra-class prediction model corresponding to each topic class; the prediction model library 33 stores the intra-class prediction models;

the new topic prediction part 4 is used for monitoring the development trend of each new topic time sequence, and comprises a similarity analysis module 41, a matching module 42 and a trend prediction module 43, wherein the similarity analysis module 41 calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method (such as Euclidean distance, dynamic bending distance, editing distance and the like), and takes an average value as the matching degree between the new topic and the topic classes; the matching module 42 screens out a set number of topic classes according to the sequence of the matching degree with the new topic from high to low; the trend prediction module 43 predicts the development trend of the new topic according to the intra-class prediction model for screening out the topic classes with the set number, preferably, the set number is in the range of 0.25 to 0.75 times of the total number of the topic classes classified by the clustering module 31.

The development trends of topics in each topic class in the topic clustering result of the hot topic early development trend prediction system are highly similar, in order to fully utilize the similarity of topics in the class to predict the development trend of a new topic, the invention establishes an intra-class prediction model for each topic class of the topic clustering result, and repeatedly trains the intra-class prediction models of various classes by utilizing similar topic development trend data, so that the obtained intra-class prediction model is more robust, better in robustness and stronger in pertinence prediction capability, and can realize the early prediction of the new topic.

In one embodiment of the present invention, the determining unit 2 includes:

a fitting unit 21 for performing normalization processing on the topic time series in a set time period (last 24 hours) and fitting the normalized topic time series by using a least square method to obtain a slope of a fitting curve of the topic;

and the judging unit 22 is used for judging whether the slope is in a range of-0.02-0, and if the slope of the fitted curve of the topic is in the range, the topic enters a decline period.

In one embodiment of the present invention, the knowledge storage 3 further includes:

the BP neural network structure determining module 34 is used for determining the number of nodes of an input layer, a hidden layer and an output layer in the BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer;

a sample preprocessing module 35 for preprocessing each topicEach complete topic time sequence in the class is divided into a plurality of subsequences according to the number of nodes of an input layer and the number of nodes of an output layer, and one complete topic time sequence is set as Var ═ Var₁,Var₂,…,Var_t]Then the converted multiple subsequences are listed as,

each row is a subsequence and is used as input and output data of a training set or a test set in the future. The first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic type is combined together according to rows after the preprocessing, so that a sample of each topic type is formed.

As shown in fig. 2, the prediction model building module 32 includes:

the BP neural network construction unit 321 calls the neural network structure determined by the BP neural network structure, and constructs a model, i.e., a prediction model, output by a hidden layer and an output layer of the neural network according to the following formulas (1) and (2), wherein:

wherein, w_ijIs the connection weight, w, of the ith node of the input layer and the jth node of the hidden layer_jkIs the connection weight of the jth node of the hidden layer and the kth node of the output layer, a_jThreshold for the jth node of the hidden layer, b_kIs the threshold of the kth node of the output layer, n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, k is 1,2 … m, m is the number of nodes of the output layer, x_iIs a variable of the ith node of the input layer, h_jIs the output value of the jth node of the hidden layer, O_kIs the output value of the kth node of the output layer, and f is the excitation function

An initialization unit 322, which performs initial assignment on parameters of the BP neural network, where the parameters include a connection weight of a hidden layer and an output layer, a connection weight of an input layer and a hidden layer, a hidden layer threshold, and an output layer threshold;

a training set and test set segmentation unit 323, which samples the samples of each topic processed by the sample preprocessing module according to a set proportion and picks out a training set, and the rest are test sets, preferably, the proportion of the training set is 3/4-4/5;

a sample training unit 324 for training the model output by the hidden layer and the output layer of the neural network according to the training set samples, the sample training unit executing the following steps:

step 1, inputting a first sample of a training set;

step 2, substituting the input data of the sample into formulas (1) and (2), and calculating the output of each node of the hidden layer and the output of each node of the output layer;

step 3, calculating the error of each node of the output layer,

e_k＝y_k-o_k

wherein, y_kIs the actual value of the kth node of the sample, o_kIs the predicted value of the kth node of the sample;

and 4, updating the parameters of the BP neural network according to the following formulas (3) to (6) in sequence, wherein:

ω_ij＇＝ω_ij+αh_j(1-h_j)x_i(3)

ω_jk＇＝ω_jk+αh_je_j(4)

a_j＇＝a_j+αh_j(1-h_j)(5)

b_k＇＝b_k+e_k(6)

wherein, ω is_ij、ω_jk、a_jAnd b_kFor the BP neural network parameters before updating, omega_ij＇、ω_jk＇、a_j' and b_k' isUpdating the BP neural network parameters;

step 5, starting to train the next sample, and circulating the steps 2-5 until the training of all the training set samples is finished;

step 6, calculating a test error, substituting the input data of the test set into the BP neural network trained in the step to obtain a test error MSE of the trained BP neural network,

wherein N is the number of the test set samples,

to test the predicted value of the set sample Z at the kth node of the output layer,

the actual value of the kth node corresponding to the test set sample Z;

the BP neural network optimization termination condition determining unit 325 determines whether the BP neural network training satisfies a termination condition, and if the termination condition is satisfied, outputs an intra-class prediction model of the question class, where the intra-class prediction model includes structure, weight and threshold information of the BP neural network satisfying the termination condition, and if the termination condition is not satisfied, returns parameters of the BP neural network after the training update to the sample training unit to continue training the model, where the termination condition includes one or both of a first termination condition and/or a second termination condition, the first termination condition is that the current iteration number is greater than a set maximum iteration number (generally 1000-plus 5000), and the second termination condition is that the test error change of the BP neural network is smaller than a set target value during multiple iterations.

On one hand, the BP neural network algorithm is a gradient descent method essentially, so that the convergence speed of the algorithm is low, and on the other hand, the BP neural network is an optimization method of local search, and the weight of the network is gradually adjusted along the local improvement direction, so that the weight is converged to a local minimum value. In addition, BP neural networks are very sensitive to initial network weights, initializing the network with different weights, which tend to converge to different local minima. The invention introduces genetic algorithm into initial weight for determining BP neural network, improves convergence speed, avoids falling into local minimum, the genetic algorithm is from natural selection of Darwin and thought of genetics, through selecting, crossing and mutating individuals, the individuals have capability of global search solution, the genetic algorithm is introduced into initial weight and threshold value for determining BP neural network, can improve algorithm convergence speed, avoids falling into local minimum, specifically, the initialization unit 322 comprises:

the initial population setting subunit 3221 randomly generates an initial population of P individuals, where the population scale is P, and G ═ G (G)₁,G₂,…,G_p)^TSelecting a certain symmetry interval [ -W, W](e.g., W)<Random real numbers in 3) constitute a real number vector of length S, given to the individuals G in the population_i＝(g₁,g₂,…,g_s),i＝1,2,…,P,S＝n*l+l*m+l+m，g_sIs an individual G_iThe S gene of (1);

the individual evaluation subunit 3222 determines an evaluation function of the individual of the initial population, uses each gene of each individual as a connection weight of the hidden layer and the output layer of the initialization unit, a connection weight of the initial input layer and the hidden layer, an initial hidden layer threshold and an initial assignment of the initial output layer threshold, and obtains fitness of each individual through a prediction model construction module, so as to obtain fitness of each individual by using the individual G in the initial population G_iFitness value of

For illustration, specifically:

wherein the content of the first and second substances,

is an individual G_iFor BP nerveMean square error when the parameters of the network are initially assigned;

the individual selecting subunit 3223 selects, by using a roulette operator, individuals in the initial population based on a fitness proportion selecting strategy, which specifically includes:

the selection probability calculating unit 32231 calculates the probability that each individual is selected

Comprises the following steps:

wherein

The cumulative probability calculating unit 32232 calculates the individual cumulative probability q_jComprises the following steps:

pseudo-random number generation unit 32233 at [0, 1 ]]Generating P uniformly distributed pseudo random numbers (r) in the interval₁,r₂,…,r_i,…,r_p) P is the size of the population;

the individual selecting unit 32234 sequentially selects, P times, individuals in the initial population according to the P random numbers to obtain a selected individual, specifically: if r₁＜q₁Then, 1 st time, the selected individual is G₁Otherwise, the individual is selected as G for the 1 st time_kSo that q is_k-1＜r₁≤q_kIt is true, assume that the 1 st selected individual is G_kThen, r is determined at the 2 nd selection₂Whether or not less than q_kRepeating the selection similar to the first time, and obtaining the selected individual G after p times of selection_u；

The cross subunit 3224 performs cross update on the individuals selected by the individual selecting subunit, performs cross update on the updated individual, uses the maximum value of each updated gene as the last point of the gene, and performs cross update on each updated geneThe minimum value of each gene is used as the lower bound of the gene, and selected individuals G of the initial population G are used as_uFor illustration, the cross operation is performed at r _ pick position by using each of the other individuals of the initial population G, and the v-th individual G of the initial population G_vAnd selecting the individual G_uSelecting individuals from the v-th cross

Wherein r _ pick is a random integer in the interval [0, S ], S is the chromosome length; selecting genes of individuals, wherein the maximum value corresponding to the crossed selected individuals is the upper limit of the genes, and the minimum value is the lower limit of the genes;

the mutation subunit 3225 performs mutation operation on the genes in the individuals after the crossover subunit to obtain mutated individuals, substitutes the mutated individuals into the individual evaluation subunit, and evolves the initial population, for example, by selecting the individual G_uJ gene g of (2)_jPerforming mutation operation, and selecting individual G according to each gene pair after mutation_uUpdating to evolve the initial population, wherein the g after mutation_j' is:

wherein, g_jTo select an individual G_uThe jth gene of (1), g_jmaxAnd g_jminIs gene g_jUpper and lower bounds of r_pTo select an individual G_uPseudo random number generated P times by the pseudo random number generating unit, iter_nowIs the current evolutionary algebra, iter_maxIs the set maximum evolution algebra;

the genetic algorithm optimization termination condition determining subunit 3226 determines whether the genetic algorithm meets an algorithm termination condition, and if the genetic algorithm meets the algorithm termination condition, outputs an optimal population individual as a connection weight of the hidden layer and the output layer of the initialization unit, a connection weight of the input layer and the hidden layer, a threshold of the hidden layer, and a final initial value of the threshold of the output layer, and if the algorithm termination condition is not met, returns to the individual evaluation subunit, where the algorithm termination condition includes a first algorithm termination condition or/and a second algorithm termination condition, the first algorithm termination condition is that a current evolution algebra is greater than a set maximum evolution algebra (generally 20-500), and the second algorithm termination condition is that a change in individual fitness value is less than a set target value when evolution is performed for a plurality of times continuously.

The hot topic early development trend prediction system introduces Genetic Algorithm (GA) to optimize initial parameters of a prediction model of the neural network, optimizes the determination of the initial parameters of the BP neural network, and obtains the neural network prediction model (GABP) optimized by the Genetic Algorithm, so that the prediction accuracy of the traditional BP neural network used for topic development trend prediction is greatly improved.

In an embodiment of the present invention, the similarity analysis module 41 of the system for predicting the early development tendency of the hot topic includes:

a window setting unit 411 for setting the size of a window according to the length of the new topic time series;

the segmentation sequence distance calculation unit 413 is used for calculating the distance between the time sequence of the new topic and the subsequence of each completion time sequence in the topic class, which corresponds to the window size, in a set window, wherein the distance is the Euclidean distance or the dynamic bending distance;

the similarity determining unit 414 determines a minimum value of distances between the plurality of subsequences in each completion time sequence and the new topic time sequence as a similarity between the new topic and the completion time sequence.

Preferably, the similarity analysis module 41 further includes a window scaling unit 412, which changes the size of the window set by the window setting unit within the setting range.

In an embodiment of the present invention, the trend prediction module 43 of the early development trend prediction system of the hot topic according to the present invention, as shown in fig. 1, includes:

the calling unit 431 is used for calling the intra-class prediction models corresponding to the screened topic classes with the set number, and substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic;

and the assigning unit 433 is configured to assign different weight values to the predicted values of the set number of different intra-class prediction models of different topic classes to combine the predicted values to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the intra-class prediction model of the topic class with the high matching degree is not less than the weight value of the predicted value of the intra-class prediction model of the topic class with the low matching degree.

Preferably, the trend prediction module 43 further comprises:

the eliminating unit 432 eliminates the topic class with the largest predicted value called by the calling unit, and sends the other topic classes to the assigning unit 433.

Fig. 4 is a flowchart of the method for predicting the early development trend of the hot topic, as shown in fig. 4, the method for predicting the early development trend of the hot topic includes:

step S1, collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times;

step S2, judging whether each topic time sequence enters a decline period;

if the topic time sequence enters the decline period, in step S3, classifying the topic time sequence as a complete topic time sequence by using a clustering method to obtain different topic classes, and substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class;

if the topic time series does not enter the decline period, its trend is monitored as a new topic at step 4.

In an embodiment of the present invention, in step 1, when the occurrence of a topic is monitored, the heat (topic reading amount) x of the topic is collected every set period T (for example, T takes 4 hours)_njFinally, a topic time sequence is obtained

Wherein n represents the number of days from the topic collection date to the topic occurrence date, j represents the jth collection point in a certain day of the topic, b_stIndicates the sample length, l, of day 1 after the topic occurred_stThe length of the sample of the day when the topic enters the decline period is shown, n is more than or equal to 1, j is more than or equal to 1,

preferably, the sample data of the first day is supplemented before the topic time series occurs

0, indicating which time of day the topic was in just happening. Since the development rules of topics are not completely the same when the topics occur in different time periods of a day such as the morning, the afternoon and the evening, the information is effectively reserved through the operation of supplementing 0.

In an embodiment of the present invention, the step S2 includes:

normalizing the topic time sequence in a set time period (last 24 hours), and fitting by adopting a least square method to obtain the slope of a fitting curve of the topic;

judging whether the slope is in the range of-0.02-0, and if the slope of the fitted curve of the topic is in the range, enabling the topic to enter a decline period.

In step S3, the complete topic time sequence is classified by using a clustering method, and the method for obtaining different topic classes may be based on Euclidean distance (Euc) or Dynamic time warping Distance (DTW) and perform clustering by using K-means, FCM (fuzzy C mean), hierarchical clustering, and various improved algorithms based on basic algorithms, such as K _ SC (K-spectral central), WKSC (Wavelet-based K _ SC), and the like.

However, the distance matching is performed by the traditional euclidean distance across days, which easily increases the distance or causes information confusion, preferably, a segmented euclidean distance is adopted, the segmented euclidean distance segments the time sequence according to a 'natural day', then the segmented euclidean distance is calculated, and finally the segmented distance is integrated (the first day of the two sequences is aligned by adopting 0 filling, and the last day and the following days of the shorter sequence are aligned by adopting 0 filling and the longer sequence), so that the distance increase caused by the distance matching performed across days is prevented.

The traditional DTW carries out bending on a time axis in a day-crossing manner, information in different days after topics are disordered, and preferably, a segmentation dynamic time bending distance (S-DTW) is obtained. On one hand, the disorder caused by information alignment in different days is effectively avoided by segmenting according to 'natural days', on the other hand, the development rules of topics at different moments in the same day are similar, so that data in the same day can be properly subjected to telescopic transformation on a time axis to enable the sequence to be matched to the minimum distance.

In addition, preferably, the complete topic time series is clustered by adopting a hierarchical clustering algorithm, which includes:

classifying each complete topic time sequence into one class, and measuring the distance between classes by adopting the maximum S-DTW between classes;

respectively finding two closest classes of the maximum S-DTW between the classes, and combining the two closest classes into one class, wherein the total number of the classes is reduced by one;

calculating the contour coefficient of the sub-cluster, contour coefficient s_iThe integrated contour coefficients for all samples for each clustering, for example:

m_iand n_iRespectively elements in different complete topic time sequences;

repeating the steps to obtain a curve of the profile coefficient changing along with the number of clusters, observing whether the curve has an extreme point, taking the number of clusters corresponding to the maximum value or the maximum value of the profile coefficient as the optimal number of clusters, and taking the clustering result corresponding to the maximum value or the maximum value of the profile coefficient as the classification result of each topic.

In step S3, as shown in fig. 5, the method for obtaining the intra-class prediction model corresponding to each topic class by substituting each topic time series in each topic class into the prediction model and training includes:

step S31, determining the node numbers of an input layer, a hidden layer and an output layer in the BP neural network structure, wherein n is the node number of the input layer, l is the node number of the hidden layer, and m is the node number of the output layer;

step 32, constructing a model, namely a prediction model, of the hidden layer and the output layer output of the neural network according to the following formulas (1) and (2), wherein:

Step S33, dividing each complete topic time sequence in each topic into a plurality of subsequences according to the number of nodes of the input layer and the number of nodes of the output layer, and setting a complete topic time sequenceColumn is Var ═ Var₁,Var₂,…,Var_t]Then the converted multiple subsequences are listed as,

each row is a subsequence and is used as input and output data of a training set or a test set in the future. The first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the rows after the preprocessing, so that a sample of each topic class is formed;

step S34, sampling the samples of each topic in line according to a set proportion to pick out a training set, and taking the rest as a test set;

step S35, carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a hidden layer threshold and an output layer threshold;

step S36, substituting the training set into the model output by the hidden layer and the output layer of the BP neural network for training, comprising: step 1, inputting a first sample of a training set;

step 3, calculating the error of each node of the output layer,

e_k＝y_k-o_k

ω_ij＇＝ω_ij+αh_j(1-h_j)x_i(3)

ω_jk＇＝ω_jk+αh_je_k(4)

a_j＇＝a_j+αh_j(1-h_j)(5)

b_k＇＝b_k+e_k(6)

wherein, ω is_ij、ω_jk、a_jAnd b_kFor the BP neural network parameters before updating, omega_ij＇、ω_jk＇、a_j' and b_k' is the updated BP neural network parameter;

wherein N is the number of the test set samples,

the actual value of the kth node corresponding to the test set sample Z;

step S37, judging whether the BP neural network training meets an end condition, wherein the end condition comprises one or two of a first end condition and/or a second end condition, the first end condition is that the current iteration time is more than the set maximum iteration time, and the second end condition is that the test error change of the BP neural network is less than the set target value when the BP neural network training is continuously iterated for multiple times;

if the end condition is satisfied, in step S38, outputting an intra-class prediction model of the topic class, where the intra-class prediction model includes the structure, weight, and threshold information of the BP neural network that satisfies the end condition;

and if the end condition is not met, returning to the step S35, and returning the updated parameters of the BP neural network after training to the step of carrying out initial assignment on the parameters of the BP neural network for circular training until the end condition is met.

Preferably, in step S35, as shown in fig. 5, the method includes:

step S351, assuming that the population size is P, randomly generates an initial population of P individuals, where G ═ G (G)₁,G₂,…,G_p)^TSelecting a certain symmetry interval [ -W, W]The random real numbers in the population form a real number vector with the length S_i＝(g₁,g₂,…,g_s),＝1,2,…,P,S＝n*l+l*m+l+m，g_sIs an individual G_iThe S gene of (1);

step S352, using each gene of each individual as a connection weight of a hidden layer and an output layer of the BP neural network, a connection weight of an initial input layer and a hidden layer, an initial hidden layer threshold, and an initial assignment of an initial output layer threshold, respectively, substituting a sample belonging to each topic class into a prediction model for training, and obtaining an output of each node of the output layer corresponding to each sample, thereby obtaining a fitness of each individual, wherein:

wherein MSE is the individual G_iThe network global error when the parameters of the BP neural network are initially assigned is

Is an individual G in an initial population G_iThe fitness of (1), Z is the sample index, and N is the total number of samples;

step S323, selecting individuals in the initial population by adopting a roulette operator and a selection strategy based on fitness proportion to obtain selected individuals G_u；

Step S324, adopting a single-point crossover operator to perform crossover updating on the selected individuals, performing crossover updating, taking the maximum value of each updated gene as the upper bound of the gene, and taking the minimum value of each updated gene as the lower bound of the gene;

step S325, performing mutation operation on the selected individuals subjected to cross update to obtain mutated individuals, substituting the mutated individuals into the individual evaluation subunit, and evolving the initial population, wherein:

step S326, judging whether the genetic algorithm meets an algorithm ending condition, wherein the algorithm ending condition comprises a first algorithm ending condition or/and a second algorithm ending condition, the first algorithm ending condition is that the current evolution algebra is larger than a set maximum evolution algebra, and the second algorithm ending condition is that the variation of the individual fitness value is smaller than a set target value when the evolution is carried out for multiple times continuously;

if the algorithm ending condition is met, returning to the step S321, performing initial assignment on the parameters of the BP neural network of the evolved initial seed group, and repeating the steps until the algorithm ending condition is met;

if the algorithm end condition is satisfied, in step S327, the optimal population is output as the final initial values of the connection weight of the hidden layer and the output layer, the connection weight of the input layer and the hidden layer, the threshold of the hidden layer, and the threshold of the output layer.

In an embodiment of the present invention, the step S4 includes:

step S41, analyzing the time sequence of the new topic and the time sequence of each complete topic in the topic class by adopting a sequence similarity methodThe similarity of the new topic is taken as the matching degree of the new topic and the topic class, and the ith class C is set_iWherein, there are lp samples, the similarity between the new topic time sequence and each completed topic similarity sequence is similarity_kAnd k is 1,2, …, lp, the average similarity between the new topic time series and each completed topic similarity series (the new topic time series and the topic class C)_iIs a degree of matching) of

Step S42, screening a set number c of topic categories in descending order of matching degree with the new topic, wherein the set number c is preferably 0.25-0.75 times of the total number of topic categories determined in step S3;

step S43, calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic;

and step S44, the predicted values of the set number of different topic class intra-class prediction models are endowed with different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.

According to the method for predicting the early development trend of the hot topic, the category c before the similarity replaces the category with the highest similarity to serve as the category matched with the new topic, as shown in FIG. 6, the topic A and the topic B belong to different categories, the matching degrees of the topic X and the topics A and B are high, but the later development trend of the topic X is unknown. The matching method of the method reserves the category of c before the matching degree, namely reserves various subsequent possible development trends of the topic X as far as possible.

In addition, the prediction value of the prediction model of the category with higher matching degree should be closer to trueThe prediction method of the invention adopts a method of weighting and combining the predicted values of the prediction models of the categories c before the matching degree to predict the development trend of the new topic, gives higher weight to the predicted value of the prediction model of the category with the highest matching degree, obtains c categories with the highest matching degree after the new topic is matched with the existing category, predicts the data of the points under the sample by using the GABP prediction models of the c categories to obtain c predicted values, and sets the predicted values of the prediction models of the

Claims

1. A hot topic early development trend prediction system is characterized by comprising:

the system comprises an acquisition part, a search part and a search part, wherein the acquisition part is used for acquiring topics from a network and a microblog and constructing a topic time sequence, and the topic time sequence is a time sequence formed by topic reading amounts corresponding to different acquisition moments;

the judging part is used for judging whether the topic time sequence enters the decline period or not, sending the topic time sequence entering the decline period to the knowledge storage part as a complete topic time sequence, and sending each topic time sequence not entering the decline period to the new topic predicting part as each new topic time sequence, wherein the judging part selects the topic time sequence of a past period of time, and fits a linear slope by a least square method, and if the slope is within a certain range, the topic time sequence enters the decline period;

the knowledge storage part comprises a clustering module, a prediction model building module and a prediction model library, wherein the clustering module classifies the complete topic time sequence by adopting a clustering method to obtain corresponding topic classes, the prediction model building module substitutes each topic time sequence in each topic class into a prediction model to train to obtain an intra-class prediction model corresponding to each topic class, and the prediction model library stores the intra-class prediction models of each topic class;

the new topic prediction part is used for monitoring the development trend of each new topic time sequence and comprises a similarity analysis module, a matching module and a trend prediction module, wherein the similarity analysis module calculates the similarity between the new topic time sequence and each complete topic time sequence in topic classes by adopting a sequence similarity method, an average value is taken as the matching degree between the new topic and the topic classes, and the matching module screens the topic classes with a set number according to the sequence of the matching degree between the new topic and the topic; the trend prediction module predicts the development trend of the new topic according to an intra-class prediction model of the screened topic class,

the trend prediction module comprises a calling unit, a prediction unit and a trend prediction unit, wherein the calling unit is used for calling the intra-class prediction models corresponding to the screened topic classes with the set number, and the new topic time sequences are respectively substituted into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topic; and the assignment unit is used for assigning the predicted values of the set number of different topic class intra-class prediction models to different weight values to be combined to obtain the predicted value of the new topic at the future time, the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.

2. The system for predicting the early-stage development tendency of the hot topic as claimed in claim 1, wherein the similarity analysis module comprises:

a window setting unit for setting the size of the window according to the length of the new topic time sequence;

the segmentation sequence distance calculation unit is used for calculating the distance between the time sequence of the new topic and the subsequence of each complete topic time sequence in the topic class corresponding to the window size in a set window, wherein the distance is the Euclidean distance or the dynamic bending distance;

and the similarity determining unit is used for taking the minimum value of the distances between the subsequences with the sizes of the plurality of corresponding windows in each complete topic time sequence and the new topic time sequence as the similarity of the new topic and the complete topic time sequence.

3. The system for predicting the early development tendency of the hot topic as claimed in claim 2, wherein the similarity analysis module further comprises a window expansion unit for changing the size of the window set by the window setting unit within a set range.

4. The system for predicting the early-stage development tendency of the hot topic as claimed in claim 1, wherein the tendency prediction module further comprises:

and the eliminating unit is used for eliminating the topic class with the largest predicted value called by the calling unit and sending the rest topic classes to the assignment unit.

5. The system for predicting the early development tendency of the hot topics as claimed in claim 1, further comprising a BP neural network structure determination module for determining the number of nodes of the input layer, the hidden layer and the output layer in the BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer.

6. The system of claim 5, further comprising a sample preprocessing module, wherein each complete topic time sequence in each topic class is divided into a plurality of subsequences according to the number of nodes in the input layer and the number of nodes in the output layer, and one complete topic time sequence is set to Var ═ Var [ Var ═ r₁，Var₂，...，Var_t]Then the converted multiple subsequences are listed as,

each row is a subsequence and is used as input and output data of a training set or a test set in the future, the first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the rows after the preprocessing, so that a sample of each topic class is formed.

7. The system for predicting the early development tendency of the hot topic as claimed in claim 6, wherein the prediction model building module comprises:

the BP neural network construction unit calls the neural network structure determined by the BP neural network structure, and constructs a model of the output of the hidden layer and the output layer of the neural network according to the following formulas (1) and (2), wherein:

The initialization unit is used for carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a threshold of the hidden layer and a threshold of the output layer;

a training set and test set segmentation unit, which samples the samples of each topic processed by the sample preprocessing module according to a set proportion and picks out the training set, and the rest are test sets;

the sample training unit trains the models output by the hidden layer and the output layer of the neural network according to the training set samples, and executes the following steps:

step 1, inputting a first sample of a training set;

step 3, calculating the error of each node of the output layer,

e_k＝y_k-o_k

ω_ij′＝ω_ij+αh_j(1-h_j)x_i(3)

ω_jk′＝ω_jk+αh_je_k(4)

a_j′＝a_j+αh_j(1-h_j) (5)

b_k′＝b_k+e_k(6)

wherein, ω is_ij、ω_jk、a_jAnd b_kFor the BP neural network parameters before updating, omega_ij′、ω_jk′、a_j' and b_k' is the updated BP neural network parameter;

wherein N is the number of the test set samples,

the actual value of the kth node corresponding to the test set sample Z;

and the BP neural network optimization termination condition judgment unit is used for judging whether the BP neural network training meets the termination condition or not, if so, outputting an intra-class prediction model of the question class, wherein the intra-class prediction model comprises the structure, weight and threshold information of the BP neural network meeting the termination condition, and if not, returning the BP neural network parameters after the training and updating to the sample training unit to continue training the model, wherein the termination condition comprises one or two of a first termination condition and/or a second termination condition, the first termination condition is that the current iteration frequency is greater than the set maximum iteration frequency, and the second termination condition is that the test error change of the BP neural network is smaller than the set target value when the BP neural network is continuously iterated for multiple times.

8. The system for predicting the early development tendency of hot topics as claimed in claim 7, wherein,

the initialization unit comprises an initial population setting subunit, an individual evaluation subunit, an individual selection subunit, a cross subunit, a variation subunit and a genetic algorithm optimization termination condition judgment subunit, wherein,

the initial population setting subunit sets the population scale to be P, randomly generates an initial population of P individuals, and generates G ═ G₁，G₂，...，G_p)^TSelecting a certain symmetry interval [ -W, W]The random real numbers in the population form a real number vector with the length S_i＝(g₁，g₂，...，g_s)，i＝1，2，...，P，S＝n*l+l*m+l+m，g_sIs an individual G_iThe S gene of (1);

the individual evaluation subunit determines an evaluation function of the individual of the initial population, takes each gene of each individual as the initial assignment of the connection weight of the hidden layer and the output layer of the initialization unit, the connection weight of the initial input layer and the hidden layer, the initial hidden layer threshold and the initial output layer threshold, and obtains the fitness of each individual through a prediction model building module, wherein:

wherein the content of the first and second substances,

is an individual G_iThe mean square error of the test when the initial assignment is made to the BP neural network of the initialization unit,

is an individual G in an initial population G_iA fitness value of;

the individual selection subunit selects the individuals in the initial population by adopting a roulette operator based on a selection strategy of fitness proportion to obtain a selected individual G_u；

The cross subunit adopts a single-point cross operator to perform cross updating on the individuals selected by the individual selection subunit, the maximum value of each updated gene is used as the upper bound of the gene, and the minimum value of each updated gene is used as the lower bound of the gene;

the mutation subunit performs mutation operation on the genes in the individuals after passing through the cross subunit to obtain the mutated individuals, substitutes the mutated individuals into the individual evaluation subunit, and evolves the initial population, wherein:

the genetic algorithm optimization termination condition judgment subunit judges whether the genetic algorithm meets an algorithm termination condition, if the genetic algorithm meets the algorithm termination condition, the optimal population individuals are output and used as the connection weight of a hidden layer and an output layer of the initialization unit, the connection weight of an input layer and the hidden layer, a hidden layer threshold and a final initial value of the output layer threshold, and if the genetic algorithm termination condition does not meet the algorithm termination condition, the optimal population individuals are returned to the individual evaluation subunit.

9. A method for predicting the early development trend of a hot topic is characterized by comprising the following steps:

collecting topics from a network and a microblog, and constructing a topic time sequence corresponding to each topic, wherein the topic time sequence is a time sequence formed by topic reading amounts corresponding to different collection times;

judging whether each topic time sequence enters a decline period or not, wherein the judgment comprises the following steps: selecting a topic time sequence of a past period of time, fitting a linear slope by using a least square method, and if the slope is within a certain range, entering a decline period by the topic time sequence;

if the topic time sequence enters a decline period, classifying the topic time sequence as a complete topic time sequence by adopting a clustering method to obtain different topic classes, substituting each topic time sequence in each topic class into a prediction model for training to obtain an intra-class prediction model corresponding to each topic class;

if the topic time series does not enter the decline period, monitoring the development trend of the topic as a new topic, wherein the monitoring comprises the following steps: analyzing the similarity between the time sequence of the new topic and each complete topic time sequence in the topic class by adopting a sequence similarity method, and taking an average value as the matching degree of the new topic and the topic class; screening out a set number of topic classes according to the sequence of high-to-low matching degree with the new topic; calling the intra-class prediction models corresponding to the screened topic classes with the set number, and respectively substituting the new topic time sequences into the intra-class prediction models for prediction to obtain the predicted values of the set number corresponding to the future time of the new topics; and giving different weight values to the predicted values of the set number of different topic class intra-class prediction models for combination to obtain a predicted value of a new topic at a future time, wherein the sum of the weight values is 1, and the weight value of the predicted value of the topic class intra-class prediction model with high matching degree is not less than that of the predicted value of the topic class intra-class prediction model with low matching degree.

10. The method for predicting the early development trend of the hot topic as claimed in claim 9, wherein the method for analyzing the similarity between the time series of the new topic and each complete topic time series in the topic class by using the sequence similarity method comprises:

setting the size of a window according to the length of the new topic time sequence;

calculating the distance between the time sequence of the new topic and the subsequence of each complete topic time sequence in the topic class corresponding to the window size in a set window, wherein the distance is an Euclidean distance or a dynamic bending distance;

and taking the minimum value of the distances between the subsequences in each complete topic time sequence and the new topic time sequence as the similarity of the new topic and the complete topic time sequence.

11. The method for predicting the early development tendency of the hot topic as claimed in claim 10, further comprising: and the size of the window is expanded and contracted within the length setting range of the new topic time sequence, and the minimum value of the distances between the new topic time sequence in the expansion and contraction range of the window and a plurality of subsequences of the complete topic time sequence in the topic class is used as the similarity of the new topic and the complete topic time sequence.

12. The method for predicting the early development tendency of the hot topic as claimed in claim 9, further comprising:

and eliminating the topic class with the maximum called predicted value, and giving different weighted values to the predicted values of the other called topic classes for combination to obtain the predicted value of the new topic at the future time.

13. The method for predicting the early development trend of the hot topics as claimed in claim 9, wherein the step of substituting each topic time sequence in each topic class into the prediction model for training to obtain the intra-class prediction model corresponding to each topic class comprises:

determining the number of nodes of an input layer, a hidden layer and an output layer in a BP neural network structure, wherein n is the number of nodes of the input layer, l is the number of nodes of the hidden layer, and m is the number of nodes of the output layer;

constructing a model of the hidden layer and output layer outputs of the neural network according to equations (1) and (2) below, wherein:

Carrying out initial assignment on parameters of the BP neural network, wherein the parameters comprise a connection weight of a hidden layer and an output layer, a connection weight of an input layer and the hidden layer, a threshold of the hidden layer and a threshold of the output layer;

dividing each complete topic time sequence in each topic class into a plurality of subsequences according to the number of nodes of an input layer and the number of nodes of an output layer, and setting a complete topic time sequence as Var ═ Var₁，Var₂，...，Var_t]Then the converted multiple subsequences are listed as,

each row is a subsequence and is used as input and output data of a training set or a test set in the future, the first n columns in each row are input data, the last m columns are output data, and each complete topic time sequence in each topic is combined together according to the row after being preprocessed to form a sample of each topic;

sampling samples of each topic class according to a set proportion and picking out a training set, and taking the rest as a test set;

substituting the training set into a model output by a hidden layer and an output layer of the BP neural network for training, and the training comprises the following steps:

step 1, inputting a first sample of a training set;

step 3, calculating the error of each node of the output layer,

e_k＝y_k-o_k

ω_ij′＝ω_ij+αh_j(1-h_j)x_i(3)

ω_jk′＝ω_jk+αh_je_k(4)

a_j′＝a_j+αh_j(1-h_j) (5)

b_k′＝b_k+e_k(6)

wherein N is the number of the test set samples,

the actual value of the kth node corresponding to the test set sample Z;

judging whether the BP neural network training meets an end condition or not, wherein the end condition comprises one or two of a first end condition and/or a second end condition, the first end condition is that the current iteration frequency is greater than a set maximum iteration frequency, and the second end condition is that the test error change of the BP neural network is less than a set target value when the BP neural network training is iterated for a plurality of times continuously;

if the ending condition is met, outputting an intra-class prediction model of the topic class, wherein the intra-class prediction model comprises the structure, weight and threshold information of the BP neural network meeting the ending condition;

and if the end condition is not met, returning the BP neural network parameters after the training update to the step of carrying out initial assignment on the parameters of the BP neural network for carrying out the circular training until the end condition is met.

14. The method for predicting the early development trend of the hot topic as claimed in claim 13, wherein the method for initially assigning the parameters of the BP neural network comprises:

assuming the population size is P, randomly generating an initial population of P individuals, and G ═ G₁，G₂，...，G_p)^TSelecting a symmetric interval [ -W, W]The random real numbers in the group constitute a real number vector of length S, and the individual G _ i ═ (G _1, G _ 2., G _ S), i ═ 1, 2., P, S ═ n × l × m + l + m, G ═ m_sIs an individual G_iThe S gene of (1);

taking each gene of each individual as a connection weight of a hidden layer and an output layer of a BP (back propagation) neural network, a connection weight of an initial input layer and the hidden layer, an initial hidden layer threshold and an initial assignment of an initial output layer threshold, respectively substituting samples belonging to each topic class into a model output by the hidden layer and the output layer of the neural network, and training to obtain the output of each node of the output layer corresponding to each sample, thereby obtaining the fitness of each individual, wherein:

selecting individuals in the initial population by adopting a roulette operator and a selection strategy based on fitness proportion to obtain selected individuals G_u；

Adopting a single-point crossover operator to carry out crossover updating on the selected individuals, taking the maximum value of each updated gene as the upper bound of the gene, and taking the minimum value of each updated gene as the lower bound of the gene;

carrying out variation operation on the selected individuals after cross updating to obtain the varied individuals, substituting the varied individuals into an individual evaluation subunit, and evolving the initial population, wherein:

judging whether the genetic algorithm meets an algorithm ending condition or not, wherein the algorithm ending condition comprises a first algorithm ending condition or/and a second algorithm ending condition, the first algorithm ending condition is that the current evolution algebra is larger than a set maximum evolution algebra, and the second algorithm ending condition is that the variation of the individual fitness value is smaller than a set target value when the evolution is continuously carried out for multiple times;

if the algorithm end condition is met, outputting the optimal population individuals as the connection weight of the hidden layer and the output layer, the connection weight of the input layer and the hidden layer, the threshold of the hidden layer and the final initial value of the threshold of the output layer;

and if the algorithm ending condition is not met, initially assigning values to the parameters of the BP neural network of the evolved initial seed group, and repeating the steps until the algorithm ending condition is met.