CN109829504B - Prediction method and system for analyzing user forwarding behavior based on ICS-SVM - Google Patents

Prediction method and system for analyzing user forwarding behavior based on ICS-SVM Download PDF

Info

Publication number
CN109829504B
CN109829504B CN201910114885.0A CN201910114885A CN109829504B CN 109829504 B CN109829504 B CN 109829504B CN 201910114885 A CN201910114885 A CN 201910114885A CN 109829504 B CN109829504 B CN 109829504B
Authority
CN
China
Prior art keywords
user
bird nest
svm
algorithm
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910114885.0A
Other languages
Chinese (zh)
Other versions
CN109829504A (en
Inventor
梁霞
肖云鹏
杜江
刘宴兵
谢小秋
朱耀堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201910114885.0A priority Critical patent/CN109829504B/en
Publication of CN109829504A publication Critical patent/CN109829504A/en
Application granted granted Critical
Publication of CN109829504B publication Critical patent/CN109829504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a prediction method and a prediction system for analyzing user forwarding behaviors based on ICS-SVM, and belongs to the field of social network analysis. A data set is first acquired. Second, influencing factors are defined. And extracting an internal influence mechanism and an external influence mechanism of the user by using data acquired from a real social network-Tencent microblog. And optimizing the CS algorithm. And according to the derivation formula, the search step length can be adaptively and dynamically adjusted. Aiming at the characteristic that the forwarding behavior of the user changes along with time, the method utilizes a time slicing method and utilizes an ICS-SVM model to predict the forwarding behavior of the user, so that the defect that the SVM parameter is optimized by the traditional cuckoo algorithm can be overcome, and the prediction precision of the traditional support vector machine is improved. The method and the device can predict the forwarding behavior of the user more accurately and analyze the propagation trend of the hot topics.

Description

Prediction method and system for analyzing user forwarding behavior based on ICS-SVM
Technical Field
The invention belongs to the field of social networks, relates to optimization of SVM algorithm parameters by a cuckoo algorithm, and discloses a method for predicting user forwarding behaviors more accurately.
Background
In recent years, with the increasing popularization of social networks such as Twitter, Facebook, microblog and the like, the social network gradually becomes a main platform for scholars at home and abroad to research user behavior rules and analyze topic popularity variation trends. The establishment of the attention relationship among users in the social network and the forwarding and comment behavior of the users on the hot topics are beneficial to the propagation of the hot topics, and huge wealth is created by mass user behavior data. Therefore, analyzing the forwarding behavior of the user has great significance for public opinion management and control, network marketing, topic detection and the like, and has attracted wide attention in the academic world and the industrial world.
At present, the user behavior prediction of hot topics mainly focuses on three aspects of topic features, network structure features and user features.
At present, a user behavior prediction method based on topic features mainly aims at some topics with higher forwarding capacity, and predicts user forwarding behaviors through the topic features. Yang et al propose a factor graph prediction model based on the characteristics of Twitter microblog forwarding. The literature simulates interest-driven information dissemination in an online social network and analyzes whether a user participates in topic discussion according to own interests. However, the influence of the individual interest of the user on the forwarding behavior is easily ignored by the prediction method, the forwarding behavior of the user is influenced by many aspects in a real social network, and the prediction of the forwarding behavior of the user only through the characteristics of the user is not accurate enough.
At present, a dynamic model is generally used for prediction in a user behavior prediction method based on network structure characteristics, Wang et al provides a novel SIR model, and researches dynamic behaviors in a uniform network model and a non-uniform network model respectively through an average field theory, and experiments show that the existence of a network intermediate medium has certain influence on information propagation. However, the construction of such a network structure is extremely computationally intensive and it is difficult to obtain complete forwarding network and node status.
At present, a user behavior prediction method based on the characteristics of a user, such as the number of fans, whether the fan is an active user, the number of microblogs, and the like, predicts the user behavior. The influence of topics is evaluated by using time series or naive Bayes and other methods, so that user behaviors are predicted, but prediction results are not accurate enough and have errors, most students predict the user behaviors by using a model for optimizing SVM parameters, but most algorithms for optimizing the parameters are based on population iteration, so that the method is easy to implement and can solve the problem of SVM parameter optimization. However, these algorithms have a slow convergence rate, and are likely to create a locally optimal situation, which makes the prediction result unstable.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The prediction method and the prediction system for analyzing the user forwarding behavior based on the ICS-SVM are provided, the accuracy of the user forwarding behavior is improved, and the variation trend of hot topics can be sensed. The technical scheme of the invention is as follows:
a prediction method for analyzing user forwarding behavior based on ICS-SVM comprises the steps of data acquisition, influence factor definition, CS algorithm improvement and ICS-SVM model construction, and specifically comprises the following steps:
s1: the data acquisition step comprises: downloading from a recommendation system based on a web research type or acquiring by using an API of a mature social platform, and carrying out preprocessing such as cleaning, duplicate checking and the like on data;
s2: defining influence factors: extracting three attributes of the user interest tag, the user historical forwarding rate and the external influence from the existing data acquired in the step S1, and defining the influence by a multiple linear regression method;
s3: and improving the CS algorithm: the improved cuckoo CS search algorithm is improved in that: on the basis of the traditional cuckoo CS search algorithm, a derivation algorithm is adopted to generate a step length, so that the search step length can be dynamically adjusted in a self-adaptive manner;
s4: constructing an ICS-SVM model: the improved cuckoo algorithm of the step S3 is combined with an SVM (support vector machine), the parameters of the SVM are optimized by using the improved cuckoo algorithm, a prediction model is trained by using the optimal parameters as the parameters of the SVM, a user forwarding behavior is predicted by using a time slicing method, and the propagation trend of the hot topics is analyzed.
Further, the step S2 is to extract three attributes of the user interest tag, the user historical forwarding rate, and the external influence from the existing data obtained in the step S1, and define the factors affecting the user forwarding, including:
s21: extracting internal attributes of the user: considering whether a user forwards a hot topic and is influenced by self factors, the hot topic is divided into two attributes, namely a user interest tag and a user historical forwarding rate, and the definition of the attributes can be properly modified according to the characteristics of data, which is specifically as follows:
user interest tag Interesttag (v)i):
Figure BDA0001969772300000031
Capturing the interactive behaviors of all users and the attendees in one month before the topic is published, arranging the behaviors in the order from top to bottom, and taking the top 8 labels; secondly, extracting keywords of the topic content, and calculating by using a Jaccard coefficient (formula (1)), wherein the higher the result is, the more interesting the user is to the topic;
user historical forwarding rate ForwardingRate (v)i):
The historical forwarding rate of the user is the proportion of the number of forwarded microblogs in a month before the topic is published by the user to the total number of published microblogs, namely
Figure BDA0001969772300000032
retweetNum(vi) Representing the number whoeNum (v) of microblogs forwarded by a user within one month before the topici) Representing the total number of microblogs published by the user in the month before the topic
S22, extracting external attributes of the user, considering whether the user forwards the hot topic, influences of family and surrounding friends besides the influence accident of the user, and extracting an external influenced attribute from the captured data, wherein the external influenced attribute comprises the following specific steps:
external influences ExternallInfluent (v)i):
The external influence comprises the influence of friends of the user and the influence of topic popularity,
Figure BDA0001969772300000033
where α, β represent coefficients, N represents the total number of friends of the user, If (v)i) Represents the amount of interaction between friends, α is 0.4, β is 0.6, leadernum (v)i) Number of opinion leaders, opinion leader (v)i) Defined as a user with strong influence in the social network, plays an important role as an intermediary, and utilizes the PageRank algorithm to calculate the opinion leader phi which represents an adjustable parameter and the opinion leader (v)i) The definition is as follows:
Figure BDA0001969772300000041
if represents the mutual amount between friends, and normalization processing is utilized.
Further, the step S3 of improving the CS algorithm specifically includes:
suppose that
Figure BDA0001969772300000042
The position of the ith bird nest in the tth generation is represented, and L (lambda) represents a random search path, so that the updated iterative formula of finding the position of the bird nest by the cuckoo is as follows:
Figure BDA0001969772300000043
wherein,
Figure BDA0001969772300000044
it is indicated that the step-size control amount,
Figure BDA0001969772300000045
the point-to-point multiplication is represented by the traditional cuckoo search algorithm, the improved cuckoo search algorithm is used for carrying out self-adaptive dynamic adjustment on the step size, and the formula is as follows:
Figure BDA0001969772300000046
maximum step size in CS
Figure BDA0001969772300000047
Minimum step size in CS
Figure BDA0001969772300000048
diRespectively indicate the position of the ith bird nest, diIs defined as follows:
Figure BDA0001969772300000049
wherein n isiIndicates the location of the ith bird nest, nbestIndicating the optimal bird nest position, dmaxRepresenting the maximum distance of the optimal bird nest position from other bird nest positions;
the improved cuckoo search algorithm can adjust the step length according to the formula (6), and if the position of the bird nest is closer to the optimal position at the moment, the step length is reduced; conversely, if the bird nest position is now farther from the optimal position, the step size is increased.
Further, the specific steps of constructing an ICS-SVM (improved cuckoo search algorithm optimization support vector machine) model in step S4 are as follows:
capturing relevant data of three hot topics of the Tencent microblog, preliminarily cleaning and sorting the data, extracting relevant attributes, and dividing the data into a training set and a test set;
② a penalty parameter C and a kernel function parameter sigma in the initialization SVM and a minimum step length in the CS
Figure BDA00019697723000000410
Maximum step size
Figure BDA00019697723000000411
The maximum iteration number N;
thirdly, the probability that the bird nest owner finds that the bird nest is a cuckoo egg is pa∈[0,1]Initial value pa0.75, randomly generating n birdsThe nest position is trained by utilizing the training set, the error is calculated, the optimal bird nest position is found and reserved;
fourthly, utilizing the formula (6) and the formula (7) to align the position of the bird nest and the position paUpdating parameters, comparing the parameters with the error of the old bird nest, and reserving the position and the correspondence of the superior bird nest
Figure BDA0001969772300000051
The like;
using random numbers r and paComparing, reserving p in the position of the preferred bird nest in the previous stepaChanging the position of the smaller bird nest to p in the position of the superior bird nest in the previous stepaComparing the positions of the larger bird nests according to the errors to obtain a group of new positions of the better bird nests;
finding the optimal bird nest position, comparing the error of the optimal bird nest position with the precision, finding the optimal parameters C and sigma of the SVM if the optimal parameters meet the precision requirement, otherwise, returning to the fourth step to continue iteration until finding the bird nest position meeting the precision requirement or exceeding the maximum iteration times, and stopping the iteration;
and seventhly, using the obtained optimal parameters C and sigma as parameter values of the SVM, training the SVM by using the training set again to obtain a prediction model, and testing the accuracy of the model by using the test set.
A prediction system for analyzing user forwarding behavior based on ICS-SVM comprises a data acquisition module, an influence factor defining module, a CS algorithm improving module and an ICS-SVM model constructing module, and specifically comprises:
a data acquisition module: downloading or acquiring the data from a recommendation system based on a web research type by utilizing an API of a mature social platform, and carrying out preprocessing such as cleaning, duplicate checking and the like on the data;
the influence factor defining module: extracting three attributes of a user interest tag, a user historical forwarding rate and an external influence from the obtained existing data, and defining the influence by a multiple linear regression method;
the improved CS algorithm module: the improved cuckoo CS search algorithm is improved in that: on the basis of the traditional cuckoo CS search algorithm, a derivation algorithm is adopted to generate a step length, so that the search step length can be dynamically adjusted in a self-adaptive manner;
constructing an ICS-SVM model module: the improved cuckoo algorithm of the step S3 is combined with an SVM (support vector machine), the parameters of the SVM are optimized by using the improved cuckoo algorithm, a prediction model is trained by using the optimal parameters as the parameters of the SVM, a user forwarding behavior is predicted by using a time slicing method, and the propagation trend of the hot topics is analyzed.
Further, the defining influence factor module extracts three attributes of the user interest tag, the user historical forwarding rate and the external influence from the acquired existing data, and defines the factors influencing the user forwarding, including:
s21: extracting internal attributes of the user: considering whether a user forwards a hot topic and is influenced by self factors, the hot topic is divided into two attributes, namely a user interest tag and a user historical forwarding rate, and the definition of the attributes can be properly modified according to the characteristics of data, which is specifically as follows:
user interest tag Interesttag (v)i):
Figure BDA0001969772300000061
Capturing the interactive behaviors of all users and the attendees in one month before the topic is published, arranging the behaviors in the order from top to bottom, and taking the top 8 labels; secondly, extracting keywords of the topic content, and calculating by using a Jaccard coefficient (formula (1)), wherein the higher the result is, the more interesting the user is to the topic;
user historical forwarding rate ForwardingRate (v)i):
The historical forwarding rate of the user is the proportion of the number of forwarded microblogs in a month before the topic is published by the user to the total number of published microblogs, namely
Figure BDA0001969772300000062
retweetNum(vi) Represents the number whoenum (v) of microblogs forwarded by the user within one month before the topici) To representThe total number of microblogs published by the user in the month before the topic
S22, extracting external attributes of the user, considering whether the user forwards the hot topic, influences of family and surrounding friends besides the influence accident of the user, and extracting an external influenced attribute from the captured data, wherein the external influenced attribute comprises the following specific steps:
external influences ExternallInfluent (v)i):
The external influence comprises the influence of friends of the user and the influence of topic popularity,
Figure BDA0001969772300000063
where α, β represent coefficients, N represents the total number of friends of the user, If (v)i) Represents the amount of interaction between friends, α is 0.4, β is 0.6, leadernum (v)i) Number of opinion leaders, opinion leader (v)i) Defined as a user with strong influence in the social network, plays an important role as an intermediary, and utilizes the PageRank algorithm to calculate the opinion leader phi which represents an adjustable parameter and the opinion leader (v)i) The definition is as follows:
Figure BDA0001969772300000071
if represents the mutual amount between friends, and normalization processing is utilized.
Further, the improved CS algorithm module specifically includes:
suppose that
Figure BDA0001969772300000072
The position of the ith bird nest in the tth generation is represented, and L (lambda) represents a random search path, so that the updated iterative formula of finding the position of the bird nest by the cuckoo is as follows:
Figure BDA0001969772300000073
wherein,
Figure BDA0001969772300000074
it is indicated that the step-size control amount,
Figure BDA0001969772300000075
the point-to-point multiplication is represented by the traditional cuckoo search algorithm, the improved cuckoo search algorithm is used for carrying out self-adaptive dynamic adjustment on the step size, and the formula is as follows:
Figure BDA0001969772300000076
maximum step size in CS
Figure BDA0001969772300000077
Minimum step size in CS
Figure BDA0001969772300000078
diRespectively, the position of the ith bird nest, diIs defined as follows:
Figure BDA0001969772300000079
wherein n isiIndicates the location of the ith bird nest, nbestIndicating the optimal bird nest position, dmaxRepresenting the maximum distance of the optimal bird nest position from other bird nest positions;
the improved cuckoo search algorithm can adjust the step length according to the formula (6), and if the position of the bird nest is closer to the optimal position at the moment, the step length is reduced; conversely, if the bird nest position is now farther from the optimal position, the step size is increased.
Further, the module for constructing an ICS-SVM (improved cuckoo search algorithm optimization support vector machine) model specifically includes:
capturing relevant data of three hot topics of the Tencent microblog, preliminarily cleaning and sorting the data, extracting relevant attributes, and dividing the data into a training set and a test set;
② a penalty parameter C and a kernel function parameter sigma in the initialization SVM and a minimum step length in the CS
Figure BDA0001969772300000081
Maximum step size
Figure BDA0001969772300000082
The maximum iteration number N;
thirdly, the probability that the bird nest owner finds that the bird nest is a cuckoo egg is pa∈[0,1]Initial value paRandomly generating n bird nest positions, training by using a training set, calculating errors, finding the optimal bird nest position and keeping the optimal bird nest position;
fourthly, utilizing the formula (6) and the formula (7) to align the position of the bird nest and the position paUpdating parameters, comparing the parameters with the error of the old bird nest, and reserving the position and the correspondence of the superior bird nest
Figure BDA0001969772300000083
The like;
using random numbers r and paComparing, reserving p in the position of the preferred bird nest in the previous stepaChanging the position of the smaller bird nest to p in the position of the superior bird nest in the previous stepaComparing the positions of the larger bird nests according to the errors to obtain a group of new positions of the better bird nests;
finding the optimal bird nest position, comparing the error of the optimal bird nest position with the precision, finding the optimal parameters C and sigma of the SVM if the optimal parameters meet the precision requirement, otherwise, returning to the fourth step to continue iteration until finding the bird nest position meeting the precision requirement or exceeding the maximum iteration times, and stopping the iteration;
and seventhly, using the obtained optimal parameters C and sigma as parameter values of the SVM, training the SVM by using the training set again to obtain a prediction model, and testing the accuracy of the model by using the test set.
The invention has the following advantages and beneficial effects:
the method firstly optimizes the support vector machine by utilizing the improved cuckoo algorithm, eliminates the condition that the parameter convergence speed of the traditional cuckoo algorithm optimized support vector machine is low, is easy to form local optimum, and provides a good basis for predicting the accuracy of the forwarding behavior of the user. Secondly, by extracting an internal driving mechanism and an external influence mechanism of the user, influence factors of the user forwarding behavior are analyzed, the defect that the user interest and the influence of the outside on the user are not considered is overcome, and the user forwarding behavior in real life is more consistent. Finally, by considering the characteristic that the forwarding behavior of the user changes along with time, the accuracy of the forwarding behavior of the user is improved by using a time slicing method, and the change trend of the hot topic can be sensed.
Drawings
FIG. 1 is an overall flow diagram of the preferred embodiment of the present invention.
FIG. 2 is a diagram illustrating the definition of influencing factors according to the present invention.
Fig. 3 is a schematic diagram of the improved cuckoo algorithm of the present invention.
FIG. 4 is a schematic flow chart of ICS-SVM of the present invention.
Fig. 5 is a schematic diagram of the predicted user forwarding behavior of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
FIG. 1 is a general flow chart of the present invention, which includes data acquisition, defining influencing factors, optimizing CS algorithm, and constructing ICS-SVM model. Specifically, the detailed implementation process of the invention comprises the following four steps:
s1: a data source is acquired. Data acquisition can be directly downloaded from a web research-based recommendation system or acquired by utilizing an API of a mature social platform, and data is preprocessed.
S2: and extracting relevant attributes and defining influence factors. Three attributes of the user interest tag, the user historical forwarding rate and the external influence are extracted from the existing data, factors influencing the user forwarding are defined, and then the behavior of the user for forwarding the hot topic is analyzed more comprehensively.
S3: and optimizing the CS algorithm. The CS has the advantages of high convergence rate, high stability and the like. The traditional cuckoo search algorithm has randomness in the generation of step length by adopting Levy flight, so that the convergence speed is low. By means of the derivation algorithm, the search step length can be dynamically adjusted in a self-adaptive mode, so that the convergence speed is increased, the result is more stable, and the effect of optimizing the CS is achieved.
S4: and constructing an ICS-SVM model. The improved cuckoo algorithm is combined with the SVM, the parameters of the SVM are optimized by using the improved cuckoo algorithm, the optimal parameters are used as the parameters of the SVM to train a prediction model, the user forwarding behavior is predicted by using a time slicing method, and the hot topic propagation trend is analyzed.
The above step S1 obtains the data source, and extracts the relevant attribute from the data source. Mainly comprises the following 1 step.
S11: data is acquired. And capturing information of forwarding or commenting users under certain three hot topics, wherein the information comprises fan information of users who participate in topic discussion.
The schematic diagram of the definition influencing factors in the step S2 is shown in fig. 2, and may be specifically divided into the following 2 steps.
S21: and extracting the internal attribute of the user. Considering whether a user forwards a hot topic and is influenced by self factors, the hot topic is divided into two attributes, namely a user interest tag and a user historical forwarding rate, and the definition of the attributes can be properly modified according to the characteristics of data, which is specifically as follows:
1. user interest tag Interesttag (v)i):
Figure BDA0001969772300000101
The interactive behaviors of all users and the attendees in the month before the topic is published are captured, and the top 8 labels are taken according to the sequence from top to bottom. Secondly, extracting keywords of the topic content, and calculating by using the Jaccard coefficient, wherein the higher the result is, the more interesting the user is to the topic.
2. User historical forwarding rate ForwardingRate (v)i):
The historical forwarding rate of the user is the proportion of the number of the forwarded microblogs in one month before the topic is published by the user to the total number of the published microblogs. Namely, it is
Figure BDA0001969772300000102
And S22, extracting the external attributes of the user. Considering whether a user forwards a hot topic, which is influenced by the factors of the user and the family and surrounding friends, the invention extracts an attribute influenced by the outside world from the captured data, and the definition of the attribute can be properly modified according to the characteristics of the data, specifically as follows:
3. external influences ExternallInfluent (v)i):
The external influence comprises the influence of friends of the user and the influence of the popularity of the topic, and if the friends concerned by the user have many people with higher influence and the topic has high popularity, the user can forward the topic at a high probability.
Figure BDA0001969772300000103
The alpha and the beta represent coefficients, and the alpha is 0.4 and the beta is 0.6 through experimental verification. leader num (v)i) Number of opinion leaders, opinion leader (v)i) Defined as users with strong influence in social networks, plays an important role as an intermediary, calculates opinion leaders using the PageRank algorithm,
Figure BDA0001969772300000111
indicating an adjustable parameter. Opinion leader (v)i) The definition is as follows:
Figure BDA0001969772300000112
if represents the mutual amount between friends, and normalization processing is utilized for convenient calculation and more accurate result.
The flow of optimizing the CS algorithm in step S3 is shown in fig. 3, and specifically includes the following steps:
suppose that
Figure BDA0001969772300000113
The position of the ith bird nest in the tth generation is represented, and L (lambda) represents a random search path, so that the updated iterative formula of finding the position of the bird nest by the cuckoo is as follows:
Figure BDA0001969772300000114
wherein,
Figure BDA0001969772300000115
it is indicated that the step-size control amount,
Figure BDA0001969772300000116
representing point-to-point multiplication. The above is the traditional cuckoo search algorithm, and the improved cuckoo search algorithm is to perform adaptive dynamic adjustment on the step size, and the formula is as follows:
Figure BDA0001969772300000117
diis defined as follows:
Figure BDA0001969772300000118
wherein n isiIndicates the location of the ith bird nest, nbestIndicating the optimal bird nest position, dmaxRepresenting the maximum distance of the optimal bird nest location from other bird nest locations.
The improved cuckoo search algorithm can adjust the step length according to the formula (6), and if the position of the bird nest is closer to the optimal position at the moment, the step length is reduced; conversely, if the bird nest position is now farther from the optimal position, the step size is increased.
The flow of building the ICS-SVM model in the step S4 is shown in fig. 4, and the specific steps are as follows:
capturing relevant data of three hot topics of the Tencent microblog, preliminarily cleaning and sorting the data, extracting relevant attributes, and dividing the data into a training set and a test set;
secondly, referring to books and documents, initializing a penalty parameter C and a kernel function parameter sigma in the SVM and a minimum step length in the CS
Figure BDA0001969772300000121
Maximum step size
Figure BDA0001969772300000122
The maximum iteration number N;
thirdly, the probability that the owner of the bird nest finds that the bird nest is a cuckoo egg is pa∈[0,1]Initial value pa0.75. And randomly generating n bird nest positions, training by using the training set, calculating errors, finding the optimal bird nest position and reserving the optimal bird nest position.
Fourthly, utilizing the formula (6) and the formula (7) to align the position of the bird nest and the position paAnd updating the parameters, comparing the errors with the old bird nest, and keeping the position of the bird nest with the optimal position and the corresponding parameters.
Using random numbers r and paComparing, reserving p in the position of the preferred bird nest in the previous stepaChanging the position of the smaller bird nest to p in the position of the superior bird nest in the previous stepaA larger bird nest position. And obtaining a new group of preferred bird nest positions according to the error comparison.
Finding the optimal bird nest position, comparing the error of the optimal bird nest position with the precision, finding the optimal parameters C and sigma of the SVM if the optimal parameters meet the precision requirement, otherwise, returning to the fourth step to continue iteration until finding the bird nest position meeting the precision requirement or exceeding the maximum iteration number, and stopping the iteration.
And seventhly, using the obtained optimal parameters C and sigma as parameter values of the SVM, training the SVM by using the training set again to obtain a prediction model, and testing the accuracy of the model by using the test set.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (2)

1. A prediction method for analyzing user forwarding behaviors based on ICS-SVM is characterized by comprising the steps of data acquisition, influence factor definition, CS algorithm improvement and ICS-SVM model construction, and specifically comprises the following steps:
s1: the data acquisition step comprises: downloading or acquiring the data from a recommendation system based on a web research type by utilizing an API of a mature social platform, and carrying out preprocessing including cleaning and duplicate checking on the data;
s2: defining influence factors: extracting three attributes of the user interest tag, the user historical forwarding rate and the external influence from the existing data acquired in the step S1, and defining the influence by a multiple linear regression method;
s3: and improving the CS algorithm: the improved cuckoo CS search algorithm is improved in that: on the basis of the traditional cuckoo CS search algorithm, a derivation algorithm is adopted to generate a step length, so that the search step length can be dynamically adjusted in a self-adaptive manner;
s4: constructing an ICS-SVM model: combining the improved cuckoo algorithm of the step S3 with an SVM (support vector machine), optimizing parameters of the SVM by using the improved cuckoo algorithm, training a prediction model by using the optimal parameters as parameters of the SVM, predicting user forwarding behaviors by adopting a time slicing method, and analyzing the propagation trend of hot topics;
the step S2 extracts three attributes of the user interest tag, the user historical forwarding rate, and the external influence from the existing data obtained in the step S1, and defines factors affecting the user forwarding, including:
s21: extracting internal attributes of the user: considering whether a user forwards a hot topic and is influenced by self factors, the hot topic is divided into two attributes, namely a user interest tag and a user historical forwarding rate, and the definition of the attributes can be properly modified according to the characteristics of data, which is specifically as follows:
user interest tag Interesttag (v)i):
Figure FDA0003621340180000011
Capturing the interactive behaviors of all users and the attendees in one month before the topic is published, arranging the behaviors in the order from top to bottom, and taking the top 8 labels; secondly, extracting keywords of topic contents, and calculating by using a formula (1), wherein the higher the result is, the more interesting the user is to the topic;
user historical forwarding rate ForwardingRate (v)i):
The historical forwarding rate of the user is the proportion of the number of forwarded microblogs in a month before the topic is published by the user to the total number of published microblogs, namely
Figure FDA0003621340180000021
retweetNum(vi) Denotes the number of microblogs that a user forwards in a month before the topic, whoeNum (v)i) Representing the total number of microblogs published by the user within a month before the topic;
s22, extracting external attributes of the user, considering whether the user forwards the hot topic, influences of family and surrounding friends besides the influence accident of the user, and extracting an external influenced attribute from the captured data, wherein the external influenced attribute comprises the following specific steps:
external influences ExternalInfluence (v)i):
The external influence comprises the influence of friends of the user and the influence of topic popularity,
Figure FDA0003621340180000022
where α, β represent coefficients, N represents the total number of friends of the user, If (v)i) Represents the amount of interaction between friends, α is 0.4, β is 0.6, leadernum (v)i) Number of opinion leaders, opinion leader (v)i) Defined as a user with strong influence in the social network, plays an important role as an intermediary, and utilizes the PageRank algorithm to calculate the opinion leader phi which represents an adjustable parameter and the opinion leader (v)i) The definition is as follows:
Figure FDA0003621340180000023
if represents the mutual amount between friends, and normalization processing is utilized;
the step S3 of improving the CS algorithm specifically includes:
suppose that
Figure FDA0003621340180000024
The position of the ith bird nest in the tth generation is represented, and L (lambda) represents a random search path, so that the updated iterative formula of finding the position of the bird nest by the cuckoo is as follows:
Figure FDA0003621340180000025
wherein,
Figure FDA0003621340180000026
it is indicated that the step-size control amount,
Figure FDA0003621340180000027
the point-to-point multiplication is represented by the traditional cuckoo search algorithm, the improved cuckoo search algorithm is used for carrying out self-adaptive dynamic adjustment on the step size, and the formula is as follows:
Figure FDA0003621340180000028
maximum step size in CS
Figure FDA0003621340180000031
Minimum step size in CS
Figure FDA0003621340180000032
diRespectively, the position of the ith bird nest, diIs defined as follows:
Figure FDA0003621340180000033
wherein n isiIndicates the location of the ith bird nest, nbestIndicating the optimal bird nest position, dmaxRepresenting the maximum distance of the optimal bird nest position from other bird nest positions;
the improved cuckoo search algorithm can adjust the step length according to the formula (6), and if the position of the bird nest is closer to the optimal position at the moment, the step length is reduced; conversely, if the position of the bird nest is far away from the optimal position, the step length is increased;
the specific steps of constructing the ICS-SVM improved cuckoo search algorithm optimized support vector machine model in step S4 are as follows:
capturing relevant data of three hot topics of the Tencent microblog, preliminarily cleaning and sorting the data, extracting relevant attributes, and dividing the data into a training set and a test set;
② a penalty parameter C and a kernel function parameter sigma in the initialization SVM and a minimum step length in the CS
Figure FDA0003621340180000034
Maximum step size
Figure FDA0003621340180000035
The maximum iteration number N;
thirdly, the owner of the bird nest finds that the bird nest is a gabby bird eggThe ratio is pa∈[0,1]Initial value paRandomly generating n bird nest positions, training by using a training set, calculating errors, finding the optimal bird nest position and keeping the optimal bird nest position;
fourthly, utilizing the formula (6) and the formula (7) to align the position of the bird nest and the position paUpdating parameters, comparing the parameters with the error of the old bird nest, and reserving the position and the correspondence of the superior bird nest
Figure FDA0003621340180000036
The like;
using random numbers r and paComparing, reserving p in the position of the preferred bird nest in the previous stepaChanging the position of the smaller bird nest to p in the position of the superior bird nest in the previous stepaComparing the positions of the larger bird nests according to the errors to obtain a group of new positions of the better bird nests;
finding the optimal bird nest position in the fifth step, comparing the error of the optimal bird nest position with the precision, finding the optimal parameters C and sigma of the SVM if the accuracy requirement is met, otherwise, returning to the fourth step for continuing iteration until the bird nest position meeting the accuracy requirement is found or the maximum iteration number is exceeded, and stopping the iteration;
and seventhly, using the obtained optimal parameters C and sigma as parameter values of the SVM, training the SVM by using the training set again to obtain a prediction model, and testing the accuracy of the model by using the test set.
2. A prediction system for analyzing user forwarding behaviors based on ICS-SVM is characterized by comprising a data acquisition module, an influence factor defining module, a CS algorithm improving module and an ICS-SVM model constructing module, and specifically comprises the following modules:
a data acquisition module: downloading or acquiring the data from a recommendation system based on a web research type by utilizing an API of a mature social platform, and carrying out preprocessing such as cleaning, duplicate checking and the like on the data;
defining an influence factor module: extracting three attributes of a user interest tag, a user historical forwarding rate and an external influence from the obtained existing data, and defining the influence by a multiple linear regression method;
the improved CS algorithm module: the improved cuckoo CS search algorithm is improved in that: on the basis of the traditional cuckoo CS search algorithm, a derivation algorithm is adopted to generate a step length, so that the search step length can be dynamically adjusted in a self-adaptive manner;
constructing an ICS-SVM model module: combining the improved cuckoo algorithm of the step S3 with an SVM (support vector machine), optimizing parameters of the SVM by using the improved cuckoo algorithm, training a prediction model by using the optimal parameters as parameters of the SVM, predicting user forwarding behaviors by adopting a time slicing method, and analyzing the propagation trend of hot topics;
the influence factor defining module extracts three attributes of the user interest tag, the user historical forwarding rate and the external influence from the acquired existing data, and defines factors influencing user forwarding, and the factors comprise:
s21: extracting internal attributes of the user: considering whether a user forwards a hot topic and is influenced by self factors, the hot topic is divided into two attributes, namely a user interest tag and a user historical forwarding rate, and the definition of the attributes can be properly modified according to the characteristics of data, which is specifically as follows:
user interest tag Interesttag (v)i):
Figure FDA0003621340180000041
Capturing the interactive behaviors of all users and the attendees in one month before the topic is published, arranging the behaviors in the order from top to bottom, and taking the top 8 labels; secondly, extracting keywords of topic contents, and calculating by using a Jaccard coefficient formula (1), wherein the higher the result is, the more interesting the user is to the topic;
user historical forward rate ForwardingRate (v)i):
The historical forwarding rate of the user is the proportion of the number of forwarded microblogs in a month before the topic is published by the user to the total number of published microblogs, namely
Figure FDA0003621340180000051
retweetNum(vi) Represents the number whoenum (v) of microblogs forwarded by the user within one month before the topici) Represents the total number of microblogs issued by the user in a month before the topic
S22, extracting external attributes of the user, considering whether the user forwards the hot topic, influences of family and surrounding friends besides the influence accident of the user, and extracting an external influenced attribute from the captured data, wherein the external influenced attribute comprises the following specific steps:
external influences ExternalInfluence (v)i):
The external influence comprises the influence of friends of the user and the influence of topic popularity,
Figure FDA0003621340180000052
where alpha, beta represent coefficients, N represents the total number of friends of the user, If (v)i) Represents the amount of interaction between friends, α is 0.4, β is 0.6, leadernum (v)i) Number of opinion leaders, opinion leader (v)i) Defined as a user with strong influence in the social network, plays an important role as an intermediary, and utilizes the PageRank algorithm to calculate the opinion leader phi which represents an adjustable parameter and the opinion leader (v)i) The definition is as follows:
Figure FDA0003621340180000053
if represents the mutual amount between friends, and normalization processing is utilized;
the improved CS algorithm module specifically includes:
suppose that
Figure FDA0003621340180000054
The position of the ith bird nest in the tth generation is represented, and L (lambda) represents a random search path, so that the updated iterative formula of finding the position of the bird nest by the cuckoo is as follows:
Figure FDA0003621340180000055
wherein,
Figure FDA0003621340180000061
it is indicated that the step-size control amount,
Figure FDA0003621340180000062
the point-to-point multiplication is represented by the traditional cuckoo search algorithm, the improved cuckoo search algorithm is used for carrying out self-adaptive dynamic adjustment on the step size, and the formula is as follows:
Figure FDA0003621340180000063
maximum step size in CS
Figure FDA0003621340180000064
Minimum step size in CS
Figure FDA0003621340180000065
diRespectively, the position of the ith bird nest, diIs defined as follows:
Figure FDA0003621340180000066
wherein n isiIndicates the location of the ith bird nest, nbestIndicating the optimal bird nest position, dmaxRepresenting the maximum distance of the optimal bird nest position from other bird nest positions;
the improved cuckoo search algorithm can adjust the step length according to the formula (6), and if the position of the bird nest is closer to the optimal position at the moment, the step length is reduced; conversely, if the position of the bird nest is far away from the optimal position, the step length is increased;
the model module for constructing the ICS-SVM (improved cuckoo search algorithm optimization support vector machine) specifically comprises:
capturing relevant data of three hot topics of Tencent microblog, preliminarily cleaning and sorting the data, extracting relevant attributes, and dividing the data into a training set and a testing set;
② a penalty parameter C and a kernel function parameter sigma in the initialization SVM and a minimum step length in the CS
Figure FDA0003621340180000067
Maximum step size
Figure FDA0003621340180000068
The maximum iteration number N;
thirdly, the probability that the bird nest owner finds that the bird nest is a cuckoo egg is pa∈[0,1]Initial value paRandomly generating n bird nest positions, training by using a training set, calculating errors, finding the optimal bird nest position and keeping the optimal bird nest position;
fourthly, utilizing the formula (6) and the formula (7) to align the position of the bird nest and the position paUpdating parameters, comparing the parameters with the error of the old bird nest, and reserving the position and the correspondence of the superior bird nest
Figure FDA0003621340180000069
The like;
using random numbers r and paComparing, reserving p in the position of the preferred bird nest in the previous stepaChanging the position of the smaller bird nest to p in the position of the superior bird nest in the previous stepaComparing the positions of the larger bird nests according to the errors to obtain a group of new positions of the better bird nests;
finding the optimal bird nest position in the fifth step, comparing the error of the optimal bird nest position with the precision, finding the optimal parameters C and sigma of the SVM if the accuracy requirement is met, otherwise, returning to the fourth step for continuing iteration until the bird nest position meeting the accuracy requirement is found or the maximum iteration number is exceeded, and stopping the iteration;
and seventhly, using the obtained optimal parameters C and sigma as parameter values of the SVM, training the SVM by using the training set again to obtain a prediction model, and testing the accuracy of the model by using the test set.
CN201910114885.0A 2019-02-14 2019-02-14 Prediction method and system for analyzing user forwarding behavior based on ICS-SVM Active CN109829504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910114885.0A CN109829504B (en) 2019-02-14 2019-02-14 Prediction method and system for analyzing user forwarding behavior based on ICS-SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910114885.0A CN109829504B (en) 2019-02-14 2019-02-14 Prediction method and system for analyzing user forwarding behavior based on ICS-SVM

Publications (2)

Publication Number Publication Date
CN109829504A CN109829504A (en) 2019-05-31
CN109829504B true CN109829504B (en) 2022-07-01

Family

ID=66862105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910114885.0A Active CN109829504B (en) 2019-02-14 2019-02-14 Prediction method and system for analyzing user forwarding behavior based on ICS-SVM

Country Status (1)

Country Link
CN (1) CN109829504B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143566A (en) * 2019-12-27 2020-05-12 北京工业大学 Method for predicting hot event outbreak aiming at twitter
CN112800336B (en) * 2021-02-07 2022-06-17 东北大学 Online social network user behavior prediction method based on simple harmonic vibration theory

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005583A (en) * 2015-06-17 2015-10-28 清华大学 Method and system for predicting information forwarding increment in social network
CN105426915A (en) * 2015-11-20 2016-03-23 北京大学深圳研究生院 Support vector machine-based prediction method and system
CN106547901A (en) * 2016-11-08 2017-03-29 周口师范学院 It is a kind of to forward behavior prediction method based on energy-optimised microblog users
CN106682770A (en) * 2016-12-14 2017-05-17 重庆邮电大学 Friend circle-based dynamic microblog forwarding behavior prediction system and method
CN107330562A (en) * 2017-07-03 2017-11-07 扬州大学 Information dissemination method based on individual consumer's feature
CN108596205A (en) * 2018-03-20 2018-09-28 重庆邮电大学 Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11080475B2 (en) * 2017-01-17 2021-08-03 Microsoft Technology Licensing, Llc Predicting spreadsheet properties

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005583A (en) * 2015-06-17 2015-10-28 清华大学 Method and system for predicting information forwarding increment in social network
CN105426915A (en) * 2015-11-20 2016-03-23 北京大学深圳研究生院 Support vector machine-based prediction method and system
CN106547901A (en) * 2016-11-08 2017-03-29 周口师范学院 It is a kind of to forward behavior prediction method based on energy-optimised microblog users
CN106682770A (en) * 2016-12-14 2017-05-17 重庆邮电大学 Friend circle-based dynamic microblog forwarding behavior prediction system and method
CN107330562A (en) * 2017-07-03 2017-11-07 扬州大学 Information dissemination method based on individual consumer's feature
CN108596205A (en) * 2018-03-20 2018-09-28 重庆邮电大学 Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Application of SVM regression in HAGC system;Li wei,et al;《The 27th chinese control and decision conference》;20150720;全文 *
基于SVM的微博转发规模预测方法;李英乐,等;《计算机应用研究》;20130515;全文 *
基于混合特征学习的微博转发预测方法;马晓峰,等;《计算机应用与软件》;20161115;全文 *

Also Published As

Publication number Publication date
CN109829504A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
Yuan et al. Jointly embedding the local and global relations of heterogeneous graph for rumor detection
Makkar et al. An efficient deep learning-based scheme for web spam detection in IoT environment
CN106682770B (en) Dynamic microblog forwarding behavior prediction system and method based on friend circle
Arora et al. Agribot: a natural language generative neural networks engine for agricultural applications
CN104166668A (en) News recommendation system and method based on FOLFM model
Smith et al. Camp stability predicts patterns of hunter–gatherer cooperation
CN113962358B (en) Information diffusion prediction method based on time sequence hypergraph attention neural network
CN114519145A (en) Sequence recommendation method for mining long-term and short-term interests of users based on graph neural network
CN111460294A (en) Message pushing method and device, computer equipment and storage medium
CN107818514B (en) Method, device and terminal for controlling information propagation of online social network
CN109933720B (en) Dynamic recommendation method based on user interest adaptive evolution
CN113268675A (en) Social media rumor detection method and system based on graph attention network
CN109829504B (en) Prediction method and system for analyzing user forwarding behavior based on ICS-SVM
WO2019214046A1 (en) Method, device, computer device, and storage medium for asset trend analysis
CN115878841A (en) Short video recommendation method and system based on improved bald eagle search algorithm
Wu et al. Estimating fund-raising performance for start-up projects from a market graph perspective
CN110825868A (en) Topic popularity based text pushing method, terminal device and storage medium
CN116542720A (en) Time enhancement information sequence recommendation method and system based on graph convolution network
CN111382345B (en) Topic screening and publishing method, device and server
Yang et al. DUAPM: An effective dynamic micro-blogging user activity prediction model towards cyber-physical-social systems
AU2021102006A4 (en) A system and method for identifying online rumors based on propagation influence
CN108596205B (en) Microblog forwarding behavior prediction method based on region correlation factor and sparse representation
Zhang et al. Rumor detection with hierarchical representation on bipartite ad hoc event trees
Xie et al. Independent asymmetric embedding for information diffusion prediction on social networks
Zhou [Retracted] Application of K‐Means Clustering Algorithm in Energy Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant