CN104281882A - Method and system for predicting social network information popularity on basis of user characteristics - Google Patents

Method and system for predicting social network information popularity on basis of user characteristics Download PDF

Info

Publication number
CN104281882A
CN104281882A CN201410472689.8A CN201410472689A CN104281882A CN 104281882 A CN104281882 A CN 104281882A CN 201410472689 A CN201410472689 A CN 201410472689A CN 104281882 A CN104281882 A CN 104281882A
Authority
CN
China
Prior art keywords
user
data
information
user data
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410472689.8A
Other languages
Chinese (zh)
Other versions
CN104281882B (en
Inventor
李歌
胡玥
于延宇
李丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410472689.8A priority Critical patent/CN104281882B/en
Publication of CN104281882A publication Critical patent/CN104281882A/en
Application granted granted Critical
Publication of CN104281882B publication Critical patent/CN104281882B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention provides a method for predicting social network information popularity on the basis of user characteristics. The method includes the steps of obtaining user data and information data in a social network, extracting part of user attribute characteristics and user behavior characteristics from the user data, classifying the user data according to the user attribute characteristics and the user behavior characteristics, obtaining user broadcasting characteristics corresponding to the information data according to the information data and the classification of a user, obtaining a social network information popularity prediction model according to the user broadcasting characteristics, and predicting the information popularity through the prediction model. The invention provides a system for predicting the social network information popularity on the basis of user characteristics. The system comprises an obtaining module, a characteristic obtaining module, a classification module, a processing module, a prediction model module and a prediction model. Through the combination with the features of user behavior characteristics, information propagation of the social network is more accurately predicted, and the problems that hot spot finding lags and the real-time performance of information pushing and online public opinion monitoring can be hardly ensured are solved.

Description

Based on the method and system of the prediction social network information stream row degree of user characteristics
Technical field
The present invention relates to technical field of network security, be specifically related to the method and system of the prediction social network information stream row degree based on user characteristics.
Background technology
At present, network has become the important channel of obtaining information, especially along with the quick emergence of various social network sites, and acquisition of information and Information Communication is following faster more easily.Social networks has defined huge Xian Shang social groups, constructs interpersonal relation on close line.Information Communication on social networks is different from traditional Information Communication such as mail, oral, newspaper, Information Communication in social networks has the prominent feature of following several respects, first, there is very strong real-time, the progress of science and technology, the person of sending of information is easy to the major event seen outwards to propagate with the fastest time; The second, have stronger group, the Information issued of social networks becomes follows one's bent, and different people can issue some for some objects and have agitative information, and these information wide-scale distribution will cause group.3rd, information updating periodically diminishes, and due to a large amount of issues of information, the source of information is more and more wider, thus information in the air gradually replace by new information, periodically diminish.
The Popularity prediction of Information Communication, in conjunction with the feature of Information Communication on social networks, effectively can solve a lot of problem.Can change in discovery Information Communication early, making prediction to the popularity of Information Communication as early as possible becomes the major part of information real time propelling movement and the monitoring of community network public sentiment.At present, propagating information pushing and public sentiment is all the monitoring method adopted, and arranges a threshold values, when some parameter of information is greater than this threshold values, will be defined as pushed information or public feelings information.These method relative coarseness, the real-time characteristic of information is difficult to be guaranteed.
Summary of the invention
For the defect of prior art, the method of the prediction social network information stream row degree based on user characteristics provided by the invention, in conjunction with the feature of user behavior feature, predict the Information Communication of social networks more exactly, solve focus and find that delayed, information pushing and network public-opinion monitoring real-time are difficult to the problem ensured.
First aspect, the invention provides a kind of method of the prediction social network information stream row degree based on user characteristics, the method comprises:
Obtain the information data in Preset Time in social networks and the user data corresponding with described information data, described user data comprises multiple user property feature;
Extraction parts user property feature from described user data, and according to described user data, obtain the user behavior feature of described user data;
According to user behavior feature described in described user property characteristic sum, user data is classified, obtain the classification of user in user data;
According to the classification of user in described information data and described user data, obtain the user propagation characteristic corresponding with described information data;
According to described user's propagation characteristic, determine the forecast model of social network information stream row degree;
Adopt described forecast model to analyze the information data produced in a period of time, information popularity is predicted.
Preferably, after the information data in described acquisition Preset Time in social networks and the user data step corresponding with described information data, the method also comprises:
By described user data and described information data storing in database.
Preferably, the information data in described acquisition Preset Time in social networks and the user data corresponding with described information data, comprising:
Web crawlers is adopted to obtain user data and the information data of forum's class social networks;
Application programming interface API is adopted to obtain user data and the information data of microblogging class social networks;
Adopt web crawlers to obtain the user data of community's class social networks, adopt the clipbook of user to obtain the information data of community's class social networks.
Preferably, describedly according to user behavior feature described in described user property characteristic sum, user data to be classified, obtains the classification of user in user data, comprising:
User behavior feature described in described user property characteristic sum is normalized, obtains user characteristics;
According to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
Preferably, user data is classified by described employing clustering algorithm, comprising:
User data is divided into two classes, and calculates the distance of class center, if the distance of class center is less than preset value, then these two classifications are fused into a classification;
Classification is continued to user data of all categories, and calculates the distance at center of all categories, until stop classification when occurring that the user data of three classifications is fused into a classification, obtain the classification of user.
Preferably, described according to described user's propagation characteristic, determine the forecast model of social network information stream row degree, comprising:
Set up the multivariate linear model based on user characteristics;
Described user is diffused information as training set, described linear model is trained, obtain social network information stream row degree forecast model.
Second aspect, the invention provides a kind of system of the prediction social network information stream row degree based on user characteristics, this system comprises:
Acquisition module, for obtaining information data in Preset Time in social networks and the user data corresponding with described information data, described user data comprises multiple user property feature;
Characteristic extracting module, for the user property feature of Extraction parts from described user data, and according to described user data, obtains the user behavior feature of described user data;
Sort module, for being classified by user data according to user behavior feature described in described user property characteristic sum, obtains the classification of user in user data;
Processing module, for the classification according to user in described information data and described user data, obtains the user propagation characteristic corresponding with described information data;
Forecast model module, for according to described user's propagation characteristic, determines the forecast model of social network information stream row degree;
Prediction module, for adopting described forecast model to analyze the information data produced in a period of time, predicts information popularity.
Preferably, described system also comprises:
Memory module, for by described user data and described information data storing in database.
Preferably, described sort module comprises:
Normalization submodule, for being normalized user behavior feature described in described user property characteristic sum, obtains user characteristics;
Divide submodule, for according to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
Preferably, described forecast model module comprises:
Modling model submodule, for setting up the multivariate linear model based on user characteristics;
Training submodule, for diffusing information as training set using described user, training described linear model, obtaining social network information stream row degree forecast model.
Based on technique scheme, the method of the prediction social network information stream row degree based on user characteristics provided by the invention, take into full account the impact for Information Communication of the real-time of social network information and user characteristics, the mode of employing information Popularity prediction carries out information disclosure model description, can predict Information Communication as early as possible, decrease the hysteresis quality of classic method, control in time to provide help to the public sentiment of timely information pushing and social networks.Meanwhile, system of the present invention operationally internal memory cost is not high, has very high efficiency, has independence and portability.Comprehensive, the present invention can make prediction early to Information Communication popularity, and to the timely propelling movement of information, and the public sentiment of network controls all to be very helpful in time.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these figure.
Fig. 1 is the schematic flow sheet of the method for the prediction social network information stream row degree based on user characteristics that one embodiment of the invention provides;
Fig. 2 is the schematic flow sheet of the method for the acquisition user data that provides of another embodiment of the present invention and information data;
Fig. 3 is the structural drawing of the system of the prediction social network information stream row degree based on user characteristics that one embodiment of the invention provides;
Fig. 4 is the structural representation of the sort module that another embodiment of the present invention provides;
Fig. 5 is the structural representation of the forecast model module that another embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, Fig. 1 shows the method for the prediction social network information stream row degree based on user characteristics that one embodiment of the invention provides, and the method comprises the steps:
Step 101: obtain the information data in Preset Time in social networks and the user data corresponding with information data.Wherein, user data comprises multiple user property feature.
In the present embodiment, by obtain social networks in user data and information data storing in database.
Take different data acquiring mode according to dissimilar social networks, the social network sites of forum's class, using model as the carrier of information, adopts the data of web crawlers acquisition model so applicable.
For microblogging class social networks, spread news with the microblogging of short text, application programming interface (Application Programming Interface is called for short API) the obtaining information data that microblog can be used to provide and user data.
For community's class social networks, user data can be obtained by web crawlers, in the clipbook obtaining information data by these users.
Step 102: the user property feature of Extraction parts from user data, and the user behavior feature obtaining user data according to user data.
Specifically, for the user data obtained, user property characteristic sum user behavior feature is specifically divided into.
User property feature is the information that user provides when registering social networks account, such as: name, and age, sex etc.Attributive character for user will retain the effective feature that may affect Information Communication, removes the invalid feature that can not affect Information Communication, such as: telephone number, and postcode etc.
The feature of generation when user behavior feature refers to that user carries out activity on social networks, such as: good friend's quantity, replys quantity etc.But there are some user behavior datas directly directly to be obtained by API and web crawlers, need by calculating indirect acquisition, such as: use the social networks age, convergence factor etc.Following user characteristics is exactly be made up of the user behavior feature that effective user property characteristic sum is complete.
Step 103: according to user behavior feature described in described user property characteristic sum, user data is classified, obtain the classification of user in user data.
In the present embodiment, this step comprises:
User behavior feature described in described user property characteristic sum is normalized, obtains user characteristics;
According to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
Specifically, CLA algorithm is a kind of clustering algorithm, need not specify the number that will classify in advance, but find suitable classification number by certain condition by CLA algorithm.First user can be divided into two classes by CLA algorithm, and calculates the distance of class center, when the distance of class center is less than certain value, just thinks that these two classifications should belong to a class, and these two classifications are fused into a classification.Then the quantity increasing classification is classified to user according to above-mentioned method again, until when the user's fusion having three classifications for the first time becomes a classification, stop algorithm.Such user will be divided into suitable classification.
Step 104: according to the classification of user in information data and user data, obtains the user propagation characteristic corresponding with information data.
According to the classification of above-mentioned user, the user participating in Information Communication in the information data in database is added up according to their classification.The number of users of the various species of statistics is as user's propagation characteristic of this information.
Step 105: according to described user's propagation characteristic, determine the forecast model of social network information stream row degree.
Specifically, this step comprises: set up the multivariate linear model based on user characteristics; Described user is diffused information as training set, described linear model is trained, obtain social network information stream row degree forecast model.
In the present embodiment, using user's propagation characteristic of information as training set, use the method for linear regression, the model of information of forecasting popularity can be obtained.The algorithm of Gradient Descent can be used in computation process to obtain the weighing factor of user for Information Communication of each classification fast.
Step 106: adopt the forecast model obtained to analyze the information data produced in a period of time, information popularity is predicted.
The method of the prediction social network information stream row degree based on user characteristics that the present embodiment provides, take into full account the impact for Information Communication of the real-time of social network information and user characteristics, the mode of employing information Popularity prediction carries out information disclosure model description, can predict Information Communication as early as possible, decrease the hysteresis quality of classic method, control in time to provide help to the public sentiment of timely information pushing and social networks.Meanwhile, system of the present invention operationally internal memory cost is not high, has very high efficiency, has independence and portability.Comprehensive, the present invention can make prediction early to Information Communication popularity, and to the timely propelling movement of information, and the public sentiment of network controls all to be very helpful in time.
Below, another embodiment of the present invention is for Benq the forum of the ends of the earth in the method for the prediction social network information stream row degree of user characteristics, and the method comprises:
Step one: obtaining information data and user data.
The idiographic flow of this step is as follows:
Because ends of the earth forum does not provide the API of effective acquisition number certificate, so in the present embodiment, the method for writing web crawlers is adopted to come obtaining information data and user data.
Platform environment: Install and configure Microsoft SQL Server2008 database under 32 windows7 platforms, uses Microsoft Visual Studio 2010 to write web crawler PostCrawler.As shown in Figure 2, be the process flow diagram of the method for the acquisition user data that provides in the present embodiment and information data, the process of carrying out practically program is shown in following web crawler postCrawler and UserCrawler.
1) by arranging URL(uniform resource locator) (Uniform Resource Locator, is abbreviated as URL) URL pond to perform web crawler postCrawler on main frame.Each model has corresponding unique model ID, can be obtained the URL of model, can set up a URL pond by continuous print model ID by model ID, so just can continuous print obtaining information data and user data.But because some models are deleted by official, some models URL normally can not return the information of model, regular expression will be first used to screen effective information data before acquisition data.
PostCrawler is defined as follows:
2) reptile UserCrawler can be set by the reply user ID of model and obtain user data.Each user has corresponding user ID, can be found the URL of corresponding User Page by user ID.Just user basic information and historical act can be checked and stored in database by the calling party page by URL.
UserCrawler is defined as follows:
3) design of SQL Server 2008 database.The design field of information data and user data is as follows:
Information data: ID (model ID), hostID (user ID of posting), click (click volume), reply (reply volume), time (posting the time), userIDList (list of money order receipt to be signed and returned to the sender user ID)
Money order receipt to be signed and returned to the sender user ID list: userID (money order receipt to be signed and returned to the sender user ID), replyTime (money order receipt to be signed and returned to the sender time)
User data: ID (user ID), fans (bean vermicelli number), follows (concern number), posts (number of posting), replyPosts (money order receipt to be signed and returned to the sender number), registerDate (registration date), lastLoginDate (last login time), score (community's integration), logins (login times), topic (participating in plate quantity), age (using the age of ends of the earth forum), clusteringCoefficient (convergence factor), reciprocity (reciprocal coefficient), userType (class of subscriber).
Here, function PostCrawler () and UserCrawler () uses existing any programming language to realize the function of this function in existing operating system platform, obtaining information data and user data.
Step 2: extract effective user property feature from described user data, and calculate user behavior feature.
The user property feature user basic information that to be user can be required when registering account number fills in, these information some can be used for the classification of user as the feature of user.Do not need to fill in these information when ends of the earth forum registration account number, but for other social networks, by API and web crawlers with acquire these data.Can deleting before stored in database for invalid user property feature, also whole user property feature can be classified stored in only selecting effective user characteristics after database before carrying out classifying step.
The feature of generation when user behavior feature refers to that user carries out activity on social networks, some user behavior feature can directly be obtained by web crawlers and API, and other need to obtain by calculating.
For the user of ends of the earth forum, we can by calculating the age (using the age of ends of the earth forum) obtaining user, clusteringCoefficient (convergence factor), reciprocity (reciprocal coefficient).
Age refers to the user behavior feature of the time span that user is active in the forum of the ends of the earth, and namely user is from being registered to the last time logging in ends of the earth forum, and computing method are:
age=registerDate-lastLoginDate
ClusteringCoefficient is used for the user behavior feature of the interconnectivity weighed between user and their neighbours.If user A has paid close attention to user B and user C, so clusteringCoefficient has been exactly the probability that there is concern relation between user B and user C.ClusteringCoefficient is represented, G with C Δrepresent user A, user B, there is concern relation in user C, G Λrepresent and only have user A and there is concern relation between user B and user C, can computing formula be obtained:
C = 3 × G Δ 3 × G Δ + G Λ
Reciprocity is used for representing the probability mutually paid close attention between user, the ratio of the number of users of the number of users namely paid close attention to mutually with user i and all concerns of user i.Represent reciprocity with R, represent and the quantity that user i pays close attention to mutually with A, B represents can obtain computing formula by all numbers of users that user i pays close attention to:
R=A/B
So just obtain whole user characteristicses, by these user characteristicses stored in database.
Step 3: by CLA algorithm, user is classified according to user characteristics, being expressed as follows of CLA algorithm:
In CLA algorithm, K-means algorithm is the most widely used a kind of clustering algorithm based on dividing, and user is classified by the Euclidean distance calculating user characteristics.D (C 1, C 2) that represent is the cluster centre C of classification 1 1with the cluster centre C of classification 2 2between distance, namely wherein F is the quantity choosing user characteristics.T is in the algorithm the threshold parameter needing to arrange, if when the distance of two cluster centres is less than this threshold values, the classification at these two cluster centre places will merge.
By CLA algorithm, we can find suitable categorical measure K, and K cluster centre, and by these cluster centres, user just can be divided into K classification by the user characteristics of user by us.
Step 4: for ends of the earth forum, the user participating in Information Communication is exactly the user replied model, by the reply user ID list of model in database, can obtain the kind of all users that replies to the topic:
usertype:=argmin j||user i-C j|| 2(1)
Wherein user irefer to user, C jrepresent the cluster centre of class of subscriber j, by calculating user characteristics and central point apart from corresponding to the shortest cluster centre being exactly the userType of user.
Add up each classification to reply to the topic the quantity of user, using the user propagation characteristic of these values as model.
Step 5: first Definition Model be by a model v at t rin it, the historical record of user's reply volume is predicted at t t(t t>t r) reply volume of model whole in sky new model not only considers t rreply volume in it, also will consider that these replies are replied by those users.Can by calculating the label of user by the user clustering of all reply models by previous step.We set up based on showing in the multivariate linear model of user characteristics that the reply volume of early stage and later stage exists strong linear relationship and user behavior to the impact of model reply volume.We can define x i(v, t r) to be expressed as class of subscriber label userType be that the user of i is at t rreply number of times summation in it, we can obtain feature vector, X like this k(v, t r), it is expressed as X k(v, t r)=(x 1(v, t r), x 2(v, t r) ..., x k(v, t r)).We just obtain t like this tthe reply volume of models whole in it forecast model:
N ^ ( v , t r , t t ) = Γ ( k , t r , t t ) · X k ( v , t r ) - - - ( 2 )
Wherein parameter the parameter vector of this model, wherein γ ibe exactly the affecting parameters of reply quantity for prediction of the user of classification i.This parameter vector depends on t r, t tand k.
In order to computation model parameter vector we introduce mean Relative Squared Error (mRSE), this is the important indicator of evaluation prediction model, and when the performance of the minimum value more hour model of mRSE is better, mRSE is expressed as:
mRSE = 1 | C | · Σ v ∈ C ( N ^ ( V , t r , t t ) N ( v , t t ) - 1 ) 2 - - - ( 3 )
Our formula (2) is updated in formula (3) we can obtain:
mRES = 1 | C | Σ v ∈ C ( Γ ( k , t r , t t ) · X k ( v , t r ) N ( v , t t ) - 1 ) 2 - - - ( 4 )
By providing training sample set C and arranging t rand t t, when using the method for Gradient Descent just can train the minimum value when mRSE gets, parameter vector value.Such problem is just converted into:
arg min 1 | C | Σ v ∈ C ( Γ ( k , t r , t t ) · X k ( v , t r ) N ( v , t t ) - 1 ) 2 - - - ( 5 )
So just can pass through will the forecast model that formula (2) describes is to predict the reply volume in notice's future.
As shown in Figure 3, the structural drawing of the system of the prediction social network information stream row degree based on user characteristics provided for one embodiment of the invention, this system comprises: acquisition module 301, characteristic extracting module 302, sort module 303, processing module 304, forecast model module 305 and forecast model 306.
Acquisition module 301, for for obtaining information data in Preset Time in social networks and the user data corresponding with described information data, described user data comprises multiple user property feature.
Characteristic extracting module 302, for the user property feature of Extraction parts from described user data, and according to described user data, obtains the user behavior feature of described user data.
Sort module 303, for being classified by user data according to user behavior feature described in described user property characteristic sum, obtains the classification of user in user data.
Processing module 304, for the classification according to user in described information data and described user data, obtains the user propagation characteristic corresponding with described information data.
Forecast model module 305, for according to described user's propagation characteristic, determines the forecast model of social network information stream row degree.
Prediction module 306, for adopting described forecast model to analyze the information data produced in a period of time, predicts information popularity.
Further, described system also comprises: memory module, for by described user data and described information data storing in database.
Specifically, as shown in Figure 4, sort module 303 comprises normalization submodule 401 and chemoattractant molecule module 402.
Normalization submodule 401, for being normalized user behavior feature described in described user property characteristic sum, obtains user characteristics;
Divide submodule 402, for according to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
Specifically, as shown in Figure 5, forecast model module 305 comprises Modling model submodule 501 and training submodule 502.
Modling model submodule 501, for setting up the multivariate linear model based on user characteristics;
Training submodule 502, for diffusing information as training set using described user, training described linear model, obtaining social network information stream row degree forecast model.
Above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that; It still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1., based on a method for the prediction social network information stream row degree of user characteristics, it is characterized in that, the method comprises:
Obtain the information data in Preset Time in social networks and the user data corresponding with described information data, described user data comprises multiple user property feature;
Extraction parts user property feature from described user data, and according to described user data, obtain the user behavior feature of described user data;
According to user behavior feature described in described user property characteristic sum, user data is classified, obtain the classification of user in user data;
According to the classification of user in described information data and described user data, obtain the user propagation characteristic corresponding with described information data;
According to described user's propagation characteristic, determine the forecast model of social network information stream row degree;
Adopt described forecast model to analyze the information data produced in a period of time, information popularity is predicted.
2. method according to claim 1, is characterized in that, after the information data in described acquisition Preset Time in social networks and the user data step corresponding with described information data, the method also comprises:
By described user data and described information data storing in database.
3. method according to claim 1, is characterized in that, the information data in described acquisition Preset Time in social networks and the user data corresponding with described information data, comprising:
Web crawlers is adopted to obtain user data and the information data of forum's class social networks;
Application programming interface API is adopted to obtain user data and the information data of microblogging class social networks;
Adopt web crawlers to obtain the user data of community's class social networks, adopt the clipbook of user to obtain the information data of community's class social networks.
4. method according to claim 1, is characterized in that, is describedly classified by user data according to user behavior feature described in described user property characteristic sum, obtains the classification of user in user data, comprising:
User behavior feature described in described user property characteristic sum is normalized, obtains user characteristics;
According to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
5. method according to claim 4, is characterized in that, user data is classified by described employing clustering algorithm, comprising:
User data is divided into two classes, and calculates the distance of class center, if the distance of class center is less than preset value, then these two classifications are fused into a classification;
Classification is continued to user data of all categories, and calculates the distance at center of all categories, until stop classification when occurring that the user data of three classifications is fused into a classification, obtain the classification of user.
6. method according to claim 1, is characterized in that, described according to described user's propagation characteristic, determines the forecast model of social network information stream row degree, comprising:
Set up the multivariate linear model based on user characteristics;
Described user is diffused information as training set, described linear model is trained, obtain social network information stream row degree forecast model.
7., based on a system for the prediction social network information stream row degree of user characteristics, it is characterized in that, this system comprises:
Acquisition module, for obtaining information data in Preset Time in social networks and the user data corresponding with described information data, described user data comprises multiple user property feature;
Characteristic extracting module, for the user property feature of Extraction parts from described user data, and according to described user data, obtains the user behavior feature of described user data;
Sort module, for being classified by user data according to user behavior feature described in described user property characteristic sum, obtains the classification of user in user data;
Processing module, for the classification according to user in described information data and described user data, obtains the user propagation characteristic corresponding with described information data;
Forecast model module, for according to described user's propagation characteristic, determines the forecast model of social network information stream row degree;
Prediction module, for adopting described forecast model to analyze the information data produced in a period of time, predicts information popularity.
8. system according to claim 7, is characterized in that, described system also comprises:
Memory module, for by described user data and described information data storing in database.
9. system according to claim 7, is characterized in that, described sort module comprises:
Normalization submodule, for being normalized user behavior feature described in described user property characteristic sum, obtains user characteristics;
Divide submodule, for according to described user characteristics, adopt clustering algorithm to be classified by user data, obtain the classification of user in user data.
10. system according to claim 7, is characterized in that, described forecast model module comprises:
Modling model submodule, for setting up the multivariate linear model based on user characteristics;
Training submodule, for diffusing information as training set using described user, training described linear model, obtaining social network information stream row degree forecast model.
CN201410472689.8A 2014-09-16 2014-09-16 The method and system of prediction social network information stream row degree based on user characteristics Expired - Fee Related CN104281882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410472689.8A CN104281882B (en) 2014-09-16 2014-09-16 The method and system of prediction social network information stream row degree based on user characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410472689.8A CN104281882B (en) 2014-09-16 2014-09-16 The method and system of prediction social network information stream row degree based on user characteristics

Publications (2)

Publication Number Publication Date
CN104281882A true CN104281882A (en) 2015-01-14
CN104281882B CN104281882B (en) 2017-09-15

Family

ID=52256742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410472689.8A Expired - Fee Related CN104281882B (en) 2014-09-16 2014-09-16 The method and system of prediction social network information stream row degree based on user characteristics

Country Status (1)

Country Link
CN (1) CN104281882B (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933622A (en) * 2015-03-12 2015-09-23 中国科学院计算技术研究所 Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN105337773A (en) * 2015-11-19 2016-02-17 南京邮电大学 ReciprocityRank algorithm based microblogging network influence node discovering method
CN105488599A (en) * 2015-12-29 2016-04-13 杭州数梦工场科技有限公司 Method and device of prediction of article popularity
CN105869022A (en) * 2016-04-07 2016-08-17 腾讯科技(深圳)有限公司 Application popularity prediction method and apparatus
CN105930540A (en) * 2016-03-23 2016-09-07 四川长虹电器股份有限公司 Data processing system
CN106127521A (en) * 2016-03-23 2016-11-16 四川长虹电器股份有限公司 A kind of information processing method and data handling system
CN106204101A (en) * 2016-03-23 2016-12-07 四川长虹电器股份有限公司 A kind of collecting method and data handling system
CN106202218A (en) * 2016-03-23 2016-12-07 四川长虹电器股份有限公司 A kind of data processing method and data handling system
CN106294508A (en) * 2015-06-10 2017-01-04 深圳市腾讯计算机系统有限公司 A kind of brush amount tool detection method and device
CN106294601A (en) * 2016-07-28 2017-01-04 腾讯科技(深圳)有限公司 Data processing method and device
CN106295844A (en) * 2015-06-12 2017-01-04 华为技术有限公司 A kind of data processing method, device, system and electronic equipment
CN106407455A (en) * 2016-09-30 2017-02-15 深圳市华傲数据技术有限公司 Data processing method and device based on graph data mining
CN106411711A (en) * 2016-10-20 2017-02-15 宁波江东大金佰汇信息技术有限公司 Improved temporary social network determination system based on computer big data
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method
CN106453495A (en) * 2016-08-31 2017-02-22 北京邮电大学 Information centric networking caching method based on content popularity prediction
CN106651605A (en) * 2016-10-20 2017-05-10 宁波江东大金佰汇信息技术有限公司 Computer big data-based temporary social network determining system
CN107403389A (en) * 2017-07-17 2017-11-28 广州特道信息科技有限公司 The method for digging and device of the potential feature of microblog users
CN107506870A (en) * 2017-09-06 2017-12-22 国家电网公司 A kind of electric service hotspot prediction method based on hot word
CN107688966A (en) * 2017-08-22 2018-02-13 北京京东尚科信息技术有限公司 Data processing method and its system and non-volatile memory medium
CN108228595A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Speculate the method and system for obtaining user property
CN108776840A (en) * 2018-04-28 2018-11-09 拉卡拉支付股份有限公司 Information flow method for pushing, device, electronic equipment and computer readable storage medium
CN109086932A (en) * 2018-08-02 2018-12-25 广东工业大学 A kind of prediction technique, system and the device of media information prevalence degree
CN109242552A (en) * 2018-08-22 2019-01-18 重庆邮电大学 A kind of retail shop's localization method based on big data
WO2019019348A1 (en) * 2017-07-27 2019-01-31 上海壹账通金融科技有限公司 Product information pushing method and apparatus, storage medium, and computer device
CN109788056A (en) * 2019-01-10 2019-05-21 四川新网银行股份有限公司 User's theme message method for pushing and system based on clustering
CN109951317A (en) * 2019-02-18 2019-06-28 大连大学 A kind of buffer replacing method of the popularity sensor model based on user's driving
CN110168534A (en) * 2017-04-22 2019-08-23 维思有限公司 Method and system for Test driver bilayer graph model
CN110704754A (en) * 2019-10-18 2020-01-17 支付宝(杭州)信息技术有限公司 Push model optimization method and device executed by user terminal
CN111159569A (en) * 2019-12-13 2020-05-15 西安交通大学 Social network user behavior prediction method based on user personalized features
CN112036659A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social network media information popularity prediction method based on combination strategy
CN112330016A (en) * 2020-11-04 2021-02-05 广东工业大学 Social network user behavior prediction method based on ensemble learning
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394798A (en) * 2011-11-16 2012-03-28 北京交通大学 Multi-feature based prediction method of propagation behavior of microblog information and system thereof
CN103258248A (en) * 2013-05-21 2013-08-21 中国科学院计算技术研究所 Method, device and system for predicting microblog fashion trend
CN104008150A (en) * 2014-05-20 2014-08-27 中国科学院信息工程研究所 Method and system for predicting social network information transmission trend

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394798A (en) * 2011-11-16 2012-03-28 北京交通大学 Multi-feature based prediction method of propagation behavior of microblog information and system thereof
CN103258248A (en) * 2013-05-21 2013-08-21 中国科学院计算技术研究所 Method, device and system for predicting microblog fashion trend
CN104008150A (en) * 2014-05-20 2014-08-27 中国科学院信息工程研究所 Method and system for predicting social network information transmission trend

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MAIA, M. ET AL.: "Identifying user behavior in online social networks", 《(2008,APRIL)IN PROCEEDINGS OF THE 1ST WORKSHOP ON SOCIAL NETWORK SYSTEMS. ACM》 *
PINTO, H. ET AL.: "Using early view patterns to predict the popularity of youtube videos", 《(2013,FEBRUARY)IN PROCEEDINGS OF THE SIXTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING. ACM》 *
赵峥: "基于两种改进的聚类算法对新浪微博用户信息的研究", 《中国优秀硕士学位论文全文数据库 社会科学II辑》 *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933622A (en) * 2015-03-12 2015-09-23 中国科学院计算技术研究所 Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN106294508A (en) * 2015-06-10 2017-01-04 深圳市腾讯计算机系统有限公司 A kind of brush amount tool detection method and device
CN106294508B (en) * 2015-06-10 2020-02-11 深圳市腾讯计算机系统有限公司 Brushing amount tool detection method and device
CN106295844A (en) * 2015-06-12 2017-01-04 华为技术有限公司 A kind of data processing method, device, system and electronic equipment
CN105337773A (en) * 2015-11-19 2016-02-17 南京邮电大学 ReciprocityRank algorithm based microblogging network influence node discovering method
CN105337773B (en) * 2015-11-19 2018-06-05 南京邮电大学 Micro blog network influence power node discovery method based on ReciprocityRank algorithms
CN105488599A (en) * 2015-12-29 2016-04-13 杭州数梦工场科技有限公司 Method and device of prediction of article popularity
CN105488599B (en) * 2015-12-29 2020-03-06 杭州数梦工场科技有限公司 Method and device for predicting article popularity
CN105930540A (en) * 2016-03-23 2016-09-07 四川长虹电器股份有限公司 Data processing system
CN106202218A (en) * 2016-03-23 2016-12-07 四川长虹电器股份有限公司 A kind of data processing method and data handling system
CN106204101A (en) * 2016-03-23 2016-12-07 四川长虹电器股份有限公司 A kind of collecting method and data handling system
CN106127521A (en) * 2016-03-23 2016-11-16 四川长虹电器股份有限公司 A kind of information processing method and data handling system
CN105869022B (en) * 2016-04-07 2020-10-23 腾讯科技(深圳)有限公司 Application popularity prediction method and device
CN105869022A (en) * 2016-04-07 2016-08-17 腾讯科技(深圳)有限公司 Application popularity prediction method and apparatus
CN106294601A (en) * 2016-07-28 2017-01-04 腾讯科技(深圳)有限公司 Data processing method and device
CN106294601B (en) * 2016-07-28 2020-11-10 腾讯科技(深圳)有限公司 Data processing method and device
CN106453495A (en) * 2016-08-31 2017-02-22 北京邮电大学 Information centric networking caching method based on content popularity prediction
CN106453495B (en) * 2016-08-31 2019-02-19 北京邮电大学 A kind of information centre's network-caching method based on content popularit prediction
CN106407455A (en) * 2016-09-30 2017-02-15 深圳市华傲数据技术有限公司 Data processing method and device based on graph data mining
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method
CN106446191B (en) * 2016-09-30 2019-11-05 浙江工业大学 A kind of multiple features network flow row label prediction technique returned based on Logistic
CN106651605B (en) * 2016-10-20 2019-11-15 福州盛世凌云环保科技有限公司 A kind of temporary social network based on computer big data determines system
CN106411711A (en) * 2016-10-20 2017-02-15 宁波江东大金佰汇信息技术有限公司 Improved temporary social network determination system based on computer big data
CN106651605A (en) * 2016-10-20 2017-05-10 宁波江东大金佰汇信息技术有限公司 Computer big data-based temporary social network determining system
CN108228595A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Speculate the method and system for obtaining user property
CN110168534B (en) * 2017-04-22 2023-06-30 维思有限公司 Method and system for testing a driven bilayer graphics model
CN110168534A (en) * 2017-04-22 2019-08-23 维思有限公司 Method and system for Test driver bilayer graph model
CN107403389A (en) * 2017-07-17 2017-11-28 广州特道信息科技有限公司 The method for digging and device of the potential feature of microblog users
WO2019019348A1 (en) * 2017-07-27 2019-01-31 上海壹账通金融科技有限公司 Product information pushing method and apparatus, storage medium, and computer device
CN107688966A (en) * 2017-08-22 2018-02-13 北京京东尚科信息技术有限公司 Data processing method and its system and non-volatile memory medium
CN107506870A (en) * 2017-09-06 2017-12-22 国家电网公司 A kind of electric service hotspot prediction method based on hot word
CN108776840B (en) * 2018-04-28 2024-04-02 拉卡拉支付股份有限公司 Information stream pushing method and device, electronic equipment and computer readable storage medium
CN108776840A (en) * 2018-04-28 2018-11-09 拉卡拉支付股份有限公司 Information flow method for pushing, device, electronic equipment and computer readable storage medium
CN109086932A (en) * 2018-08-02 2018-12-25 广东工业大学 A kind of prediction technique, system and the device of media information prevalence degree
CN109242552B (en) * 2018-08-22 2020-09-29 重庆邮电大学 Shop positioning method based on big data
CN109242552A (en) * 2018-08-22 2019-01-18 重庆邮电大学 A kind of retail shop's localization method based on big data
CN109788056A (en) * 2019-01-10 2019-05-21 四川新网银行股份有限公司 User's theme message method for pushing and system based on clustering
CN109951317B (en) * 2019-02-18 2022-04-05 大连大学 User-driven popularity perception model-based cache replacement method
CN109951317A (en) * 2019-02-18 2019-06-28 大连大学 A kind of buffer replacing method of the popularity sensor model based on user's driving
CN110704754A (en) * 2019-10-18 2020-01-17 支付宝(杭州)信息技术有限公司 Push model optimization method and device executed by user terminal
CN110704754B (en) * 2019-10-18 2023-03-28 支付宝(杭州)信息技术有限公司 Push model optimization method and device executed by user terminal
CN111159569A (en) * 2019-12-13 2020-05-15 西安交通大学 Social network user behavior prediction method based on user personalized features
CN112036659A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social network media information popularity prediction method based on combination strategy
CN112330016A (en) * 2020-11-04 2021-02-05 广东工业大学 Social network user behavior prediction method based on ensemble learning
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis

Also Published As

Publication number Publication date
CN104281882B (en) 2017-09-15

Similar Documents

Publication Publication Date Title
CN104281882A (en) Method and system for predicting social network information popularity on basis of user characteristics
US11659050B2 (en) Discovering signature of electronic social networks
Dahal et al. Topic modeling and sentiment analysis of global climate change tweets
CN104866969A (en) Personal credit data processing method and device
CN111125453B (en) Opinion leader role identification method in social network based on subgraph isomorphism and storage medium
CN103795613B (en) Method for predicting friend relationships in online social network
WO2016161976A1 (en) Method and device for selecting data content to be pushed to terminals
CN109906451A (en) Use the similarity searching of polyphone
CN113626719A (en) Information recommendation method, device, equipment, storage medium and computer program product
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN111275503B (en) Data processing method and device for obtaining recall success rate of lost user
US20140188994A1 (en) Social Neighborhood Determination
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
CN107146112A (en) A kind of mobile Internet advertisement placement method
US20150324844A1 (en) Advertising marketplace systems and methods
Wismans et al. Improving a priori demand estimates transport models using mobile phone data: a Rotterdam-region case
CN111797320B (en) Data processing method, device, equipment and storage medium
US20190057404A1 (en) Jobs forecasting
CN110766438A (en) Method for analyzing user behaviors of power grid users through artificial intelligence
CN107633257B (en) Data quality evaluation method and device, computer readable storage medium and terminal
CN112365007A (en) Model parameter determination method, device, equipment and storage medium
CN107784511A (en) A kind of customer loss Forecasting Methodology and device
CN110555713A (en) method and device for determining sales prediction model
CN111552882B (en) News influence calculation method and device, computer equipment and storage medium
CN111882224A (en) Method and device for classifying consumption scenes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170915

Termination date: 20180916

CF01 Termination of patent right due to non-payment of annual fee