CN109242710A - Social networks node influence power sort method and system - Google Patents
Social networks node influence power sort method and system Download PDFInfo
- Publication number
- CN109242710A CN109242710A CN201810931729.9A CN201810931729A CN109242710A CN 109242710 A CN109242710 A CN 109242710A CN 201810931729 A CN201810931729 A CN 201810931729A CN 109242710 A CN109242710 A CN 109242710A
- Authority
- CN
- China
- Prior art keywords
- user
- model
- information
- test
- transfer matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000012549 training Methods 0.000 claims abstract description 139
- 238000012360 testing method Methods 0.000 claims abstract description 121
- 239000011159 matrix material Substances 0.000 claims abstract description 115
- 238000012546 transfer Methods 0.000 claims abstract description 97
- 230000008569 process Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000004088 simulation Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 74
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 claims description 21
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 15
- 238000002474 experimental method Methods 0.000 claims description 11
- 230000000694 effects Effects 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 abstract description 13
- 230000010365 information processing Effects 0.000 abstract description 2
- 208000015181 infectious disease Diseases 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 230000007704 transition Effects 0.000 description 11
- 244000046052 Phaseolus vulgaris Species 0.000 description 9
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 9
- 241000208340 Araliaceae Species 0.000 description 4
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 4
- 235000003140 Panax quinquefolius Nutrition 0.000 description 4
- 244000097202 Rathbunia alamosensis Species 0.000 description 4
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 235000008434 ginseng Nutrition 0.000 description 4
- 238000012804 iterative process Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 3
- 238000003012 network analysis Methods 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 244000207740 Lemna minor Species 0.000 description 1
- 235000006439 Lemna minor Nutrition 0.000 description 1
- 235000001855 Portulaca oleracea Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of social networks node influence power sort method and systems, it is related to digital information processing field, information that this method collects individual subscriber home tip first, user posts and user are to information, information is posted to personal homepage information, user and user pre-processes information, forms training set and test set;Then according to training set, the transfer matrix model of model is established, simulation calculation is carried out to transfer matrix model, obtains optimal training parameter;It finally combines optimal training parameter to establish the test transfer matrix of model forwarding according to test set, test transfer matrix is calculated, the social networks node influence power ranking results are obtained.The present invention is it can be found that concealed nodes pay close attention to possibility, so that the data network imperfect to multidate information, missing is more serious carries out influence power ranking analysis;Social networks node influence power more can be accurately analyzed because of missing social network data.
Description
Technical field
The present invention relates to digital information processing fields, and in particular to a kind of social networks node influence power sort method
And system.
Background technique
Information age, the analysis of human relationship are to measure individual value, promote Related product, realize public sentiment monitoring,
One of the important foundation stone of the related construction of planning.End in March, 2017 according to statistics, Android and apple application market possess about altogether
5000000 sections of APP, how preferably to recommend APP to user is the critical problem for promoting user experience and increasing enterprise's business revenue,
But existing APP Generalization bounds are based primarily upon userspersonal information, and there is no consider that user group's influence power is to it in social networks
The influence of generation.However user will necessarily be influenced when downloading or buying APP by social networks good friend.Therefore, personalized
The income of the social influence power between the network user is incorporated in proposed algorithm and be all social network analysis institute the problems such as how to incorporate
The problem of concern.Social networks is to include under personal line at one's side with regard to friend-making relational network, also includes social application foundation on line
Social networks, this kind of network can be divided into the weak linked network of unidirectional concern formula and the strong linked network of two-way good friend's formula again.
The method for paying close attention to human relationship is gradually diffused into information interconnected network from traditional sociology and psychology mode
Field is achieved pair by feat of bulk information acquisition capability and mass data mining algorithm and relevance ranking rank algorithm
Social network analysis on line.
User can show to influence power is exchanged on line by the mutual activity situation of user, i.e., the network user is dynamic
Work and thinking are influenced to change by remaining human action and thinking.And in network capability of influence compared with big people network struction, expanding,
It is posted etc. in multiple step links and plays the huge key effect of impact effect.Therefore, how to evaluate social network user influences energy
Power is ranked up user in community network, and obtaining great influence power user node is network individual influence discussion on line
The most basic problem requirement in the inside.Community network interior joint influence power and sequence be often subsequent relevant community network discussion with grind
Study carefully basis.
Network node influence power and the Early analysis method of sequence are mainly using non-networkization and digitization mode, for example ask
Volume is filled in, telephone poll etc., and this kind of mode obtains that data is few, and time delay is big, and there are problems.
With the rapid development of Internet technology and personal mobile network's technology, made by social networks mass data on line
For data support, usage mode mainly includes analysis concern relation network structure, is posted record and User Activity/content meaning of a word is distinguished
Analysis, is posted possibility to message to extract user, and the number of successful spread pair learns shadow between sequencing statistical user accordingly
Power is rung, and estimates the probability of spreading between user as influence power by Bernoulli Jacob's model and Jie Kaerde exponential model.
In the meantime, many outstanding algorithms are suggested and apply, if Pagerank algorithm is to all nodes to a phase
Then same initial algorithm value carries out number wheel iteration, algorithm values are basically unchanged after iteration, and the algorithm values of node are exactly at this time
The algorithm values of final ranking foundation, the value is bigger, and node influence power is bigger.Since Pagerank is not unique etc. solving sequence
Prime number amendment has been used when problem, has caused matrix structure that moderate finite deformation occurs, therefore has improved LeaderRank algorithm and is suggested, more
It reduces amendment well to affect, guarantees that result is reliable.The analysis of community network is a complicated problem, be not depend merely on it is a certain
Kind method can solve, but needs to comprehensively consider various factors and optimize combination, identify final social role and shadow
Ring power.
As network security and information leakage risk are increasingly taken seriously, previous algorithm encounters one in information collection
A little problems, for crawling information to Sina weibo crawler, at present under usual manner, microblogging avoids to protect user information
Information leakage risk, does not allow to crawl concern people and bean vermicelli information, and user pays close attention to collect and establish facing to complex situations.And
Be also increasingly specification and stringent for user's history content protecting of posting, this cause possibly can not to obtain many users post and
It is posted information.And these information are often classical influence power parser necessity data.
Summary of the invention
The purpose of the present invention is to provide one kind can integrate social network user information, accurate judgement social activity user node
Influence power and its method and system of sequence, to solve existing social networks node influence power present in above-mentioned background technique point
The technical problem that analysis method Consideration is unilateral, result is inaccurate.
To achieve the goals above, this invention takes following technical solutions:
On the one hand, the present invention provides a kind of social networks node influence power sort method, and this method includes following process step
It is rapid:
Step S110: individual subscriber home tip, user are collected and posts information and user to information, to the personal homepage
Information, the user post information and the user pre-processes information, form training set and test set;
Step S120: according to the training set, establishing the transfer matrix model of model, carries out to the transfer matrix model
Simulation calculation obtains optimal training parameter;
Step S130: square is shifted in conjunction with the test that the optimal training parameter establishes model forwarding according to the test set
Battle array, calculates the test transfer matrix, obtains the social networks node influence power ranking results.
Further, the step S110 is specifically included:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, described personal main
Page information is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information
With training set user to information;The test set includes test set family personal information and test set user to information.
Further, in the step S120, the transfer matrix model for establishing model according to the training set is specific
Include:
Step S121: determine the user of model forwarding in the training set to impact factor f1:
Wherein, IUIndicate that user U is concerned number, SVIndicate the concern number of user V;
Step S122: user's itself affect factor f that model forwards in the training set is determined2:
Wherein, X indicates the tradeoff that is posted and comments on significance level of the social networks to user U, MUIndicate posting for user U
Sum, TUIndicate that user U's enlivens duration, ZUIndicate the model of user U is turned note number, PUIndicate being commented for the model of user U
By number;
Step S123: total impact factor f that model forwards in the training set is determineduv:
fuv=1-exp (- (f1)m×(f2)1-m), wherein m indicates training parameter, i.e. f1And f2Tradeoff parameter;
Step S124: K-shell decomposition algorithm is utilized, the probability that user U in the training set forwards oneself model is obtained
puu:
Wherein, n indicates user node number, KsuIndicate the K-shell value of user U;
Step S125: determine the model of user U in the training set is forwarded probability Puv:
Step S126: according to puuAnd Puv, obtain the trained transfer matrix P:
Further, described that the optimal training ginseng of emulation acquisition is carried out to the transfer matrix model in the step S120
Number specifically includes:
Number average ranking is forwarded according to model successively to choose in the training set C user and corresponding true turn
Paste number Mc, multiple and different m values corresponds to multiple and different training transfer matrix P, using independent cascade model respectively to each P into
Row propagates emulation experiment, and the expectation for obtaining C user is averagely posted several Fc;
Determine error MAPE value:
Wherein, c={ 1 ..., C };
Select the corresponding training parameter of the smallest P of MAPE value as optimal training parameter.
Further, described that note is established in conjunction with the optimal training parameter according to the test set in the step S130
The test transfer matrix of son forwarding specifically includes:
According to the test set, select the optimal training parameter as f1And f2Tradeoff parameter, according to the step
The method of S121- step S126 establishes the test transfer matrix.
Further, described that the test transfer matrix is carried out to calculate the acquisition social network in the step S130
Network node influence power ranking results specifically include:
If the initial value of each user's value vector St is 1, stable convergence value is obtained using Markov iteration, was calculated
Journey are as follows:
St=(1 ... 1)1×n×Pm,
Following procedure is repeated, when user is worth vector Euclid norm error less than predetermined accuracy twice for front and back, is stopped
Only iterative process obtains stable convergence algorithm values S:
S=St1×n×Pm,
Using each entry value of gained stable convergence algorithm values S as the algorithm values of each user, its size is compared, described in acquisition
Social networks node influence power ranking results.
On the other hand, the present invention provides a kind of social networks node influence power ordering system, which includes:
Data preprocessing module, for collecting individual subscriber home tip, user post information and user to information, to institute
It states personal homepage information and user information of posting pre-processes, form training set and test set;
Training module, for the transfer matrix model of model being established, to the transfer matrix model according to the training set
Simulation calculation is carried out, optimal training parameter is obtained;
Test module, for being shifted according to the test set in conjunction with the test that the optimal training parameter establishes model forwarding
Matrix calculates the test transfer matrix, obtains the social networks node influence power ranking results.
Further, the data preprocessing module specifically includes:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, described personal main
Page information is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information
With training set user to information;The test set includes test set family personal information and test set user to information.
Further, the training module includes:
User is to factor of influence determining unit, for being concerned number and another according to user in the user couple
The concern number of a user determines the user of model forwarding to impact factor;
User's itself affect factor specifying unit, for according to one user post sum, enliven duration, model
Be forwarded several and model by comment number, determine user's itself affect factor of the model forwarding;
Total factor of influence determining unit, for according to the user to impact factor and user's itself affect factor
Weigh parameter, determines total impact factor of model forwarding;
User, for utilizing K-shell decomposition algorithm, obtains one user and forwards institute from probability determining unit is forwarded
State the probability of model;
Model is forwarded probability determining unit, for determine one user model by other forever user forward
Probability;
Transfer matrix model foundation unit, for forwarding the probability of oneself model and the quilt of the model according to the user
Forwarding probability establishes the transfer matrix model of model;
Optimal training parameter establishes unit, imitative for propagate to transfer matrix model respectively using independent cascade model
True experiment obtains expectation and is averagely posted number, determines error MAPE value, selects the corresponding training ginseng of the smallest transfer matrix of MAPE value
Number is used as the optimal training parameter;
Further, the test module includes:
Test transfer matrix establishes unit, turns for establishing test in conjunction with the optimal training parameter according to the test set
Move matrix;
Influence power, which sorts, establishes unit, obtains the social networks node for calculate to the test transfer matrix
Influence power ranking results.
The invention has the advantages that: it can be found that concealed nodes pay close attention to possibility, so as to imperfect to multidate information, scarce
It loses more serious data network and carries out influence power ranking analysis, provide general algorithm because of benefit when missing data can not be analyzed
Scheme is filled, social networks node influence power is more accurately analyzed.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others
Attached drawing.
Fig. 1 is social networks node influence power ordering system functional block diagram described in the embodiment of the present invention one.
Fig. 2 is social networks node influence power sort method flow diagram described in the embodiment of the present invention two.
Fig. 3 is social networks node influence power ordering system functional block diagram described in the embodiment of the present invention three.
Fig. 4 is posted situation schematic diagram with corresponding user for training set K-shell value described in the embodiment of the present invention four.
Fig. 5 is the calculated result figure of optimal transfer matrix parameter m described in the embodiment of the present invention four.
Fig. 6 is posted situation schematic diagram with corresponding user for test set K-shell value described in the embodiment of the present invention four.
Fig. 7 is contrast images compliance test result figure described in the embodiment of the present invention five.
Fig. 8 is that comparison kendall described in the embodiment of the present invention five examines proof diagram.
Fig. 9 is the specific ranking proof diagram of comparison described in the embodiment of the present invention five.
Figure 10 is social networks node influence power sort method flow diagram described in the embodiment of the present invention four.
Specific embodiment
Embodiments of the present invention are described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein from beginning
Same or similar element or module with the same or similar functions are indicated to same or similar label eventually.Below by ginseng
The embodiment for examining attached drawing description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art
Language and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also
Understand, those terms such as defined in the general dictionary, which should be understood that, to be had and the meaning in the context of the prior art
The consistent meaning of justice, and unless defined as here, it will not be explained in an idealized or overly formal meaning.
In order to facilitate understanding of embodiments of the present invention, further by taking specific embodiment as an example below in conjunction with attached drawing to be solved
Explanation is released, and embodiment does not constitute the restriction to the embodiment of the present invention.
Those of ordinary skill in the art are it should be understood that attached drawing is the schematic diagram of one embodiment, the portion in attached drawing
Part or device are not necessarily implemented necessary to the present invention.
Embodiment one
As shown in Figure 1, the embodiment of the present invention one provides a kind of social networks node influence power ordering system, the system packet
It includes:
Data preprocessing module, for collecting individual subscriber home tip, user post information and user to information, to institute
It states personal homepage information and user information of posting pre-processes, form training set and test set;
Training module, for the transfer matrix model of model being established, to the transfer matrix model according to the training set
Simulation calculation is carried out, optimal training parameter is obtained;
Test module, for being shifted according to the test set in conjunction with the test that the optimal training parameter establishes model forwarding
Matrix calculates the test transfer matrix, obtains the social networks node influence power ranking results.
In the specific embodiment of the invention one, the data preprocessing module is specifically included:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, described personal main
Page information is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information
With training set user to information;The test set includes test set family personal information and test set user to information.
Data preprocessing module described in the embodiment of the present invention one in practical applications, is mainly used for obtaining data set, institute
The data stated include 3 classes: the first kind is personal homepage information, is posted sum, user gradation or work including at least User ID, user
Jump duration, user's bean vermicelli (user's first concern user's second then first be referred to as second bean vermicelli) number, user pay close attention to people (user's first concern user
Second then second be referred to as first concern people) number;Second class is that user posts information, being forwarded number and commented including at least part model
By number;Third class is concern relation between user, including at least the concern relation between certain customers;
Data set is simply pre-processed, form needed for generating, the simple cleaning such as advertisement filter is carried out to data set, is pressed
Demand is cut into training set and test set, then generates following demand form respectively: processing personal homepage statistical information table closes
And third class data and primary sources, the interior addition user of table are averagely posted several and average by comment several two;Cleaning is used
Family concern relation table, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table.
In specific embodiments of the present invention one, the training module includes:
User is to factor of influence determining unit, for being concerned number and another according to user in the user couple
The concern number of a user determines the user of model forwarding to impact factor;
User's itself affect factor specifying unit, for according to one user post sum, enliven duration, model
Be forwarded several and model by comment number, determine user's itself affect factor of the model forwarding;
Total factor of influence determining unit, for according to the user to impact factor and user's itself affect factor
Weigh parameter, determines total impact factor of model forwarding;
User, for utilizing K-shell decomposition algorithm, obtains one user and forwards institute from probability determining unit is forwarded
State the probability of model;
Model is forwarded probability determining unit, for determine one user model by other forever user forward
Probability;
Transfer matrix model foundation unit, for forwarding the probability of oneself model and the quilt of the model according to the user
Forwarding probability establishes the transfer matrix model of model;
Optimal training parameter establishes unit, imitative for propagate to transfer matrix model respectively using independent cascade model
True experiment obtains expectation and is averagely posted number, determines error MAPE value, selects the corresponding training ginseng of the smallest transfer matrix of MAPE value
Number is used as the optimal training parameter;
In specific embodiments of the present invention one, the test module includes:
Test transfer matrix establishes unit, turns for establishing test in conjunction with the optimal training parameter according to the test set
Move matrix;
Influence power, which sorts, establishes unit, obtains the social networks node for calculate to the test transfer matrix
Influence power ranking results.
Embodiment two
As shown in Fig. 2, it is provided by Embodiment 2 of the present invention it is a kind of utilize system described in embodiment one carry out social networks
The method of node influence power sequence, this method includes following process step:
Step S110: individual subscriber home tip, user are collected and posts information and user to information, to the personal homepage
Information and the user information of posting pre-process, and form training set and test set;
Step S120: according to the training set, establishing the transfer matrix model of model, carries out to the transfer matrix model
Simulation calculation obtains optimal training parameter;
Step S130: square is shifted in conjunction with the test that the optimal training parameter establishes model forwarding according to the test set
Battle array, calculates the test transfer matrix, obtains the social networks node influence power ranking results.
In specific embodiments of the present invention two, the step S110 is specifically included:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, described personal main
Page information is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information
With training set user to information;The test set includes test set family personal information and test set user to information.
It is described that model is established according to the training set in the step S120 in specific embodiments of the present invention two
Transfer matrix model specifically includes:
Step S121: determine the user of model forwarding in the training set to impact factor f1:
Wherein, IUIndicate that user U is concerned number, SVIndicate the concern number of user V;
Step S122: user's itself affect factor f that model forwards in the training set is determined2:
Wherein, X indicates the tradeoff that is posted and comments on significance level of the social networks to user U, MUIndicate posting for user U
Sum, TUIndicate that user U's enlivens duration, ZUIndicate the model of user U is turned note number, PUIndicate being commented for the model of user U
By number;
Step S123: total impact factor f that model forwards in the training set is determineduv:
fuv=1-exp (- (f1)m×(f2)1-m), wherein m indicates training parameter, i.e. f1And f2Tradeoff parameter;
Step S124: K-shell decomposition algorithm is utilized, the probability that user U in the training set forwards oneself model is obtained
puu:
Wherein, n indicates user node number, KsuIndicate the K-shell value of user U;
Step S125: determine the model of user U in the training set is forwarded probability Puv:
Step S126: according to puuAnd Puv, obtain the trained transfer matrix P:
It is described that the transfer matrix model is imitated in the step S120 in specific embodiments of the present invention two
Optimal training parameter is really obtained to specifically include:
Number average ranking is forwarded according to model successively to choose in the training set C user and corresponding true turn
Paste number Mc, multiple and different m values corresponds to multiple and different training transfer matrix P, using independent cascade model respectively to each P into
Row propagates emulation experiment, and the expectation for obtaining C user is averagely posted several Fc;
Determine error MAPE value:
Wherein, c={ 1 ..., C };
Select the corresponding training parameter of the smallest P of MAPE value as optimal training parameter.
In specific embodiments of the present invention two, in the step S130, it is described according to the test set in conjunction with it is described most
The test transfer matrix that excellent training parameter establishes model forwarding specifically includes:
According to the test set, select the optimal training parameter as f1And f2Tradeoff parameter, according to step S121-
Step 126 establishes the test transfer matrix.
It is described that the test transfer matrix is counted in the step S130 in specific embodiments of the present invention two
The acquisition social networks node influence power ranking results are calculated to specifically include:
If the initial value of each user's value vector St is 1, stable convergence value is obtained using Markov iteration, was calculated
Journey are as follows:
St=(1 ... 1)1×n×Pm,
Following procedure is repeated, when user is worth vector Euclid norm error less than predetermined accuracy twice for front and back, is stopped
Only iterative process obtains stable convergence algorithm values S:
S=St1×n×Pm,
Using each entry value of gained stable convergence algorithm values S as the algorithm values of each user, its size is compared, described in acquisition
Social networks node influence power ranking results.
Embodiment three
As shown in figure 3, the embodiment of the present invention three provides a kind of social networks node influence power ordering system, the system packet
It includes:
Data pre-processing unit 21 obtains Sina weibo data set MicroblogPCU from network, and carries out simple pre- place
Reason, generate the design needed for form training set and test set 4, be respectively as follows: training set userspersonal information's table, training set
User pays close attention to information form, test set userspersonal information's table, and test set user pays close attention to information form;
First training unit 22 generates the transfer matrix Pm of model forwarding according to training set, wherein taking m is 0~1 with 0.1
It is corresponding to generate different transfer matrix Pm for equally spaced 11 sampling values;
Second training unit 23, according to 11 different transfer matrix Pm, the model for carrying out " model is by revolution ", which is propagated, to be imitated
True experiment screens the optimal value in transfer matrix Pm according to MAPE value, obtains trained values m;
First test unit 24 generates transfer matrix P, raw cost algorithm design row according to test set and training result m
Name;
Second test cell 25 generates remaining algorithm ranking according to test set and other algorithms;
Third test cell 26 carries out consistency check, card according to this algorithm ranking result and remaining algorithm ranking result
The superiority of bright algorithm ranking results;
The data pre-processing unit 21 specifically includes:
Data set obtains subelement 211, obtains Sina weibo data set MicroblogPCU, this data combines in
2015.3.17 being obtained by Jun Liu et al. people from Sina weibo.Data set owner will include weibo_user.csv (individual subscriber letter
Breath), followe-followee.csv (user pays close attention to information), user_post.csv (content information of posting) and post.csv
(content information of posting) 4 files, wherein weibo_user.csv includes 700+ User ID, title, gender, grade, individual's letter
Breath, postcode, bean vermicelli value, the information such as concern people's sum;Followe-followee.csv has included about 140,000 bean vermicelli-concern people
Concern relation pair, including the user for not being embodied in weibo_user;And user_post.csv and post.csv are recorded
These users post content, and model ID, post people ID, be posted quantity, number of reviews etc.;The data set have passed through simple clear
It washes, eliminates corpse number, trumpet etc. interferes content, but there are still several missing values, needs to reject manually;
Data set pre-processes subelement 212, is simply pre-processed, and training set and test set are generated;Handle individual subscriber
Information form, addition is average to be posted number and averagely by comment number information, it is ensured that there is following project information: ID to each user,
Title, user gradation, number of posting, number of fans pay close attention to number, and user's model is average to be posted/by number of reviews;User is cleaned to close
Infuse information form, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table.So far, 4 are generated
A table, is respectively as follows: training set userspersonal information's table, and training set user pays close attention to information form, test set individual subscriber letter
Table is ceased, test set user pays close attention to information form.
First training unit 22 specifically includes:
The transfer matrix Pm is to be posted probability for describing model that may be present, when being posted probability greater than 0, i.e.,
Make to turn note record or concern relation between user currently without observing, following there is also be posted possibility;
Kshell value computation subunit 221 calculates training set kshell value, and steps are as follows for the calculating of kshell value: choosing first
Degree is 0 point of peeling in network selection network;Then select in network it is all be judged as degree be 1 point peeling, later in newly-generated network
Face certain customers' degree can change, continue to select it is all be judged as that degree is 1 point of peeling, repeat until newly-generated network not
Until having peelable drop point again, all 1 points of peeling degree are referred to as 1shell;Repeat above step, the kshell that obtains 2shell ..., directly
It is peeled off to all nodes, node each in this way has the kshell value of one's own integer;
As shown in figure 4, being posted situation map with corresponding user for training set kshell value of the present invention, it can be seen that kshell
Be worth and be posted number and be not presented apparent correlation, show it is poor, cannot be separately as the method for analysis such network ranking.
Training parameter drafts subelement 222, takes 0~1 numerical value at equal intervals, such as 0.1 interval respectively to m, that is, divides 11 completions
Following process;
Fuv and Puu computation subunit 223 calculates fuv and Puu;
1 is defined, user is to impact factor f1:
Wherein Iu indicates the number of fans of user u, and Sv indicates the concern number of user v;
2 are defined, user's itself affect factor f2:
The wherein parameter that x is 0~1, expression social application value being posted and commenting on the tradeoff of degree, without loss of generality
Value can be set as 0.5;
3 are defined, transition probability fuv:
fuv==1-exp {-(f1)m*(f2)1-m}
The parameter for the 0.1 interval sampling that wherein m is 0~1, indicates the tradeoff to f1 and f2, needs to train by totally 11;
4 are defined, transition probability Puu:
Wherein n is user node number, and ksu is kshell value.
Puv computation subunit 224 calculates Puv;
5 are defined, probability P uv is forwarded:
Transfer matrix P generates subelement 225, generates transition probability matrix P, and different m is expressed as Pm;
According to Puv and Puu, transfer matrix Pm is obtained:
Second training unit 23 specifically includes:
Data extract subelement 231, and " model is posted 20 users before quantity ranking " is extracted according to data set contents
It obtains user list and corresponding note is posted quantity;
Propagation experimentation subelement 232 has different transfer matrix P to different m, to each transfer matrix Pm, with 20 users
Make single starting point and independently cascade propagation experimentation, experiment is repeated 10 times every time, obtains average value Fc;
MAPE value computation subunit 233 calculates MAPE value using Fc and Mc to each Pm;
Wherein C is number of users, and c is specific user, and MAPE expression is error between prediction data and truthful data;
Trained values select subelement 234, select minimum MAPE value, correspond to optimal m, are training result;
As shown in figure 5, be the optimal m calculated result figure of two trained values of the embodiment of the present invention, m take 0~1 with 0.1 for interval
11 sample values, obtain m=0, then 0.1 ... 1 totally 11 transition probability matrix Pm calculate separately MAPE value, take its minimum, obtain
To corresponding optimal m value, the best m value obtained in this embodiment is 0.5.
The first test unit 24 specifically includes:
Kshell value computation subunit 241 calculates test set kshell value;
As shown in fig. 6, being posted situation map with corresponding user for two test set kshell value of the embodiment of the present invention, can see
Out, the performance of kshell value is poor, cannot be separately as the method for analysis such network ranking.
Transfer matrix computation subunit 242, using trained values m, test set data and transfer matrix model calculation formula,
Obtain test set transfer matrix P;
Sort computation subunit 243, if initial value vector is complete 1 vector, multiplied by transition probability matrix P, continuous iteration is straight
To convergence, algorithm values are obtained, and obtain preceding 10 ranking.Markov mode iteration can be used and obtain stable convergence value.Order changes
It is 1 for initial value, calculates:
St=(1 ... 1)1*n*Pn*n
Following procedure is repeated, until error delta meets required precision, obtains stable convergence algorithm values vector S:
St=St1*n*Pn*n
It can be used two norms (euclideam norm) when calculating Δ, calculating front and back, St difference vector length satisfaction is wanted twice
It asks, it is believed that convergence.
Second test cell 25 specifically includes:
Centrad is spent as local influence power algorithm and represents reference, during betweenness center and close centers degree approach in other words
Disposition is represented as global impact power algorithm and is referred to;Therefore embodiment two is applied to test set data using these three indexs, point
Preceding ten ranking is not obtained, is used for subsequent contrast.
The third test cell 26 specifically includes:
Contrast images compliance test result subelement 261 is drawn algorithms of different and truthful data checking image, and is compared;First
For after test set calculates, therefrom obtaining 10 users before ranking using this algorithm, it is flat that their models are obtained from data set
It is posted quantity, is then made using user's algorithm evaluation as abscissa, it is vertical that corresponding user's model, which is averagely posted number,
Whether coordinate, both observations show positive correlation or whether have comformity relation;For other algorithms, centrad is such as spent,
Betweenness center degree, close centers degree index equally can be to correspond to user by number is posted using algorithm or index value as horizontal axis
The longitudinal axis makes above-mentioned comformity relation inspection figure, and observing these algorithm patterns seems no to show positive correlation or comformity relation;It sees
Examine whether this method with truthful data has comformity relation, intuitively whether this method shows better than its other party from image
Method;
As shown in fig. 7, be two contrast images compliance test result figure of the embodiment of the present invention, including before ranking of the present invention 10 with it is true
Be posted data consistency (res.myAlgo), spend centrad before 10 be really posted data consistency (res.inDgCent), tightly
Before close centrad 10 with really be posted data consistency (res.closeCent), before betweenness center degree 10 with the true data that are posted
The result and contrast images of consistency (res.betweenCent) can be can be visually seen by image comparison, and close centers degree refers to
It is very bad to mark effect, without obvious consistency;Betweenness center degree and degree centrad effect can receive, as value increase is posted
Model number is not reduced, and is in monotonicity, but is spent centrad and lacked distinguishing, and betweenness center degree is since complexity is higher, engineering
Application cost defect is obvious;And the algorithm values under the present invention will not be hidden in number presentation consistency, high-impact user is posted
The user group and algorithm calculated result value of algorithm calculated result value lower (thinking that influence power ranking is lower) (think influence power between two parties
Ranking is placed in the middle) user group among, can by this algorithm reduce the stronger user of capability of influence screening range in algorithm
Calculated result is worth in biggish region, realizes preferable ranking effect;
It compares kendall and examines verifying subelement 262, calculate algorithms of different and truthful data kendall is examined, research inspection
Testing out result and numerical value, (kendall consistency check refers to and carries out distinct methods ranking to same sample, then to every two
Kind ranking, comparing calculation ranking similarity, one of method is exactly kendall consistency check, and the method for calculating is mainly
Consider same ordered pair and inverted sequence pair, if in method A user's first ranking be higher than user, and in method B user's first ranking also above
User's second is then same sequence, and symbol just, otherwise is inverted sequence, and symbol is negative, count with the positive and negative of sequence and inverted sequence and, the bigger explanation of value
More with ordered pair, ranking is closer, if the kendall of some way and true ranking examines numerical value bigger, this method is obtained
Ranking it is more accurate);10 rankings before test set obtains are calculated to application this algorithm, preceding 10 ranking that remaining algorithm obtains and true
10 rankings (being posted 10 users of several highests) before real data obtains kendall using kendall consistency check mode and examines
As a result, the numerical value means consistency between two vectors, if kendall value is bigger, illustrate that two vector orders are more consistent, and
The case where truthful data is calculated that kendall coefficient is bigger, illustrates result that algorithm sorts with truthful data is more consistent,
From numerically clearly perception method effectiveness;
As shown in figure 8, comparing kendall for the embodiment of the present invention two examines proof diagram, it can be seen that close centers degree
(clCent) very poor with true precedence data (realRepo) consistency that is posted, it is unable to get completely correctly using close centers degree
Ranking result;For betweenness center degree (bwCent) since complexity is higher, it is more difficult that there are Project Realizations, and cost is higher to be lacked
It falls into;And use this algorithm and degree centrad (dgCent) gap little, preferable pertinent trends can be obtained, therefore can contract
Small high-impact user screens range.
Specific ranking verifying subelement 263 is compared, 10 users and algorithm/true number before algorithms of different and truthful data are exported
Value is specific to compare;To application, this algorithm calculates 10 rankings before test set obtains, and 10 are set out before other algorithms and truthful data
Come, the project of enumerating includes each algorithm ranking/truthful data ranked users ID, each algorithm algorithm values/be really averagely posted model
Number obtains algorithm effect analysis from specific ranking result and algorithm values.
As shown in figure 9, comparing specific ranking proof diagram for the embodiment of the present invention two, analysis chart in truthful data it is found that deposit
3 users are 3 before inventive algorithm ranking before ranking, this illustrates high-impact user there are high algorithm values, and the present invention is effective
, furthermore inventive algorithm resolution ratio is higher, and different user algorithm difference is relatively steady, and the result compared to remaining algorithm has obviously
Advantage.
Used time counts subelement 264, output program total used time, considers whether the time can receive, in embodiment operation
Beginning and end establishing time stamp calculates time difference;By in Fig. 8 it is found that including that 700+ user's network analysis interconnected is real
It applies example two to run the used time 42 seconds, this is identical with o (n^2) complexity of theory analysis, when using operation due to embodiment two
Between slower python language, and more time is spent to export comparison test image, this used time still can advanced optimize
And reduction.
Example IV
As shown in Figure 10, the embodiment of the present invention four provide it is a kind of utilize system described in embodiment three carry out social networks section
The method of point influence power sequence.This method mainly includes following process step:
Step 11 obtains data set, and is simply pre-processed, the training set of form needed for generating the design and test
Collection;
Step 12, the transfer matrix Pm forwarded according to training set, generation model, wherein different training parameter m is generated not
Same transfer matrix Pm;
Step 13, according to different transfer matrix Pm, carry out the emulation experiment of model propagation, screen in transfer matrix Pm
Optimal value;
Step 14, according to test set and training result m, generate transfer matrix P, raw cost algorithm designs ranking.
The step 11 includes:
Step 111, data set is obtained, the data include 3 classes: the first kind is personal homepage information, includes at least and uses
Family ID, user post sum, user gradation or enliven duration, user's bean vermicelli (user's first concern user's second then first be referred to as second powder
Silk) number, user pay close attention to people (user's first concern user's second then second be referred to as first concern people) count;Second class is that user posts information,
Number is forwarded and by comment number including at least part model;Third class is concern relation between user, includes at least certain customers
Between concern relation;
Step 112, data set is simply pre-processed, form needed for generating, the letters such as advertisement filter is carried out to data set
Single cleaning, is cut into training set and test set on demand, then generates following demand form respectively: processing personal homepage statistics letter
Table is ceased, third class data and primary sources are merged, addition user is averagely posted several and average by comment several two in table
?;Clean user's concern relation table, it is ensured that every a pair of of concern information, bean vermicelli and concern people are in userspersonal information's table
In.
The step 12 specifically:
The transfer matrix Pm is to be posted probability for describing model that may be present, when being posted probability greater than 0, i.e.,
Make to turn note record or concern relation between user currently without observing, following there is also be posted possibility;
Transfer matrix Pm generation method is as follows:
1 is defined, user is to impact factor f1: user's number of fans I is more, and influence power is stronger, and user's concern number S is more,
Sensibility is stronger, and if user's concern number S is bigger, individual node is opposite to its influence power to be diluted, therefore sets
Set parameter:
Wherein Iu indicates the number of fans of user u, and Sv indicates the concern number of user v, and f1 shows to describe user U to user V
User to influence, and considering cannot be that 0 may bring contingency question except 0 and f1;
2 are defined, user's itself affect factor f2: on the one hand can consider node active degree=quantity of posting/active total
Between timing, and enlivening the total time can be embodied by user gradation;Another aspect model quality can be by being posted number and comment number
It embodies, therefore parameter is set
The wherein parameter that x is 0~1, expression social application value being posted and commenting on the tradeoff of degree, without loss of generality
Value can be set as 0.5, if a user there are multiple models to be crawled, be averaged and be posted, average review number, and consider except 0 asks
Topic and f2 may bring contingency question for 0;
3 are defined, transition probability fuv: showing that different user U influences V, i.e. V is posted U model probability
fuv=1-exp {-(f1)m*(f2)1-m}
It is to increase and increase tendency with f1, f2 because meeting fuv using exponential form, m is the key training of value 0~1
Parameter, for distributing user to impact factor and user's itself affect factor in communication process proportion, the i.e. power of f1 and f2
Weighing apparatus;
4 are defined, transition probability Puu: indicating that user itself is directed toward itself probability;In view of a user is turned by remaining user
Patch probability is more, then oneself being directed toward oneself may be more micro-, and user in this way is usually core customer in network, i.e. kshell is decomposed
Value is more a little bigger, therefore
Wherein n is user node number, and Ksu is kshell value, and when kshell value is bigger, expression node is located in network
The heart, it is relatively smaller that node is directed toward oneself possibility;
5 are defined, probability P uv is forwarded: matrix pattern can be obtained by defining 3 and defining 4, but probability forwarding matrix is contemplated
Definition, needs to meet
∑puv+Puu=1
Therefore have
In view of Puv is generally too small at this time, it is unfavorable for subsequent training, therefore enables Sv < average (Sv) (Sv's is averaged
Value) when, taking fuv is minimum.
According to Puv and Puu, transfer matrix Pm is obtained:
The step 13 includes:
The propagation emulation experiment of model is carried out, the optimal value in transfer matrix Pm is screened, propagates emulation experiment using independent
Cascade model takes out corresponding data and " average model is by 20 users before revolution amount ranking " is concentrated to obtain user list and correspond to true
It is posted several Mc;
Brief introduction independence cascade model: in independent cascade model, there are two types of states for each node: activation and un-activation,
Middle activation indicates that the node receives or propagate certain information (such as forwarding on microblogging, the behaviors such as thumb up) [Li Guoliang, Chu Ya
The maximizing influence of duckweed, the more social networks of the strong of Feng Jianhua, Xu Yao analyzes [J] Chinese journal of computers, 2016,39 (04): 643-
656];Independent cascade model is such a modeling situation, and when one piece of node u is infected inside the model, it can be attempted with can
Energy property Puv infects neighbor node v, and this infection is used only once on a direction between a pair of of user, and u is to all neighbours
V infection does not interfere with each other, all different users to v infection does not interfere with each other equally, until u trial infected all neighboring user v, so
Afterwards to user v has been infected, sequentially gone on according to the way of front;The node of excitation can not be activated again,
That is, information cannot be posted by the same user is secondary;Process is posted using independent cascade model message are as follows:
I, gives an initial user or multiple users, and successively infection becomes starting point at the beginning;If user u is infected
If, then u will be possible to infect all good friends and only have an opportunity respectively, each process infection potential Puv, itself is solely
It is vertical;When Puv is more, illustrate that infection potential is more, u is more possible to infection v;
If II, t moment node w is not infected, all w have infected neighbours to be attempted to infect to node w, but does not include
Course of infection was attempted, if w is infected, t+1 moment, node w is transferred to Infection Status;
III, is repeated the above process, until it is all can infect trial all be completed, that is, reach maximum can infection scope,
Infection scope is information maximum propagation range since start node at this time, is averaged;
There are different transfer matrix Pm to different m, respectively with independent cascade model propagation experimentation 10 times, is averagely posted
Number Fc;
6 are defined, error MAPE value: expression is error between prediction data and truthful data, if the result calculated
Be it is smaller, mean that the error calculated is smaller, in other words mean that corresponding P and m value is better, more received;
Wherein C is number of users, and c is specific user;
The Fc and MAPE for calculating different m select the smallest m of MAPE for optimum value training.
The step 14 includes:
Step 141, transfer matrix P is generated, according to the trained values of the data of test set and optimal m, by step 12 method
Generate transfer matrix P;
Step 142, raw cost algorithm ranking is obtained if each user's value vector St initial value is 1 using Markov iteration
To stable convergence value, calculating process are as follows:
St=(1 ... 1)1*n*Pn*n
Following procedure is repeated, until error delta meets required precision, obtains stable convergence algorithm values vector S:
St==St1*n*Pn*n
Comparing front and back, user is worth vector Euclid norm error delta twice, when error delta is less than predetermined accuracy, stops
Iterative process compares its size and obtains the use of this algorithm using each entry value of gained user's value St as the algorithm values of each user
Family ranking;
If the convergence during this can prove that transfer matrix P restrains, need to meet 3 conditions:
P is random matrix;
P is irreducible;
P is aperiodic;
For first item requirement, random matrix: enabling the i row j that Pij is P arrange, there is an any i=1,2 ... n and j=1,2 ... n,
Pij >=0, and meet any i=1 simultaneously, 2 ... n, Pij are 1 to j summation, it is clear that matrix P is non-negative, and meet every a line and
It is 1;
For second condition, matrix P be the matrix that meets the requirements and if only if only and the corresponding network of P it is oriented
Image be strong continune (any two node reachable) network image i.e. any two points between can find access, and due to this calculation
In method transfer matrix P, all elements be all it is complete just therefore certainly exist such access, therefore matrix P meet it is irreducible this
Part;
For third condition, periodically refer to that iterative value is changed by regular repeatability, due to that can obtain according to relevant knowledge
Know that aperiodic and prime matrix is relationship of equal value, and it is positive matrices that prime matrix, which refers to that matrix has the number of a power, because P is complete
Portion's element is positive, and P also necessarily meets equivalents certainly, that is to say, that meets the aperiodic condition of third;
The transfer matrix that mainly spends time at of this method calculates simultaneously, and independent cascade model calculates and Markov changes
In generation, needs to spend time o (n^2) altogether, element meter to the independent operation of all whole elements inside transition probability matrix in algorithm
It is relatively easy to calculate itself, data can be directly obtained from table, carries out simple addition, division and exponent arithmetic, therefore shift square
It is o (n^2) that battle array, which generates time complexity,;Inside independent cascade model, worst condition is exactly one user of single infection, until complete
Portion's infection, time-consuming at this time is also o (n^2) order of magnitude, therefore time complexity is o (n^2).Although and Markov iterative process phase
To taking a long time, but some are similar with classics Pagerank transfer matrix for the process in this method, same in Pagerank algorithm
One user's transition probability is also to approach to divide equally, and research has shown that, Pagerank can generally be restrained in 50-75 iteration, to sum up,
Total time complexity of the present invention is o (n^2), and time complexity can be received by being one.
In conclusion a kind of social networks node influence power sequence that present example described in the embodiment of the present invention provides
System is suitable for the incomplete social networks of multidate information.It include social network user personal information, post information by obtaining
With the data set of concern relation information, and data cleansing is carried out, simple pretreatment, the form needed for generating this system such as project merges
Training set and test set;According to training set, the generation of social networks model transfer matrix and screening model are established, use is successively considered
Model transition probability between family personal network position, user partial network influence, user itself model transition probability and user, it is raw
The transfer matrix forwarded at model;According to different transfer matrixes, the emulation experiment of model propagation is carried out respectively, is compared model and is passed
The actual value of range and the relative error of simulation value are broadcast, selects the smallest error to filter out optimal transfer matrix and corresponding instruction
Practice parameter;According to test set and training parameter, transfer matrix is generated using identical modeling method, finally obtains stable influence
Power sequence.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can
It realizes by means of software and necessary general hardware platform.Based on this understanding, technical solution of the present invention essence
On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product
It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment
(can be personal computer, server or the network equipment etc.) executes the certain of each embodiment or embodiment of the invention
Method described in part.
Apparatus and system embodiment described above is only schematical, wherein it is described as illustrated by the separation member
Unit may or may not be physically separated, and component shown as a unit may or may not be object
Manage unit, it can it is in one place, or may be distributed over multiple network units.It can select according to the actual needs
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying wound
In the case that the property made is worked, it can understand and implement.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims
Subject to.
Claims (10)
1. a kind of social networks node influence power sort method, which is characterized in that this method includes following process step:
Step S110: individual subscriber home tip, user are collected and posts information and user to information, the personal homepage is believed
Breath, the user post information and the user pre-processes information, form training set and test set;
Step S120: according to the training set, the transfer matrix model of model is established, the transfer matrix model is emulated
It calculates, obtains optimal training parameter;
Step S130: establishing the test transfer matrix of model forwarding according to the test set in conjunction with the optimal training parameter, right
The test transfer matrix is calculated, and the social networks node influence power ranking results are obtained.
2. the method according to claim 1, wherein the step S110 is specifically included:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, the personal homepage letter
Breath is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information and instruction
Practice collection user to information;The test set includes test set family personal information and test set user to information.
3. described to be built according to the training set according to the method described in claim 2, it is characterized in that, in the step S120
The transfer matrix model of vertical model specifically includes:
Step S121: determine the user of model forwarding in the training set to impact factor f1:
Wherein, IUIndicate that user U is concerned number, SVIndicate the concern number of user V;
Step S122: user's itself affect factor f that model forwards in the training set is determined2:
Wherein, X indicates the tradeoff that is posted and comments on significance level of the social networks to user U, MUIndicate the sum of posting of user U,
TUIndicate that user U's enlivens duration, ZUIndicate the model of user U is turned note number, PUIndicate user U model by comment number;
Step S123: total impact factor f that model forwards in the training set is determineduv:
fuv=1-exp (- (f1)m×(f2)1-m), wherein m indicates training parameter, i.e. f1And f2Tradeoff parameter;
Step S124: K-shell decomposition algorithm is utilized, the Probability p that user U in the training set forwards oneself model is obtaineduu:
Wherein, n indicates user node number, KsuIndicate the K-shell value of user U;
Step S125: determine the model of user U in the training set is forwarded probability Puv:
Step S126: according to puuAnd Puv, obtain the trained transfer matrix P:
4. described to the transfer matrix mould according to the method described in claim 3, it is characterized in that, in the step S120
Type carries out the optimal training parameter of emulation acquisition and specifically includes:
Number average ranking is forwarded according to model successively to choose in the training set C user and corresponding be really posted number
Mc, multiple and different m values corresponded to multiple and different training transfer matrix P, passed respectively to each P using independent cascade model
Emulation experiment is broadcast, the expectation for obtaining C user is averagely posted several Fc;
Determine error MAPE value:
Wherein, c={ 1 ..., C };
Select the corresponding training parameter of the smallest P of MAPE value as optimal training parameter.
5. described to be assembled according to the test according to the method described in claim 4, it is characterized in that, in the step S130
Close the optimal training parameter establish model forwarding test transfer matrix specifically include:
According to the test set, select the optimal training parameter as f1And f2Tradeoff parameter, according to the step S121-
The method of the step S126 establishes the test transfer matrix.
6. described to shift square to the test according to the method described in claim 5, it is characterized in that, in the step S130
Battle array carries out the calculating acquisition social networks node influence power ranking results and specifically includes:
If the initial value of each user's value vector St is 1, stable convergence value, calculating process are obtained using Markov iteration
Are as follows:
St=(1 ... 1)1×n×Pm,
Following procedure is repeated, when user is worth vector Euclid norm error less than predetermined accuracy twice for front and back, stops changing
For process, stable convergence algorithm values S is obtained:
S=St1×n×Pm,
Using each entry value of gained stable convergence algorithm values S as the algorithm values of each user, its size is compared, the social activity is obtained
Network node influence power ranking results.
7. a kind of social networks node influence power ordering system, which is characterized in that the system includes:
Data preprocessing module, for collecting individual subscriber home tip, user post information and user to information, to described
People's home tip and the user information of posting pre-process, and form training set and test set;
Training module carries out the transfer matrix model for establishing the transfer matrix model of model according to the training set
Simulation calculation obtains optimal training parameter;
Test module, for shifting square in conjunction with the test that the optimal training parameter establishes model forwarding according to the test set
Battle array, calculates the test transfer matrix, obtains the social networks node influence power ranking results.
8. system according to claim 7, which is characterized in that the data preprocessing module specifically includes:
It collects personal homepage information, user and posts information and user to information, form data set;Wherein, the personal homepage letter
Breath is posted including at least User ID, user, and total, user enlivens duration, user is concerned number, user pays close attention to number;
The user posts information, several and model by comment number including at least being forwarded for model;
The user is to the concern relation that information includes between user and user;
The data set is cut into training set and test set on demand, the training set includes training set family personal information and instruction
Practice collection user to information;The test set includes test set family personal information and test set user to information.
9. system according to claim 8, which is characterized in that the training module includes:
User is to factor of influence determining unit, for being concerned number and another use according to user in the user couple
The concern number at family determines the user of model forwarding to impact factor;
User's itself affect factor specifying unit, for quilt total, enlivening duration, model according to posting for one user
Forwarding number and model by comment number, determine user's itself affect factor of the model forwarding;
Total factor of influence determining unit, for the tradeoff according to the user to impact factor and user's itself affect factor
Parameter determines total impact factor of model forwarding;
User, for utilizing K-shell decomposition algorithm, obtains one user and forwards the note from probability determining unit is forwarded
The probability of son;
Model is forwarded probability determining unit, for determine one user model by other forever user forward it is general
Rate;
Transfer matrix model foundation unit, for forwarding the probability of oneself model and being forwarded for the model according to the user
Probability establishes the transfer matrix model of model;
Optimal training parameter establishes unit, for propagate emulation in fact to transfer matrix model respectively using independent cascade model
It tests, obtains expectation and be averagely posted number, determine error MAPE value, the corresponding training parameter of the smallest transfer matrix of MAPE value is selected to make
For the optimal training parameter.
10. system according to claim 9, which is characterized in that the test module includes:
Test transfer matrix establishes unit, for establishing test transfer square in conjunction with the optimal training parameter according to the test set
Battle array;
Influence power, which sorts, establishes unit, obtains the social networks node influence for calculate on the test transfer matrix
Power ranking results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810931729.9A CN109242710B (en) | 2018-08-16 | 2018-08-16 | Social network node influence ordering method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810931729.9A CN109242710B (en) | 2018-08-16 | 2018-08-16 | Social network node influence ordering method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109242710A true CN109242710A (en) | 2019-01-18 |
CN109242710B CN109242710B (en) | 2022-03-11 |
Family
ID=65070531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810931729.9A Active CN109242710B (en) | 2018-08-16 | 2018-08-16 | Social network node influence ordering method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109242710B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110391013A (en) * | 2019-07-17 | 2019-10-29 | 北京智能工场科技有限公司 | A kind of system and device based on semantic vector building neural network prediction mental health |
CN110942345A (en) * | 2019-11-25 | 2020-03-31 | 北京三快在线科技有限公司 | Seed user selection method, device, equipment and storage medium |
CN111062808A (en) * | 2019-12-24 | 2020-04-24 | 深圳市信联征信有限公司 | Credit card limit evaluation method, device, computer equipment and storage medium |
CN111192153A (en) * | 2019-12-19 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Crowd relation network construction method and device, computer equipment and storage medium |
CN111639267A (en) * | 2020-05-28 | 2020-09-08 | 郭海萍 | Method for quickly calculating first screen attention posts and related product |
CN111932109A (en) * | 2020-08-06 | 2020-11-13 | 国家计算机网络与信息安全管理中心 | User influence evaluation system for mobile short video application |
CN112612968A (en) * | 2020-12-17 | 2021-04-06 | 北京理工大学 | Link recommendation method in dynamic social network based on long-term income |
CN113158072A (en) * | 2021-03-24 | 2021-07-23 | 马琦伟 | Method, device, equipment and medium for measuring influence of multi-attribute heterogeneous network node |
CN117217808A (en) * | 2023-07-21 | 2023-12-12 | 广州有机云计算有限责任公司 | Intelligent analysis and prediction method for activity invitation capability |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150120717A1 (en) * | 2013-10-25 | 2015-04-30 | Marketwire L.P. | Systems and methods for determining influencers in a social data network and ranking data objects based on influencers |
CN106952166A (en) * | 2016-01-07 | 2017-07-14 | 腾讯科技(深圳)有限公司 | The user force evaluation method and device of a kind of social platform |
CN107818514A (en) * | 2016-09-12 | 2018-03-20 | 腾讯科技(深圳)有限公司 | A kind of method, apparatus and terminal that control online social network information to propagate |
CN108305181A (en) * | 2017-08-31 | 2018-07-20 | 腾讯科技(深圳)有限公司 | The determination of social influence power, information distribution method and device, equipment and storage medium |
-
2018
- 2018-08-16 CN CN201810931729.9A patent/CN109242710B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150120717A1 (en) * | 2013-10-25 | 2015-04-30 | Marketwire L.P. | Systems and methods for determining influencers in a social data network and ranking data objects based on influencers |
CN106952166A (en) * | 2016-01-07 | 2017-07-14 | 腾讯科技(深圳)有限公司 | The user force evaluation method and device of a kind of social platform |
CN107818514A (en) * | 2016-09-12 | 2018-03-20 | 腾讯科技(深圳)有限公司 | A kind of method, apparatus and terminal that control online social network information to propagate |
CN108305181A (en) * | 2017-08-31 | 2018-07-20 | 腾讯科技(深圳)有限公司 | The determination of social influence power, information distribution method and device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
苑卫国 等: ""微博双向"关注"网络节点中心性及传播影响力的分析"", 《物理学报》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110391013B (en) * | 2019-07-17 | 2020-08-14 | 北京智能工场科技有限公司 | System and device for predicting mental health by building neural network based on semantic vector |
CN110391013A (en) * | 2019-07-17 | 2019-10-29 | 北京智能工场科技有限公司 | A kind of system and device based on semantic vector building neural network prediction mental health |
CN110942345A (en) * | 2019-11-25 | 2020-03-31 | 北京三快在线科技有限公司 | Seed user selection method, device, equipment and storage medium |
CN110942345B (en) * | 2019-11-25 | 2022-02-15 | 北京三快在线科技有限公司 | Seed user selection method, device, equipment and storage medium |
CN111192153B (en) * | 2019-12-19 | 2023-08-29 | 浙江大搜车软件技术有限公司 | Crowd relation network construction method, device, computer equipment and storage medium |
CN111192153A (en) * | 2019-12-19 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Crowd relation network construction method and device, computer equipment and storage medium |
CN111062808B (en) * | 2019-12-24 | 2023-06-09 | 深圳市信联征信有限公司 | Credit card limit evaluation method, credit card limit evaluation device, computer equipment and storage medium |
CN111062808A (en) * | 2019-12-24 | 2020-04-24 | 深圳市信联征信有限公司 | Credit card limit evaluation method, device, computer equipment and storage medium |
CN111639267A (en) * | 2020-05-28 | 2020-09-08 | 郭海萍 | Method for quickly calculating first screen attention posts and related product |
CN111932109A (en) * | 2020-08-06 | 2020-11-13 | 国家计算机网络与信息安全管理中心 | User influence evaluation system for mobile short video application |
CN111932109B (en) * | 2020-08-06 | 2023-04-07 | 国家计算机网络与信息安全管理中心 | User influence evaluation system for mobile short video application |
CN112612968A (en) * | 2020-12-17 | 2021-04-06 | 北京理工大学 | Link recommendation method in dynamic social network based on long-term income |
CN112612968B (en) * | 2020-12-17 | 2024-04-09 | 北京理工大学 | Link recommendation method in dynamic social network based on long-term benefits |
CN113158072A (en) * | 2021-03-24 | 2021-07-23 | 马琦伟 | Method, device, equipment and medium for measuring influence of multi-attribute heterogeneous network node |
CN113158072B (en) * | 2021-03-24 | 2024-03-22 | 马琦伟 | Multi-attribute heterogeneous network node influence measurement method, device, equipment and medium |
CN117217808A (en) * | 2023-07-21 | 2023-12-12 | 广州有机云计算有限责任公司 | Intelligent analysis and prediction method for activity invitation capability |
CN117217808B (en) * | 2023-07-21 | 2024-04-05 | 广州有机云计算有限责任公司 | Intelligent analysis and prediction method for activity invitation capability |
Also Published As
Publication number | Publication date |
---|---|
CN109242710B (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109242710A (en) | Social networks node influence power sort method and system | |
Tumlinson | Chemical evolution in hierarchical models of cosmic structure. II. The formation of the Milky Way stellar halo and the distribution of the oldest stars | |
Wang et al. | Measurement error in network data: A re-classification | |
Leskovec et al. | Cascading behavior in large blog graphs: Patterns and a model | |
Kim et al. | Mobile application service networks: Apple’s App Store | |
Fudenberg et al. | Measuring the completeness of economic models | |
Arnaboldi et al. | Dynamics of personal social relationships in online social networks: a study on twitter | |
Davis et al. | Clearing the FOG: Fuzzy, overlapping groups for social networks | |
Ghosh et al. | A framework for quantitative analysis of cascades on networks | |
Peixoto | Disentangling homophily, community structure, and triadic closure in networks | |
Wan et al. | Information propagation model based on hybrid social factors of opportunity, trust and motivation | |
Stadtfeld | Events in social networks: A stochastic actor-oriented framework for dynamic event processes in social networks | |
Chen et al. | Identification of λ-fuzzy measures using sampling design and genetic algorithms | |
CN103577876A (en) | Credible and incredible user recognizing method based on feedforward neural network | |
Baste et al. | Temporal matching | |
Lian et al. | Cross-device user matching based on massive browse logs: The runner-up solution for the 2016 cikm cup | |
Coscia et al. | Benchmarking api costs of network sampling strategies | |
CN109190040A (en) | Personalized recommendation method and device based on coevolution | |
Zhai et al. | The bi-directional h-index and B-core decomposition in directed networks | |
Ohsawa et al. | Decision process modeling across Internet and real world by double helical model of chance discovery | |
Colbaugh et al. | Emerging topic detection for business intelligence via predictive analysis of'meme'dynamics | |
Rastogi et al. | A correlative study of centrality measures across real-world networks | |
Chiang et al. | Reference classes and relational learning | |
Altenburger et al. | Which node attribute prediction task are we solving? within-network, across-network, or across-layer tasks | |
Sacramento | Sentiment analysis in the stock market based on Twitter data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |