CN107633260B - Social network opinion leader mining method based on clustering - Google Patents

Social network opinion leader mining method based on clustering Download PDF

Info

Publication number
CN107633260B
CN107633260B CN201710729792.XA CN201710729792A CN107633260B CN 107633260 B CN107633260 B CN 107633260B CN 201710729792 A CN201710729792 A CN 201710729792A CN 107633260 B CN107633260 B CN 107633260B
Authority
CN
China
Prior art keywords
user
information
users
total number
social network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710729792.XA
Other languages
Chinese (zh)
Other versions
CN107633260A (en
Inventor
张波
张倩
李美子
潘建国
赵勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN201710729792.XA priority Critical patent/CN107633260B/en
Publication of CN107633260A publication Critical patent/CN107633260A/en
Application granted granted Critical
Publication of CN107633260B publication Critical patent/CN107633260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a social network opinion leader mining method based on clustering, which is used for acquiring opinion leaders in social network users and comprises the following steps: 1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model; 2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result; 3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence; 4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users. Compared with the prior art, the method has the advantages of comprehensive consideration, accurate evaluation, accurate calculation and the like.

Description

Social network opinion leader mining method based on clustering
Technical Field
The invention relates to the technical field of social networks, in particular to a social network opinion leader mining method based on clustering.
Background
The social network opinion leader has a significant impact on people in terms of thought, experience and action. And because of the openness of social networks, they are more influential in information dissemination than ordinary users. No doubt, the research on opinion leaders is one of the most important researches in the field of social network user analysis, and is widely applied to analytical prediction of information dissemination, public opinion guidance and supervision, and commercial development of social networks.
The handling of large data in opinion leader mining remains a challenge. Most opinion leader mining algorithms do not distinguish and evaluate the influence of users in the whole network, and the more users in the social network, the higher the time complexity of the calculation process. Cha M et al analyze and mine opinion leaders using user degrees, user mentions, publication forwarded or quoted, and other numerical values; on the basis of LeaderRank, the Xushiring et al adds the emotional tendency and the liveness of the user to conduct opinion leader mining; wu 23704Hakke et al construct topic-related entitled microblog graph models, and adopt a random walk idea to search central points of the graph models so as to mine opinion leaders in microblogs; new Caojiu et al firstly identify and obtain a theme community, then measure the influence of the user from three dimensions of structure, behavior and emotion respectively, and propose an MFP algorithm to mine opinion leaders; the Chen Yuan et al identifies the opinion leaders according to the position of the structural hole, the centrality position and the edge position in the social network; calculating user leadership by comprehensively considering user activeness and user influence, and mining the user leadership in a social network by combining user centrality; wu Chao et al constructs a reply relationship graph according to the post reply relationship under a specific topic, and identifies the opinion leader by comprehensively considering the reply emotion tendency, the reply path clustering and the reply text similarity. The technology does not classify the users in the social network, screens out the users which can become opinion leaders, but provides a foundation for mining the opinion leaders of the social network from a clustering angle.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a clustering-based social network opinion leader mining method which is comprehensive in consideration, accurate in evaluation and accurate in calculation.
The purpose of the invention can be realized by the following technical scheme:
a social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users and comprises the following steps:
1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model;
2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result;
3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence;
4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users.
In the step 1), the calculation formula of the user's degree of entry is:
Figure GDA0002582752930000021
wherein D isI(u) is the in-degree of user u,vudefined as when user v is a follower of user u, then there isvuWhen user v is not the follower of user u, then there is 1vuV is the set of users in the social network, 0.
In the step 1), the calculation formula of the clustering coefficient of the incoming degree group is as follows:
Figure GDA0002582752930000022
wherein, CI(u) is the aggregation coefficient of the degree group of user u, n is the total number of users of the degree group of user u, P is the set of degree groups of user u, M (v) is the total number of directed edges actually existing between users having a direct edge relationship with user v, and N (v) is the total number of users having a direct edge relationship with user v.
In step 1), the formula for calculating the mediation centrality of the user is as follows:
Figure GDA0002582752930000023
wherein, BI(u) is the mediation centrality, σ, of user umn(u) is the number of shortest paths between user m and user n that pass through user u, σmnTo useThe total number of shortest paths between user m and user n.
In the step 2), selecting users in the clusters which simultaneously meet the conditions of maximum element income of the cluster center, maximum clustering coefficient of an income group and maximum medium centrality from the clusters obtained by clustering results, and adding the users into an opinion leader candidate user set L.
When the cluster which simultaneously meets the three conditions of the maximum element income degree of the cluster center, the maximum aggregation coefficient of the income degree group and the maximum medium centrality does not exist, the cluster which simultaneously meets any two conditions is selected, and the user in the cluster is added into the opinion leader candidate user set L.
In the step 3), the calculation formula of the user activity degree is as follows:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α123=1
Figure GDA0002582752930000031
ΔTp=Tnow-Tfirstpublish
Figure GDA0002582752930000032
ΔTf=Tnow-Tfirstforward
Figure GDA0002582752930000033
ΔTe=Tnow-Tfirstevaluate
wherein, ua (u) is the user activity of the user u, FP (u) is the frequency of issuing information by the user u, FF (u) is the frequency of forwarding information by the user u, FE (u) is the frequency of commenting information by the user u, FP ' (u), FF ' (u) and FE ' (u) are respectively the values after standardized processing of FP (u), FF (u) and FE (u) min-max, and Δ TpFor user u at the time T of acquiring datanowTime T corresponding to earliest released informationfirstpublishThe interval between the two-dimensional structure and the three-dimensional structure,
Figure GDA0002582752930000034
for user u at Δ TpTotal number of messages released, Δ T, within the time offFor user u at the time T of acquiring datanowTime T of earliest forwarding informationfirstforwardThe interval between the two-dimensional structure and the three-dimensional structure,
Figure GDA0002582752930000035
for user u at Δ TfTotal number of information forwarded in time, Δ TeFor user u at the time T of acquiring datanowTime T of the earliest comment informationfirstevaluateThe interval between the two-dimensional structure and the three-dimensional structure,
Figure GDA0002582752930000036
for user u at Δ TeTotal number of messages forwarded in time of α1、α2、α3The weight value after distribution through the hierarchical analysis.
In the step 3), the calculation formula of the user influence is as follows:
Figure GDA0002582752930000037
Figure GDA0002582752930000041
Figure GDA0002582752930000042
Figure GDA0002582752930000043
Figure GDA0002582752930000044
wherein UI (u) is user influence of user u, UI (v) is user influence of user v, ADvuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,videfined as when user v forwards or comments information i of user u, and the information i covers user v, there isviWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information ivi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the set of users v.
In the step 3), the calculation formula of the user leadership is as follows:
ULD(u)=UI(u)×UA(u)
wherein ULD (u) is the user leadership of user u.
The step 4) specifically comprises the following steps:
and sorting all users in the opinion leader candidate user set L from big to small according to the leader power of the users, and selecting the first K users as opinion leaders.
Compared with the prior art, the invention has the following advantages:
firstly, considering comprehensively: and screening an opinion leader candidate user set from the aspect of topological attributes, determining a technical framework of opinion leaders from the aspect of user attributes, and comprehensively considering the topological attributes and the user attributes to avoid one-sidedness of an analysis result caused by only using partial attributes.
Secondly, the evaluation is accurate: the definition and calculation method of the user income group clustering coefficient. Considering followers having direct edge relation and indirect edge relation with the user as the user's entrance group, the clustering coefficient of the user's entrance group more accurately evaluates the closeness of the relation among the followers of the user's entrance group members, and the closer the relation is, the more likely the information spread by the user is to be diffused. The user income group clustering coefficient is used as an element of cluster analysis, so that the level of cluster analysis can be effectively improved, and the obtained candidate user set member of the opinion leader has sufficient opinion leader characteristics.
Thirdly, accurate calculation: the calculation method of the user leadership comprises the following steps: calculating the activity of the user and calculating the influence of the user. The user liveness effectively ensures that the mined opinion leaders are active in the social network. Meanwhile, various sources of user influence are comprehensively considered: the influence contribution of the user (information coverage rate) and the follower enables the calculation of the influence of the user to be more accurate.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of an example network node of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
The invention first defines the social network:
the social network model is as follows: the social network is formalized as graph G ═ V, E, R, where V represents the set of users in the social network, E describes the set of relationships between users, and R is an N × N matrix representing the relationships between users.
Topological property: the topological attributes are a set of functions of relationships between points and points, edges and edges, and points and edges in the directed graph G.
User attributes: user attributes are quantitative relationships between various actions of a user in a social network.
The leader power of the user: opinion leaders are people in social networks who have the ability to disseminate information, and user leadership is a quantification of this ability. The user leadership mainly depends on the activity and influence of the user.
The attendee and follower: the attendee and the follower appear in pairs, a directed edge exists between the user u and the user v in the social network, the source point of the directed edge is u, the end point is v, the user u is the follower of the user v, and the user v is the attendee of the user u.
Suggestion collar-sleeve: the opinion leaders are K users with the largest leadership in the social network and are represented by O, and meanwhile, the non-opinion leaders in the social network are ordinary users and are represented by C.
Opinion leader candidate user set: the opinion leader candidate user set L is a set of users considered to be the most likely opinion leader before accurately calculating the user leadership of the users.
As shown in FIG. 1, the technology for mining social network opinion leader from clustering angle according to the present invention is provided
The method for mining the social network opinion leader technology from the clustering angle comprises the following specific steps:
A. concepts and modeling of social networks related to opinion leader mining are defined.
B, (1) calculating the degree of entry and the mesocentricity of the users according to the relation of the directed edges among the users; counting the user forming user entrance degree groups which can be concerned about users through one edge or two edges, and calculating the user entrance degree group aggregation coefficient; (2) combining the user's income, intermediary centrality and income group clustering coefficient, clustering the users by using K-means + + improved K-means clustering algorithm; (3) among the plurality of clusters obtained, users in the cluster having a greater degree of income, a greater centrality of intermediaries, and a greater group clustering coefficient of income are added to the opinion leader candidate user set.
Respectively counting the times of information release, forwarding and comment of a user, the time interval of information release and information acquisition at first, the time interval of information forwarding and information acquisition at first, the time interval of information comment and information acquisition at first, calculating the average number of information release, forwarding and comment of the user in unit time, and calculating to obtain the user activity; (2) counting the total number of information received by a user, the total number of forwarded and commented information, the number of forwarded information and commented information of each user to each concerned person, and calculating the attention degree of each user concerned by the user; (3) calculating the number of people who forward or comment the information issued and forwarded by the user to obtain the information coverage rate of the user; (4) and (3) regarding the attention of each follower of the user to the user as the contribution weight of the user follower to the influence of the user, calculating the influence obtained by the user from the followers, and calculating the influence of the user by combining the information coverage rate.
D. Calculating the leadership of the user according to the calculated user activity and the influence of the user; and sorting the calculated user leadership in a descending order, and taking the K users with the maximum user leadership as opinion leaders.
(1) Selection of opinion leader candidate user set
When the user is an opinion leader in the social network, a plurality of users can choose to pay attention to the user and follow the user, the end point represented on the graph G, which is a plurality of edges, is the user, and the degree of entry of the user on the social network G is represented as:
Figure GDA0002582752930000061
wherein when user v is a follower of user uvu1 when user v is not a follower of user uvu0. Obviously, the user's degree of income reflects whether the user can become an opinion leader or not from the number of user followers, and a user with high user degree of income is more likely to become an opinion leader than a user with low user degree of income. The entrance clique clustering coefficient considers the possibility of the user becoming an opinion leader in terms of the closeness of the relationship between the user followers, when one directed edge points to u from v, the user v is an entrance clique member of the user u, when the other directed edge points to v from w, the user w is also an entrance clique member of the user u, and the entrance clique clustering coefficient is expressed as:
Figure GDA0002582752930000071
wherein, P is the set of the entrance groups of the user u, n is the total number of the entrance groups of the user u, and N (v) is the total number of the users having direct edge relation with the user v. And M (v) is the total number of directed edges that actually exist between users having a direct edge relationship with user v. In addition, the mediation is the embodiment of the ability of the user to bear information in the whole social network G, and is represented as follows:
Figure GDA0002582752930000072
wherein sigmamnIs the total number of shortest paths between user m and user n, σmn(u) is the number of shortest paths passing through user u among the shortest paths between user m and user n, BI(u) the probability that the shortest path of any two users in the social network passes through user u can be measured.
Carrying out clustering analysis on the user by using a K-means + + improved K-means algorithm; selecting clusters meeting the conditions from the clusters obtained by clustering: the element income of the cluster center is maximum, the income group clustering coefficient is maximum, the medium centrality is maximum (three maximum conditions), if no cluster meeting the three maximum conditions exists, the cluster meeting the two maximum conditions can be selected, and the users in the cluster are added into the opinion leader candidate user set L.
(2) Calculation of user liveness
The user does not always receive information in the social network, and the liveness of the user on the social network can be calculated by calculating the number of information which is published, forwarded and commented by the user in unit time. The frequency of the information issued by the user u is defined as:
Figure GDA0002582752930000073
ΔTp=Tnow-Tfirstpublish
wherein, TnowIs the time of acquisition of the data, TfirstpublishIs the time when the user u first releases information in the obtained data,
Figure GDA0002582752930000074
is that user u is at Δ TpThe collection of information is published within the time of (c),
Figure GDA0002582752930000075
is that user u is at Δ TpIs released within timeThe total number of messages. The larger the FP (u) value is, the more active the user u publishes information. The frequency at which the user forwards information is defined as:
Figure GDA0002582752930000076
ΔTf=Tnow-Tfirstforward
wherein, TfirstforwardIs the time at which user u first forwards information in the obtained data,
Figure GDA0002582752930000081
is that user u is at Δ TfThe set of information is forwarded within the time of (a),
Figure GDA0002582752930000082
is that user u is at Δ TfThe total number of messages forwarded over time. The larger the value of FF (u), the more active the user u forwards information. The frequency of user comment information is defined as:
Figure GDA0002582752930000083
wherein, TfirstevaluateIs the time at which user u first reviews the information in the obtained data,
Figure GDA0002582752930000084
is that user u is at Δ TeThe collection of review information within the time of (c),
Figure GDA0002582752930000085
is that user u is at Δ TeThe total number of review messages within the time of (a). The larger the FE (u) value, the more active the user u reviews the others information.
Carrying out min-max standardization processing on the frequency of information released by a user, the frequency of information forwarded and the frequency of comment information:
Figure GDA0002582752930000086
Figure GDA0002582752930000087
wherein the content of the first and second substances,
Figure GDA0002582752930000088
and
Figure GDA0002582752930000089
the minimum information issuing frequency, the minimum information forwarding frequency and the minimum comment information frequency of the users in the opinion leader candidate user set L are respectively.
Figure GDA00025827529300000810
And
Figure GDA00025827529300000811
the maximum information issuing frequency, the maximum information forwarding frequency and the maximum comment information frequency of the users in the opinion leader candidate user set L are respectively.
According to the frequency of information issued by users, the frequency of information forwarded and the frequency of comment information, the user activity of users in the opinion leader candidate user set is evaluated, and the following definitions are provided:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α therein123=1,α1,α2And α3The value of (c) is assigned using Analytic Hierarchy Process (AHP).
(3) Calculation of user influence
1) Calculation of degree of attention of user to attention person
The user may have multiple followers, but the user does not necessarily forward or comment on information of each follower, and the attention of the user v to the user u is defined as follows:
Figure GDA0002582752930000091
where focus (v) is a set of attendees of user v, | for (u) | is the total number of forwarded information of user u, | pub (u)) | is the total number of issued information of user u, | eva (v)) | is the total number of comments by user to other information, | for (v.source ═ u) | is the total number of forwarded information of user u by user v, | eva (v.source ═ u) | is the total number of comments by user v to user u. ADvuThe larger the value of (b), the more attention the user v pays to the user u.
2) Calculation of user information coverage
For each user in the social network, the information coverage of the user accounts for the influence of the user to some extent, and is defined as follows:
Figure GDA0002582752930000092
Figure GDA0002582752930000093
wherein the content of the first and second substances,viequal to 1, indicates that user v forwarded or commented on information i of user u, information i is overlaid on user v,viequal to 0 indicates that the user does not forward or comment the information i of the user u, the information i does not cover the user V | is the total number of users in the social network, pub (u) ∪ for (u) is a set of information published or forwarded by the user u, and the information published or forwarded by the user u belongs to the information of the user u.
3) Calculation of user influence
The information coverage rate of the user, the attention of the follower of the user to the user and the influence of the follower of the user are integrated to obtain the influence of the user, and the influence of the user is defined as follows:
Figure GDA0002582752930000094
Figure GDA0002582752930000095
where fas (u) is the follower set of users, and when the user is one of the opinion leader user candidate set, the initial impact of the user is 1, otherwise the initial impact of the user is the ratio of the user's degree to the total number of users in the social network.
4) Selection of opinion leaders
The leadership of the user is obtained by integrating the activity of the user and the influence of the user, and is defined as follows:
ULD(u)=UI(u)×UA(u)
and sorting the user leadership of the users in the opinion leader candidate user set L according to the sequence from big to small, and selecting the first K users in the former sorting as opinion leaders.
Example (b):
the following example is presented to illustrate a technique for mining opinion leaders in a social network from a clustering perspective (as shown in FIG. 2).
1. Selection of opinion leader candidate user set
1) In degree calculation
In FIG. 2, user V1 is at an in-degree DI(V1) ═ 1, user V2 penetration DI(V2) ═ 2, user V3 penetration DI(V3) ═ 2, user V4 penetration DI(V4) ═ 2, user V5 penetration DI(V5)=2。
2) Calculation of group clustering coefficients of degree of entry
For user V1, the user with edge pointing to V1 is V2, the user with edge pointing to V2 is V1 and V3, so the income bracket of user V1 is (V1, V2, V3),
Figure GDA0002582752930000101
for user V2, users with edges pointing to V2 are V1, V3, and V5 has edges pointing to V3, so the entry cliques of user V2 are (V1, V2, V3, V5),
Figure GDA0002582752930000102
for user V3, users with edges pointing to V3 are V1, V5, V2 with edges pointing to V1, and V4 with edges pointing to V5, so that the entry groups of V3 are (V1, V2, V3, V4, V5),
Figure GDA0002582752930000103
for user V4, the users with edge pointing to V4 are V3, V5, while V1 has edge pointing to V3, V2 points to V5, so the in-degree cliques of user V4 are (V1, V2, V3, V4, V5),
Figure GDA0002582752930000104
for user V5, the users with edges pointing to V5 are V2, V4, while V3 has edges pointing to V4 and V1 has edges pointing to V2, so the entry groups of user V5 are (V1, V2, V3, V4, V5),
Figure GDA0002582752930000105
3) intermediary centric computing
The shortest path that exists in the figure is:
Figure GDA0002582752930000111
the shortest path through V1 in the shortest paths has 1V 2 → V1 → V3, while the total number of shortest paths from V2 to V3 is 2, so the mesocentrality of V1 is:
Figure GDA0002582752930000112
among the shortest paths passing through V2, there are 5, V1 → V2 → V5, V3 → V2 → V1, V3 → V2 → V5, V4 → V5 → V3 → V2 → V1, V5 → V3 → V2 → V1, and the total number of shortest paths from V3 to V5 is 2, so the intermediating centrality of V2 is:
Figure GDA0002582752930000113
among the shortest paths through V3, there are 5, V1 → V3 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V5 → V3 → V2 →V1, V5 → V3 → V2, so the mesocentrality of V3 is: b isI(V3)=1+1+1+1+1=5
The shortest path through V4 in the shortest paths has 1, V3 → V4 → V5, while the total number of shortest paths from V3 to V5 is 2, so the intermediation centrality of V4 is:
Figure GDA0002582752930000114
among the shortest paths passing through V5, there are 5, V2 → V5 → V3, V2 → V5 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V4 → V5 → V3, and the total number of shortest paths from V2 to V3 is 2, so the intermediating centrality of V5 is:
Figure GDA0002582752930000115
the value of the set of topological properties for each user u is DI(u),CI(u),BI(u) min-max normalization, clustering using K-means algorithm modified by K-means + + algorithm, in this example, selecting to divide the user into two clusters. Cluster 1(V1, V4) and cluster 2(V2, V3, V5) are obtained, and because the cluster 2 centers clustered in both the degree of entry and the mesocentrality are larger than the cluster 1 center, cluster 2(V2, V3, V5) is selected as the opinion leader candidate user set, and L ═ is (V2, V3, V5).
2. User leadership calculation
1) Calculation of user liveness
a) Frequency calculation of information released, forwarded and commented
As can be seen from the Table 1, T is the user V2now20 pieces of issued information T, 2017-4-22firstpublishCalculated as Δ T2017-4-1p=21,
Figure GDA0002582752930000121
Forwarding information 6 pieces, TfirstforwardCalculated as Δ T2017-4-2f=20,
Figure GDA0002582752930000122
Number of comment information 10, TfirstevaluateCalculated as Δ T2017-4-3e=19,
Figure GDA0002582752930000123
For user V3, Tnow2017-4-22, the number of issued messages is 15, TfirstpublishCalculated as Δ T2017-3-20p=33,
Figure GDA0002582752930000124
Forwarding information 10 pieces, TfirstforwardCalculated as Δ T2017-4-5f=17,
Figure GDA0002582752930000125
Comment information 5 pieces, TfirstevaluateCalculated as Δ T2017-4-15e=7,
Figure GDA0002582752930000126
For user V5, Tnow2017-4-22, 5 pieces of issued information, TfirstpublishCalculated as Δ T2017-3-1p=52,
Figure GDA0002582752930000127
Forwarding information 15 pieces, TfirstforwardCalculated as Δ T2017-4-10f=12,
Figure GDA0002582752930000128
Comment information 5 pieces, TfirstevaluateCalculated as Δ T2017-4-1e=21,
Figure GDA0002582752930000129
TABLE 1 user-related information Table
User' s V1 V2 V3 V4 V5
Number of issued messages 1 20 15 3 5
Time of first release of information 2017-3-1 2017-4-1 2017-3-20 2017-2-18 2017-3-1
Number of messages to be forwarded 3 6 10 5 15
Time of first forwarding information 2017-3-9 2017-4-2 2017-4-5 2017-4-7 2017-4-10
Number of comment information 3 10 5 4 5
Time of first comment on information 2017-3-3 2017-4-3 2017-4-15 2017-1-1 2017-4-1
Obtaining information time 2017-4-22 2017-4-22 2017-4-22 2017-4-22 2017-4-22
b) Frequency standardization of publishing, forwarding and commenting information
The information frequency of release, information frequency of transmission and frequency information frequency are standardized by min-max to obtain the following table
Figure GDA00025827529300001210
Figure GDA0002582752930000131
c) User liveness calculation
And determining the weight of the frequency of the information to be released, forwarded and commented in the user activity by using an analytic hierarchy process. Constructing a decision matrix
Figure GDA0002582752930000132
Get the weight value of α1=0.636985,α2=0.2582850,α3Calculated as 0.1047294:
UA(V2)=0.636985×1+0.2582850×0+0.1047294×0.62=0.70
UA(V3)=0.636985×0.41+0.2582850×0.31+0.1047294×1=0.45
UA(V5)=0.636985×0+0.2582850×1+0.1047294×0=0.26
2. calculation of user influence
According to the analysis of the content in the table 2, the following results are obtained:
table 2. user forwarding and comment information source table
User' s Forwarding information source users Comment information source user
V1 V2:1,V3:2 V2:1,V3:2
V2 V5:3,V4:2,V1:1 V5:7,V1:2,V3:1
V3 V2:4,V4:5,V5:1 V2:2,V4:3
V4 V5:3,V3:2 V5:4
V5 V3:10,V4:5 V4:5
Note: v: n indicates that the source of n pieces of information is user V
a) Degree of attention of user
For user V2, the followers are user V1 and user V3:
wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V2 is:
Figure GDA0002582752930000133
wherein the followers of the V3 are the user V2 and the user V4, and therefore the degree of attention of the user V3 to the user V2 is:
Figure GDA0002582752930000134
for user V3, the followers are user V1 and user V5:
wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V3 is:
Figure GDA0002582752930000141
wherein the followers of the V5 are the user V3 and the user V4, and therefore the degree of attention of the user V5 to the user V3 is:
Figure GDA0002582752930000142
for user V5, the followers are users V2 and V4:
wherein the followers of the V2 are the user V1 and the user V5, and therefore the degree of attention of the user V2 to the user V5 is:
Figure GDA0002582752930000143
wherein the attendee of V4 is user V5, so the degree of attention of user V4 to user V5 is:
Figure GDA0002582752930000144
b) calculation of user information coverage
The analysis was performed according to the contents of table 3, table 4, table 5.
TABLE 3 information coverage table of UserV 2
Information number of user V2 Number of covered people
2 3
4 2
15 4
20 2
26 1
Note: the number of other information covered is 0;
TABLE 4 information coverage table of USER V3
Information number of user V3 Number of covered people
6 3
9 1
25 4
Note: the number of other information covered is 0;
TABLE 5 information coverage table of UserV 5
Information number of user V5 Number of covered people
1 2
8 2
13 4
19 3
20 1
Note: the number of other information covered is 0;
the information coverage of user V2 is calculated as:
Figure GDA0002582752930000151
the information coverage of user V3 is calculated as:
Figure GDA0002582752930000152
the information coverage of user V5 is calculated as:
Figure GDA0002582752930000153
C) calculation of user influence
And integrating the user attention and the user information coverage rate, and calculating to obtain the user influence.
In the user set V, V1,
Figure GDA0002582752930000155
thus initially
Figure GDA0002582752930000154
V2, V3, V5 ∈ L, initial UI (V2) ═ UI (V3) ═ UI (V5) ═ 1.
The followers of the user V2 are V1 and V3, so UI (V2) ═ CR (V2) + AD(V1)(V2)×UI(V1)+AD(V3)(V2)The followers of × UI (V3) to 0.09+0.17 × 0.6+0.31 × 1 to 0.502 user V3 are V1 and V5 because of the fact thatThis UI (V3) ═ CR (V3) + AD(V1)(V3)×UI(V1)+AD(V5)(V3)× UI (V5) ═ 0.06+0.33 × 0.6+0.38 × 1 ═ 0.638 followers of user V5 are V2 and V4, so UI (V5) ═ CR (V5) + AD(V2)(V5)×UI(V2)+AD(V4)(V5)×UI(V4)=0.12+0.52×1+0.78×0.6=1.108
3. Calculation of user leadership
And (3) integrating the user activity and the user influence, and calculating to obtain the user leadership:
ULD(V2)=UI(V2)×UA(V2)=0.70×0.502=0.3514
ULD(V3)=UI(V3)×UA(V3)=0.45×0.638=0.2871
ULD(V5)=UI(V5)×UA(V5)=0.26×1.108=0.28808
the leadership of the users in L is sorted in descending order,
ULD(V2)>ULD(V5)>ULD(V3)
if an opinion leader needs to be mined, the user V2 is the opinion leader; if two opinion leaders need to be mined, user V2 and user V5 are opinion leaders.
In summary, the present invention provides a technique for mining opinion leaders of a social network from a clustering perspective based on topological attributes of the social network and by making full use of user attributes of users in the social network. As the prior opinion leader mining technology rarely considers that only part of users in a social network have the condition of becoming the opinion leader, aiming at the problem, the invention utilizes the topological attribute of the users in the social network to perform clustering, screens out the opinion leader candidate user set with the condition of becoming the opinion leader, analyzes the leadership of the opinion leader candidate user set, and excavates the opinion leader with liveness and influence.

Claims (9)

1. A social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users, and is characterized by comprising the following steps:
1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model;
2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result;
3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence, wherein the calculation formula of the user activity is as follows:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α123=1
Figure FDA0002582752920000011
ΔTp=Tnow-Tfirstpublish
Figure FDA0002582752920000012
ΔTf=Tnow-Tfirstforward
Figure FDA0002582752920000013
ΔTe=Tnow-Tfirstevaluate
wherein, ua (u) is the user activity of the user u, FP (u) is the frequency of issuing information by the user u, FF (u) is the frequency of forwarding information by the user u, FE (u) is the frequency of commenting information by the user u, FP ' (u), FF ' (u) and FE ' (u) are respectively the values after standardized processing of FP (u), FF (u) and FE (u) min-max, and Δ TpFor user u at the time T of acquiring datanowTime T corresponding to earliest released informationfirstpublishThe interval between the two-dimensional structure and the three-dimensional structure,
Figure FDA0002582752920000014
for user u at Δ TpTotal of published information within time ofNumber, Δ TfFor user u at the time T of acquiring datanowTime T of earliest forwarding informationfirstforwardThe interval between the two-dimensional structure and the three-dimensional structure,
Figure FDA0002582752920000015
for user u at Δ TfTotal number of information forwarded in time, Δ TeFor user u at the time T of acquiring datanowTime T of the earliest comment informationfirstevaluateThe interval between the two-dimensional structure and the three-dimensional structure,
Figure FDA0002582752920000016
for user u at Δ TeTotal number of messages forwarded in time of α1、α2、α3The weight value is distributed through hierarchical analysis;
4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users.
2. The method as claimed in claim 1, wherein the user's degree of entry in step 1) is calculated by the following formula:
Figure FDA0002582752920000021
wherein D isI(u) is the in-degree of user u,vudefined as when user v is a follower of user u, then there isvuWhen user v is not the follower of user u, then there is 1vuV is the set of users in the social network, 0.
3. The method as claimed in claim 1, wherein the clustering coefficient of the entrance group in step 1) is calculated as:
Figure FDA0002582752920000022
wherein, CI(u) is the aggregation coefficient of the degree group of user u, n is the total number of users of the degree group of user u, P is the set of degree groups of user u, M (v) is the total number of directed edges actually existing between users having a direct edge relationship with user v, and N (v) is the total number of users having a direct edge relationship with user v.
4. The method as claimed in claim 1, wherein the calculation formula of the mediation center of the user in the step 1) is:
Figure FDA0002582752920000023
wherein, BI(u) is the mediation centrality, σ, of user umn(u) is the number of shortest paths between user m and user n that pass through user u, σmnV is the total number of shortest paths between user m and user n, and is the set of users in the social network.
5. The method as claimed in claim 1, wherein in the step 2), the users in the clusters satisfying the conditions of maximum element income of cluster center, maximum clustering coefficient of income group and maximum medium centrality are selected to join the opinion leader candidate user set L.
6. The method as claimed in claim 5, wherein when there is no cluster satisfying three conditions of maximum element income, maximum clustering coefficient of income groups, and maximum medium centrality at the center of the cluster, selecting a cluster satisfying any two conditions at the same time, and adding the user in the candidate opinion leader set L.
7. The method as claimed in claim 1, wherein the user influence in step 3) is calculated by the following formula:
Figure FDA0002582752920000031
Figure FDA0002582752920000032
Figure FDA0002582752920000033
Figure FDA0002582752920000034
Figure FDA0002582752920000035
wherein UI (u) is user influence of user u, UI (v) is user influence of user v, ADvuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,videfined as when user v forwards or comments information i of user u, and the information i covers user v, there isviWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information ivi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the total number of user v's forwarded others information, and "v | is the total number of user k issued informationA set of attendees.
8. The method as claimed in claim 7, wherein the user leadership is calculated in step 3) according to the following formula:
ULD(u)=UI(u)×UA(u)
wherein ULD (u) is the user leadership of user u.
9. The clustering-based social network opinion leader mining method according to claim 6, wherein the step 4) specifically comprises the steps of:
and sorting all users in the opinion leader candidate user set L from big to small according to the leader power of the users, and selecting the first K users as opinion leaders.
CN201710729792.XA 2017-08-23 2017-08-23 Social network opinion leader mining method based on clustering Active CN107633260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710729792.XA CN107633260B (en) 2017-08-23 2017-08-23 Social network opinion leader mining method based on clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710729792.XA CN107633260B (en) 2017-08-23 2017-08-23 Social network opinion leader mining method based on clustering

Publications (2)

Publication Number Publication Date
CN107633260A CN107633260A (en) 2018-01-26
CN107633260B true CN107633260B (en) 2020-10-16

Family

ID=61099724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710729792.XA Active CN107633260B (en) 2017-08-23 2017-08-23 Social network opinion leader mining method based on clustering

Country Status (1)

Country Link
CN (1) CN107633260B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647247A (en) * 2018-04-16 2018-10-12 国家计算机网络与信息安全管理中心 Key node recognition methods is propagated based on the micro-blog information for improving PageRank algorithms
CN109146700B (en) * 2018-08-14 2021-11-02 西华大学 Method for extracting influence characteristics of social network leader
CN109657105B (en) * 2018-12-25 2021-10-22 杭州灿八科技有限公司 Method for acquiring target user
CN109919794B (en) * 2019-03-14 2022-07-29 哈尔滨工程大学 Microblog user trust evaluation method based on trust propagation
CN110110084A (en) * 2019-04-23 2019-08-09 北京科技大学 The recognition methods of high quality user-generated content
CN110489658A (en) * 2019-07-12 2019-11-22 北京邮电大学 Online social network opinion leader method for digging based on digraph model
CN110737804B (en) * 2019-09-20 2022-04-22 华中科技大学 Graph processing access optimization method and system based on activity degree layout
CN111681120A (en) * 2020-05-20 2020-09-18 卓尔智联(武汉)研究院有限公司 Core user determination method, device and storage medium
CN112667876B (en) * 2020-12-24 2024-04-09 湖北第二师范学院 Opinion leader group identification method based on PSOTVCF-Kmeans algorithm
CN112785156A (en) * 2021-01-23 2021-05-11 罗家德 Industrial leader identification method based on clustering and comprehensive evaluation
CN113158082B (en) * 2021-05-13 2023-01-17 和鸿广科技(上海)有限公司 Artificial intelligence-based media content reality degree analysis method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890702A (en) * 2012-07-19 2013-01-23 中国人民解放军国防科学技术大学 Internet forum-oriented opinion leader mining method
CN103024017A (en) * 2012-12-04 2013-04-03 武汉大学 Method for distinguishing important goals and community groups of social network
CN103136331A (en) * 2013-01-18 2013-06-05 西北工业大学 Micro blog network opinion leader identification method
CN104035987A (en) * 2014-05-30 2014-09-10 南京邮电大学 Method for ranking microblog network user influence
CN104615717A (en) * 2015-02-05 2015-05-13 北京航空航天大学 Multi-dimension assessment method for social network emergency
CN105260474A (en) * 2015-10-29 2016-01-20 俞定国 Microblog user influence computing method based on information interaction network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890702A (en) * 2012-07-19 2013-01-23 中国人民解放军国防科学技术大学 Internet forum-oriented opinion leader mining method
CN103024017A (en) * 2012-12-04 2013-04-03 武汉大学 Method for distinguishing important goals and community groups of social network
CN103136331A (en) * 2013-01-18 2013-06-05 西北工业大学 Micro blog network opinion leader identification method
CN104035987A (en) * 2014-05-30 2014-09-10 南京邮电大学 Method for ranking microblog network user influence
CN104615717A (en) * 2015-02-05 2015-05-13 北京航空航天大学 Multi-dimension assessment method for social network emergency
CN105260474A (en) * 2015-10-29 2016-01-20 俞定国 Microblog user influence computing method based on information interaction network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths;Tore Opsahl等;《Social Networks》;20100420;第32卷(第31期);1-25页 *
网络舆情突发事件参与者群体构成及行为的实证研究;李纲,陈璟浩;《信息资源管理学报》;20121217;第42-49页 *
网络舆论形成过程中意见领袖形成模型研究;胡勇等;《四川大学学报(自然科学版)》;20080430;第45卷(第2期);第347-351页 *

Also Published As

Publication number Publication date
CN107633260A (en) 2018-01-26

Similar Documents

Publication Publication Date Title
CN107633260B (en) Social network opinion leader mining method based on clustering
Tinati et al. Identifying communicator roles in twitter
Backstrom et al. Preferential behavior in online groups
CN109063010B (en) Opinion leader mining method based on PageRank
Alvarez et al. Sentiment cascades in the 15M movement
Del Vicario et al. News consumption during the Italian referendum: A cross-platform analysis on facebook and twitter
Weng et al. Topicality and impact in social media: diverse messages, focused messengers
Hoang et al. Politics, sharing and emotion in microblogs
CN103218400B (en) Based on link and network community user group's division methods of content of text
CN105849763A (en) Systems and methods for dynamically determining influencers in a social data network using weighted analysis
Perdana et al. Combining likes-retweet analysis and naive bayes classifier within twitter for sentiment analysis
Carley et al. Ora & netmapper
Creamer et al. Segmentation and automated social hierarchy detection through email network analysis
Aljohani et al. Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks
Hoang et al. On joint modeling of topical communities and personal interest in microblogs
Bródka A method for group extraction and analysis in multilayer social networks
Huang et al. Information fusion oriented heterogeneous social network for friend recommendation via community detection
Sun et al. A bisecting K-Medoids clustering algorithm based on cloud model
Zhu et al. Path prediction of information diffusion based on a topic-oriented relationship strength network
Ellis et al. Equality and social mobility in Twitter discussion groups
Nerurkar et al. Understanding attribute and social circle correlation in social networks
Smailovic et al. Mining Social Networks for Calculation of SmartSocial Influence.
Guleva et al. Topology of thematic communities in online social networks: a comparative study
Velichety et al. A cross-sectional and temporal analysis of information consumption on twitter
Rangnani et al. Autoregressive model for users’ retweeting profiles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Pan Jianguo

Inventor after: Zhang Bo

Inventor after: Zhang Qian

Inventor after: Li Meizi

Inventor after: Zhao Qin

Inventor before: Zhang Bo

Inventor before: Zhang Qian

Inventor before: Li Meizi

Inventor before: Pan Jianguo

Inventor before: Zhao Qin