CN107633260B - Social network opinion leader mining method based on clustering - Google Patents
Social network opinion leader mining method based on clustering Download PDFInfo
- Publication number
- CN107633260B CN107633260B CN201710729792.XA CN201710729792A CN107633260B CN 107633260 B CN107633260 B CN 107633260B CN 201710729792 A CN201710729792 A CN 201710729792A CN 107633260 B CN107633260 B CN 107633260B
- Authority
- CN
- China
- Prior art keywords
- user
- information
- users
- total number
- social network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention relates to a social network opinion leader mining method based on clustering, which is used for acquiring opinion leaders in social network users and comprises the following steps: 1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model; 2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result; 3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence; 4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users. Compared with the prior art, the method has the advantages of comprehensive consideration, accurate evaluation, accurate calculation and the like.
Description
Technical Field
The invention relates to the technical field of social networks, in particular to a social network opinion leader mining method based on clustering.
Background
The social network opinion leader has a significant impact on people in terms of thought, experience and action. And because of the openness of social networks, they are more influential in information dissemination than ordinary users. No doubt, the research on opinion leaders is one of the most important researches in the field of social network user analysis, and is widely applied to analytical prediction of information dissemination, public opinion guidance and supervision, and commercial development of social networks.
The handling of large data in opinion leader mining remains a challenge. Most opinion leader mining algorithms do not distinguish and evaluate the influence of users in the whole network, and the more users in the social network, the higher the time complexity of the calculation process. Cha M et al analyze and mine opinion leaders using user degrees, user mentions, publication forwarded or quoted, and other numerical values; on the basis of LeaderRank, the Xushiring et al adds the emotional tendency and the liveness of the user to conduct opinion leader mining; wu 23704Hakke et al construct topic-related entitled microblog graph models, and adopt a random walk idea to search central points of the graph models so as to mine opinion leaders in microblogs; new Caojiu et al firstly identify and obtain a theme community, then measure the influence of the user from three dimensions of structure, behavior and emotion respectively, and propose an MFP algorithm to mine opinion leaders; the Chen Yuan et al identifies the opinion leaders according to the position of the structural hole, the centrality position and the edge position in the social network; calculating user leadership by comprehensively considering user activeness and user influence, and mining the user leadership in a social network by combining user centrality; wu Chao et al constructs a reply relationship graph according to the post reply relationship under a specific topic, and identifies the opinion leader by comprehensively considering the reply emotion tendency, the reply path clustering and the reply text similarity. The technology does not classify the users in the social network, screens out the users which can become opinion leaders, but provides a foundation for mining the opinion leaders of the social network from a clustering angle.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a clustering-based social network opinion leader mining method which is comprehensive in consideration, accurate in evaluation and accurate in calculation.
The purpose of the invention can be realized by the following technical scheme:
a social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users and comprises the following steps:
1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model;
2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result;
3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence;
4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users.
In the step 1), the calculation formula of the user's degree of entry is:
wherein D isI(u) is the in-degree of user u,vudefined as when user v is a follower of user u, then there isvuWhen user v is not the follower of user u, then there is 1vuV is the set of users in the social network, 0.
In the step 1), the calculation formula of the clustering coefficient of the incoming degree group is as follows:
wherein, CI(u) is the aggregation coefficient of the degree group of user u, n is the total number of users of the degree group of user u, P is the set of degree groups of user u, M (v) is the total number of directed edges actually existing between users having a direct edge relationship with user v, and N (v) is the total number of users having a direct edge relationship with user v.
In step 1), the formula for calculating the mediation centrality of the user is as follows:
wherein, BI(u) is the mediation centrality, σ, of user umn(u) is the number of shortest paths between user m and user n that pass through user u, σmnTo useThe total number of shortest paths between user m and user n.
In the step 2), selecting users in the clusters which simultaneously meet the conditions of maximum element income of the cluster center, maximum clustering coefficient of an income group and maximum medium centrality from the clusters obtained by clustering results, and adding the users into an opinion leader candidate user set L.
When the cluster which simultaneously meets the three conditions of the maximum element income degree of the cluster center, the maximum aggregation coefficient of the income degree group and the maximum medium centrality does not exist, the cluster which simultaneously meets any two conditions is selected, and the user in the cluster is added into the opinion leader candidate user set L.
In the step 3), the calculation formula of the user activity degree is as follows:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α1+α2+α3=1
ΔTp=Tnow-Tfirstpublish
ΔTf=Tnow-Tfirstforward
ΔTe=Tnow-Tfirstevaluate
wherein, ua (u) is the user activity of the user u, FP (u) is the frequency of issuing information by the user u, FF (u) is the frequency of forwarding information by the user u, FE (u) is the frequency of commenting information by the user u, FP ' (u), FF ' (u) and FE ' (u) are respectively the values after standardized processing of FP (u), FF (u) and FE (u) min-max, and Δ TpFor user u at the time T of acquiring datanowTime T corresponding to earliest released informationfirstpublishThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TpTotal number of messages released, Δ T, within the time offFor user u at the time T of acquiring datanowTime T of earliest forwarding informationfirstforwardThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TfTotal number of information forwarded in time, Δ TeFor user u at the time T of acquiring datanowTime T of the earliest comment informationfirstevaluateThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TeTotal number of messages forwarded in time of α1、α2、α3The weight value after distribution through the hierarchical analysis.
In the step 3), the calculation formula of the user influence is as follows:
wherein UI (u) is user influence of user u, UI (v) is user influence of user v, ADvuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,videfined as when user v forwards or comments information i of user u, and the information i covers user v, there isviWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information ivi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the set of users v.
In the step 3), the calculation formula of the user leadership is as follows:
ULD(u)=UI(u)×UA(u)
wherein ULD (u) is the user leadership of user u.
The step 4) specifically comprises the following steps:
and sorting all users in the opinion leader candidate user set L from big to small according to the leader power of the users, and selecting the first K users as opinion leaders.
Compared with the prior art, the invention has the following advantages:
firstly, considering comprehensively: and screening an opinion leader candidate user set from the aspect of topological attributes, determining a technical framework of opinion leaders from the aspect of user attributes, and comprehensively considering the topological attributes and the user attributes to avoid one-sidedness of an analysis result caused by only using partial attributes.
Secondly, the evaluation is accurate: the definition and calculation method of the user income group clustering coefficient. Considering followers having direct edge relation and indirect edge relation with the user as the user's entrance group, the clustering coefficient of the user's entrance group more accurately evaluates the closeness of the relation among the followers of the user's entrance group members, and the closer the relation is, the more likely the information spread by the user is to be diffused. The user income group clustering coefficient is used as an element of cluster analysis, so that the level of cluster analysis can be effectively improved, and the obtained candidate user set member of the opinion leader has sufficient opinion leader characteristics.
Thirdly, accurate calculation: the calculation method of the user leadership comprises the following steps: calculating the activity of the user and calculating the influence of the user. The user liveness effectively ensures that the mined opinion leaders are active in the social network. Meanwhile, various sources of user influence are comprehensively considered: the influence contribution of the user (information coverage rate) and the follower enables the calculation of the influence of the user to be more accurate.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of an example network node of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
The invention first defines the social network:
the social network model is as follows: the social network is formalized as graph G ═ V, E, R, where V represents the set of users in the social network, E describes the set of relationships between users, and R is an N × N matrix representing the relationships between users.
Topological property: the topological attributes are a set of functions of relationships between points and points, edges and edges, and points and edges in the directed graph G.
User attributes: user attributes are quantitative relationships between various actions of a user in a social network.
The leader power of the user: opinion leaders are people in social networks who have the ability to disseminate information, and user leadership is a quantification of this ability. The user leadership mainly depends on the activity and influence of the user.
The attendee and follower: the attendee and the follower appear in pairs, a directed edge exists between the user u and the user v in the social network, the source point of the directed edge is u, the end point is v, the user u is the follower of the user v, and the user v is the attendee of the user u.
Suggestion collar-sleeve: the opinion leaders are K users with the largest leadership in the social network and are represented by O, and meanwhile, the non-opinion leaders in the social network are ordinary users and are represented by C.
Opinion leader candidate user set: the opinion leader candidate user set L is a set of users considered to be the most likely opinion leader before accurately calculating the user leadership of the users.
As shown in FIG. 1, the technology for mining social network opinion leader from clustering angle according to the present invention is provided
The method for mining the social network opinion leader technology from the clustering angle comprises the following specific steps:
A. concepts and modeling of social networks related to opinion leader mining are defined.
B, (1) calculating the degree of entry and the mesocentricity of the users according to the relation of the directed edges among the users; counting the user forming user entrance degree groups which can be concerned about users through one edge or two edges, and calculating the user entrance degree group aggregation coefficient; (2) combining the user's income, intermediary centrality and income group clustering coefficient, clustering the users by using K-means + + improved K-means clustering algorithm; (3) among the plurality of clusters obtained, users in the cluster having a greater degree of income, a greater centrality of intermediaries, and a greater group clustering coefficient of income are added to the opinion leader candidate user set.
Respectively counting the times of information release, forwarding and comment of a user, the time interval of information release and information acquisition at first, the time interval of information forwarding and information acquisition at first, the time interval of information comment and information acquisition at first, calculating the average number of information release, forwarding and comment of the user in unit time, and calculating to obtain the user activity; (2) counting the total number of information received by a user, the total number of forwarded and commented information, the number of forwarded information and commented information of each user to each concerned person, and calculating the attention degree of each user concerned by the user; (3) calculating the number of people who forward or comment the information issued and forwarded by the user to obtain the information coverage rate of the user; (4) and (3) regarding the attention of each follower of the user to the user as the contribution weight of the user follower to the influence of the user, calculating the influence obtained by the user from the followers, and calculating the influence of the user by combining the information coverage rate.
D. Calculating the leadership of the user according to the calculated user activity and the influence of the user; and sorting the calculated user leadership in a descending order, and taking the K users with the maximum user leadership as opinion leaders.
(1) Selection of opinion leader candidate user set
When the user is an opinion leader in the social network, a plurality of users can choose to pay attention to the user and follow the user, the end point represented on the graph G, which is a plurality of edges, is the user, and the degree of entry of the user on the social network G is represented as:
wherein when user v is a follower of user uvu1 when user v is not a follower of user uvu0. Obviously, the user's degree of income reflects whether the user can become an opinion leader or not from the number of user followers, and a user with high user degree of income is more likely to become an opinion leader than a user with low user degree of income. The entrance clique clustering coefficient considers the possibility of the user becoming an opinion leader in terms of the closeness of the relationship between the user followers, when one directed edge points to u from v, the user v is an entrance clique member of the user u, when the other directed edge points to v from w, the user w is also an entrance clique member of the user u, and the entrance clique clustering coefficient is expressed as:
wherein, P is the set of the entrance groups of the user u, n is the total number of the entrance groups of the user u, and N (v) is the total number of the users having direct edge relation with the user v. And M (v) is the total number of directed edges that actually exist between users having a direct edge relationship with user v. In addition, the mediation is the embodiment of the ability of the user to bear information in the whole social network G, and is represented as follows:
wherein sigmamnIs the total number of shortest paths between user m and user n, σmn(u) is the number of shortest paths passing through user u among the shortest paths between user m and user n, BI(u) the probability that the shortest path of any two users in the social network passes through user u can be measured.
Carrying out clustering analysis on the user by using a K-means + + improved K-means algorithm; selecting clusters meeting the conditions from the clusters obtained by clustering: the element income of the cluster center is maximum, the income group clustering coefficient is maximum, the medium centrality is maximum (three maximum conditions), if no cluster meeting the three maximum conditions exists, the cluster meeting the two maximum conditions can be selected, and the users in the cluster are added into the opinion leader candidate user set L.
(2) Calculation of user liveness
The user does not always receive information in the social network, and the liveness of the user on the social network can be calculated by calculating the number of information which is published, forwarded and commented by the user in unit time. The frequency of the information issued by the user u is defined as:
ΔTp=Tnow-Tfirstpublish
wherein, TnowIs the time of acquisition of the data, TfirstpublishIs the time when the user u first releases information in the obtained data,is that user u is at Δ TpThe collection of information is published within the time of (c),is that user u is at Δ TpIs released within timeThe total number of messages. The larger the FP (u) value is, the more active the user u publishes information. The frequency at which the user forwards information is defined as:
ΔTf=Tnow-Tfirstforward
wherein, TfirstforwardIs the time at which user u first forwards information in the obtained data,is that user u is at Δ TfThe set of information is forwarded within the time of (a),is that user u is at Δ TfThe total number of messages forwarded over time. The larger the value of FF (u), the more active the user u forwards information. The frequency of user comment information is defined as:
wherein, TfirstevaluateIs the time at which user u first reviews the information in the obtained data,is that user u is at Δ TeThe collection of review information within the time of (c),is that user u is at Δ TeThe total number of review messages within the time of (a). The larger the FE (u) value, the more active the user u reviews the others information.
Carrying out min-max standardization processing on the frequency of information released by a user, the frequency of information forwarded and the frequency of comment information:
wherein the content of the first and second substances,andthe minimum information issuing frequency, the minimum information forwarding frequency and the minimum comment information frequency of the users in the opinion leader candidate user set L are respectively.Andthe maximum information issuing frequency, the maximum information forwarding frequency and the maximum comment information frequency of the users in the opinion leader candidate user set L are respectively.
According to the frequency of information issued by users, the frequency of information forwarded and the frequency of comment information, the user activity of users in the opinion leader candidate user set is evaluated, and the following definitions are provided:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α therein1+α2+α3=1,α1,α2And α3The value of (c) is assigned using Analytic Hierarchy Process (AHP).
(3) Calculation of user influence
1) Calculation of degree of attention of user to attention person
The user may have multiple followers, but the user does not necessarily forward or comment on information of each follower, and the attention of the user v to the user u is defined as follows:
where focus (v) is a set of attendees of user v, | for (u) | is the total number of forwarded information of user u, | pub (u)) | is the total number of issued information of user u, | eva (v)) | is the total number of comments by user to other information, | for (v.source ═ u) | is the total number of forwarded information of user u by user v, | eva (v.source ═ u) | is the total number of comments by user v to user u. ADvuThe larger the value of (b), the more attention the user v pays to the user u.
2) Calculation of user information coverage
For each user in the social network, the information coverage of the user accounts for the influence of the user to some extent, and is defined as follows:
wherein the content of the first and second substances,viequal to 1, indicates that user v forwarded or commented on information i of user u, information i is overlaid on user v,viequal to 0 indicates that the user does not forward or comment the information i of the user u, the information i does not cover the user V | is the total number of users in the social network, pub (u) ∪ for (u) is a set of information published or forwarded by the user u, and the information published or forwarded by the user u belongs to the information of the user u.
3) Calculation of user influence
The information coverage rate of the user, the attention of the follower of the user to the user and the influence of the follower of the user are integrated to obtain the influence of the user, and the influence of the user is defined as follows:
where fas (u) is the follower set of users, and when the user is one of the opinion leader user candidate set, the initial impact of the user is 1, otherwise the initial impact of the user is the ratio of the user's degree to the total number of users in the social network.
4) Selection of opinion leaders
The leadership of the user is obtained by integrating the activity of the user and the influence of the user, and is defined as follows:
ULD(u)=UI(u)×UA(u)
and sorting the user leadership of the users in the opinion leader candidate user set L according to the sequence from big to small, and selecting the first K users in the former sorting as opinion leaders.
Example (b):
the following example is presented to illustrate a technique for mining opinion leaders in a social network from a clustering perspective (as shown in FIG. 2).
1. Selection of opinion leader candidate user set
1) In degree calculation
In FIG. 2, user V1 is at an in-degree DI(V1) ═ 1, user V2 penetration DI(V2) ═ 2, user V3 penetration DI(V3) ═ 2, user V4 penetration DI(V4) ═ 2, user V5 penetration DI(V5)=2。
2) Calculation of group clustering coefficients of degree of entry
For user V1, the user with edge pointing to V1 is V2, the user with edge pointing to V2 is V1 and V3, so the income bracket of user V1 is (V1, V2, V3),
for user V2, users with edges pointing to V2 are V1, V3, and V5 has edges pointing to V3, so the entry cliques of user V2 are (V1, V2, V3, V5),
for user V3, users with edges pointing to V3 are V1, V5, V2 with edges pointing to V1, and V4 with edges pointing to V5, so that the entry groups of V3 are (V1, V2, V3, V4, V5),
for user V4, the users with edge pointing to V4 are V3, V5, while V1 has edge pointing to V3, V2 points to V5, so the in-degree cliques of user V4 are (V1, V2, V3, V4, V5),
for user V5, the users with edges pointing to V5 are V2, V4, while V3 has edges pointing to V4 and V1 has edges pointing to V2, so the entry groups of user V5 are (V1, V2, V3, V4, V5),
3) intermediary centric computing
The shortest path that exists in the figure is:
the shortest path through V1 in the shortest paths has 1V 2 → V1 → V3, while the total number of shortest paths from V2 to V3 is 2, so the mesocentrality of V1 is:
among the shortest paths passing through V2, there are 5, V1 → V2 → V5, V3 → V2 → V1, V3 → V2 → V5, V4 → V5 → V3 → V2 → V1, V5 → V3 → V2 → V1, and the total number of shortest paths from V3 to V5 is 2, so the intermediating centrality of V2 is:
among the shortest paths through V3, there are 5, V1 → V3 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V5 → V3 → V2 →V1, V5 → V3 → V2, so the mesocentrality of V3 is: b isI(V3)=1+1+1+1+1=5
The shortest path through V4 in the shortest paths has 1, V3 → V4 → V5, while the total number of shortest paths from V3 to V5 is 2, so the intermediation centrality of V4 is:
among the shortest paths passing through V5, there are 5, V2 → V5 → V3, V2 → V5 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V4 → V5 → V3, and the total number of shortest paths from V2 to V3 is 2, so the intermediating centrality of V5 is:
the value of the set of topological properties for each user u is DI(u),CI(u),BI(u) min-max normalization, clustering using K-means algorithm modified by K-means + + algorithm, in this example, selecting to divide the user into two clusters. Cluster 1(V1, V4) and cluster 2(V2, V3, V5) are obtained, and because the cluster 2 centers clustered in both the degree of entry and the mesocentrality are larger than the cluster 1 center, cluster 2(V2, V3, V5) is selected as the opinion leader candidate user set, and L ═ is (V2, V3, V5).
2. User leadership calculation
1) Calculation of user liveness
a) Frequency calculation of information released, forwarded and commented
As can be seen from the Table 1, T is the user V2now20 pieces of issued information T, 2017-4-22firstpublishCalculated as Δ T2017-4-1p=21,Forwarding information 6 pieces, TfirstforwardCalculated as Δ T2017-4-2f=20,Number of comment information 10, TfirstevaluateCalculated as Δ T2017-4-3e=19,
For user V3, Tnow2017-4-22, the number of issued messages is 15, TfirstpublishCalculated as Δ T2017-3-20p=33,Forwarding information 10 pieces, TfirstforwardCalculated as Δ T2017-4-5f=17,Comment information 5 pieces, TfirstevaluateCalculated as Δ T2017-4-15e=7,
For user V5, Tnow2017-4-22, 5 pieces of issued information, TfirstpublishCalculated as Δ T2017-3-1p=52,Forwarding information 15 pieces, TfirstforwardCalculated as Δ T2017-4-10f=12,Comment information 5 pieces, TfirstevaluateCalculated as Δ T2017-4-1e=21,
TABLE 1 user-related information Table
User' s | V1 | V2 | V3 | V4 | V5 |
Number of issued messages | 1 | 20 | 15 | 3 | 5 |
Time of first release of information | 2017-3-1 | 2017-4-1 | 2017-3-20 | 2017-2-18 | 2017-3-1 |
Number of messages to be forwarded | 3 | 6 | 10 | 5 | 15 |
Time of first forwarding information | 2017-3-9 | 2017-4-2 | 2017-4-5 | 2017-4-7 | 2017-4-10 |
Number of comment information | 3 | 10 | 5 | 4 | 5 |
Time of first comment on information | 2017-3-3 | 2017-4-3 | 2017-4-15 | 2017-1-1 | 2017-4-1 |
Obtaining information time | 2017-4-22 | 2017-4-22 | 2017-4-22 | 2017-4-22 | 2017-4-22 |
b) Frequency standardization of publishing, forwarding and commenting information
The information frequency of release, information frequency of transmission and frequency information frequency are standardized by min-max to obtain the following table
c) User liveness calculation
And determining the weight of the frequency of the information to be released, forwarded and commented in the user activity by using an analytic hierarchy process. Constructing a decision matrix
Get the weight value of α1=0.636985,α2=0.2582850,α3Calculated as 0.1047294:
UA(V2)=0.636985×1+0.2582850×0+0.1047294×0.62=0.70
UA(V3)=0.636985×0.41+0.2582850×0.31+0.1047294×1=0.45
UA(V5)=0.636985×0+0.2582850×1+0.1047294×0=0.26
2. calculation of user influence
According to the analysis of the content in the table 2, the following results are obtained:
table 2. user forwarding and comment information source table
User' s | Forwarding information source users | Comment information source user |
V1 | V2:1,V3:2 | V2:1,V3:2 |
V2 | V5:3,V4:2,V1:1 | V5:7,V1:2,V3:1 |
V3 | V2:4,V4:5,V5:1 | V2:2,V4:3 |
V4 | V5:3,V3:2 | V5:4 |
V5 | V3:10,V4:5 | V4:5 |
Note: v: n indicates that the source of n pieces of information is user V
a) Degree of attention of user
For user V2, the followers are user V1 and user V3:
wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V2 is:
wherein the followers of the V3 are the user V2 and the user V4, and therefore the degree of attention of the user V3 to the user V2 is:
for user V3, the followers are user V1 and user V5:
wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V3 is:
wherein the followers of the V5 are the user V3 and the user V4, and therefore the degree of attention of the user V5 to the user V3 is:
for user V5, the followers are users V2 and V4:
wherein the followers of the V2 are the user V1 and the user V5, and therefore the degree of attention of the user V2 to the user V5 is:
wherein the attendee of V4 is user V5, so the degree of attention of user V4 to user V5 is:
b) calculation of user information coverage
The analysis was performed according to the contents of table 3, table 4, table 5.
TABLE 3 information coverage table of UserV 2
Information number of user V2 | Number of covered people |
2 | 3 |
4 | 2 |
15 | 4 |
20 | 2 |
26 | 1 |
Note: the number of other information covered is 0;
TABLE 4 information coverage table of USER V3
Information number of user V3 | Number of covered people |
6 | 3 |
9 | 1 |
25 | 4 |
Note: the number of other information covered is 0;
TABLE 5 information coverage table of UserV 5
Information number of user V5 | Number of covered people |
1 | 2 |
8 | 2 |
13 | 4 |
19 | 3 |
20 | 1 |
Note: the number of other information covered is 0;
the information coverage of user V2 is calculated as:
the information coverage of user V3 is calculated as:
the information coverage of user V5 is calculated as:
C) calculation of user influence
And integrating the user attention and the user information coverage rate, and calculating to obtain the user influence.
The followers of the user V2 are V1 and V3, so UI (V2) ═ CR (V2) + AD(V1)(V2)×UI(V1)+AD(V3)(V2)The followers of × UI (V3) to 0.09+0.17 × 0.6+0.31 × 1 to 0.502 user V3 are V1 and V5 because of the fact thatThis UI (V3) ═ CR (V3) + AD(V1)(V3)×UI(V1)+AD(V5)(V3)× UI (V5) ═ 0.06+0.33 × 0.6+0.38 × 1 ═ 0.638 followers of user V5 are V2 and V4, so UI (V5) ═ CR (V5) + AD(V2)(V5)×UI(V2)+AD(V4)(V5)×UI(V4)=0.12+0.52×1+0.78×0.6=1.108
3. Calculation of user leadership
And (3) integrating the user activity and the user influence, and calculating to obtain the user leadership:
ULD(V2)=UI(V2)×UA(V2)=0.70×0.502=0.3514
ULD(V3)=UI(V3)×UA(V3)=0.45×0.638=0.2871
ULD(V5)=UI(V5)×UA(V5)=0.26×1.108=0.28808
the leadership of the users in L is sorted in descending order,
ULD(V2)>ULD(V5)>ULD(V3)
if an opinion leader needs to be mined, the user V2 is the opinion leader; if two opinion leaders need to be mined, user V2 and user V5 are opinion leaders.
In summary, the present invention provides a technique for mining opinion leaders of a social network from a clustering perspective based on topological attributes of the social network and by making full use of user attributes of users in the social network. As the prior opinion leader mining technology rarely considers that only part of users in a social network have the condition of becoming the opinion leader, aiming at the problem, the invention utilizes the topological attribute of the users in the social network to perform clustering, screens out the opinion leader candidate user set with the condition of becoming the opinion leader, analyzes the leadership of the opinion leader candidate user set, and excavates the opinion leader with liveness and influence.
Claims (9)
1. A social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users, and is characterized by comprising the following steps:
1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model;
2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result;
3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence, wherein the calculation formula of the user activity is as follows:
UA(u)=α1FP'(u)+α2FF'(u)+α3FE'(u)
α1+α2+α3=1
ΔTp=Tnow-Tfirstpublish
ΔTf=Tnow-Tfirstforward
ΔTe=Tnow-Tfirstevaluate
wherein, ua (u) is the user activity of the user u, FP (u) is the frequency of issuing information by the user u, FF (u) is the frequency of forwarding information by the user u, FE (u) is the frequency of commenting information by the user u, FP ' (u), FF ' (u) and FE ' (u) are respectively the values after standardized processing of FP (u), FF (u) and FE (u) min-max, and Δ TpFor user u at the time T of acquiring datanowTime T corresponding to earliest released informationfirstpublishThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TpTotal of published information within time ofNumber, Δ TfFor user u at the time T of acquiring datanowTime T of earliest forwarding informationfirstforwardThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TfTotal number of information forwarded in time, Δ TeFor user u at the time T of acquiring datanowTime T of the earliest comment informationfirstevaluateThe interval between the two-dimensional structure and the three-dimensional structure,for user u at Δ TeTotal number of messages forwarded in time of α1、α2、α3The weight value is distributed through hierarchical analysis;
4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users.
2. The method as claimed in claim 1, wherein the user's degree of entry in step 1) is calculated by the following formula:
wherein D isI(u) is the in-degree of user u,vudefined as when user v is a follower of user u, then there isvuWhen user v is not the follower of user u, then there is 1vuV is the set of users in the social network, 0.
3. The method as claimed in claim 1, wherein the clustering coefficient of the entrance group in step 1) is calculated as:
wherein, CI(u) is the aggregation coefficient of the degree group of user u, n is the total number of users of the degree group of user u, P is the set of degree groups of user u, M (v) is the total number of directed edges actually existing between users having a direct edge relationship with user v, and N (v) is the total number of users having a direct edge relationship with user v.
4. The method as claimed in claim 1, wherein the calculation formula of the mediation center of the user in the step 1) is:
wherein, BI(u) is the mediation centrality, σ, of user umn(u) is the number of shortest paths between user m and user n that pass through user u, σmnV is the total number of shortest paths between user m and user n, and is the set of users in the social network.
5. The method as claimed in claim 1, wherein in the step 2), the users in the clusters satisfying the conditions of maximum element income of cluster center, maximum clustering coefficient of income group and maximum medium centrality are selected to join the opinion leader candidate user set L.
6. The method as claimed in claim 5, wherein when there is no cluster satisfying three conditions of maximum element income, maximum clustering coefficient of income groups, and maximum medium centrality at the center of the cluster, selecting a cluster satisfying any two conditions at the same time, and adding the user in the candidate opinion leader set L.
7. The method as claimed in claim 1, wherein the user influence in step 3) is calculated by the following formula:
wherein UI (u) is user influence of user u, UI (v) is user influence of user v, ADvuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,videfined as when user v forwards or comments information i of user u, and the information i covers user v, there isviWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information ivi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the total number of user v's forwarded others information, and "v | is the total number of user k issued informationA set of attendees.
8. The method as claimed in claim 7, wherein the user leadership is calculated in step 3) according to the following formula:
ULD(u)=UI(u)×UA(u)
wherein ULD (u) is the user leadership of user u.
9. The clustering-based social network opinion leader mining method according to claim 6, wherein the step 4) specifically comprises the steps of:
and sorting all users in the opinion leader candidate user set L from big to small according to the leader power of the users, and selecting the first K users as opinion leaders.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710729792.XA CN107633260B (en) | 2017-08-23 | 2017-08-23 | Social network opinion leader mining method based on clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710729792.XA CN107633260B (en) | 2017-08-23 | 2017-08-23 | Social network opinion leader mining method based on clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107633260A CN107633260A (en) | 2018-01-26 |
CN107633260B true CN107633260B (en) | 2020-10-16 |
Family
ID=61099724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710729792.XA Active CN107633260B (en) | 2017-08-23 | 2017-08-23 | Social network opinion leader mining method based on clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107633260B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647247A (en) * | 2018-04-16 | 2018-10-12 | 国家计算机网络与信息安全管理中心 | Key node recognition methods is propagated based on the micro-blog information for improving PageRank algorithms |
CN109146700B (en) * | 2018-08-14 | 2021-11-02 | 西华大学 | Method for extracting influence characteristics of social network leader |
CN109657105B (en) * | 2018-12-25 | 2021-10-22 | 杭州灿八科技有限公司 | Method for acquiring target user |
CN109919794B (en) * | 2019-03-14 | 2022-07-29 | 哈尔滨工程大学 | Microblog user trust evaluation method based on trust propagation |
CN110110084A (en) * | 2019-04-23 | 2019-08-09 | 北京科技大学 | The recognition methods of high quality user-generated content |
CN110489658A (en) * | 2019-07-12 | 2019-11-22 | 北京邮电大学 | Online social network opinion leader method for digging based on digraph model |
CN110737804B (en) * | 2019-09-20 | 2022-04-22 | 华中科技大学 | Graph processing access optimization method and system based on activity degree layout |
CN111681120A (en) * | 2020-05-20 | 2020-09-18 | 卓尔智联(武汉)研究院有限公司 | Core user determination method, device and storage medium |
CN112667876B (en) * | 2020-12-24 | 2024-04-09 | 湖北第二师范学院 | Opinion leader group identification method based on PSOTVCF-Kmeans algorithm |
CN112785156A (en) * | 2021-01-23 | 2021-05-11 | 罗家德 | Industrial leader identification method based on clustering and comprehensive evaluation |
CN113158082B (en) * | 2021-05-13 | 2023-01-17 | 和鸿广科技(上海)有限公司 | Artificial intelligence-based media content reality degree analysis method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102890702A (en) * | 2012-07-19 | 2013-01-23 | 中国人民解放军国防科学技术大学 | Internet forum-oriented opinion leader mining method |
CN103024017A (en) * | 2012-12-04 | 2013-04-03 | 武汉大学 | Method for distinguishing important goals and community groups of social network |
CN103136331A (en) * | 2013-01-18 | 2013-06-05 | 西北工业大学 | Micro blog network opinion leader identification method |
CN104035987A (en) * | 2014-05-30 | 2014-09-10 | 南京邮电大学 | Method for ranking microblog network user influence |
CN104615717A (en) * | 2015-02-05 | 2015-05-13 | 北京航空航天大学 | Multi-dimension assessment method for social network emergency |
CN105260474A (en) * | 2015-10-29 | 2016-01-20 | 俞定国 | Microblog user influence computing method based on information interaction network |
-
2017
- 2017-08-23 CN CN201710729792.XA patent/CN107633260B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102890702A (en) * | 2012-07-19 | 2013-01-23 | 中国人民解放军国防科学技术大学 | Internet forum-oriented opinion leader mining method |
CN103024017A (en) * | 2012-12-04 | 2013-04-03 | 武汉大学 | Method for distinguishing important goals and community groups of social network |
CN103136331A (en) * | 2013-01-18 | 2013-06-05 | 西北工业大学 | Micro blog network opinion leader identification method |
CN104035987A (en) * | 2014-05-30 | 2014-09-10 | 南京邮电大学 | Method for ranking microblog network user influence |
CN104615717A (en) * | 2015-02-05 | 2015-05-13 | 北京航空航天大学 | Multi-dimension assessment method for social network emergency |
CN105260474A (en) * | 2015-10-29 | 2016-01-20 | 俞定国 | Microblog user influence computing method based on information interaction network |
Non-Patent Citations (3)
Title |
---|
Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths;Tore Opsahl等;《Social Networks》;20100420;第32卷(第31期);1-25页 * |
网络舆情突发事件参与者群体构成及行为的实证研究;李纲,陈璟浩;《信息资源管理学报》;20121217;第42-49页 * |
网络舆论形成过程中意见领袖形成模型研究;胡勇等;《四川大学学报(自然科学版)》;20080430;第45卷(第2期);第347-351页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107633260A (en) | 2018-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107633260B (en) | Social network opinion leader mining method based on clustering | |
Tinati et al. | Identifying communicator roles in twitter | |
Backstrom et al. | Preferential behavior in online groups | |
CN109063010B (en) | Opinion leader mining method based on PageRank | |
Alvarez et al. | Sentiment cascades in the 15M movement | |
Del Vicario et al. | News consumption during the Italian referendum: A cross-platform analysis on facebook and twitter | |
Weng et al. | Topicality and impact in social media: diverse messages, focused messengers | |
Hoang et al. | Politics, sharing and emotion in microblogs | |
CN103218400B (en) | Based on link and network community user group's division methods of content of text | |
CN105849763A (en) | Systems and methods for dynamically determining influencers in a social data network using weighted analysis | |
Perdana et al. | Combining likes-retweet analysis and naive bayes classifier within twitter for sentiment analysis | |
Carley et al. | Ora & netmapper | |
Creamer et al. | Segmentation and automated social hierarchy detection through email network analysis | |
Aljohani et al. | Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks | |
Hoang et al. | On joint modeling of topical communities and personal interest in microblogs | |
Bródka | A method for group extraction and analysis in multilayer social networks | |
Huang et al. | Information fusion oriented heterogeneous social network for friend recommendation via community detection | |
Sun et al. | A bisecting K-Medoids clustering algorithm based on cloud model | |
Zhu et al. | Path prediction of information diffusion based on a topic-oriented relationship strength network | |
Ellis et al. | Equality and social mobility in Twitter discussion groups | |
Nerurkar et al. | Understanding attribute and social circle correlation in social networks | |
Smailovic et al. | Mining Social Networks for Calculation of SmartSocial Influence. | |
Guleva et al. | Topology of thematic communities in online social networks: a comparative study | |
Velichety et al. | A cross-sectional and temporal analysis of information consumption on twitter | |
Rangnani et al. | Autoregressive model for users’ retweeting profiles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Pan Jianguo Inventor after: Zhang Bo Inventor after: Zhang Qian Inventor after: Li Meizi Inventor after: Zhao Qin Inventor before: Zhang Bo Inventor before: Zhang Qian Inventor before: Li Meizi Inventor before: Pan Jianguo Inventor before: Zhao Qin |