CN107633260B

CN107633260B - Social network opinion leader mining method based on clustering

Info

Publication number: CN107633260B
Application number: CN201710729792.XA
Authority: CN
Inventors: 张波; 张倩; 李美子; 潘建国; 赵勤
Original assignee: Shanghai Normal University
Current assignee: Shanghai Normal University
Priority date: 2017-08-23
Filing date: 2017-08-23
Publication date: 2020-10-16
Anticipated expiration: 2037-08-23
Also published as: CN107633260A

Abstract

The invention relates to a social network opinion leader mining method based on clustering, which is used for acquiring opinion leaders in social network users and comprises the following steps: 1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model; 2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result; 3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence; 4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users. Compared with the prior art, the method has the advantages of comprehensive consideration, accurate evaluation, accurate calculation and the like.

Description

Social network opinion leader mining method based on clustering

Technical Field

The invention relates to the technical field of social networks, in particular to a social network opinion leader mining method based on clustering.

Background

The social network opinion leader has a significant impact on people in terms of thought, experience and action. And because of the openness of social networks, they are more influential in information dissemination than ordinary users. No doubt, the research on opinion leaders is one of the most important researches in the field of social network user analysis, and is widely applied to analytical prediction of information dissemination, public opinion guidance and supervision, and commercial development of social networks.

The handling of large data in opinion leader mining remains a challenge. Most opinion leader mining algorithms do not distinguish and evaluate the influence of users in the whole network, and the more users in the social network, the higher the time complexity of the calculation process. Cha M et al analyze and mine opinion leaders using user degrees, user mentions, publication forwarded or quoted, and other numerical values; on the basis of LeaderRank, the Xushiring et al adds the emotional tendency and the liveness of the user to conduct opinion leader mining; wu 23704Hakke et al construct topic-related entitled microblog graph models, and adopt a random walk idea to search central points of the graph models so as to mine opinion leaders in microblogs; new Caojiu et al firstly identify and obtain a theme community, then measure the influence of the user from three dimensions of structure, behavior and emotion respectively, and propose an MFP algorithm to mine opinion leaders; the Chen Yuan et al identifies the opinion leaders according to the position of the structural hole, the centrality position and the edge position in the social network; calculating user leadership by comprehensively considering user activeness and user influence, and mining the user leadership in a social network by combining user centrality; wu Chao et al constructs a reply relationship graph according to the post reply relationship under a specific topic, and identifies the opinion leader by comprehensively considering the reply emotion tendency, the reply path clustering and the reply text similarity. The technology does not classify the users in the social network, screens out the users which can become opinion leaders, but provides a foundation for mining the opinion leaders of the social network from a clustering angle.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a clustering-based social network opinion leader mining method which is comprehensive in consideration, accurate in evaluation and accurate in calculation.

The purpose of the invention can be realized by the following technical scheme:

a social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users and comprises the following steps:

1) establishing a social network model, and acquiring the income degree, the intermediary centrality and the clustering coefficient of an income degree group of each user in the social network model;

2) clustering users by adopting a K-means clustering algorithm according to the degree of entrance of the users, the centrality of the intermediaries and the clustering coefficient of the degree of entrance group, and acquiring an opinion leader candidate user set L in a clustering result;

3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence;

4) and obtaining the opinion leader by observing the leader candidate user set L according to the leader power of the users.

In the step 1), the calculation formula of the user's degree of entry is:

wherein D is^I(u) is the in-degree of user u,_vudefined as when user v is a follower of user u, then there is_vuWhen user v is not the follower of user u, then there is 1_vuV is the set of users in the social network, 0.

In the step 1), the calculation formula of the clustering coefficient of the incoming degree group is as follows:

wherein, C^I(u) is the aggregation coefficient of the degree group of user u, n is the total number of users of the degree group of user u, P is the set of degree groups of user u, M (v) is the total number of directed edges actually existing between users having a direct edge relationship with user v, and N (v) is the total number of users having a direct edge relationship with user v.

In step 1), the formula for calculating the mediation centrality of the user is as follows:

wherein, B^I(u) is the mediation centrality, σ, of user u_mn(u) is the number of shortest paths between user m and user n that pass through user u, σ_mnTo useThe total number of shortest paths between user m and user n.

In the step 2), selecting users in the clusters which simultaneously meet the conditions of maximum element income of the cluster center, maximum clustering coefficient of an income group and maximum medium centrality from the clusters obtained by clustering results, and adding the users into an opinion leader candidate user set L.

When the cluster which simultaneously meets the three conditions of the maximum element income degree of the cluster center, the maximum aggregation coefficient of the income degree group and the maximum medium centrality does not exist, the cluster which simultaneously meets any two conditions is selected, and the user in the cluster is added into the opinion leader candidate user set L.

In the step 3), the calculation formula of the user activity degree is as follows:

UA(u)＝α₁FP'(u)+α₂FF'(u)+α₃FE'(u)

α₁+α₂+α₃＝1

ΔT_p＝T_now-T_firstpublish

ΔT_f＝T_now-T_firstforward

ΔT_e＝T_now-T_{firstevaluate}

wherein, ua (u) is the user activity of the user u, FP (u) is the frequency of issuing information by the user u, FF (u) is the frequency of forwarding information by the user u, FE (u) is the frequency of commenting information by the user u, FP ' (u), FF ' (u) and FE ' (u) are respectively the values after standardized processing of FP (u), FF (u) and FE (u) min-max, and Δ T_pFor user u at the time T of acquiring data_nowTime T corresponding to earliest released information_firstpublishThe interval between the two-dimensional structure and the three-dimensional structure,

for user u at Δ T_pTotal number of messages released, Δ T, within the time of_fFor user u at the time T of acquiring data_nowTime T of earliest forwarding information_firstforwardThe interval between the two-dimensional structure and the three-dimensional structure,

for user u at Δ T_fTotal number of information forwarded in time, Δ T_eFor user u at the time T of acquiring data_nowTime T of the earliest comment information_{firstevaluate}The interval between the two-dimensional structure and the three-dimensional structure,

for user u at Δ T_eTotal number of messages forwarded in time of α₁、α₂、α₃The weight value after distribution through the hierarchical analysis.

In the step 3), the calculation formula of the user influence is as follows:

wherein UI (u) is user influence of user u, UI (v) is user influence of user v, AD_vuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,_videfined as when user v forwards or comments information i of user u, and the information i covers user v, there is_viWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information i_vi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the set of users v.

In the step 3), the calculation formula of the user leadership is as follows:

ULD(u)＝UI(u)×UA(u)

wherein ULD (u) is the user leadership of user u.

The step 4) specifically comprises the following steps:

and sorting all users in the opinion leader candidate user set L from big to small according to the leader power of the users, and selecting the first K users as opinion leaders.

Compared with the prior art, the invention has the following advantages:

firstly, considering comprehensively: and screening an opinion leader candidate user set from the aspect of topological attributes, determining a technical framework of opinion leaders from the aspect of user attributes, and comprehensively considering the topological attributes and the user attributes to avoid one-sidedness of an analysis result caused by only using partial attributes.

Secondly, the evaluation is accurate: the definition and calculation method of the user income group clustering coefficient. Considering followers having direct edge relation and indirect edge relation with the user as the user's entrance group, the clustering coefficient of the user's entrance group more accurately evaluates the closeness of the relation among the followers of the user's entrance group members, and the closer the relation is, the more likely the information spread by the user is to be diffused. The user income group clustering coefficient is used as an element of cluster analysis, so that the level of cluster analysis can be effectively improved, and the obtained candidate user set member of the opinion leader has sufficient opinion leader characteristics.

Thirdly, accurate calculation: the calculation method of the user leadership comprises the following steps: calculating the activity of the user and calculating the influence of the user. The user liveness effectively ensures that the mined opinion leaders are active in the social network. Meanwhile, various sources of user influence are comprehensively considered: the influence contribution of the user (information coverage rate) and the follower enables the calculation of the influence of the user to be more accurate.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram of an example network node of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Examples

The invention first defines the social network:

the social network model is as follows: the social network is formalized as graph G ═ V, E, R, where V represents the set of users in the social network, E describes the set of relationships between users, and R is an N × N matrix representing the relationships between users.

Topological property: the topological attributes are a set of functions of relationships between points and points, edges and edges, and points and edges in the directed graph G.

User attributes: user attributes are quantitative relationships between various actions of a user in a social network.

The leader power of the user: opinion leaders are people in social networks who have the ability to disseminate information, and user leadership is a quantification of this ability. The user leadership mainly depends on the activity and influence of the user.

The attendee and follower: the attendee and the follower appear in pairs, a directed edge exists between the user u and the user v in the social network, the source point of the directed edge is u, the end point is v, the user u is the follower of the user v, and the user v is the attendee of the user u.

Suggestion collar-sleeve: the opinion leaders are K users with the largest leadership in the social network and are represented by O, and meanwhile, the non-opinion leaders in the social network are ordinary users and are represented by C.

Opinion leader candidate user set: the opinion leader candidate user set L is a set of users considered to be the most likely opinion leader before accurately calculating the user leadership of the users.

As shown in FIG. 1, the technology for mining social network opinion leader from clustering angle according to the present invention is provided

The method for mining the social network opinion leader technology from the clustering angle comprises the following specific steps:

A. concepts and modeling of social networks related to opinion leader mining are defined.

B, (1) calculating the degree of entry and the mesocentricity of the users according to the relation of the directed edges among the users; counting the user forming user entrance degree groups which can be concerned about users through one edge or two edges, and calculating the user entrance degree group aggregation coefficient; (2) combining the user's income, intermediary centrality and income group clustering coefficient, clustering the users by using K-means + + improved K-means clustering algorithm; (3) among the plurality of clusters obtained, users in the cluster having a greater degree of income, a greater centrality of intermediaries, and a greater group clustering coefficient of income are added to the opinion leader candidate user set.

Respectively counting the times of information release, forwarding and comment of a user, the time interval of information release and information acquisition at first, the time interval of information forwarding and information acquisition at first, the time interval of information comment and information acquisition at first, calculating the average number of information release, forwarding and comment of the user in unit time, and calculating to obtain the user activity; (2) counting the total number of information received by a user, the total number of forwarded and commented information, the number of forwarded information and commented information of each user to each concerned person, and calculating the attention degree of each user concerned by the user; (3) calculating the number of people who forward or comment the information issued and forwarded by the user to obtain the information coverage rate of the user; (4) and (3) regarding the attention of each follower of the user to the user as the contribution weight of the user follower to the influence of the user, calculating the influence obtained by the user from the followers, and calculating the influence of the user by combining the information coverage rate.

D. Calculating the leadership of the user according to the calculated user activity and the influence of the user; and sorting the calculated user leadership in a descending order, and taking the K users with the maximum user leadership as opinion leaders.

(1) Selection of opinion leader candidate user set

When the user is an opinion leader in the social network, a plurality of users can choose to pay attention to the user and follow the user, the end point represented on the graph G, which is a plurality of edges, is the user, and the degree of entry of the user on the social network G is represented as:

wherein when user v is a follower of user u_vu1 when user v is not a follower of user u_vu0. Obviously, the user's degree of income reflects whether the user can become an opinion leader or not from the number of user followers, and a user with high user degree of income is more likely to become an opinion leader than a user with low user degree of income. The entrance clique clustering coefficient considers the possibility of the user becoming an opinion leader in terms of the closeness of the relationship between the user followers, when one directed edge points to u from v, the user v is an entrance clique member of the user u, when the other directed edge points to v from w, the user w is also an entrance clique member of the user u, and the entrance clique clustering coefficient is expressed as:

wherein, P is the set of the entrance groups of the user u, n is the total number of the entrance groups of the user u, and N (v) is the total number of the users having direct edge relation with the user v. And M (v) is the total number of directed edges that actually exist between users having a direct edge relationship with user v. In addition, the mediation is the embodiment of the ability of the user to bear information in the whole social network G, and is represented as follows:

wherein sigma_mnIs the total number of shortest paths between user m and user n, σ_mn(u) is the number of shortest paths passing through user u among the shortest paths between user m and user n, B^I(u) the probability that the shortest path of any two users in the social network passes through user u can be measured.

Carrying out clustering analysis on the user by using a K-means + + improved K-means algorithm; selecting clusters meeting the conditions from the clusters obtained by clustering: the element income of the cluster center is maximum, the income group clustering coefficient is maximum, the medium centrality is maximum (three maximum conditions), if no cluster meeting the three maximum conditions exists, the cluster meeting the two maximum conditions can be selected, and the users in the cluster are added into the opinion leader candidate user set L.

(2) Calculation of user liveness

The user does not always receive information in the social network, and the liveness of the user on the social network can be calculated by calculating the number of information which is published, forwarded and commented by the user in unit time. The frequency of the information issued by the user u is defined as:

ΔT_p＝T_now-T_firstpublish

wherein, T_nowIs the time of acquisition of the data, T_firstpublishIs the time when the user u first releases information in the obtained data,

is that user u is at Δ T_pThe collection of information is published within the time of (c),

is that user u is at Δ T_pIs released within timeThe total number of messages. The larger the FP (u) value is, the more active the user u publishes information. The frequency at which the user forwards information is defined as:

ΔT_f＝T_now-T_firstforward

wherein, T_firstforwardIs the time at which user u first forwards information in the obtained data,

is that user u is at Δ T_fThe set of information is forwarded within the time of (a),

is that user u is at Δ T_fThe total number of messages forwarded over time. The larger the value of FF (u), the more active the user u forwards information. The frequency of user comment information is defined as:

wherein, T_{firstevaluate}Is the time at which user u first reviews the information in the obtained data,

is that user u is at Δ T_eThe collection of review information within the time of (c),

is that user u is at Δ T_eThe total number of review messages within the time of (a). The larger the FE (u) value, the more active the user u reviews the others information.

Carrying out min-max standardization processing on the frequency of information released by a user, the frequency of information forwarded and the frequency of comment information:

wherein the content of the first and second substances,

and

the minimum information issuing frequency, the minimum information forwarding frequency and the minimum comment information frequency of the users in the opinion leader candidate user set L are respectively.

And

the maximum information issuing frequency, the maximum information forwarding frequency and the maximum comment information frequency of the users in the opinion leader candidate user set L are respectively.

According to the frequency of information issued by users, the frequency of information forwarded and the frequency of comment information, the user activity of users in the opinion leader candidate user set is evaluated, and the following definitions are provided:

UA(u)＝α₁FP'(u)+α₂FF'(u)+α₃FE'(u)

α therein₁+α₂+α₃＝1，α₁，α₂And α₃The value of (c) is assigned using Analytic Hierarchy Process (AHP).

(3) Calculation of user influence

1) Calculation of degree of attention of user to attention person

The user may have multiple followers, but the user does not necessarily forward or comment on information of each follower, and the attention of the user v to the user u is defined as follows:

where focus (v) is a set of attendees of user v, | for (u) | is the total number of forwarded information of user u, | pub (u)) | is the total number of issued information of user u, | eva (v)) | is the total number of comments by user to other information, | for (v.source ═ u) | is the total number of forwarded information of user u by user v, | eva (v.source ═ u) | is the total number of comments by user v to user u. AD_vuThe larger the value of (b), the more attention the user v pays to the user u.

2) Calculation of user information coverage

For each user in the social network, the information coverage of the user accounts for the influence of the user to some extent, and is defined as follows:

wherein the content of the first and second substances,_viequal to 1, indicates that user v forwarded or commented on information i of user u, information i is overlaid on user v,_viequal to 0 indicates that the user does not forward or comment the information i of the user u, the information i does not cover the user V | is the total number of users in the social network, pub (u) ∪ for (u) is a set of information published or forwarded by the user u, and the information published or forwarded by the user u belongs to the information of the user u.

3) Calculation of user influence

The information coverage rate of the user, the attention of the follower of the user to the user and the influence of the follower of the user are integrated to obtain the influence of the user, and the influence of the user is defined as follows:

where fas (u) is the follower set of users, and when the user is one of the opinion leader user candidate set, the initial impact of the user is 1, otherwise the initial impact of the user is the ratio of the user's degree to the total number of users in the social network.

4) Selection of opinion leaders

The leadership of the user is obtained by integrating the activity of the user and the influence of the user, and is defined as follows:

ULD(u)＝UI(u)×UA(u)

and sorting the user leadership of the users in the opinion leader candidate user set L according to the sequence from big to small, and selecting the first K users in the former sorting as opinion leaders.

Example (b):

the following example is presented to illustrate a technique for mining opinion leaders in a social network from a clustering perspective (as shown in FIG. 2).

1. Selection of opinion leader candidate user set

1) In degree calculation

In FIG. 2, user V1 is at an in-degree D^I(V1) ═ 1, user V2 penetration D^I(V2) ═ 2, user V3 penetration D^I(V3) ═ 2, user V4 penetration D^I(V4) ═ 2, user V5 penetration D^I(V5)＝2。

2) Calculation of group clustering coefficients of degree of entry

For user V1, the user with edge pointing to V1 is V2, the user with edge pointing to V2 is V1 and V3, so the income bracket of user V1 is (V1, V2, V3),

for user V2, users with edges pointing to V2 are V1, V3, and V5 has edges pointing to V3, so the entry cliques of user V2 are (V1, V2, V3, V5),

for user V3, users with edges pointing to V3 are V1, V5, V2 with edges pointing to V1, and V4 with edges pointing to V5, so that the entry groups of V3 are (V1, V2, V3, V4, V5),

for user V4, the users with edge pointing to V4 are V3, V5, while V1 has edge pointing to V3, V2 points to V5, so the in-degree cliques of user V4 are (V1, V2, V3, V4, V5),

for user V5, the users with edges pointing to V5 are V2, V4, while V3 has edges pointing to V4 and V1 has edges pointing to V2, so the entry groups of user V5 are (V1, V2, V3, V4, V5),

3) intermediary centric computing

The shortest path that exists in the figure is:

the shortest path through V1 in the shortest paths has 1V 2 → V1 → V3, while the total number of shortest paths from V2 to V3 is 2, so the mesocentrality of V1 is:

among the shortest paths passing through V2, there are 5, V1 → V2 → V5, V3 → V2 → V1, V3 → V2 → V5, V4 → V5 → V3 → V2 → V1, V5 → V3 → V2 → V1, and the total number of shortest paths from V3 to V5 is 2, so the intermediating centrality of V2 is:

among the shortest paths through V3, there are 5, V1 → V3 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V5 → V3 → V2 →V1, V5 → V3 → V2, so the mesocentrality of V3 is: b is^I(V3)＝1+1+1+1+1＝5

The shortest path through V4 in the shortest paths has 1, V3 → V4 → V5, while the total number of shortest paths from V3 to V5 is 2, so the intermediation centrality of V4 is:

among the shortest paths passing through V5, there are 5, V2 → V5 → V3, V2 → V5 → V4, V4 → V5 → V3 → V2 → V1, V4 → V5 → V3 → V2, V4 → V5 → V3, and the total number of shortest paths from V2 to V3 is 2, so the intermediating centrality of V5 is:

the value of the set of topological properties for each user u is D^I(u)，C^I(u),B^I(u) min-max normalization, clustering using K-means algorithm modified by K-means + + algorithm, in this example, selecting to divide the user into two clusters. Cluster 1(V1, V4) and cluster 2(V2, V3, V5) are obtained, and because the cluster 2 centers clustered in both the degree of entry and the mesocentrality are larger than the cluster 1 center, cluster 2(V2, V3, V5) is selected as the opinion leader candidate user set, and L ═ is (V2, V3, V5).

2. User leadership calculation

1) Calculation of user liveness

a) Frequency calculation of information released, forwarded and commented

As can be seen from the Table 1, T is the user V2_now20 pieces of issued information T, 2017-4-22_firstpublishCalculated as Δ T2017-4-1_p＝21，

Forwarding information 6 pieces, T_firstforwardCalculated as Δ T2017-4-2_f＝20，

Number of comment information 10, T_{firstevaluate}Calculated as Δ T2017-4-3_e＝19，

For user V3, T_now2017-4-22, the number of issued messages is 15, T_firstpublishCalculated as Δ T2017-3-20_p＝33，

Forwarding information 10 pieces, T_firstforwardCalculated as Δ T2017-4-5_f＝17，

Comment information 5 pieces, T_{firstevaluate}Calculated as Δ T2017-4-15_e＝7，

For user V5, T_now2017-4-22, 5 pieces of issued information, T_firstpublishCalculated as Δ T2017-3-1_p＝52，

Forwarding information 15 pieces, T_firstforwardCalculated as Δ T2017-4-10_f＝12，

Comment information 5 pieces, T_{firstevaluate}Calculated as Δ T2017-4-1_e＝21，

TABLE 1 user-related information Table

User' s	V1	V2	V3	V4	V5
						Number of issued messages	1	20	15	3	5
Time of first release of information	2017-3-1	2017-4-1	2017-3-20	2017-2-18	2017-3-1
						Number of messages to be forwarded	3	6	10	5	15
Time of first forwarding information	2017-3-9	2017-4-2	2017-4-5	2017-4-7	2017-4-10
						Number of comment information	3	10	5	4	5
Time of first comment on information	2017-3-3	2017-4-3	2017-4-15	2017-1-1	2017-4-1
						Obtaining information time	2017-4-22	2017-4-22	2017-4-22	2017-4-22	2017-4-22

b) Frequency standardization of publishing, forwarding and commenting information

The information frequency of release, information frequency of transmission and frequency information frequency are standardized by min-max to obtain the following table

c) User liveness calculation

And determining the weight of the frequency of the information to be released, forwarded and commented in the user activity by using an analytic hierarchy process. Constructing a decision matrix

Get the weight value of α₁＝0.636985，α₂＝0.2582850，α₃Calculated as 0.1047294:

UA(V2)＝0.636985×1+0.2582850×0+0.1047294×0.62＝0.70

UA(V3)＝0.636985×0.41+0.2582850×0.31+0.1047294×1＝0.45

UA(V5)＝0.636985×0+0.2582850×1+0.1047294×0＝0.26

2. calculation of user influence

According to the analysis of the content in the table 2, the following results are obtained:

table 2. user forwarding and comment information source table

User' s	Forwarding information source users	Comment information source user
			V1	V2:1，V3:2	V2:1，V3：2
V2	V5：3，V4：2，V1：1	V5：7，V1：2，V3：1
			V3	V2：4，V4：5，V5：1	V2：2，V4：3
V4	V5：3，V3：2	V5：4
			V5	V3：10，V4：5	V4：5

Note: v: n indicates that the source of n pieces of information is user V

a) Degree of attention of user

For user V2, the followers are user V1 and user V3:

wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V2 is:

wherein the followers of the V3 are the user V2 and the user V4, and therefore the degree of attention of the user V3 to the user V2 is:

for user V3, the followers are user V1 and user V5:

wherein the followers of the V1 are the user V2 and the user V3, and therefore the degree of attention of the user V1 to the user V3 is:

wherein the followers of the V5 are the user V3 and the user V4, and therefore the degree of attention of the user V5 to the user V3 is:

for user V5, the followers are users V2 and V4:

wherein the followers of the V2 are the user V1 and the user V5, and therefore the degree of attention of the user V2 to the user V5 is:

wherein the attendee of V4 is user V5, so the degree of attention of user V4 to user V5 is:

b) calculation of user information coverage

The analysis was performed according to the contents of table 3, table 4, table 5.

TABLE 3 information coverage table of UserV 2

Information number of user V2	Number of covered people
		2	3
4	2
		15	4
20	2
		26	1

Note: the number of other information covered is 0;

TABLE 4 information coverage table of USER V3

Information number of user V3	Number of covered people
		6	3
9	1
		25	4

Note: the number of other information covered is 0;

TABLE 5 information coverage table of UserV 5

Information number of user V5	Number of covered people
		1	2
8	2
		13	4
19	3
		20	1

Note: the number of other information covered is 0;

the information coverage of user V2 is calculated as:

the information coverage of user V3 is calculated as:

the information coverage of user V5 is calculated as:

C) calculation of user influence

And integrating the user attention and the user information coverage rate, and calculating to obtain the user influence.

In the user set V, V1,

thus initially

V2, V3, V5 ∈ L, initial UI (V2) ═ UI (V3) ═ UI (V5) ═ 1.

The followers of the user V2 are V1 and V3, so UI (V2) ═ CR (V2) + AD_(V1)(V2)×UI(V1)+AD_(V3)(V2)The followers of × UI (V3) to 0.09+0.17 × 0.6+0.31 × 1 to 0.502 user V3 are V1 and V5 because of the fact thatThis UI (V3) ═ CR (V3) + AD_(V1)(V3)×UI(V1)+AD_(V5)(V3)× UI (V5) ═ 0.06+0.33 × 0.6+0.38 × 1 ═ 0.638 followers of user V5 are V2 and V4, so UI (V5) ═ CR (V5) + AD_(V2)(V5)×UI(V2)+AD_(V4)(V5)×UI(V4)＝0.12+0.52×1+0.78×0.6＝1.108

3. Calculation of user leadership

And (3) integrating the user activity and the user influence, and calculating to obtain the user leadership:

ULD(V2)＝UI(V2)×UA(V2)＝0.70×0.502＝0.3514

ULD(V3)＝UI(V3)×UA(V3)＝0.45×0.638＝0.2871

ULD(V5)＝UI(V5)×UA(V5)＝0.26×1.108＝0.28808

the leadership of the users in L is sorted in descending order,

ULD(V2)＞ULD(V5)＞ULD(V3)

if an opinion leader needs to be mined, the user V2 is the opinion leader; if two opinion leaders need to be mined, user V2 and user V5 are opinion leaders.

In summary, the present invention provides a technique for mining opinion leaders of a social network from a clustering perspective based on topological attributes of the social network and by making full use of user attributes of users in the social network. As the prior opinion leader mining technology rarely considers that only part of users in a social network have the condition of becoming the opinion leader, aiming at the problem, the invention utilizes the topological attribute of the users in the social network to perform clustering, screens out the opinion leader candidate user set with the condition of becoming the opinion leader, analyzes the leadership of the opinion leader candidate user set, and excavates the opinion leader with liveness and influence.

Claims

1. A social network opinion leader mining method based on clustering is used for obtaining opinion leaders in social network users, and is characterized by comprising the following steps:

3) calculating the user activity and the user influence of the users in the opinion leader candidate user set L, and calculating the user leadership according to the user activity and the user influence, wherein the calculation formula of the user activity is as follows:

UA(u)＝α₁FP'(u)+α₂FF'(u)+α₃FE'(u)

α₁+α₂+α₃＝1

ΔT_p＝T_now-T_firstpublish

ΔT_f＝T_now-T_firstforward

ΔT_e＝T_now-T_{firstevaluate}

for user u at Δ T_pTotal of published information within time ofNumber, Δ T_fFor user u at the time T of acquiring data_nowTime T of earliest forwarding information_firstforwardThe interval between the two-dimensional structure and the three-dimensional structure,

for user u at Δ T_eTotal number of messages forwarded in time of α₁、α₂、α₃The weight value is distributed through hierarchical analysis;

2. The method as claimed in claim 1, wherein the user's degree of entry in step 1) is calculated by the following formula:

3. The method as claimed in claim 1, wherein the clustering coefficient of the entrance group in step 1) is calculated as:

4. The method as claimed in claim 1, wherein the calculation formula of the mediation center of the user in the step 1) is:

wherein, B^I(u) is the mediation centrality, σ, of user u_mn(u) is the number of shortest paths between user m and user n that pass through user u, σ_mnV is the total number of shortest paths between user m and user n, and is the set of users in the social network.

5. The method as claimed in claim 1, wherein in the step 2), the users in the clusters satisfying the conditions of maximum element income of cluster center, maximum clustering coefficient of income group and maximum medium centrality are selected to join the opinion leader candidate user set L.

6. The method as claimed in claim 5, wherein when there is no cluster satisfying three conditions of maximum element income, maximum clustering coefficient of income groups, and maximum medium centrality at the center of the cluster, selecting a cluster satisfying any two conditions at the same time, and adding the user in the candidate opinion leader set L.

7. The method as claimed in claim 1, wherein the user influence in step 3) is calculated by the following formula:

wherein UI (u) is user influence of user u, UI (v) is user influence of user v, AD_vuFor the attention of user V to user u, fans (u) is the set of followers of user u, | V | is the total number of users in the social network, d (V) is the total number of degrees of user V, cr (u) is the information coverage of user u, cri (u) is the information coverage of information i, pub (u) ∪ for (u) is the set of information issued or forwarded by user u, | for (u) is the total number of information forwarded by user u, | pub (u) is the total number of information issued by user u,_videfined as when user v forwards or comments information i of user u, and the information i covers user v, there is_viWhen the user v does not forward or comment the information i of the user u, and the information i does not cover the user v, then the user v has the information i_vi0, | for (v) | is the total number of forwarded others information of user v, | eva (v) | is the total number of users v commenting on others information, | for (v.source ═ u) | is the total number of forwarded users u information of user v, | eva (v.source ═ u) | is the total number of commenting on users u information of user v, | for (k) | is the total number of forwarded others information of user k, | pub (k) | is the total number of user k issued information, focus (v) is the total number of user v's forwarded others information, and "v | is the total number of user k issued informationA set of attendees.

8. The method as claimed in claim 7, wherein the user leadership is calculated in step 3) according to the following formula:

ULD(u)＝UI(u)×UA(u)

wherein ULD (u) is the user leadership of user u.

9. The clustering-based social network opinion leader mining method according to claim 6, wherein the step 4) specifically comprises the steps of: