CN110347933B - Ego network social circle recognition method - Google Patents

Ego network social circle recognition method Download PDF

Info

Publication number
CN110347933B
CN110347933B CN201910507062.4A CN201910507062A CN110347933B CN 110347933 B CN110347933 B CN 110347933B CN 201910507062 A CN201910507062 A CN 201910507062A CN 110347933 B CN110347933 B CN 110347933B
Authority
CN
China
Prior art keywords
similarity
social circle
edge
social
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910507062.4A
Other languages
Chinese (zh)
Other versions
CN110347933A (en
Inventor
王晨旭
郝崇孝
管晓宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910507062.4A priority Critical patent/CN110347933B/en
Publication of CN110347933A publication Critical patent/CN110347933A/en
Application granted granted Critical
Publication of CN110347933B publication Critical patent/CN110347933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An ego network social circle recognition method abstracts users into a node in ego network, and abstracts the relationship between the users into an edge in ego network to construct ego network; combining ego network structure information and node attribute information to model similarities between edges; initializing each edge as a social circle, and clustering by using an average-linking hierarchical clustering method according to the similarity between the social circles to obtain a dendrogram corresponding to a hierarchical clustering algorithm; selecting a proper threshold value to intercept the dendrogram so as to obtain a corresponding edge social circle, and converting the edge social circle into a node social circle; the method comprises the steps of extracting attribute features and structural features of the social circle, and then training a classifier to identify the type of the social circle.

Description

Ego network social circle recognition method
Technical Field
The invention belongs to the field of community division and relates to an ego network social circle identification method.
Background
The social circle identification can help users to group their friends into social circles, so that the obtained information can be effectively managed, and the method has important application in the fields of link prediction, friend recommendation and the like. The classical social circle recognition method mainly utilizes two types of information: network structure information and user attribute information. The existing methods can be divided into two categories: node-based methods and edge-based methods. The node-based approach partitions ego the nodes in the network into different circles, while the edge-based approach partitions ego the edges in the network into different circles.
McAuley method (refer to McAuley's method: J.Leskovec, J.J.McAuley, Learning to discover social circuits in ego networks, in: Advances in neural information processing systems,2012, pp.539-547.) proposes a social circle recognition method based on network structure and user attribute information. For each social circle, the algorithm learns its membership based on the similarity of the user attribute information. It associates node membership to circles to identify overlapping and hierarchically nested circles. However, this method cannot effectively utilize the structural information, which leaves a large room for improvement in recognition accuracy.
The Wang method (refer to Wang's method: M.Wang, W.Zuo, Y.Wang, An improved dense peaks-based clustering method for social circle discovery in social networks, neuro-clustering 179(2016)219- "227.) proposes An improved clustering method based on density peaks and applies it to the identification of overlapping social circles based on user attribute information and network topology. First, the method measures ego the local density of each node in the network using a Gaussian kernel function. Secondly, the distance from each node to the node with higher density is calculated. The circle center is then selected according to a decision graph, where the x-axis represents the local density and the y-axis represents the distance to the higher density node. The method recursively aggregates social circles and assigns each remaining user to a circle containing a higher density of nearest neighbors in each iteration. However, the performance of this approach depends to a large extent on the choice of initial cluster center.
An Ahn method (refer to Ahn's method: y. -y.ahn, j.p.bagrow, s.lehmann, Link communities temporal multiscale complex in networks, nature 466(7307) (2010)761.) proposes a hierarchical edge clustering method and establishes a tree based on edge similarity. They propose a density-based method to determine the appropriate cut threshold of the dendrogram. This approach is superior to node-based approaches in revealing social circle overlap structures. But the method ignores the attribute information of the nodes, and the accuracy of the identified circle is reduced.
Disclosure of Invention
To overcome the problems in the prior art, the invention aims to provide an ego network social circle identification method.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an ego network social circle identification method, comprising the following steps:
step 1: building ego network of users: abstracting ego a node in the network, and abstracting ego an edge in the network by the relationship between users;
step 2: combining ego network structure information and node attribute information to model similarities between edges;
and step 3: initializing each edge as a social circle, and then clustering by adopting a hierarchical clustering method according to the similarity between the social circles to obtain a dendrogram;
and 4, step 4: selecting a threshold value to intercept the dendrogram, obtaining an edge social circle, and converting the edge social circle into a node social circle;
and 5: and extracting attribute features and structural features of the node social circle, and then training a classifier to identify the category of the social circle.
A further development of the invention is that the ego network in step 1) is defined as follows:
in the ego network, a node consists of a central node ego and neighbor nodes of the node, the edges include the edge between the central node ego and the neighbor nodes, and the edge between the neighbor nodes.
The further improvement of the present invention is that in step 2), the similarity between the edges is judged by using a similarity S, which is defined as follows:
S=a×Ss+(1-a)×SA
wherein S issIs the structural similarity, SAIs the attribute similarity, a is the trade-off coefficient between the structural similarity and the attribute similarity; two sides eikAnd ejkStructural similarity of (S)sThe definition is as follows:
Figure BDA0002092179400000031
two sides eikAnd ejk(ii) attribute similarity SAThe definition is as follows:
Figure BDA0002092179400000032
where A (i) is the attribute set for node i, and A (j) is the attribute set for node j.
The further improvement of the invention is that the specific process of the step 3) is as follows: initially, each edge is considered as a separate edge social circle; then, when merging each time, selecting two social circles with the maximum similarity for merging; and repeating the merging process until the edges are merged into an edge social circle, so as to obtain the dendrogram corresponding to the hierarchical clustering algorithm.
The further improvement of the invention is that in the step 4), a threshold value intercepting tree-shaped graph is selected by adopting a method based on partition density, or a threshold value intercepting tree-shaped graph is selected by adopting a method based on similarity distribution.
The further improvement of the invention is that the specific process of selecting the threshold value to intercept the tree-shaped graph by adopting the method based on the partition density is as follows: calculating the partition density of the edge social circle partition corresponding to each level according to the tree graph, selecting the edge social circle corresponding to the maximum partition density value as a final result, and converting the edge social circle into a node social circle; wherein, partition density definition D is as follows:
Figure BDA0002092179400000033
wherein M represents the number of all edges, McRepresenting the number of edges in the social circle c, DcIs the edge density of social circle c, DcThe definition is as follows:
Figure BDA0002092179400000041
ncrepresenting the number of nodes of the social circle c.
The further improvement of the invention is that the specific process of selecting the threshold value to intercept the dendriform graph by adopting a method based on similarity distribution is as follows: first, the similarity is divided into N intervals having equal lengths, and the number of similarity values falling in each interval is calculated, and then the empirical distribution of the similarity is fitted with a polynomial using the similarity values as the horizontal x-axis and the number of similarity values as the vertical y-axis:
Figure BDA0002092179400000042
wherein n is more than or equal to 3,
Figure BDA0002092179400000043
is the parameter to be estimated, the center of the interval being x1,x2,…,xNAnd the counts in the intervals are y respectively1,y2,…,yN(ii) a For each n, minimizing the sum of the squared residuals between the true and predicted values using a polynomial:
Figure BDA0002092179400000044
then, evaluating the fitting polynomial by using Mean Square Error (MSE), and selecting the polynomial with the minimum MSE and the order of n as a fitting curve; the mean square error MSE is defined as follows:
Figure BDA0002092179400000045
wherein the content of the first and second substances,
Figure BDA0002092179400000046
is the predicted value of the i-th example, yiIs the corresponding true value; finally, the fitted curve is calculated by solving the following equation
Figure BDA0002092179400000047
The inflection point of (2):
Figure BDA0002092179400000048
the threshold is set to the minimum x that satisfies the above equation.
The further improvement of the invention is that the specific process of the step 5) is as follows: first, manually label the class l of social circles with Facebook and Google + datasetscThen, for each social circle, extracting the network structure characteristics and the attribute characteristics thereof and forming a characteristic vector
Figure BDA0002092179400000049
p is the number of features, xiIs the value of the ith feature; each training instance includes a feature vector
Figure BDA0002092179400000051
And the corresponding type of circle lc(ii) a And then classifying the social circles obtained in the step 4) by adopting a classification method.
Compared with the prior art, the invention has the following beneficial effects: compared with the J.McAuley method, the Wang method and the Ahn method, the invention provides a method for calculating the similarity between edges with common nodes, and the structural similarity and the attribute similarity are combined, so that a more accurate social circle is obtained; the invention trains a method based on structural features and attribute features to identify the categories of the obtained social circles, and helps users to manage the social circles more effectively.
Furthermore, the invention adopts a method based on partition density to select the threshold value interception tree-like graph or adopts a method based on similarity distribution to select the threshold value interception tree-like graph, and the two methods naturally disclose the hierarchy structure of the social circle, provide a feasible way for users, and can gather the social circle at different levels.
Drawings
Fig. 1 is an example of structural similarity.
FIG. 2 is an example of zone density.
Fig. 3 is an example of a similarity distribution and its fitted curve.
Fig. 4(a) is an example of a social circle corresponding to a threshold method based on partition density.
Fig. 4(b) is an example of a social circle corresponding to a threshold method based on similarity distribution.
Fig. 5 shows the classification accuracy of four different classification methods.
Fig. 6(a) is a comparison of the method of the invention with NMI assessment methods on Facebook, Twitter, Google + datasets using j.mcauley, Wang, Ahn methods.
Fig. 6(b) is a comparison of the method of the invention with the average F1-score evaluation method on Facebook, Twitter, Google + datasets for the j.mcauley method, Wang method, Ahn method.
Fig. 6(c) is a comparison of the method of the invention with the EQ assessment method on Facebook, Twitter, Google + datasets using j.mcauley, Wang, Ahn methods.
Fig. 6(d) is a comparison of the method of the invention with the Omega Index evaluation method of j.mcauley, Wang, Ahn methods on Facebook, Twitter, Google + data sets.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The invention provides an ego network social circle recognition method, which comprises the steps of firstly constructing a ego network of a user, then modeling similarity of user relations by combining structure and attribute information, then clustering initialized social circles by using a clustering algorithm, and finally extracting characteristics of the social circles to classify the social circles, so that recognition of the social circles is realized, and the method specifically comprises the following steps:
1) building ego network of users: abstracting ego a node in the network, and abstracting ego an edge in the network by the relationship between users;
ego the definition of the network is as follows:
in the ego network, a node is composed of a central node (ego) and neighbor nodes (peers) of the node, and edges include edges between the central node ego and the neighbor nodes and edges between the neighbor nodes and the neighbor nodes.
2) Combining ego network structure information and node attribute information to model similarities between edges;
in real life, the nodes of the ego network correspond to users and the edges correspond to relationships between users. The users may have relations of classmates, colleagues, family members, etc., and the same relations constitute a social circle. The degree of similarity between these relationships can be used to divide the circles. Because the similarity between relationships with common users is higher, the similarity between relationships is modeled by the similarity S between edges with common nodes, and the similarity S is defined as follows:
S=a×Ss+(1-a)×SA
wherein S issIs the structural similarity, SAIs the attribute similarity, and a is the trade-off coefficient between the structural similarity and the attribute similarity. Two sides eikAnd ejkStructural similarity of (S)sThe definition is as follows:
Figure BDA0002092179400000061
two sides eikAnd ejk(ii) attribute similarity SAThe definition is as follows:
Figure BDA0002092179400000071
where A (i) is the attribute set for node i, and A (j) is the attribute set for node j.
Fig. 1 shows an example of calculating structural similarity. Edge e(4,2)And e(3,2)The structural similarity of (a) is: | n+(4)∪n+(3)|=7,|n+(4)∩n+(3) 2, so Ss=2/7。
Table 1 shows the set of attributes for nodes 2,3, 4. Edge e(4,2)And e(3,2)The attribute similarity of (2) is: sA=|A(2,4)∩A(2,3)|/|A(2,4)∪A(2,3)|=0.8。
TABLE 1 Attribute set for nodes 2,3,4
Figure BDA0002092179400000072
3) Initializing each edge as a social circle, and clustering by using an average-linking hierarchical clustering method according to the similarity between the social circles to obtain a dendrogram corresponding to a hierarchical clustering algorithm;
initially, each edge is considered as a separate edge social circle; then, when merging, selecting two social circles with the maximum similarity for merging; and repeating the merging process until the edges are merged into an edge social circle, so as to obtain the dendrogram corresponding to the hierarchical clustering algorithm.
4) Selecting a proper threshold value to intercept the dendrogram corresponding to the hierarchical clustering algorithm so as to obtain an edge social circle corresponding to the hierarchical clustering algorithm, and converting the edge social circle into a node social circle;
the method for intercepting the tree graph by selecting a proper threshold value is divided into two methods: a partition density based method and a similarity distribution based method. The method for intercepting the dendrogram based on the selection threshold value of the partition density comprises the following specific processes: and calculating the partition density of the edge social circle partition corresponding to each level according to the tree graph, selecting the edge social circle corresponding to the maximum partition density value as a final result, and converting the edge social circle into a node social circle. Partition density definition D is as follows:
Figure BDA0002092179400000081
wherein M represents the number of all edges, McRepresenting the number of edges in the social circle c, DcIs the edge density of social circle c, DcThe definition is as follows:
Figure BDA0002092179400000082
ncrepresenting the number of nodes of the social circle c.
The method for intercepting the dendrogram by selecting the threshold based on the similarity distribution comprises the following specific processes: first, the similarity is divided into N intervals having equal lengths, and the number of similarity values falling in each interval is calculated. The similarity values are then taken as the horizontal x-axis and the number of similarity values as the vertical y-axis. Then fit its empirical distribution with a polynomial:
Figure BDA0002092179400000083
wherein n is more than or equal to 3,
Figure BDA0002092179400000084
is the parameter to be estimated, the center of the interval being x1,x2,…,xNAnd the counts in the intervals are y respectively1,y2,…,yN. For each n, a polynomial is used to minimize the sum of the squared residuals between the true and predicted values:
Figure BDA0002092179400000085
the fitted polynomial is then evaluated using the Mean Square Error (MSE), and the polynomial of order n with the smallest MSE is selected as the fitted curve. The Mean Square Error (MSE) is defined as follows:
Figure BDA0002092179400000086
wherein the content of the first and second substances,
Figure BDA0002092179400000087
is the predicted value of the i-th example, yiIs the corresponding true value. Finally, a fitted curve is calculated by solving the following equation
Figure BDA0002092179400000088
The inflection point of (2):
Figure BDA0002092179400000089
the threshold is set to the minimum x that satisfies the above equation.
FIG. 2 shows an example of partition density. Fig. 3 shows an example of a similarity distribution and its fitted curve. Fig. 4(a) and 4(b) show examples of social circles corresponding to two threshold methods.
5) Extracting attribute features and structural features of the node social circle, and then training a classifier to identify the type of the social circle;
first, the category l of the social circle is manually tagged with Facebook and Google + datasetsc. Then, for each circle, extracting its network structure feature and attribute feature and forming a feature vector
Figure BDA0002092179400000091
p is the number of features, xiIs the value of the ith feature, xpIs the value of the p-th feature. Each training instance includes a feature vector
Figure BDA0002092179400000092
And the corresponding type of circle lc. The invention then classifies the social circles obtained in step 4) with four different classification methods. The four methods include: naive Bayes, SVM (support vector machine), AdaBoost algorithm and GradientBoosting algorithm.
Table 2 shows the network structure features and attribute features of the social circles extracted by the present invention. Fig. 4 shows the social circle for two threshold methods. Fig. 5 shows the classification accuracy of four different classification methods.
TABLE 2 network Structure and Attribute features of social circles
Figure BDA0002092179400000093
The invention has the beneficial effects that:
compared with the existing classical social circle recognition method, the invention provides a method for calculating the similarity between edges with common nodes, and the structural similarity and the attribute similarity are combined, so that a more accurate social circle is obtained; the invention provides two methods for calculating the correct cutting threshold of the dendrogram, and the two strategies naturally disclose the hierarchical structure of the social circle, so that a feasible way is provided for a user, and the social circle can be gathered at different levels; the invention trains a method based on structural features and attribute features to identify the categories of the obtained social circles, and helps users to manage the social circles more effectively.
The algorithm of the present invention was verified by using experimental data. Experimental data three different data sets were selected: facebook dataset, Twitter dataset, Google + dataset. A specific summary of the data set is given in table 3:
table 3 summary of the data sets
Figure BDA0002092179400000101
When the accuracy rate result of the social circle recognition is quantitatively compared, the method adopts four different evaluation indexes: NMI, average F1-score, Omega Index and EQ.
NMI is an important evaluation index in community division, and is defined as follows:
Figure BDA0002092179400000102
wherein, C*Is the social circle that is identified,
Figure BDA0002092179400000103
is the group-truth social circle.
Figure BDA0002092179400000104
Is C*And
Figure BDA0002092179400000105
mutual information between, defined as:
Figure BDA0002092179400000106
H(C*) Is C*The information entropy of (2):
Figure BDA0002092179400000107
wherein the content of the first and second substances,
Figure BDA0002092179400000108
is a ring
Figure BDA0002092179400000109
N represents the total number of nodes.
Figure BDA00020921794000001010
Is defined as:
Figure BDA00020921794000001011
wherein the content of the first and second substances,
Figure BDA00020921794000001012
wherein the content of the first and second substances,
Figure BDA00020921794000001013
is defined as:
Figure BDA00020921794000001014
wherein the content of the first and second substances,
Figure BDA0002092179400000111
Figure BDA0002092179400000112
the average F1-score is defined as follows:
Figure BDA0002092179400000113
wherein g and g' are defined as:
Figure BDA0002092179400000114
wherein the content of the first and second substances,
Figure BDA0002092179400000115
is defined as:
Figure BDA0002092179400000116
wherein the content of the first and second substances,
Figure BDA0002092179400000117
the Omega Index is used to measure the accuracy of the number of social circles shared by each pair of nodes, and is defined as follows:
Figure BDA0002092179400000118
wherein, CuvIs a set of identities shared by nodes u and v,
Figure BDA0002092179400000119
is a group-treth social circle subset shared by nodes u and v.
EQ is used to measure the quality of a community and is defined as follows:
Figure BDA00020921794000001110
where m is the number of edges in ego network, Au,vIs the weight of the edge, if nodes u and v are connected, then Au,vEqual to 1, otherwise 0. k is a radical ofuAnd kvIs the degree of nodes u and v. O isuIs the number of social circles to which node u belongs.
Fig. 6(a), 6(b), 6(c) and 6(d) show the results of comparing the method of the invention (Slop) with different evaluation methods on Facebook, Twitter, Google + datasets by the j. The results of the comparison show that the algorithm of the invention shows higher recognition accuracy on most evaluation methods than the other three algorithms.
The method abstracts users into ego nodes in the network, and abstracts the relationship between the users into ego edges in the network to construct ego network; combining ego network structure information and node attribute information to model similarities between edges; initializing each edge as a social circle, and clustering by using an average-linking hierarchical clustering method according to the similarity between the social circles to obtain a dendrogram corresponding to a hierarchical clustering algorithm; selecting a proper threshold value to intercept the dendrogram so as to obtain a corresponding edge social circle, and converting the edge social circle into a node social circle; the attribute features and the structural features of the social circle are extracted, and then a classifier is trained to identify the category of the social circle. The method is simple and accurate in social circle identification.

Claims (3)

1. An ego network social circle identification method is characterized by comprising the following steps:
step 1: building ego network of users: abstracting ego a node in the network, and abstracting ego an edge in the network by the relationship between users;
step 2: combining ego network structure information and node attribute information to model similarities between edges; the similarity between the edges is judged by adopting a similarity S, and the similarity S is defined as follows:
S=a×Ss+(1-a)×SA
wherein S issIs the structural similarity, SAIs the attribute similarity, a is the trade-off coefficient between the structural similarity and the attribute similarity;
two sides eikAnd ejkStructural similarity of (S)sThe definition is as follows:
Figure FDA0003433939700000011
two sides eikAnd ejk(ii) attribute similarity SAThe definition is as follows:
Figure FDA0003433939700000012
A(eij)=A(i)∪A(j)
wherein A (i) is the attribute set of node i, and A (j) is the attribute set of node j;
and step 3: initializing each edge as a social circle, and then clustering by adopting a hierarchical clustering method according to the similarity between the social circles to obtain a dendrogram;
and 4, step 4: selecting a threshold value to intercept the dendrogram, obtaining an edge social circle, and converting the edge social circle into a node social circle; selecting a threshold value intercepting tree-shaped graph by adopting a method based on partition density or selecting a threshold value intercepting tree-shaped graph by adopting a method based on similarity distribution; the specific process of selecting a threshold value to intercept the dendrogram by adopting a method based on partition density comprises the following steps: calculating the partition density of the edge social circle partition corresponding to each level according to the tree graph, selecting the edge social circle corresponding to the maximum partition density value as a final result, and converting the edge social circle into a node social circle; wherein, partition density definition D is as follows:
Figure FDA0003433939700000013
wherein M represents the number of all edges, McRepresenting the number of edges in the social circle c, DcIs the edge density of social circle c, DcThe definition is as follows:
Figure FDA0003433939700000021
ncnumber of nodes representing social circle c;
the specific process of selecting a threshold value to intercept the dendrogram by adopting a method based on similarity distribution comprises the following steps: first, the similarity is divided into N intervals having equal lengths, and the number of similarity values falling in each interval is calculated, and then the empirical distribution of the similarity is fitted with a polynomial using the similarity values as the horizontal x-axis and the number of similarity values as the vertical y-axis:
Figure FDA0003433939700000022
wherein n is more than or equal to 3,
Figure FDA0003433939700000023
is the parameter to be estimated, the center of the interval being x1,x2,...,xNAnd the counts in the intervals are y respectively1,y2,...,yN(ii) a For each n, minimizing the sum of the squared residuals between the true and predicted values using a polynomial:
Figure FDA0003433939700000024
then, evaluating the fitting polynomial by using Mean Square Error (MSE), and selecting the polynomial with the minimum MSE and the order of n as a fitting curve; the mean square error MSE is defined as follows:
Figure FDA0003433939700000025
wherein the content of the first and second substances,
Figure FDA0003433939700000026
is the predicted value of the i-th example, yiIs the corresponding true value; finally, the fitted curve is calculated by solving the following equation
Figure FDA0003433939700000027
The inflection point of (2):
Figure FDA0003433939700000028
setting the threshold to a minimum x that satisfies the above equation;
and 5: extracting attribute features and structural features of the node social circle, and then training a classifier to identify the type of the social circle;
first, manually label the class l of social circles with Facebook and Google + datasetscThen, for each social circle, extracting the network structure characteristics and the attribute characteristics thereof and forming a characteristic vector
Figure FDA0003433939700000031
p is the number of features, xiIs the value of the ith feature; each training instance includes a feature vector
Figure FDA0003433939700000032
And the corresponding type of circle lc(ii) a Then classifying the obtained social circles by adopting a classification method; the classification method comprises naive Bayes, a support vector machine, an AdaBoost algorithm and a GradientBoosting algorithm.
2. The ego social networking circle recognition method of claim 1, wherein: the ego network in step 1) is defined as follows:
in the ego network, a node consists of a central node ego and neighbor nodes of the node, the edges include the edge between the central node ego and the neighbor nodes, and the edge between the neighbor nodes.
3. The ego social networking circle recognition method of claim 1, wherein: the specific process of the step 3) is as follows: initially, each edge is considered as a separate edge social circle; then, when merging each time, selecting two social circles with the maximum similarity for merging; and repeating the merging process until the edges are merged into an edge social circle, so as to obtain the dendrogram corresponding to the hierarchical clustering algorithm.
CN201910507062.4A 2019-06-12 2019-06-12 Ego network social circle recognition method Active CN110347933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910507062.4A CN110347933B (en) 2019-06-12 2019-06-12 Ego network social circle recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910507062.4A CN110347933B (en) 2019-06-12 2019-06-12 Ego network social circle recognition method

Publications (2)

Publication Number Publication Date
CN110347933A CN110347933A (en) 2019-10-18
CN110347933B true CN110347933B (en) 2022-04-22

Family

ID=68181854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910507062.4A Active CN110347933B (en) 2019-06-12 2019-06-12 Ego network social circle recognition method

Country Status (1)

Country Link
CN (1) CN110347933B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343114B (en) * 2021-07-05 2022-10-28 云南大学 Multi-feature fusion social network friend recommendation method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9749406B1 (en) * 2013-03-13 2017-08-29 Hrl Laboratories, Llc System and methods for automated community discovery in networks with multiple relational types

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153713B (en) * 2017-05-27 2018-02-23 合肥工业大学 Overlapping community detection method and system based on similitude between node in social networks
CN108920678A (en) * 2018-07-10 2018-11-30 福州大学 A kind of overlapping community discovery method based on spectral clustering with fuzzy set
CN109117875A (en) * 2018-07-26 2019-01-01 福州大学 A kind of overlapping community discovery method based on side Density Clustering
CN109344326B (en) * 2018-09-11 2021-09-24 创新先进技术有限公司 Social circle mining method and device
CN109446836B (en) * 2018-10-09 2022-02-15 上海交通大学 Social network personal information propagation access control method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9749406B1 (en) * 2013-03-13 2017-08-29 Hrl Laboratories, Llc System and methods for automated community discovery in networks with multiple relational types

Also Published As

Publication number Publication date
CN110347933A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
Jadhav et al. Comparative study of K-NN, naive Bayes and decision tree classification techniques
Mukhopadhyay et al. A survey of multiobjective evolutionary clustering
Ayed et al. Survey on clustering methods: Towards fuzzy clustering for big data
Kesavaraj et al. A study on classification techniques in data mining
CN110460605B (en) Abnormal network flow detection method based on automatic coding
Effendy et al. Classification of intrusion detection system (IDS) based on computer network
Chander et al. Outlier detection strategies for WSNs: A survey
CN112906770A (en) Cross-modal fusion-based deep clustering method and system
Satyanarayana et al. Survey of classification techniques in data mining
CN112767186B (en) Social network link prediction method based on 7-subgraph topological structure
CN107292097A (en) The feature selection approach of feature based group and traditional Chinese medical science primary symptom system of selection
CN111985623A (en) Attribute graph group discovery method based on maximized mutual information and graph neural network
Rasyid et al. Review on clustering algorithms based on data type: towards the method for data combined of numeric-fuzzy linguistics
CN110347933B (en) Ego network social circle recognition method
CN115114484A (en) Abnormal event detection method and device, computer equipment and storage medium
Yu et al. A novel three-way clustering algorithm for mixed-type data
De Silva et al. Recursive hierarchical clustering algorithm
KR100869554B1 (en) Domain density description based incremental pattern classification method
Yu et al. A hybrid incremental regression neural network for uncertain data streams
CN117009613A (en) Picture data classification method, system, device and medium
Qu et al. A robust fuzzy time series forecasting method based on multi‐partition and outlier detection
Bahrbegi et al. A new system to evaluate GA-based clustering algorithms in Intrusion Detection alert management system
Helal et al. Leader‐based community detection algorithm for social networks
Devi et al. Community Detection by Node Betweenness Using Optimized Girvan-Newman Cuckoo Search Algorithm
Mehrotra et al. Data clustering and various clustering approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant