CN110163288A

CN110163288A - A kind of social network user group classification method captured based on network node extensiveness and intensiveness

Info

Publication number: CN110163288A
Application number: CN201910441152.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Zhongsen Yunchain (chengdu) Technology Co Ltd
Current assignee: Zhongsen Yunchain (chengdu) Technology Co Ltd
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2019-08-23

Abstract

The invention discloses a kind of social network user group classification methods captured based on network node extensiveness and intensiveness, for global structure feature Preserving problems present in network representation study, by learning the extensiveness and intensiveness feature of node, the learning ability in network representation study to node global structure feature can be greatly improved.The present invention captures the neighbor node similitude in network using deep learning method, secondly the depth and broadness information of node is obtained in the way of the method for adjacency matrix power and degree, then the depth and broadness similitude between node is measured by Laplacian Eigenmap method used in manifold learning.The classification of social network user group is finally carried out with it.

Description

A kind of social network user group classification captured based on network node extensiveness and intensiveness Method

Technical field

The invention belongs to network representation learning methods, are a kind of network representations for considering network node global structure feature Learning method.

Background technique

It is defined according to wikipedia, network (network) is used to indicate symmetrical or asymmetric between discrete object Incidence relation.In computer science, network can usually be expressed as the figure (graph) comprising node and side.Network knot The data of structure can naturally be used to indicate that the relationship between different objects, miscellaneous network structure are non-in daily life It is often universal.For example, interpersonal concern, friend relation may be constructed typical social networks in social media platform； Adduction relationship between paper and paper can constitute academic citation network；Hyperlink relation between Web page also constitutes mutually Web page interlinkage network in networking.

With the development of internet, large-scale social media platform continues to bring out, more representational social media Platform includes domestic Sina weibo, wechat, knows, external Facebook, Twitter, Instagram, Linkedin etc.. These social media platforms have attracted the user of magnanimity.Concern, friend relation shape in these platforms, between user and user At typical social networks.Compared with traditional network, these extensive social networks include following feature:

Social networks is larger compared with traditional network, and more sparse.It is counted according to data statistics website Statista, Moon any active ues by the end of in January, 2018, the maximum social media platform Facebook in the whole world reach 21.67 hundred million, and Chinese Maximum social platform wechat, the moon any active ues also reached 9.8 hundred million.These social networks include the same of mass users node When, also become more sparse, most of user node often only has limited tens and arrives several hundred a neighbor nodes.It is extensive and dilute The characteristics of dredging property causes huge challenge for the network analysis towards these social networks, social computing task.

In extensive social networks, other than the network structure between user and user, there is also user's rows abundant For information.For example, the content information of the types such as text, picture, video that user issues or forwarded in these platforms, user is certainly The personal information such as introduction, the label of body, user is to the thumbing up of other contents, sharing information etc..The Heterogeneous Information of these magnanimity The important informations such as hobby, the personal attribute of user are able to reflect out, there is weight for the application service towards social media The value wanted.

It is very rich for the application scenarios of these extensive social medias.For example, being directed to social media user, use can use Family behavioural information etc. carries out user's portrait to it, judges the attribute informations such as gender, age, the occupation of user and theirs is emerging Interest hobby；Based on user portrait as a result, can to user carry out personalized recommendation, come recommend they can knowable good friend or Interested news, product etc..

Research for above-mentioned extensive social networks becomes in recent years with application and calculates social science, artificial intelligence technology Popular research field.How network analysis task efficiently to be carried out on these extensive social networks, such as node-classification, poly- Class, link prediction, community discovery etc., are always the Research foundation and emphasis in the field.In order to carry out corresponding network analysis Task, sixty-four dollar question are how using the structural information in network, Heterogeneous Information, to have to the node in network The character representation of effect, that is, how to carry out network representation.The quality of network representation, for carrying out subsequent network analysis task It is most important.

It is most important always for the character representation of network node in data mining and social network analysis.With big The problem of appearance of scale community network, traditional network representation method is faced with computational efficiency and interpretation.In addition, this A little community networks often contain Heterogeneous Information abundant, these features prevent existing network representation method from locating well Manage these large scale community networks.Network representation learns (Network Representation Learning), that is, network It is embedded in (Network Embedding), it is therefore an objective to which the vector for learning a low-dimensional real value for the node in network indicates.Each section Point is corresponding indicate vector contained the node network structure information and other Heterogeneous Informations, these indicate the general quilts of vector As feature vector, to carry out further network analysis task, such as node-classification, link prediction, community discovery etc..

Summary of the invention

It is an object of the invention to cope with every problem in above-mentioned network analysis task, provide a kind of based on network node The network representation learning method that extensiveness and intensiveness captures, the present invention is similar similar to range information using node depth information, knot The local message for closing node is similar, and node is mapped to the feature space of more low-dimensional by way of insertion.Finally utilize insertion Node afterwards indicates, can provide help for network analysis task.

To achieve the purpose of the present invention, the invention proposes a kind of net lists captured based on network node extensiveness and intensiveness Dendrography learning method, interior joint depth information is similar similar from range information to be obtained by two full articulamentums of different neural networks , further according to the local neighbor information of node, nodal information is merged, the present invention the following steps are included:

Step 1: acquiring network data from internet and pre-processed, is stored in local file；

Step 2: constructing adjacency matrix A based on data；

Step 3: one-hot coding is carried out to nodes all in A；

Step 4: node is embedded in deep space and range space respectively；

Step 5: n times power operation being carried out to A, using it as the measurement standard of node depth；

Step 6: for the information of node statistics degree each in A, using it as the measurement standard of node range；

Step 7: capturing the depth similitude between node by laplacian eigenmaps, and embed it in deep space In；

Step 8: capturing the range similitude between node by laplacian eigenmaps, and embed it in range space In；

Step 9: the two spaces of node being embedded in the input spliced and as final embedded space, are caught by negative sampling Catch the similitude between node.

Step 10: using network as the incorporation model of node, and being used for node-classification task.

The data that the step 1 acquires include at least unique ID of network node, the link information between node

The dimension of adjacency matrix A in the step 2 is N*N, and N is number of nodes, and A [i, j] represents node i, is between j No to there is link, A [i, j]=1 item exists, otherwise without.

Node one-hot coding dimension in the step 3 is equal to nodes quantity.

The step 5 is to will abut against matrix A to carry out seeking k power, is saved in the k power of adjacency matrix for 1 element representative The k of point is walked up to neighbours.

The step 7 is to capture node depth similitude using laplacian eigenmaps, and calculation is as follows:

l_m, l_nIndicate the depth of arbitrary node in network, min | l_m-l_n| indicate the lowest difference of nodes depth, max |l_m-l_n| indicate that the maximum of nodes depth is poor.

The step 8 captures node depth similitude using laplacian eigenmaps, and calculation is as follows:

The step 9 captures the local similarity of node using the negative method of sampling, specifically, utilizes single order and second order Approximation；

First approximation refers to the node of direct neighbor, and low-dimensional expression should be close, i.e. 1-hop neighbours；

Two-order approximation refers to the node with common neighbours, and low-dimensional expression should also be as close, i.e. 2-hop neighbours；

It is required that non-neighbours' node table shows that mutually the method for far utilizing sampling chooses non-neighbor node pair, referred to as negative sampling, to every To neighbor node, a small number of (K to) non-neighbor node is chosen as negative sample；

| V | it is node total number,Indicate single order neighbours' number of node i,Indicate the second order neighbours of node i Number,Indicate the degree of node v

The quality of the step 10 interior joint classification task effect is indicated with Micro-F1 and Macro-F1；

Micro-F1: all categories total Precision and Recall are calculated, F1 is then calculated

Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally puts down F1

Detailed description of the invention

Fig. 1 is item recommendation method flow chart of the invention.

Specific embodiment

Step 2: constructing adjacency matrix A based on data；

Step 3: one-hot coding is carried out to nodes all in A；

Step 4: node is embedded in deep space and range space respectively；

Node one-hot coding dimension in the step 3 is equal to nodes quantity.

It is required that non-neighbours' node table shows mutually far, non-neighbor node pair is chosen using the method for sampling, referred to as negative sampling, also It is that a small number of (K to) non-neighbor node is chosen as negative sample to each pair of neighbor node；

Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally F1 is averaged

Specifically, the working principle of the system related functions module of the embodiment of the present invention can be found in the correlation of embodiment of the method Description, which is not described herein again.

Using the implementation method in the embodiment of the present invention, have the beneficial effect that: (1) respectively study arrived the depth of node with Range information (2) preferably learns the expression for having arrived network node by fusion extensiveness and intensiveness and local message.

In addition, implementing a kind of provided network representation captured based on network node extensiveness and intensiveness to the present invention above Learning method is described in detail, and principle and implementation of the present invention are described herein, the explanation of the above implementation It is merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to this The thought of invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered It is interpreted as limitation of the present invention.

Claims

1. a kind of social network user group classification method captured based on network node extensiveness and intensiveness, it is characterised in that including Following steps:

Step 2: adjacency matrix A being constructed based on network node, wherein A is | V | row | V | the sparse matrix of column, wherein V is in network Node set, | V | for the node total number in network, A [i, j] represent between node i and node j with the presence or absence of link, A [i, J]=1 presence link, otherwise without link；

Step 3: one-hot coding is carried out to nodes all in A；

Step 4: node is embedded in deep space and range space respectively；

Step 5: n times power operation is carried out to A, N is used as hyper parameter, specifies by hand, as the node reach distance farthest considered, with The measurement standard as node depth；

Step 7: capturing the depth similitude between node by laplacian eigenmaps, and embed it in deep space；

Step 8: capturing the range similitude between node by laplacian eigenmaps, and embed it in range space；

Step 9: the two spaces of node being embedded in the input spliced and as final embedded space, section is captured by negative sampling Similitude between point；

Step 10: the model that the neural network learnt is embedded in as social networks node exports social networks node, that is, society The low-dimensional expression of user in network is handed over, and for groups of users classification task in social networks.

2. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the data that the step 1 acquires include at least unique ID of network node, the link information between node.

3. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the node one-hot coding dimension in the step 3 is equal to nodes quantity.

4. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the step 5 is to will abut against matrix A to carry out seeking k power, and the element for being 1 in the k power of adjacency matrix represents node K is walked up to neighbours.

5. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature It is: captures node depth similitude using the laplacian eigenmaps in step 7, calculation is as follows:

Wherein l_m, l_nIndicate the depth of arbitrary node in network, | V | indicate the node total number in network,It indicates The sum of node single order neighbours and second order neighbours' quantity,d_vIndicate the degree of node v, f^(depth)Indicate the depth of node Degree mapping, min | l_m-l_n| indicate the lowest difference of nodes depth, max | l_m-l_n| indicate the maximum of nodes depth Difference.

6. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature It is: captures node depth similitude using the laplacian eigenmaps in step 8, calculation is as follows:

Wherein, f^(breadth)Indicate the range mapping of node.

7. a kind of social network user group classification captured based on network node extensiveness and intensiveness according to claim 1 Method, it is characterised in that: the negative method of sampling in step 9 captures the local similarity of node, utilizes the first approximation of node Property with Two-order approximation, i.e. the 1-hop and 2-hop neighbours of node capture the relationship between node and its neighbour, single order neighbours, choosing Take destination node v in adjacency matrix_iIt is 1 element in being expert at, corresponding column k is node v_iSingle order neighbor node v_k, Second order neighbours choose destination node v in 2 power of adjacency matrix_iIt is 1 element in being expert at, corresponding column k is node The second order neighbor node of i.

It is required that non-neighbours' nodal distance is mutually remote, using negative sampling, each pair of neighbor node is chosen a small number of (K to) non-neighbor node and is made It is each pair of neighbor node (v for negative sample_i,v_j), stationary nodes v_i, according to node degree 0.75 power from all nodes (in addition to v_j) in carry out stochastical sampling.

| V | it is node total number,Indicate single order neighbours' number of node i,Indicate second order neighbours' number of node i,d_vIndicate the degree of node v.

8. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the quality of step 10 interior joint classification task effect indicates that F1 calculation formula is as follows with Micro-F1 and Macro-F1:

Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally F1 is averaged.