CN110163288A - A kind of social network user group classification method captured based on network node extensiveness and intensiveness - Google Patents

A kind of social network user group classification method captured based on network node extensiveness and intensiveness Download PDF

Info

Publication number
CN110163288A
CN110163288A CN201910441152.8A CN201910441152A CN110163288A CN 110163288 A CN110163288 A CN 110163288A CN 201910441152 A CN201910441152 A CN 201910441152A CN 110163288 A CN110163288 A CN 110163288A
Authority
CN
China
Prior art keywords
node
network
extensiveness
intensiveness
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910441152.8A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongsen Yunchain (chengdu) Technology Co Ltd
Original Assignee
Zhongsen Yunchain (chengdu) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongsen Yunchain (chengdu) Technology Co Ltd filed Critical Zhongsen Yunchain (chengdu) Technology Co Ltd
Priority to CN201910441152.8A priority Critical patent/CN110163288A/en
Publication of CN110163288A publication Critical patent/CN110163288A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of social network user group classification methods captured based on network node extensiveness and intensiveness, for global structure feature Preserving problems present in network representation study, by learning the extensiveness and intensiveness feature of node, the learning ability in network representation study to node global structure feature can be greatly improved.The present invention captures the neighbor node similitude in network using deep learning method, secondly the depth and broadness information of node is obtained in the way of the method for adjacency matrix power and degree, then the depth and broadness similitude between node is measured by Laplacian Eigenmap method used in manifold learning.The classification of social network user group is finally carried out with it.

Description

A kind of social network user group classification captured based on network node extensiveness and intensiveness Method
Technical field
The invention belongs to network representation learning methods, are a kind of network representations for considering network node global structure feature Learning method.
Background technique
It is defined according to wikipedia, network (network) is used to indicate symmetrical or asymmetric between discrete object Incidence relation.In computer science, network can usually be expressed as the figure (graph) comprising node and side.Network knot The data of structure can naturally be used to indicate that the relationship between different objects, miscellaneous network structure are non-in daily life It is often universal.For example, interpersonal concern, friend relation may be constructed typical social networks in social media platform; Adduction relationship between paper and paper can constitute academic citation network;Hyperlink relation between Web page also constitutes mutually Web page interlinkage network in networking.
With the development of internet, large-scale social media platform continues to bring out, more representational social media Platform includes domestic Sina weibo, wechat, knows, external Facebook, Twitter, Instagram, Linkedin etc.. These social media platforms have attracted the user of magnanimity.Concern, friend relation shape in these platforms, between user and user At typical social networks.Compared with traditional network, these extensive social networks include following feature:
Social networks is larger compared with traditional network, and more sparse.It is counted according to data statistics website Statista, Moon any active ues by the end of in January, 2018, the maximum social media platform Facebook in the whole world reach 21.67 hundred million, and Chinese Maximum social platform wechat, the moon any active ues also reached 9.8 hundred million.These social networks include the same of mass users node When, also become more sparse, most of user node often only has limited tens and arrives several hundred a neighbor nodes.It is extensive and dilute The characteristics of dredging property causes huge challenge for the network analysis towards these social networks, social computing task.
In extensive social networks, other than the network structure between user and user, there is also user's rows abundant For information.For example, the content information of the types such as text, picture, video that user issues or forwarded in these platforms, user is certainly The personal information such as introduction, the label of body, user is to the thumbing up of other contents, sharing information etc..The Heterogeneous Information of these magnanimity The important informations such as hobby, the personal attribute of user are able to reflect out, there is weight for the application service towards social media The value wanted.
It is very rich for the application scenarios of these extensive social medias.For example, being directed to social media user, use can use Family behavioural information etc. carries out user's portrait to it, judges the attribute informations such as gender, age, the occupation of user and theirs is emerging Interest hobby;Based on user portrait as a result, can to user carry out personalized recommendation, come recommend they can knowable good friend or Interested news, product etc..
Research for above-mentioned extensive social networks becomes in recent years with application and calculates social science, artificial intelligence technology Popular research field.How network analysis task efficiently to be carried out on these extensive social networks, such as node-classification, poly- Class, link prediction, community discovery etc., are always the Research foundation and emphasis in the field.In order to carry out corresponding network analysis Task, sixty-four dollar question are how using the structural information in network, Heterogeneous Information, to have to the node in network The character representation of effect, that is, how to carry out network representation.The quality of network representation, for carrying out subsequent network analysis task It is most important.
It is most important always for the character representation of network node in data mining and social network analysis.With big The problem of appearance of scale community network, traditional network representation method is faced with computational efficiency and interpretation.In addition, this A little community networks often contain Heterogeneous Information abundant, these features prevent existing network representation method from locating well Manage these large scale community networks.Network representation learns (Network Representation Learning), that is, network It is embedded in (Network Embedding), it is therefore an objective to which the vector for learning a low-dimensional real value for the node in network indicates.Each section Point is corresponding indicate vector contained the node network structure information and other Heterogeneous Informations, these indicate the general quilts of vector As feature vector, to carry out further network analysis task, such as node-classification, link prediction, community discovery etc..
Summary of the invention
It is an object of the invention to cope with every problem in above-mentioned network analysis task, provide a kind of based on network node The network representation learning method that extensiveness and intensiveness captures, the present invention is similar similar to range information using node depth information, knot The local message for closing node is similar, and node is mapped to the feature space of more low-dimensional by way of insertion.Finally utilize insertion Node afterwards indicates, can provide help for network analysis task.
To achieve the purpose of the present invention, the invention proposes a kind of net lists captured based on network node extensiveness and intensiveness Dendrography learning method, interior joint depth information is similar similar from range information to be obtained by two full articulamentums of different neural networks , further according to the local neighbor information of node, nodal information is merged, the present invention the following steps are included:
Step 1: acquiring network data from internet and pre-processed, is stored in local file;
Step 2: constructing adjacency matrix A based on data;
Step 3: one-hot coding is carried out to nodes all in A;
Step 4: node is embedded in deep space and range space respectively;
Step 5: n times power operation being carried out to A, using it as the measurement standard of node depth;
Step 6: for the information of node statistics degree each in A, using it as the measurement standard of node range;
Step 7: capturing the depth similitude between node by laplacian eigenmaps, and embed it in deep space In;
Step 8: capturing the range similitude between node by laplacian eigenmaps, and embed it in range space In;
Step 9: the two spaces of node being embedded in the input spliced and as final embedded space, are caught by negative sampling Catch the similitude between node.
Step 10: using network as the incorporation model of node, and being used for node-classification task.
The data that the step 1 acquires include at least unique ID of network node, the link information between node
The dimension of adjacency matrix A in the step 2 is N*N, and N is number of nodes, and A [i, j] represents node i, is between j No to there is link, A [i, j]=1 item exists, otherwise without.
Node one-hot coding dimension in the step 3 is equal to nodes quantity.
The step 5 is to will abut against matrix A to carry out seeking k power, is saved in the k power of adjacency matrix for 1 element representative The k of point is walked up to neighbours.
The step 7 is to capture node depth similitude using laplacian eigenmaps, and calculation is as follows:
lm, lnIndicate the depth of arbitrary node in network, min | lm-ln| indicate the lowest difference of nodes depth, max |lm-ln| indicate that the maximum of nodes depth is poor.
The step 8 captures node depth similitude using laplacian eigenmaps, and calculation is as follows:
The step 9 captures the local similarity of node using the negative method of sampling, specifically, utilizes single order and second order Approximation;
First approximation refers to the node of direct neighbor, and low-dimensional expression should be close, i.e. 1-hop neighbours;
Two-order approximation refers to the node with common neighbours, and low-dimensional expression should also be as close, i.e. 2-hop neighbours;
It is required that non-neighbours' node table shows that mutually the method for far utilizing sampling chooses non-neighbor node pair, referred to as negative sampling, to every To neighbor node, a small number of (K to) non-neighbor node is chosen as negative sample;
| V | it is node total number,Indicate single order neighbours' number of node i,Indicate the second order neighbours of node i Number,Indicate the degree of node v
The quality of the step 10 interior joint classification task effect is indicated with Micro-F1 and Macro-F1;
Micro-F1: all categories total Precision and Recall are calculated, F1 is then calculated
Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally puts down F1
Detailed description of the invention
Fig. 1 is item recommendation method flow chart of the invention.
Specific embodiment
To achieve the purpose of the present invention, the invention proposes a kind of net lists captured based on network node extensiveness and intensiveness Dendrography learning method, interior joint depth information is similar similar from range information to be obtained by two full articulamentums of different neural networks , further according to the local neighbor information of node, nodal information is merged, the present invention the following steps are included:
Step 1: acquiring network data from internet and pre-processed, is stored in local file;
Step 2: constructing adjacency matrix A based on data;
Step 3: one-hot coding is carried out to nodes all in A;
Step 4: node is embedded in deep space and range space respectively;
Step 5: n times power operation being carried out to A, using it as the measurement standard of node depth;
Step 6: for the information of node statistics degree each in A, using it as the measurement standard of node range;
Step 7: capturing the depth similitude between node by laplacian eigenmaps, and embed it in deep space In;
Step 8: capturing the range similitude between node by laplacian eigenmaps, and embed it in range space In;
Step 9: the two spaces of node being embedded in the input spliced and as final embedded space, are caught by negative sampling Catch the similitude between node.
Step 10: using network as the incorporation model of node, and being used for node-classification task.
The data that the step 1 acquires include at least unique ID of network node, the link information between node
The dimension of adjacency matrix A in the step 2 is N*N, and N is number of nodes, and A [i, j] represents node i, is between j No to there is link, A [i, j]=1 item exists, otherwise without.
Node one-hot coding dimension in the step 3 is equal to nodes quantity.
The step 5 is to will abut against matrix A to carry out seeking k power, is saved in the k power of adjacency matrix for 1 element representative The k of point is walked up to neighbours.
The step 7 is to capture node depth similitude using laplacian eigenmaps, and calculation is as follows:
lm, lnIndicate the depth of arbitrary node in network, min | lm-ln| indicate the lowest difference of nodes depth, max |lm-ln| indicate that the maximum of nodes depth is poor.
The step 8 captures node depth similitude using laplacian eigenmaps, and calculation is as follows:
The step 9 captures the local similarity of node using the negative method of sampling, specifically, utilizes single order and second order Approximation;
First approximation refers to the node of direct neighbor, and low-dimensional expression should be close, i.e. 1-hop neighbours;
Two-order approximation refers to the node with common neighbours, and low-dimensional expression should also be as close, i.e. 2-hop neighbours;
It is required that non-neighbours' node table shows mutually far, non-neighbor node pair is chosen using the method for sampling, referred to as negative sampling, also It is that a small number of (K to) non-neighbor node is chosen as negative sample to each pair of neighbor node;
| V | it is node total number,Indicate single order neighbours' number of node i,Indicate the second order neighbours of node i Number,Indicate the degree of node v
The quality of the step 10 interior joint classification task effect is indicated with Micro-F1 and Macro-F1;
Micro-F1: all categories total Precision and Recall are calculated, F1 is then calculated
Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally F1 is averaged
Specifically, the working principle of the system related functions module of the embodiment of the present invention can be found in the correlation of embodiment of the method Description, which is not described herein again.
Using the implementation method in the embodiment of the present invention, have the beneficial effect that: (1) respectively study arrived the depth of node with Range information (2) preferably learns the expression for having arrived network node by fusion extensiveness and intensiveness and local message.
In addition, implementing a kind of provided network representation captured based on network node extensiveness and intensiveness to the present invention above Learning method is described in detail, and principle and implementation of the present invention are described herein, the explanation of the above implementation It is merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to this The thought of invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered It is interpreted as limitation of the present invention.

Claims (8)

1. a kind of social network user group classification method captured based on network node extensiveness and intensiveness, it is characterised in that including Following steps:
Step 1: acquiring network data from internet and pre-processed, is stored in local file;
Step 2: adjacency matrix A being constructed based on network node, wherein A is | V | row | V | the sparse matrix of column, wherein V is in network Node set, | V | for the node total number in network, A [i, j] represent between node i and node j with the presence or absence of link, A [i, J]=1 presence link, otherwise without link;
Step 3: one-hot coding is carried out to nodes all in A;
Step 4: node is embedded in deep space and range space respectively;
Step 5: n times power operation is carried out to A, N is used as hyper parameter, specifies by hand, as the node reach distance farthest considered, with The measurement standard as node depth;
Step 6: for the information of node statistics degree each in A, using it as the measurement standard of node range;
Step 7: capturing the depth similitude between node by laplacian eigenmaps, and embed it in deep space;
Step 8: capturing the range similitude between node by laplacian eigenmaps, and embed it in range space;
Step 9: the two spaces of node being embedded in the input spliced and as final embedded space, section is captured by negative sampling Similitude between point;
Step 10: the model that the neural network learnt is embedded in as social networks node exports social networks node, that is, society The low-dimensional expression of user in network is handed over, and for groups of users classification task in social networks.
2. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the data that the step 1 acquires include at least unique ID of network node, the link information between node.
3. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the node one-hot coding dimension in the step 3 is equal to nodes quantity.
4. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the step 5 is to will abut against matrix A to carry out seeking k power, and the element for being 1 in the k power of adjacency matrix represents node K is walked up to neighbours.
5. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature It is: captures node depth similitude using the laplacian eigenmaps in step 7, calculation is as follows:
Wherein lm, lnIndicate the depth of arbitrary node in network, | V | indicate the node total number in network,It indicates The sum of node single order neighbours and second order neighbours' quantity,dvIndicate the degree of node v, f(depth)Indicate the depth of node Degree mapping, min | lm-ln| indicate the lowest difference of nodes depth, max | lm-ln| indicate the maximum of nodes depth Difference.
6. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature It is: captures node depth similitude using the laplacian eigenmaps in step 8, calculation is as follows:
Wherein, f(breadth)Indicate the range mapping of node.
7. a kind of social network user group classification captured based on network node extensiveness and intensiveness according to claim 1 Method, it is characterised in that: the negative method of sampling in step 9 captures the local similarity of node, utilizes the first approximation of node Property with Two-order approximation, i.e. the 1-hop and 2-hop neighbours of node capture the relationship between node and its neighbour, single order neighbours, choosing Take destination node v in adjacency matrixiIt is 1 element in being expert at, corresponding column k is node viSingle order neighbor node vk, Second order neighbours choose destination node v in 2 power of adjacency matrixiIt is 1 element in being expert at, corresponding column k is node The second order neighbor node of i.
It is required that non-neighbours' nodal distance is mutually remote, using negative sampling, each pair of neighbor node is chosen a small number of (K to) non-neighbor node and is made It is each pair of neighbor node (v for negative samplei,vj), stationary nodes vi, according to node degree 0.75 power from all nodes (in addition to vj) in carry out stochastical sampling.
| V | it is node total number,Indicate single order neighbours' number of node i,Indicate second order neighbours' number of node i,dvIndicate the degree of node v.
8. the network representation learning method according to claim 1 captured based on network node extensiveness and intensiveness, feature Be: the quality of step 10 interior joint classification task effect indicates that F1 calculation formula is as follows with Micro-F1 and Macro-F1:
Micro-F1: all categories total Precision and Recall are calculated, F1 is then calculated
Macro-F1: F1 is calculated after calculating the Precison and Recall of each class, finally F1 is averaged.
CN201910441152.8A 2019-05-24 2019-05-24 A kind of social network user group classification method captured based on network node extensiveness and intensiveness Pending CN110163288A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910441152.8A CN110163288A (en) 2019-05-24 2019-05-24 A kind of social network user group classification method captured based on network node extensiveness and intensiveness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910441152.8A CN110163288A (en) 2019-05-24 2019-05-24 A kind of social network user group classification method captured based on network node extensiveness and intensiveness

Publications (1)

Publication Number Publication Date
CN110163288A true CN110163288A (en) 2019-08-23

Family

ID=67632710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910441152.8A Pending CN110163288A (en) 2019-05-24 2019-05-24 A kind of social network user group classification method captured based on network node extensiveness and intensiveness

Country Status (1)

Country Link
CN (1) CN110163288A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046907A (en) * 2019-11-02 2020-04-21 国网天津市电力公司 Semi-supervised convolutional network embedding method based on multi-head attention mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046907A (en) * 2019-11-02 2020-04-21 国网天津市电力公司 Semi-supervised convolutional network embedding method based on multi-head attention mechanism
CN111046907B (en) * 2019-11-02 2023-10-27 国网天津市电力公司 Semi-supervised convolutional network embedding method based on multi-head attention mechanism

Similar Documents

Publication Publication Date Title
Tang et al. Community detection and mining in social media
CN112215837B (en) Multi-attribute image semantic analysis method and device
US9710447B2 (en) Visual recognition using social links
Zhang Language in our time: An empirical analysis of hashtags
CN114036406A (en) Recommendation method and system based on graph contrast learning and social network enhancement
Smirnov et al. Recommendation system for tourist attraction information service
CN112036445A (en) Cross-social-network user identity recognition method based on neural tensor network
CN113806630A (en) Attention-based multi-view feature fusion cross-domain recommendation method and device
CN111340187B (en) Network characterization method based on attention countermeasure mechanism
CN108920712A (en) The representation method and device of nodes
Zhang et al. Tweetscore: Scoring tweets via social attribute relationships for twitter spammer detection
CN110163288A (en) A kind of social network user group classification method captured based on network node extensiveness and intensiveness
Yao et al. Unified entity search in social media community
CN117251586A (en) Multimedia resource recommendation method, device and storage medium
Ichimura et al. A generation method of filtering rules of Twitter via smartphone based Participatory Sensing system for tourist by interactive GHSOM and C4. 5
Oo et al. Detecting Influential Users in a Trending Topic Community Using Link Analysis Approach.
Batura Methods of social networks analysis
Sun et al. Mapping users across social media platforms by integrating text and structure information
Fang et al. Active exploration for large graphs
Liu et al. Denoise network structure for user alignment across networks via graph structure learning
Kaushal et al. NeXLink: Node embedding framework for cross-network linkages across social networks
Yang et al. Comparison and modelling of country-level micro-blog user behaviour and activity in cyber-physical-social systems using weibo and twitter data
CN114943588B (en) Commodity recommendation method based on neural network noise data
CN110990715B (en) Multi-source user attribute deducing method based on layer self-encoder
Zhang et al. A Novel Mobile Video Community Discovery Scheme Using Ontology‐Based Semantical Interest Capture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190823