CN109767008A - A kind of polymorphic feature learning method of high isomerism network based on meta schema - Google Patents

A kind of polymorphic feature learning method of high isomerism network based on meta schema Download PDF

Info

Publication number
CN109767008A
CN109767008A CN201910017697.6A CN201910017697A CN109767008A CN 109767008 A CN109767008 A CN 109767008A CN 201910017697 A CN201910017697 A CN 201910017697A CN 109767008 A CN109767008 A CN 109767008A
Authority
CN
China
Prior art keywords
node
meta schema
network
type
learning method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910017697.6A
Other languages
Chinese (zh)
Inventor
陈军
高熙越
朱文谦
詹泽行
杨帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910017697.6A priority Critical patent/CN109767008A/en
Publication of CN109767008A publication Critical patent/CN109767008A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The polymorphic feature learning method of the high isomerism network that the invention discloses a kind of based on meta schema, based on similar node in former network, also similar principle, extracting method include the following steps in embedded space.Firstly, using the random walk extracted based on the random walk of meta schema in heterogeneous network.Then, the similar node pair of specific objective in sliding window extraction path is utilized.Finally, the polymorphic insertion for learning heterogeneous nodes according to network influence matrix and weighting skip-gram model is expressed.Obtained node insertion can be further fed into supervised learning model, realize the prediction to tasks such as node-classifications.The present invention realizes the function of learning node diagnostic from heterogeneous network connection, and with calculating, simple, time complexity is low, the good technical effect of multiplicity effect is expressed in insertion.

Description

A kind of polymorphic feature learning method of high isomerism network based on meta schema
Technical field
The invention belongs to network technique field more particularly to a kind of polymorphic feature learning sides of heterogeneous network based on meta schema Method.
Background technique
With the fast development of internet with the successive appearance of all kinds of social network sites, Network Science be increasingly becoming one by The subject of concern plays an important role in big data research.Network in life is seen everywhere, such as in computer field WWW, the electric power networks of energy field, the air net of field of traffic, the online friends networks in social field etc..When When being attempted to solve such as node-classification, link prediction, cluster network traditional problem, there is an urgent need to about network node, The feature of Lian Bian, corporations or other network elements.And internet startup disk algorithm provides in a kind of relationship and attribute from network Automatically extract the method for feature representation.Heterogeneous network be it is a kind of contain a plurality of types of nodes or even side network it is total Claim.How to extract feature from heterogeneous network becomes a urgent problem.
Internet startup disk technology can mainly be divided into three classes according to data extracting mode at present: node to neighbours, node to section Point, random walk.First kind method ([document 1]) assumes that the insertion of each node is the linear combination of its neighbor node.Second Class method ([document 2]) attempts to make the distance of two nodes insertion closer, if the weight between them is bigger.Third class method ([document 3,4]) extracts similar node pair by random walk, then learns node insertion by skip-gram algorithm.However, Efficiency and ability to express of these methods in high isomerism network are limited, there is an urgent need to it is a kind of flexibly can be to high isomerism The method of network progress feature extraction.
[document 1] .S.T.Roweis, Nonlinear dimensionality reduction by lo cally linear emb edding,Science 290(5500)(2000)2323{2326.doi:10.1126/ science.290.5500.2323
[document 2] M.Belkin, P.Niyogi, Laplacian eigenmaps and sp ectral tech- 250niques for emb edding and clustering,in:Advances in neural in-formation pro cessing systems,2002,pp.585{591}
[document 3] A.Grover, J.Leskovec, no de2vec:Scalable feature learning for networks,in:Pro ceedings of the 22nd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining,KDD'16,ACM,2016,pp.855{864.doi:10.1145/ 2939672.275 2939754
[document 4] Y.Dong, N.V.Chawla, A.Swami, metapath2vec:Scalable rep- resentation learning for heterogeneous networks,in:Pro ceed-ings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining- KDD'17,ACM,2017,280pp.135{144.doi:10.1145/3097983.3098036
Summary of the invention
In view of the deficienciess of the prior art, the invention proposes a kind of polymorphic characterologies of the heterogeneous network based on meta schema Learning method.
The technical scheme adopted by the invention is that: a kind of polymorphic feature learning side of high isomerism network based on meta schema Method, which comprises the following steps:
Step 1: given heterogeneous network G={ V, E, φve, wherein V indicates node set, and E indicates line set, φv It is from V to TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeIt is node class respectively The set of type and side type, high isomerism requirement | Tv|+|Te| > > 1.Sampling meta schema S is initialized according to heterogeneous network G, Initialize influence power weight matrix α;
Step 2: from each node of heterogeneous network, obtaining k random walk using the random walk based on meta schema Path;
Step 3: being sampled in random walk path using the sliding window that length is l, all window center nodes and window Respectively as similar node to proposition, the node in all windows is considered as similar remaining interior node;
Step 4: according to influence power weight matrix α and similar node to snp, using weighting skip-gram algorithm optimization with Lower objective function, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, most Network node feature X ∈ R is acquired eventually|V|*d, wherein | V | it is heterogeneous network number of nodes, d is the dimension of network node feature, full Sufficient d " | V |.
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v) Respectively indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling, It indicates from node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
Step 5: node insertion being passed through in subsequent Supervised classification device, actual task is solved.
With existing network embedded technology compared with system, the present invention is had the following advantages and beneficial effects:
1) compared with prior art, the invention proposes a new similar sections based on meta schema towards heterogeneous network Point extractive technique;
2) compared with prior art, the present invention passes through setting influence power matrix, available a variety of heterogeneous network mark sheets It reaches, extracts the phase Sihe otherness between heterogeneous network node comprehensively.
Detailed description of the invention
Fig. 1: for the flow chart of the embodiment of the present invention;
Fig. 2: for meta schema schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention is understood and implemented for the ease of those of ordinary skill in the art, and the present invention is made into one with reference to the accompanying drawing The detailed description of step, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, and is not used to limit The present invention.
Referring to Fig.1, a kind of polymorphic feature learning method of high isomerism network based on meta schema provided by the invention, including Following steps:
Step 1: given heterogeneous network G={ V, E, φve, wherein V indicates node set, and E indicates line set, φv It is from V to TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeIt is node class respectively The set of type and side type, high isomerism requirement | Tv|+|Te|"1.Sampling meta schema S is initialized according to heterogeneous network G, initially Change influence power weight matrix α;
See Fig. 2, the meta schema S of the present embodiment is the sub-network of network mode, it is specified that the weighting out-degree of each node is 1。
Step 2: from each node of heterogeneous network, obtaining k random walk using the random walk based on meta schema Path;
In the present embodiment, using based on meta schema random walk obtain k random walk path, specific implementation include with Lower sub-step:
Step 2.1: searching neighbor node collection ngb of the random walk path ends node in heterogeneous network;
Assuming that random walk path ends node is v, the node that the type of node v is connected is searched first in meta schema With even side type t, the neighbor node collection ngb that v meets t is then searched in real network;
Step 2.2: according to meta schema calculate node to the transition probability of neighbor node collection ngb, calculating the tool of transition probability Body formula are as follows:
Wherein,For in meta schema S from node type t (vs) arrive side type t (est) transition probability,For node types all in meta schema S composition set,It is all from node type t (v in meta schemas) set out All side types composition set.
The transition probability is the setting value in meta schema between 0~1, and the target in migration on each node turns Moving probability matrix is non-normalization matrix;
Step 2.3: utilizing the Alias method of sampling, go out goal displacement node from node transition probability cluster sampling;
Step 2.4: goal displacement node is sent into migration path, step 2.1 is then branched to, repeatedly, until L in objective function terminates to repeat when being less than the critical value of a certain setting.
Step 3: being sampled in random walk path using the sliding window that length is l, all window center nodes and window Respectively as similar node to proposition, the node in all windows is considered as similar remaining interior node;
In the present embodiment, sliding window only extracts destination node pair on path, not whole nodes pair in meta schema.
Step 4: according to influence power weight matrix α and similar node to snp, using weighting skip-gram algorithm optimization with Lower objective function, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, most Network node feature X ∈ R is acquired eventually|V|*d, wherein | V | it is heterogeneous network number of nodes, d is the dimension of network node feature, full Sufficient d " | V |.
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v) Respectively indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling, It indicates from node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
In the present embodiment, influence power is smaller between influence power weight size indicates node type 0~1,0, and segmentation property is larger; Influence power is larger between 1 expression node type, and segmentation property is smaller.The objective function of weighting skip-gram algorithm is by node to similar Property target is set as node to the influence power weight of type.
Step 5: node insertion being passed through in subsequent Supervised classification device, such as node-classification, link prediction, cluster are solved Actual task;
In the present embodiment, classifier is one of logistic regression, support vector machines or neural network;
Node-classification is to predict the node label of other tag misses according to the node of part known label, by skip- The node diagnostic of gram study is passed through Supervised classification device as input feature vector;
Link prediction is to be passed through Supervised classification device, spy of the node to (u, v) for the feature of node pair as input feature vector Sign f (u, v) can be merged one of as follows:
(1) mean value merges: f (u, v)i=f (u)i+f(v)i/2;
(2) Hadamard is merged: f (u, v)i=f (u)i*f(v)i
(3) weighting L1 fusion: f (u, v)i=| f (u)i-f(v)i|;
(4) weighting L2 fusion: f (u, v)i=| f (u)i-f(v)i|;
(5) fused in tandem: f (u, v)i=concatenate (f (u), f (v));
Wherein, u and v respectively indicates two nodes in heterogeneous network, and f (u) and f (v) respectively indicate node u and node v Two features insertion, f (u)iWith f (v)iRespectively indicate i-th of element of the two features insertion vector, concatenate letter Number indicates to connect two vector fs (u) and f (v).
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair It is bright range is claimed to be determined by the appended claims.

Claims (9)

1. a kind of polymorphic feature learning method of high isomerism network based on meta schema, which comprises the following steps:
Step 1: given heterogeneous network G={ V, E, φv, φe, wherein V indicates node set, and E indicates line set, φvIt is from V To TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeBe respectively node type and The set of side type, high isomerism requirement | Tv|+|Te|>>1;Sampling meta schema S, initialization are initialized according to heterogeneous network G Influence power weight matrix α;
Step 2: from each node of heterogeneous network, obtaining k random walk path using the random walk based on meta schema;
Step 3: sampled in random walk path using the sliding window that length is l, in all window center nodes and window its Respectively as similar node to proposition, the node in all windows is considered as similar remaining node;
Step 4: according to influence power weight matrix α and similar node to snp, utilizing the weighting following mesh of skip-gram algorithm optimization Scalar functions, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, finally ask Obtain network node feature X ∈ R|V|*d, wherein | V | be heterogeneous network number of nodes, d be network node feature dimension, meet d < < |V|;
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v) is respectively Indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling,It indicates From node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
Step 5: node insertion being passed through in subsequent Supervised classification device, actual task is solved.
2. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 1, the meta schema S is the sub-network of network mode, it is specified that the weighting out-degree of each node is 1.
3. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: Described to obtain k random walk path using the random walk based on meta schema in step 2, specific implementation includes following sub-step It is rapid:
Step 2.1: searching neighbor node collection ngb of the random walk path ends node in heterogeneous network;
Step 2.2: according to meta schema calculate node to the transition probability of neighbor node collection ngb, calculating the specific public affairs of transition probability Formula are as follows:
Wherein,For in meta schema S from node type t (vs) arrive side type t (est) transition probability,For The set of all node type compositions in meta schema S,It is all from node type t (v in meta schemas) all sides for setting out The set of type composition;
The transition probability is the setting value in meta schema between 0~1, and the goal displacement in migration on each node is general Rate matrix is non-normalization matrix;
Step 2.3: utilizing the Alias method of sampling, go out goal displacement node from node transition probability cluster sampling;
Step 2.4: goal displacement node being sent into migration path, step 2.1 is then branched to, repeatedly, until target L in function terminates to repeat when being less than the critical value of a certain setting.
4. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 3, sliding window only extracts destination node pair on path, not whole nodes pair in meta schema.
5. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 4, influence power is smaller between influence power weight size indicates node type 0~1,0, and segmentation property is larger;1 indicates node class Influence power is larger between type, and segmentation property is smaller.
6. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 4, the objective function of the weighting skip-gram algorithm sets node to the shadow of type to similitude target for node Ring power weight.
7. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 4, described Probability p (u | v)=σ (X (u) that node v is observed from node uTθ (v)), wherein X (u) indicates node u's Feature insertion, θ (v) indicate the supplemental characteristic insertion of node v, and σ is sigmoid function.
8. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that: In step 5, the classifier is one of logistic regression, support vector machines or neural network.
9. the polymorphic feature learning method of the high isomerism network according to any one of claims 1 to 8 based on meta schema, It is characterized by: actual task includes node-classification, link prediction, cluster in step 5;
The node-classification is to predict the node label of other tag misses according to the node of part known label, by skip- The node diagnostic of gram study is passed through Supervised classification device as input feature vector;
The link prediction is to be passed through Supervised classification device, spy of the node to (u, v) for the feature of node pair as input feature vector Sign f (u, v) can be merged one of as follows:
(1) mean value merges: f (u, v)i=f (u)i+f(v)i/2;
(2) Hadamard is merged: f (u, v)i=f (u)i*f(v)i
(3) weighting L1 fusion: f (u, v)i=| f (u)i-f(v)i|;
(4) weighting L2 fusion: f (u, v)i=| f (u)i-f(v)i|;
(5) fused in tandem: f (u, v)i=concatenate (f (u), f (v));
Wherein, u and v respectively indicates two nodes in heterogeneous network, and f (u) and f (v) respectively indicate the two of node u and node v A feature insertion, f (u)iWith f (v)iRespectively indicate i-th of element of the two features insertion vector, concatenate function table Show and two vector fs (u) and f (v) are connected.
CN201910017697.6A 2019-01-07 2019-01-07 A kind of polymorphic feature learning method of high isomerism network based on meta schema Pending CN109767008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910017697.6A CN109767008A (en) 2019-01-07 2019-01-07 A kind of polymorphic feature learning method of high isomerism network based on meta schema

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910017697.6A CN109767008A (en) 2019-01-07 2019-01-07 A kind of polymorphic feature learning method of high isomerism network based on meta schema

Publications (1)

Publication Number Publication Date
CN109767008A true CN109767008A (en) 2019-05-17

Family

ID=66453516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910017697.6A Pending CN109767008A (en) 2019-01-07 2019-01-07 A kind of polymorphic feature learning method of high isomerism network based on meta schema

Country Status (1)

Country Link
CN (1) CN109767008A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325326A (en) * 2020-02-21 2020-06-23 北京工业大学 Link prediction method based on heterogeneous network representation learning
CN111400560A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for predicting based on heterogeneous graph neural network model
CN111581488A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112507244A (en) * 2019-09-16 2021-03-16 腾讯科技(深圳)有限公司 Social data recommendation method and device, distributed computing cluster and storage medium
CN112561688A (en) * 2020-12-21 2021-03-26 第四范式(北京)技术有限公司 Credit card overdue prediction method and device based on graph embedding and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507244A (en) * 2019-09-16 2021-03-16 腾讯科技(深圳)有限公司 Social data recommendation method and device, distributed computing cluster and storage medium
CN112507244B (en) * 2019-09-16 2023-09-26 腾讯科技(深圳)有限公司 Social data recommendation method and device, distributed computing cluster and storage medium
CN111325326A (en) * 2020-02-21 2020-06-23 北京工业大学 Link prediction method based on heterogeneous network representation learning
CN111400560A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for predicting based on heterogeneous graph neural network model
CN111581488A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN111581488B (en) * 2020-05-14 2023-08-04 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112561688A (en) * 2020-12-21 2021-03-26 第四范式(北京)技术有限公司 Credit card overdue prediction method and device based on graph embedding and electronic equipment

Similar Documents

Publication Publication Date Title
CN109767008A (en) A kind of polymorphic feature learning method of high isomerism network based on meta schema
Bansal et al. Zero-shot object detection
CN112257066B (en) Malicious behavior identification method and system for weighted heterogeneous graph and storage medium
CN101950284B (en) Chinese word segmentation method and system
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN108388651A (en) A kind of file classification method based on the kernel of graph and convolutional neural networks
CN103425996B (en) A kind of large-scale image recognition methods of parallel distributed
CN108628970A (en) A kind of biomedical event joint abstracting method based on new marking mode
CN106815310A (en) A kind of hierarchy clustering method and system to magnanimity document sets
Meena et al. Image-based sentiment analysis using InceptionV3 transfer learning approach
Agrawal et al. Scalable, semi-supervised extraction of structured information from scientific literature
CN106127260A (en) A kind of multi-source data fuzzy clustering algorithm of novelty
Petkos et al. Graph-based multimodal clustering for social multimedia
CN105656692B (en) Area monitoring method based on more example Multi-label learnings in wireless sensor network
CN109033304B (en) Multi-modal retrieval method based on online deep topic model
CN105337842B (en) A kind of rubbish mail filtering method unrelated with content
Boomija et al. Comparison of partition based clustering algorithms
CN109002561A (en) Automatic document classification method, system and medium based on sample keyword learning
Li et al. Inferring user profiles in online social networks based on convolutional neural network
Castellano et al. Classification of data streams by incremental semi-supervised fuzzy clustering
Gao et al. Semi-supervised graph embedding for multi-label graph node classification
Yu et al. Social group suggestion from user image collections
Rehman et al. Building multi-resolution event-enriched maps from social data
Wang et al. IdeaGraph: turning data into human insights for collective intelligence
Zhang et al. Federated model decomposition with private vocabulary for text classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190517

RJ01 Rejection of invention patent application after publication