CN109767008A - A kind of polymorphic feature learning method of high isomerism network based on meta schema - Google Patents
A kind of polymorphic feature learning method of high isomerism network based on meta schema Download PDFInfo
- Publication number
- CN109767008A CN109767008A CN201910017697.6A CN201910017697A CN109767008A CN 109767008 A CN109767008 A CN 109767008A CN 201910017697 A CN201910017697 A CN 201910017697A CN 109767008 A CN109767008 A CN 109767008A
- Authority
- CN
- China
- Prior art keywords
- node
- meta schema
- network
- type
- learning method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The polymorphic feature learning method of the high isomerism network that the invention discloses a kind of based on meta schema, based on similar node in former network, also similar principle, extracting method include the following steps in embedded space.Firstly, using the random walk extracted based on the random walk of meta schema in heterogeneous network.Then, the similar node pair of specific objective in sliding window extraction path is utilized.Finally, the polymorphic insertion for learning heterogeneous nodes according to network influence matrix and weighting skip-gram model is expressed.Obtained node insertion can be further fed into supervised learning model, realize the prediction to tasks such as node-classifications.The present invention realizes the function of learning node diagnostic from heterogeneous network connection, and with calculating, simple, time complexity is low, the good technical effect of multiplicity effect is expressed in insertion.
Description
Technical field
The invention belongs to network technique field more particularly to a kind of polymorphic feature learning sides of heterogeneous network based on meta schema
Method.
Background technique
With the fast development of internet with the successive appearance of all kinds of social network sites, Network Science be increasingly becoming one by
The subject of concern plays an important role in big data research.Network in life is seen everywhere, such as in computer field
WWW, the electric power networks of energy field, the air net of field of traffic, the online friends networks in social field etc..When
When being attempted to solve such as node-classification, link prediction, cluster network traditional problem, there is an urgent need to about network node,
The feature of Lian Bian, corporations or other network elements.And internet startup disk algorithm provides in a kind of relationship and attribute from network
Automatically extract the method for feature representation.Heterogeneous network be it is a kind of contain a plurality of types of nodes or even side network it is total
Claim.How to extract feature from heterogeneous network becomes a urgent problem.
Internet startup disk technology can mainly be divided into three classes according to data extracting mode at present: node to neighbours, node to section
Point, random walk.First kind method ([document 1]) assumes that the insertion of each node is the linear combination of its neighbor node.Second
Class method ([document 2]) attempts to make the distance of two nodes insertion closer, if the weight between them is bigger.Third class method
([document 3,4]) extracts similar node pair by random walk, then learns node insertion by skip-gram algorithm.However,
Efficiency and ability to express of these methods in high isomerism network are limited, there is an urgent need to it is a kind of flexibly can be to high isomerism
The method of network progress feature extraction.
[document 1] .S.T.Roweis, Nonlinear dimensionality reduction by lo cally
linear emb edding,Science 290(5500)(2000)2323{2326.doi:10.1126/
science.290.5500.2323
[document 2] M.Belkin, P.Niyogi, Laplacian eigenmaps and sp ectral tech-
250niques for emb edding and clustering,in:Advances in neural in-formation
pro cessing systems,2002,pp.585{591}
[document 3] A.Grover, J.Leskovec, no de2vec:Scalable feature learning for
networks,in:Pro ceedings of the 22nd ACM SIGKDD Interna-tional Conference on
Knowledge Discovery and Data Mining,KDD'16,ACM,2016,pp.855{864.doi:10.1145/
2939672.275 2939754
[document 4] Y.Dong, N.V.Chawla, A.Swami, metapath2vec:Scalable rep-
resentation learning for heterogeneous networks,in:Pro ceed-ings of the 23rd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-
KDD'17,ACM,2017,280pp.135{144.doi:10.1145/3097983.3098036
Summary of the invention
In view of the deficienciess of the prior art, the invention proposes a kind of polymorphic characterologies of the heterogeneous network based on meta schema
Learning method.
The technical scheme adopted by the invention is that: a kind of polymorphic feature learning side of high isomerism network based on meta schema
Method, which comprises the following steps:
Step 1: given heterogeneous network G={ V, E, φv,φe, wherein V indicates node set, and E indicates line set, φv
It is from V to TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeIt is node class respectively
The set of type and side type, high isomerism requirement | Tv|+|Te| > > 1.Sampling meta schema S is initialized according to heterogeneous network G,
Initialize influence power weight matrix α;
Step 2: from each node of heterogeneous network, obtaining k random walk using the random walk based on meta schema
Path;
Step 3: being sampled in random walk path using the sliding window that length is l, all window center nodes and window
Respectively as similar node to proposition, the node in all windows is considered as similar remaining interior node;
Step 4: according to influence power weight matrix α and similar node to snp, using weighting skip-gram algorithm optimization with
Lower objective function, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, most
Network node feature X ∈ R is acquired eventually|V|*d, wherein | V | it is heterogeneous network number of nodes, d is the dimension of network node feature, full
Sufficient d " | V |.
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v)
Respectively indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling,
It indicates from node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
Step 5: node insertion being passed through in subsequent Supervised classification device, actual task is solved.
With existing network embedded technology compared with system, the present invention is had the following advantages and beneficial effects:
1) compared with prior art, the invention proposes a new similar sections based on meta schema towards heterogeneous network
Point extractive technique;
2) compared with prior art, the present invention passes through setting influence power matrix, available a variety of heterogeneous network mark sheets
It reaches, extracts the phase Sihe otherness between heterogeneous network node comprehensively.
Detailed description of the invention
Fig. 1: for the flow chart of the embodiment of the present invention;
Fig. 2: for meta schema schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention is understood and implemented for the ease of those of ordinary skill in the art, and the present invention is made into one with reference to the accompanying drawing
The detailed description of step, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, and is not used to limit
The present invention.
Referring to Fig.1, a kind of polymorphic feature learning method of high isomerism network based on meta schema provided by the invention, including
Following steps:
Step 1: given heterogeneous network G={ V, E, φv,φe, wherein V indicates node set, and E indicates line set, φv
It is from V to TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeIt is node class respectively
The set of type and side type, high isomerism requirement | Tv|+|Te|"1.Sampling meta schema S is initialized according to heterogeneous network G, initially
Change influence power weight matrix α;
See Fig. 2, the meta schema S of the present embodiment is the sub-network of network mode, it is specified that the weighting out-degree of each node is
1。
Step 2: from each node of heterogeneous network, obtaining k random walk using the random walk based on meta schema
Path;
In the present embodiment, using based on meta schema random walk obtain k random walk path, specific implementation include with
Lower sub-step:
Step 2.1: searching neighbor node collection ngb of the random walk path ends node in heterogeneous network;
Assuming that random walk path ends node is v, the node that the type of node v is connected is searched first in meta schema
With even side type t, the neighbor node collection ngb that v meets t is then searched in real network;
Step 2.2: according to meta schema calculate node to the transition probability of neighbor node collection ngb, calculating the tool of transition probability
Body formula are as follows:
Wherein,For in meta schema S from node type t (vs) arrive side type t (est) transition probability,For node types all in meta schema S composition set,It is all from node type t (v in meta schemas) set out
All side types composition set.
The transition probability is the setting value in meta schema between 0~1, and the target in migration on each node turns
Moving probability matrix is non-normalization matrix;
Step 2.3: utilizing the Alias method of sampling, go out goal displacement node from node transition probability cluster sampling;
Step 2.4: goal displacement node is sent into migration path, step 2.1 is then branched to, repeatedly, until
L in objective function terminates to repeat when being less than the critical value of a certain setting.
Step 3: being sampled in random walk path using the sliding window that length is l, all window center nodes and window
Respectively as similar node to proposition, the node in all windows is considered as similar remaining interior node;
In the present embodiment, sliding window only extracts destination node pair on path, not whole nodes pair in meta schema.
Step 4: according to influence power weight matrix α and similar node to snp, using weighting skip-gram algorithm optimization with
Lower objective function, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, most
Network node feature X ∈ R is acquired eventually|V|*d, wherein | V | it is heterogeneous network number of nodes, d is the dimension of network node feature, full
Sufficient d " | V |.
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v)
Respectively indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling,
It indicates from node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
In the present embodiment, influence power is smaller between influence power weight size indicates node type 0~1,0, and segmentation property is larger;
Influence power is larger between 1 expression node type, and segmentation property is smaller.The objective function of weighting skip-gram algorithm is by node to similar
Property target is set as node to the influence power weight of type.
Step 5: node insertion being passed through in subsequent Supervised classification device, such as node-classification, link prediction, cluster are solved
Actual task;
In the present embodiment, classifier is one of logistic regression, support vector machines or neural network;
Node-classification is to predict the node label of other tag misses according to the node of part known label, by skip-
The node diagnostic of gram study is passed through Supervised classification device as input feature vector;
Link prediction is to be passed through Supervised classification device, spy of the node to (u, v) for the feature of node pair as input feature vector
Sign f (u, v) can be merged one of as follows:
(1) mean value merges: f (u, v)i=f (u)i+f(v)i/2;
(2) Hadamard is merged: f (u, v)i=f (u)i*f(v)i;
(3) weighting L1 fusion: f (u, v)i=| f (u)i-f(v)i|;
(4) weighting L2 fusion: f (u, v)i=| f (u)i-f(v)i|;
(5) fused in tandem: f (u, v)i=concatenate (f (u), f (v));
Wherein, u and v respectively indicates two nodes in heterogeneous network, and f (u) and f (v) respectively indicate node u and node v
Two features insertion, f (u)iWith f (v)iRespectively indicate i-th of element of the two features insertion vector, concatenate letter
Number indicates to connect two vector fs (u) and f (v).
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this
The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention
Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair
It is bright range is claimed to be determined by the appended claims.
Claims (9)
1. a kind of polymorphic feature learning method of high isomerism network based on meta schema, which comprises the following steps:
Step 1: given heterogeneous network G={ V, E, φv, φe, wherein V indicates node set, and E indicates line set, φvIt is from V
To TvNode type mapping function, φeIt is from E to TeSide Type mapping function, wherein TvAnd TeBe respectively node type and
The set of side type, high isomerism requirement | Tv|+|Te|>>1;Sampling meta schema S, initialization are initialized according to heterogeneous network G
Influence power weight matrix α;
Step 2: from each node of heterogeneous network, obtaining k random walk path using the random walk based on meta schema;
Step 3: sampled in random walk path using the sliding window that length is l, in all window center nodes and window its
Respectively as similar node to proposition, the node in all windows is considered as similar remaining node;
Step 4: according to influence power weight matrix α and similar node to snp, utilizing the weighting following mesh of skip-gram algorithm optimization
Scalar functions, the objective function reduce similar node to the distance between simultaneously increase remaining node to the distance between, finally ask
Obtain network node feature X ∈ R|V|*d, wherein | V | be heterogeneous network number of nodes, d be network node feature dimension, meet d < <
|V|;
Wherein, upFor a node of similar node centering, unFor a node of remaining node centering, t (up) and t (v) is respectively
Indicate node upWith the type of node v, neg indicates remaining node pair obtained by negative sampling,It indicates
From node type t (up) the influence power weight of t (v) is arrived, p (u | v) indicates the probability that node v is observed from node u;
Step 5: node insertion being passed through in subsequent Supervised classification device, actual task is solved.
2. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 1, the meta schema S is the sub-network of network mode, it is specified that the weighting out-degree of each node is 1.
3. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
Described to obtain k random walk path using the random walk based on meta schema in step 2, specific implementation includes following sub-step
It is rapid:
Step 2.1: searching neighbor node collection ngb of the random walk path ends node in heterogeneous network;
Step 2.2: according to meta schema calculate node to the transition probability of neighbor node collection ngb, calculating the specific public affairs of transition probability
Formula are as follows:
Wherein,For in meta schema S from node type t (vs) arrive side type t (est) transition probability,For
The set of all node type compositions in meta schema S,It is all from node type t (v in meta schemas) all sides for setting out
The set of type composition;
The transition probability is the setting value in meta schema between 0~1, and the goal displacement in migration on each node is general
Rate matrix is non-normalization matrix;
Step 2.3: utilizing the Alias method of sampling, go out goal displacement node from node transition probability cluster sampling;
Step 2.4: goal displacement node being sent into migration path, step 2.1 is then branched to, repeatedly, until target
L in function terminates to repeat when being less than the critical value of a certain setting.
4. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 3, sliding window only extracts destination node pair on path, not whole nodes pair in meta schema.
5. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 4, influence power is smaller between influence power weight size indicates node type 0~1,0, and segmentation property is larger;1 indicates node class
Influence power is larger between type, and segmentation property is smaller.
6. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 4, the objective function of the weighting skip-gram algorithm sets node to the shadow of type to similitude target for node
Ring power weight.
7. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 4, described Probability p (u | v)=σ (X (u) that node v is observed from node uTθ (v)), wherein X (u) indicates node u's
Feature insertion, θ (v) indicate the supplemental characteristic insertion of node v, and σ is sigmoid function.
8. the polymorphic feature learning method of the high isomerism network according to claim 1 based on meta schema, it is characterised in that:
In step 5, the classifier is one of logistic regression, support vector machines or neural network.
9. the polymorphic feature learning method of the high isomerism network according to any one of claims 1 to 8 based on meta schema,
It is characterized by: actual task includes node-classification, link prediction, cluster in step 5;
The node-classification is to predict the node label of other tag misses according to the node of part known label, by skip-
The node diagnostic of gram study is passed through Supervised classification device as input feature vector;
The link prediction is to be passed through Supervised classification device, spy of the node to (u, v) for the feature of node pair as input feature vector
Sign f (u, v) can be merged one of as follows:
(1) mean value merges: f (u, v)i=f (u)i+f(v)i/2;
(2) Hadamard is merged: f (u, v)i=f (u)i*f(v)i;
(3) weighting L1 fusion: f (u, v)i=| f (u)i-f(v)i|;
(4) weighting L2 fusion: f (u, v)i=| f (u)i-f(v)i|;
(5) fused in tandem: f (u, v)i=concatenate (f (u), f (v));
Wherein, u and v respectively indicates two nodes in heterogeneous network, and f (u) and f (v) respectively indicate the two of node u and node v
A feature insertion, f (u)iWith f (v)iRespectively indicate i-th of element of the two features insertion vector, concatenate function table
Show and two vector fs (u) and f (v) are connected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910017697.6A CN109767008A (en) | 2019-01-07 | 2019-01-07 | A kind of polymorphic feature learning method of high isomerism network based on meta schema |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910017697.6A CN109767008A (en) | 2019-01-07 | 2019-01-07 | A kind of polymorphic feature learning method of high isomerism network based on meta schema |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109767008A true CN109767008A (en) | 2019-05-17 |
Family
ID=66453516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910017697.6A Pending CN109767008A (en) | 2019-01-07 | 2019-01-07 | A kind of polymorphic feature learning method of high isomerism network based on meta schema |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109767008A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325326A (en) * | 2020-02-21 | 2020-06-23 | 北京工业大学 | Link prediction method based on heterogeneous network representation learning |
CN111400560A (en) * | 2020-03-10 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and system for predicting based on heterogeneous graph neural network model |
CN111581488A (en) * | 2020-05-14 | 2020-08-25 | 上海商汤智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112507244A (en) * | 2019-09-16 | 2021-03-16 | 腾讯科技(深圳)有限公司 | Social data recommendation method and device, distributed computing cluster and storage medium |
CN112561688A (en) * | 2020-12-21 | 2021-03-26 | 第四范式(北京)技术有限公司 | Credit card overdue prediction method and device based on graph embedding and electronic equipment |
-
2019
- 2019-01-07 CN CN201910017697.6A patent/CN109767008A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507244A (en) * | 2019-09-16 | 2021-03-16 | 腾讯科技(深圳)有限公司 | Social data recommendation method and device, distributed computing cluster and storage medium |
CN112507244B (en) * | 2019-09-16 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Social data recommendation method and device, distributed computing cluster and storage medium |
CN111325326A (en) * | 2020-02-21 | 2020-06-23 | 北京工业大学 | Link prediction method based on heterogeneous network representation learning |
CN111400560A (en) * | 2020-03-10 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and system for predicting based on heterogeneous graph neural network model |
CN111581488A (en) * | 2020-05-14 | 2020-08-25 | 上海商汤智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111581488B (en) * | 2020-05-14 | 2023-08-04 | 上海商汤智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112561688A (en) * | 2020-12-21 | 2021-03-26 | 第四范式(北京)技术有限公司 | Credit card overdue prediction method and device based on graph embedding and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767008A (en) | A kind of polymorphic feature learning method of high isomerism network based on meta schema | |
Bansal et al. | Zero-shot object detection | |
CN112257066B (en) | Malicious behavior identification method and system for weighted heterogeneous graph and storage medium | |
CN101950284B (en) | Chinese word segmentation method and system | |
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN108388651A (en) | A kind of file classification method based on the kernel of graph and convolutional neural networks | |
CN103425996B (en) | A kind of large-scale image recognition methods of parallel distributed | |
CN108628970A (en) | A kind of biomedical event joint abstracting method based on new marking mode | |
CN106815310A (en) | A kind of hierarchy clustering method and system to magnanimity document sets | |
Meena et al. | Image-based sentiment analysis using InceptionV3 transfer learning approach | |
Agrawal et al. | Scalable, semi-supervised extraction of structured information from scientific literature | |
CN106127260A (en) | A kind of multi-source data fuzzy clustering algorithm of novelty | |
Petkos et al. | Graph-based multimodal clustering for social multimedia | |
CN105656692B (en) | Area monitoring method based on more example Multi-label learnings in wireless sensor network | |
CN109033304B (en) | Multi-modal retrieval method based on online deep topic model | |
CN105337842B (en) | A kind of rubbish mail filtering method unrelated with content | |
Boomija et al. | Comparison of partition based clustering algorithms | |
CN109002561A (en) | Automatic document classification method, system and medium based on sample keyword learning | |
Li et al. | Inferring user profiles in online social networks based on convolutional neural network | |
Castellano et al. | Classification of data streams by incremental semi-supervised fuzzy clustering | |
Gao et al. | Semi-supervised graph embedding for multi-label graph node classification | |
Yu et al. | Social group suggestion from user image collections | |
Rehman et al. | Building multi-resolution event-enriched maps from social data | |
Wang et al. | IdeaGraph: turning data into human insights for collective intelligence | |
Zhang et al. | Federated model decomposition with private vocabulary for text classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190517 |
|
RJ01 | Rejection of invention patent application after publication |