CN109800232A - A kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and storage medium - Google Patents

A kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and storage medium Download PDF

Info

Publication number
CN109800232A
CN109800232A CN201910052260.6A CN201910052260A CN109800232A CN 109800232 A CN109800232 A CN 109800232A CN 201910052260 A CN201910052260 A CN 201910052260A CN 109800232 A CN109800232 A CN 109800232A
Authority
CN
China
Prior art keywords
node
tuple
membership
nodes
heterogeneous information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910052260.6A
Other languages
Chinese (zh)
Other versions
CN109800232B (en
Inventor
石川
陆元福
胡琳梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910052260.6A priority Critical patent/CN109800232B/en
Publication of CN109800232A publication Critical patent/CN109800232A/en
Application granted granted Critical
Publication of CN109800232B publication Critical patent/CN109800232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and storage medium, method comprises determining that the corresponding relationship by objective (RBO) of node relationships tuple;For each node relationships tuple of membership, the initial insertion vector of each node in the initial insertion vector of connection relationship between the node relationships tuple interior joint, the node relationships tuple is input in the membership model of predetermined heterogeneous information internet startup disk model;For each node relationships tuple of interactive relation, the initial insertion vector of each node in the initial insertion vector of connection relationship between the node relationships tuple interior joint, the node relationships tuple is input in the interactive relation model of predetermined heterogeneous information internet startup disk model;In heterogeneous information internet startup disk model value minimum, the target insertion vector of each node in heterogeneous information network to be processed is exported respectively.The present invention, which is realized, carries out targetedly internet startup disk analysis to node each in heterogeneous information network.

Description

A kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and storage medium
Technical field
The present invention relates to information technology fields, set more particularly to a kind of heterogeneous information internet startup disk method, apparatus, electronics Standby and storage medium.
Background technique
Heterogeneous information network (HIN, Heterogeneous Information Network) insertion is intended to polymorphic type Node is embedded into the vector space of a low dimensional.Due to internet startup disk can effectively learning network intrinsic characteristic implicit spy Sign, it provides a kind of novel angle for network analysis.
The method of existing heterogeneous information internet startup disk is, using any node in heterogeneous information network as starting point, prolongs and is somebody's turn to do Any first path random walk that node is connected, generates sequence node.Calculate the maximum phase of adjacent node in the sequence node Like property, realization obtains the insertion vector of each node in the sequence node.
However, it is found by the inventors that all nodes and side are used identical in existing heterogeneous information internet startup disk method Processing mode calculates similitude.But in actual scene, which includes a plurality of types of nodes and side, such as DBLP (DataBase systems and Logic Programming, Database Systems and programming in logic) science network be Example, such as a kind of heterogeneous information network topology structure figure of the embodiment of the present invention shown in FIG. 1, wherein including the section of four seed types Point: author (Author, A), paper (Paper, P), meeting (Conference, C) and keyword (Term, T).It is also wrapped in network Containing a plurality of types of relationships: writing/write relationship (writing/written) and delivered/is published relationship (publish/ Published) etc..In addition, there are also some complex relationships by first path representation, such as APA (cooperative relationship) and APC (author Publish thesis in meeting) etc..All nodes and side are directly calculated into similitude using identical processing mode, obtain each section The insertion vector of point, necessarily has ignored the self-characteristic of node, so that the result of obtained heterogeneous information internet startup disk can not Meets the needs of subsequent applications well.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and deposits Storage media, to realize that carrying out targetedly internet startup disk to node each in heterogeneous information network analyzes, so that obtained heterogeneous letter The result of breath internet startup disk preferably meets the needs of subsequent applications.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention discloses a kind of heterogeneous information internet startup disk methods, which comprises
Obtain the node type and each connection relation between nodes of each node for including in heterogeneous information network to be processed;
According to the node type and each connection relation between nodes of each node, each connection relation between nodes are determined And the corresponding relationship by objective (RBO) of node relationships tuple that corresponding node is formed;The relationship by objective (RBO) is that membership or interaction are closed System;
For each node relationships tuple of the membership, by between the node relationships tuple interior joint connection relationship just The initial insertion vector of each node, is input to predetermined heterogeneous information network in beginning insertion vector, the node relationships tuple In the membership model of incorporation model;
For each node relationships tuple of the interactive relation, by between the node relationships tuple interior joint connection relationship just The initial insertion vector of each node, is input to the predetermined heterogeneous information in beginning insertion vector, the node relationships tuple In the interactive relation model of internet startup disk model;
In the heterogeneous information internet startup disk model value minimum, export respectively each in the heterogeneous information network to be processed The target of node is embedded in vector.
Optionally, it according to the node type of each node and each connection relation between nodes, determines between each node The corresponding relationship by objective (RBO) of node relationships tuple that connection relationship and corresponding node are formed, comprising:
According to the node type and each connection relation between nodes of each node, calculates separately and connect between each node The average angle value of relationship corresponding node;
According to the size of the average angle value of the connection relation between nodes corresponding node, determine each connection relation between nodes with And the corresponding relationship by objective (RBO) of node relationships tuple that corresponding node is formed.
Optionally, the node type and each connection relation between nodes according to each node, determines each section The corresponding relationship by objective (RBO) of node relationships tuple that connection relationship and corresponding node are formed between point, comprising:
The node relationships tuple that each connection relation between nodes and corresponding node are formed, according to connection relation between nodes into Row classification;
For each type node relationship tuple, the sparse angle value of the type node relationships tuple is calculated;
According to the size of the sparse angle value, the relationship by objective (RBO) of all types of node relationships tuples is determined.
Optionally, the step of predefining the heterogeneous information internet startup disk model, comprising:
It obtains each positive sample of membership, each positive sample of interactive relation and presets each negative sample;The positive sample is sample There are the node relationships tuples of connection relation between nodes in this heterogeneous information network;The negative sample is the sample heterogeneous information The node relationships tuple of connection relation between nodes is not present in network;
For the membership, the similarity function that is subordinate to of each positive sample of the membership is determined respectively, and really Fixed each negative sample is subordinate to similarity function:
By the membership each positive sample be subordinate to similarity function, the negative sample is subordinate to similarity function, Determine the membership loss function;
For the interactive relation, the interaction similarity function of each positive sample of the interactive relation is determined respectively, and really The interaction similarity function of fixed each negative sample:
By the interaction similarity function of each positive sample of the interactive relation, the interaction similarity function of the negative sample, Determine the interactive relation loss function;
To the membership loss function, interactive relation loss function summation, heterogeneous information internet startup disk is obtained Model.
It is optionally, described to be subordinate to similarity function expression are as follows:
Wherein, f (p, q) indicates to be subordinate to similarity function between any membership positive sample interior joint;XpIndicate the person in servitude The initial insertion vector of category relationship positive sample interior joint p;XqIndicate the membership positive sample interior joint q initially be embedded in Amount;wpqIndicate the weighted value of the connection relation between nodes of the node p and the node q;
The membership loss function indicates are as follows:
Wherein, LEuARIndicate the membership loss function;s∈RARIt indicates to connect pass between node relationships tuple interior joint It is that s belongs to membership;<p,s,q>∈ PARIndicate membership positive sample;<p′,s,q′>∈ P 'ARIndicate any negative sample; γ indicates interval hyper parameter, γ > 0;Any membership positive sample of f (p, q) expression is subordinate to similarity function;F (p ', Q ') indicate any negative sample be subordinate to similarity function;
The interactive similarity function indicates are as follows:
G (u, v)=wU, v||Xu+Yr-Xv||
Wherein, g (u, v) indicates to be subordinate to similarity function between any interactive relation positive sample interior joint;XuIndicate the friendship The initial insertion vector of mutual relation positive sample interior joint u;XvIndicate the interactive relation positive sample interior joint v initially be embedded in Amount;YrIndicate the initial insertion vector of connection relationship r between the interactive relation positive sample interior joint;wU, vIndicate the node u with The weighted value of the connection relation between nodes of the node v;
The interactive relation loss function indicates are as follows:
Wherein, LTrIRIndicate the interactive relation loss function;r∈RIRIt indicates to connect pass between node relationships tuple interior joint It is that r belongs to interactive relation;<u,r,v>∈ PIRIndicate interactive relation positive sample;<u′,r,v′>∈ P 'IRIndicate any negative sample; γ indicates interval hyper parameter, γ > 0;F (u, v) indicates the interaction similarity function of any interactive relation positive sample;F (u ', V ') indicate the interaction similarity function of any negative sample.
Second aspect, the embodiment of the invention discloses a kind of heterogeneous information internet startup disk device, described device includes:
Nodal information obtain module, for obtain each node for including in heterogeneous information network to be processed node type, And each connection relation between nodes;
Relationship by objective (RBO) determining module, for according to each node node type and each node between connect close System determines the corresponding relationship by objective (RBO) of node relationships tuple that each connection relation between nodes and corresponding node are formed;The target Relationship is membership or interactive relation;
Membership node relationships tuple input module will for being directed to each node relationships tuple of the membership The initial insertion vector of connection relationship between the node relationships tuple interior joint, in the node relationships tuple each node initial insertion Vector is input in the membership model of predetermined heterogeneous information internet startup disk model;
Interactive relation node relationships tuple input module will for being directed to each node relationships tuple of the interactive relation The initial insertion vector of connection relationship between the node relationships tuple interior joint, in the node relationships tuple each node initial insertion Vector is input in the interactive relation model of the predetermined heterogeneous information internet startup disk model;
Target is embedded in vector output module, for being exported respectively in the heterogeneous information internet startup disk model value minimum The target of each node is embedded in vector in the heterogeneous information network to be processed.
Optionally, the relationship by objective (RBO) determining module, comprising:
Average angle value computational submodule, for according to each node node type and each node between connect and close System, calculates separately the average angle value of each connection relation between nodes corresponding node;
First object relationship determines submodule, for the average angle value according to the connection relation between nodes corresponding node Size determines the corresponding relationship by objective (RBO) of node relationships tuple that each connection relation between nodes and corresponding node are formed.
Optionally, the relationship by objective (RBO) determining module, comprising:
Node relationships tuple classification submodule, for closing the node of each connection relation between nodes and corresponding node formation It is tuple, classifies according to connection relation between nodes;
Sparse angle value computational submodule calculates the type node relationships member for being directed to each type node relationship tuple The sparse angle value of group;
Second relationship by objective (RBO) determines submodule, for the size according to the sparse angle value, determines all types of node relationships The relationship by objective (RBO) of tuple.
Optionally, described device further include:
Sample acquisition module, for obtaining each positive sample of membership, each positive sample of interactive relation and presetting each negative sample This;The positive sample is that there are the node relationships tuples of connection relation between nodes in sample heterogeneous information network;The negative sample For the node relationships tuple that connection relation between nodes are not present in the sample heterogeneous information network;
It is subordinate to similarity function determining module, for being directed to the membership, determines that the membership is each just respectively Sample is subordinate to similarity function, and determine each negative sample be subordinate to similarity function:
Membership loss function determining module, for being subordinate to similitude letter by each positive sample of the membership Several, the described negative sample is subordinate to similarity function, determines the membership loss function;
Interaction similarity function determining module determines that the interactive relation is each just for being directed to the interactive relation respectively The interaction similarity function of sample, and determine the interaction similarity function of each negative sample:
Interactive relation loss function determining module, for the interaction similitude letter by each positive sample of the interactive relation The interaction similarity function of several, the described negative sample determines the interactive relation loss function;
Heterogeneous information internet startup disk model determining module, for the membership loss function, the interactive relation Loss function summation, obtains heterogeneous information internet startup disk model.
It is optionally, described to be subordinate to similarity function expression are as follows:
Wherein, f (p, q) indicates to be subordinate to similarity function between any membership positive sample interior joint;XpIndicate the person in servitude The initial insertion vector of category relationship positive sample interior joint p;XqIndicate the membership positive sample interior joint q initially be embedded in Amount;wpqIndicate the weighted value of the connection relation between nodes of the node p and the node q;
The membership loss function indicates are as follows:
Wherein, LEuARIndicate the membership loss function;s∈RARIt indicates to connect pass between node relationships tuple interior joint It is that s belongs to membership;<p,s,q>∈ PARIndicate membership positive sample;<p′,s,q′>∈ P 'ARIndicate any negative sample; γ indicates interval hyper parameter, γ > 0;Any membership positive sample of f (p, q) expression is subordinate to similarity function;F (p ', Q ') indicate any negative sample be subordinate to similarity function;
The interactive similarity function indicates are as follows:
G (u, v)=wU, v||Xu+Yr-Xv||
Wherein, g (u, v) indicates to be subordinate to similarity function between any interactive relation positive sample interior joint;XuIndicate the friendship The initial insertion vector of mutual relation positive sample interior joint u;XvIndicate the interactive relation positive sample interior joint v initially be embedded in Amount;YrIndicate the initial insertion vector of connection relationship r between the interactive relation positive sample interior joint;wU, vIndicate the node u with The weighted value of the connection relation between nodes of the node v;
The interactive relation loss function indicates are as follows:
Wherein, LTrIRIndicate the interactive relation loss function;r∈RIRIt indicates to connect pass between node relationships tuple interior joint It is that r belongs to interactive relation;<u,r,v>∈ PIRIndicate interactive relation positive sample;<u′,r,v′>∈ P 'IRIndicate any negative sample; γ indicates interval hyper parameter, γ > 0;F (u, v) indicates the interaction similarity function of any interactive relation positive sample;F (u ', V ') indicate the interaction similarity function of any negative sample.
The third aspect, the embodiment of the invention discloses a kind of electronic equipment, including processor, communication interface, memory and Communication bus, wherein, the processor, the communication interface, the memory completed by the communication bus it is mutual Communication;
The memory, for storing computer program;
The processor when for executing the program stored on memory, realizes above-mentioned heterogeneous information internet startup disk side Any method and step in method.
Another aspect, the embodiment of the invention discloses a kind of computer readable storage medium, the computer-readable storage Dielectric memory contains computer program, when the computer program is executed by processor, realizes above-mentioned heterogeneous information internet startup disk Any method and step in method.
Another aspect, the embodiment of the invention discloses a kind of computer program products comprising instruction, when it is in computer When upper operation, so that computer executes any method and step in above-mentioned heterogeneous information internet startup disk method.
In a kind of heterogeneous information internet startup disk method, apparatus of the embodiment of the present invention, electronic equipment and storage medium, lead to Cross relational structure feature in analysis heterogeneous network, and then the node relationships that each connection relation between nodes and corresponding node are formed Tuple is divided into membership or interactive relation.Due to the similar characteristic of the nodes sharing of membership, the section of membership connection Point can be directly close to each other, and the present invention is provided with membership model to this;Node between interactive relation is shown as strongly Interactive relation, the present invention are provided with interactive relation model to this.It is obtained by joint membership model and interactive relation model To heterogeneous information internet startup disk model, and then the minimum value of the heterogeneous information internet startup disk model is solved, realization obtains to be processed The target of each node is embedded in vector in heterogeneous information network.The present invention is based on the architectural characteristics of heterogeneous information network itself, realize Targetedly internet startup disk is carried out to node each in heterogeneous information network to analyze, so that obtained heterogeneous information internet startup disk As a result preferably meets the needs of subsequent applications.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of heterogeneous information network topology structure of the embodiment of the present invention;
Fig. 2 is the method structure chart for obtaining the insertion vector of node using single model in the prior art;
Fig. 3 is the method structure chart for obtaining the insertion vector of node in the embodiment of the present invention using multiple specific aim models;
Fig. 4 is a kind of heterogeneous information internet startup disk method flow diagram of the embodiment of the present invention;
Fig. 5 is heterogeneous information network data table in a kind of heterogeneous information internet startup disk method of the embodiment of the present invention;
Fig. 6 is a kind of heterogeneous information internet startup disk apparatus structure schematic diagram of the embodiment of the present invention;
Fig. 7 is a kind of electronic equipment structural schematic diagram of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Due to internet startup disk can effectively learning network intrinsic characteristic hidden feature, it provides one for network analysis The novel angle of kind.Such as a kind of heterogeneous information network topology structure of the embodiment of the present invention shown in FIG. 1, wherein including four kinds The node of type: author (Author, A), paper (Paper, P), meeting (Conference, C) and keyword (Term, T), with And connection relation between nodes, writing/write relationship (writing/written) and delivered/are published relationship (publish/ Published) etc..
For the heterogeneity of modeled network, the method for existing heterogeneous information internet startup disk is, in heterogeneous information network Any node is starting point, prolongs any first path random walk being connected with the node, generates sequence node.Calculate the node sequence The maximum comparability of adjacent node in column, realization obtain the insertion vector of each node in the sequence node.In addition there are also some Method neural network based, these methodologies commonly use the nonlinear mapping function in internet startup disk, and then obtain each node Insertion vector.Although these methods consider the heterogeneity of network, they usually have a hypothesis: by making two The expression of node is close to each other, and single model can handle all relationship and node, and as shown in Figure 2 makes in the prior art The method structure chart of the insertion vector of node is obtained with single model.
As shown in Figure 1, which includes atomic relation (e.g., AP and PC) and compositive relation (e.g., APA And APC).It is apparent that AP relationship and PC relationship show significantly different structure feature.That is, in AP relationship, it is some Author has write some papers, and this demonstrate the equity knots of peer-to-peer a kind of.And in PC relationship, many paper publishings in The same meeting, which reflects the structure features of one-centered-by-another a kind of.Similarly, APA and APC difference Illustrate the structure feature of peer-to-peer and one-centered-by-another.Directly pass through single mould shown in Fig. 2 There is the similitude of the node of connection relationship between type calculating adjacent node, necessarily has ignored the self-characteristic of node, so that obtain The result of heterogeneous information internet startup disk can not meet the needs of subsequent applications well.
Therefore, relationship in heterogeneous information network is explored first by thorough mathematical analysis in the embodiment of the present invention Structure feature, and propose the relevant measure of two structures.The two methods can by the relationship consistency of multiplicity be drawn It is divided into two classes: indicates the membership (Affiliation Relations, ARs) of one-centered-by-another structure With the interactive relation (Interaction Relations, IRs) for indicating peer-to-peer structure.In order to have in capture relationship The structure feature of difference, the embodiment of the present invention are provided with membership model, pass through for the node relationships tuple of membership Euclidean distance between membership model calculate node relationship tuple interior joint guarantees node direct phase in implicit space with this Closely.On the other hand, pass through interaction provided with interactive relation model for bridging the interactive relation relationship of two peer nodes and close It is the translation that the node relationships tuple of this kind of relationship is modeled as between node by model.Due to membership model and interactive relation mould Type is consistent in mathematical form, they combined optimization can be solved in a uniform manner, and then obtain the embodiment of the present invention Heterogeneous information internet startup disk model, finally optimize the heterogeneous information internet startup disk model, obtain the heterogeneous information internet startup disk The target of each node is embedded in vector in the corresponding heterogeneous information network to be processed of model minimum value.The present invention as shown in Figure 3 is real Apply the method structure chart for obtaining the insertion vector of node in example using multiple specific aim models.Specific embodiment is as follows:
In a first aspect, the embodiment of the invention discloses a kind of heterogeneous information internet startup disk methods, as shown in Figure 4.Fig. 4 is this A kind of heterogeneous information internet startup disk method flow diagram of inventive embodiments, method include:
S401 is obtained and is connected between the node type and each node for each node for including in heterogeneous information network to be processed Relationship.
One heterogeneous information network is generally defined as a figureWherein V and E is section respectively The set of point and side.There are their Type mapping functions by each node v and side e: φ: V → TVWithTVWith TEIndicate the set of the type on node and side, wherein | TV|+|TE| > 2 and T=TV∪TE.Heterogeneous information internet startup disk is represented to A fixed heterogeneous information networkAnd then learn a mapping functionIt can be incited somebody to action Each node v ∈ V is mapped to the vector space of a low-dimensionalWherein d " | V |.
In this step, according to the topological structure of heterogeneous information network to be processed, each node for including in topological structure is obtained Node type and each connection relation between nodes.Or the tables of data according to heterogeneous information network to be processed, obtain tables of data In include each node node type and each connection relation between nodes.
Such as heterogeneous information network data in a kind of heterogeneous information internet startup disk method of the embodiment of the present invention shown in fig. 5 Table.Include academic heterogeneous information network DBLP, social activity heterogeneous information network Yelp and academic heterogeneous information net in the tables of data Network Aminer.Wherein, the node type of each node of DBLP are as follows: author (Author, A), paper (Paper, P), meeting (Conference, C) and keyword (Term, T), each connection relation between nodes are as follows: { AP, PC, PT, APC, APT };Yelp's is each The node type of node are as follows: user (User, U), shop (Business, B), predefined type (Reservation, R), service class Type (Service, S) and star (Star Level, L), each connection relation between nodes are as follows: { BR, BS, BL, UB, BUB };AMiner Each node node type are as follows: author (Author, A), paper (Paper, P), meeting (Conference, C) and reference (Reference, R);Each connection relation between nodes are { AP, PC, PR, APC, APR }.
In this step, the node type of each node of DBLP in the heterogeneous information network data table: author can be obtained It is connected between (Author, A), paper (Paper, P), meeting (Conference, C) and keyword (Term, T) and each node Relationship: { AP, PC, PT, APC, APT }.Wherein, AP indicates that author A writing paper P or paper P writes relationship by author A;PC The P or paper P that publishes thesis on expression meeting C is published on meeting C;PT indicates that paper P includes keyword T;
APC indicates that the paper P of author A writing is published on meeting C;APT indicates that author A writes paper P, includes keyword TS402 determines each connection relation between nodes and corresponding section according to the node type and each connection relation between nodes of each node The corresponding relationship by objective (RBO) of node relationships tuple that point is formed;Relationship by objective (RBO) is membership or interactive relation.
In the embodiment of the present invention, in a heterogeneous information network, connection relation between nodes R include atomic relation (for example, Link) and compositive relation (for example, first path, first path definition is for one by node typeOr side typeIt constitutes SequenceIt is abbreviated asFirst path description node v1And vl+1Between Complex relationship).One node relationships tuple can be described as<u,r,v>, describe the connection relation between nodes of two nodes u and v For r;<u,r,v>∈ P, P indicate node relationships tuple-set.For example, in Fig. 1 < a2, APC, c2> it is a node relationships tuple, Indicate a1It has write a paper and has been published in c2On.
In this step, the node relationships tuple formed to each connection relation between nodes and corresponding node is carried out targetedly Analysis, determines that each node relationships tuple is membership or interactive relation.
Optionally, each node is determined according to the node type of each node and each connection relation between nodes in above-mentioned S202 Between the corresponding relationship by objective (RBO) of node relationships tuple that is formed of connection relationship and corresponding node, comprising:
Step 1 is calculated separately and is connected between each node according to the node type and each connection relation between nodes of each node The average angle value of relationship corresponding node;
Since the degree of node can be well reflected the structure of network, the measurement D based on degree is defined in the present invention (r) otherness of relationship in heterogeneous information network is studied.Specifically, two types connected by connection relation between nodes r are calculated The average degree of type node.
Formally, a connection relation between nodes r and node u and v are given (that is, node relationships tuple<u,r,v>), tuAnd tvThe node type of node u and node v respectively, then in this step can between calculate node connection relationship corresponding node it is flat Equal angle value D (r), specific formula for calculation are as follows:
Wherein,Be node type be tuNode u average angle value;Be node type be tvNode v it is flat Equal angle value.
Step 2, according to the size of the average angle value of connection relation between nodes corresponding node, connection is closed between determining each node The corresponding relationship by objective (RBO) of node relationships tuple that system and corresponding node are formed.
The numerical value of the average angle value D (r) of connection relation between nodes corresponding node is big, and expression is connected by connection relation between nodes r A kind of structure (one-centered-by-another) of suitable non-equivalence between the node u connect and node v.And D (r) Numerical value is small, shows a kind of structure (peer-to-peer) of equity.In other words, the big relationship of D (r) numerical value shows very strong person in servitude Category relationship, by the usually shared more similar characteristics of the node of this class connection relation between nodes connection;The small relationship of D (r) numerical value Illustrate a kind of great interactive relation.
It therefore, can be according to the size of the average angle value of connection relation between nodes corresponding node, by each node in this step Relationship tuple correspondence is divided into membership (Affiliation Relations, AR) or interactive relation (Interaction Relations, IR).
Architectural difference between a variety of relationships in order to better understand, can illustrate by taking DBLP in Fig. 5 as an example.As shown in Figure 5, For PC relationship, D (PC)=718.8, the average degree for the node that type is P is 1.0;Type is that the average degree of the node of C is 718.8.This shows that paper and meeting are not reciprocity, the circular meeting of paper in structure.Different, D (AP)=1.0 indicates to make It is a kind of reciprocity (peer-to-peer) structural relation between person and paper, the common sense of this and we are also consistent.In language In justice, PC relationship indicates " paper publishing is in meeting ", implies a kind of membership AR, and AP relationship indicates that " author writes opinion Text ", significantly describes a kind of interactive relation IR.
Optionally, each node is determined according to the node type of each node and each connection relation between nodes in above-mentioned S402 Between the corresponding relationship by objective (RBO) of node relationships tuple that is formed of connection relationship and corresponding node, comprising:
Step a, the node relationships tuple that each connection relation between nodes and corresponding node are formed, connects according between node Relationship is classified;
In addition, can also be divided by the sparse angle value of node relationships tuple the structure of network in the embodiment of the present invention Analysis.
In this step, the node relationships tuple that each connection relation between nodes and corresponding node are formed, according between node Connection relationship is classified.
For example, there are node relationships tuple < a1, AP, p1 in heterogeneous information network topological diagram shown in FIG. 1), < a2, AP, P4>,<a1, APC, c1>;<a3, APC, c2>, then it can classify according to connection relation between nodes to node relationships tuple are as follows: AP:< A1, AP, p1>,<a2, AP, p4>;APC:<a1, APC, c1>;<a3, APC, c2>.
Step b calculates the sparse angle value of the type node relationships tuple for each type node relationship tuple;
The sparse angle value S (r) of the type node relationships tuple can be calculated in this step by following formula:
Wherein, NrIndicate the quantity of the type node relationships tuple,Expression node type is tuNode quantity;Expression node type is tvNode quantity.
The sparse angle value of each type node relationships tuple can be calculated according to above-mentioned calculation formula.
Step c determines the relationship by objective (RBO) of all types of node relationships tuples according to the size of sparse angle value.
It can illustrate by taking DBLP in Fig. 5 as an example, S (PC)=0.05;S (AP)=0.0002.Semantically, PC relationship is indicated " paper publishing is in meeting " implies a kind of membership AR, and AP relationship indicates " author writes paper ", significantly describes A kind of interactive relation IR.
It is apparent that membership (AR) and interactive relation (IR) show significantly different feature: (1) AR illustrates one- The structure of centered-by-another, the average degree value difference of two class nodes in relationship is different very big, sparse angle value compared with Greatly.(2) IR describes the structure of peer-to-peer, and the average degree of two class nodes in relationship is reciprocity, sparse angle value It is smaller.
S403, for each node relationships tuple of membership, by connection relationship between the node relationships tuple interior joint The initial insertion vector of each node, is input to predetermined heterogeneous information net in initial insertion vector, the node relationships tuple In the membership model of network incorporation model.
The node relationships tuple for including in heterogeneous information network to be processed is divided into membership AR in above-mentioned S402 And interactive relation IR.AR, which is demonstrated by between node, is subordinate to structure, shows that the nodes sharing connected by this class relationship is similar Characteristic, therefore, in the embodiment of the present invention for AR be provided with membership model, indicate vector space in, connected by AR Node can be directly close to each other, the optimization aim of this and Euclidean distance is also consistent.IR is shown between peer node Strong interactive relation, relationship itself contains important structural information between node.Therefore, it is set in the embodiment of the present invention for IR Interactive relation model is set, by translating operation of the IR relationship modeling between node.
In addition, distance and Euclidean distance based on translation are consistent in mathematical form, therefore they can be easily In conjunction with and combine Optimization Solution, and then obtain the heterogeneous information internet startup disk model of the embodiment of the present invention.It specifically pre-establishes different The method of matter information network incorporation model, following embodiment are described in detail.
In this step, for each node relationships tuple of membership, closed being connected between the node relationships tuple interior joint The initial insertion vector of system, in the node relationships tuple each node initial insertion vector, be input to predetermined heterogeneous letter In the membership model for ceasing internet startup disk model.
S404, for each node relationships tuple of interactive relation, by connection relationship between the node relationships tuple interior joint The initial insertion vector of each node, is input to predetermined heterogeneous information net in initial insertion vector, the node relationships tuple In the interactive relation model of network incorporation model.
S405 is exported respectively save in heterogeneous information network to be processed respectively in heterogeneous information internet startup disk model value minimum The target of point is embedded in vector.
In a kind of heterogeneous information internet startup disk method of the embodiment of the present invention, pass through relational structure in analysis heterogeneous network Feature, and then the node relationships tuple that each connection relation between nodes and corresponding node are formed is divided into membership or interaction pass System.Due to the similar characteristic of the nodes sharing of membership, the node of membership connection can be directly close to each other, the present invention Membership model is provided with to this;Node between interactive relation shows as strong interactive relation, and the present invention is provided with this Interactive relation model.Heterogeneous information internet startup disk model is obtained by joint membership model and interactive relation model, into And the minimum value of the heterogeneous information internet startup disk model is solved, realization obtains the target of each node in heterogeneous information network to be processed It is embedded in vector.The present invention is based on the architectural characteristic of heterogeneous information network itself, realize to node each in heterogeneous information network into Row targetedly analyze by internet startup disk, so that the result of obtained heterogeneous information internet startup disk preferably meets the need of subsequent applications It asks.
Optionally, in a kind of embodiment of heterogeneous information internet startup disk method of the present invention, heterogeneous information net is predefined The step of network incorporation model, comprising:
Step A obtains each positive sample of membership, each positive sample of interactive relation and presets each negative sample;Positive sample is There are the node relationships tuples of connection relation between nodes in sample heterogeneous information network;Negative sample is in sample heterogeneous information network There is no the node relationships tuples of connection relation between nodes;
As shown in table 1, the distribution of AR and IR is quite unbalanced, and two class relationships include node relationships tuple in Connection relation between nodes distribution is also unbalanced.Traditional may result in the while over-sampling of negligible amounts in sampling, and count Measure more side lack sampling.In order to solve this problem, according to connection relationship between node relationships tuple interior joint in the present invention Probability distribution samples positive sample, and positive sample is that there are the node relationships of connection relation between nodes members in sample heterogeneous information network Group.
For negative sample, negative node relationships tuple-set P ' can be pre-establishedU, r, v=(u ', r, v) | and u ' ∈ V } ∪ (u, r, v ') | and v ' ∈ V }, by head node in random replacement node relationships tuple or replacement tail node, but head is not replaced simultaneously Node or tail node, and then preset each negative sample is obtained, negative sample is that there is no connect between node in sample heterogeneous information network Connect the node relationships tuple of relationship.
Step B determines the similarity function that is subordinate to of each positive sample of membership for membership respectively, and determines Each negative sample is subordinate to similarity function:
The similar characteristic of nodes sharing connected by membership AR, therefore in indicating vector space, can directly it make Node is close to each other.It therefore, can be using Euclidean distance as being subordinate to similarity function between node metric in the embodiment of the present invention Foundation.
Optionally, it is subordinate to similarity function expression are as follows:
Wherein, f (p, q) indicates to be subordinate to similarity function between any membership positive sample interior joint;XpExpression is subordinate to pass It is the initial insertion vector of positive sample interior joint p;XqIndicate the initial insertion vector of membership positive sample interior joint q;wpqTable Show the weighted value of the connection relation between nodes of node p and node q;
It is subordinate to similarity function between all membership positive sample interior joints can be represented by above-mentioned formula, and by upper It states between formula represents all negative sample interior joints and is subordinate to similarity function.
Step C, by membership each positive sample be subordinate to similarity function, negative sample is subordinate to similarity function, really Determine membership loss function;
Optionally, membership loss function indicates are as follows:
Wherein, LEuARIndicate membership loss function;s∈RARConnection relationship s between expression node relationships tuple interior joint Belong to membership;<p,s,q>∈ PARIndicate membership positive sample;<p′,s,q′>∈ P 'ARIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;Any membership positive sample of f (p, q) expression is subordinate to similarity function;F (p ', q ') is indicated Any negative sample is subordinate to similarity function.
Step D determines the interaction similarity function of each positive sample of interactive relation, and determine for interactive relation respectively The interaction similarity function of each negative sample.
Optionally, interaction similarity function indicates are as follows:
G (u, v)=wU, v||Xu+Yr-Xv||
Wherein, g (u, v) indicates to be subordinate to similarity function between any interactive relation positive sample interior joint;XuIndicate that interaction is closed It is the initial insertion vector of positive sample interior joint u;XvIndicate the initial insertion vector of interactive relation positive sample interior joint v;YrIt indicates The initial insertion vector of connection relationship r between interactive relation positive sample interior joint;wU, vIt indicates to connect between node u and the node of node v The weighted value of relationship.
Similarity function is interacted between all membership positive sample interior joints can be represented by above-mentioned formula, and by upper It states formula and represents interaction similarity function between all negative sample interior joints.
Step E, by the interaction similarity function of each positive sample of interactive relation, the interaction similarity function of negative sample, really Determine interactive relation loss function.
Optionally, interactive relation loss function indicates are as follows:
Wherein, LTrIRIndicate interactive relation loss function;r∈RIRConnection relationship r between expression node relationships tuple interior joint Belong to interactive relation;<u,r,v>∈ PIRIndicate interactive relation positive sample;<u′,r,v′>∈ P 'IRIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;F (u, v) indicates the interaction similarity function of any interactive relation positive sample;F (u ', v ') is indicated The interaction similarity function of any negative sample.
Step F sums to membership loss function, interactive relation loss function, obtains heterogeneous information internet startup disk mould Type.
Optionally, heterogeneous information internet startup disk model L can be expressed as follows:
The embodiment of the present invention analyzes relational structure feature in heterogeneous information network, and it is relevant to propose two structures Heterogeneous relation is consistently divided into membership and interactive relation, and then is respectively provided with membership model by measure With interactive relation model, the different of novel relational structure perception has been obtained by joint membership model and interactive relation model Matter information network incorporation model.By solving the minimum value of the heterogeneous information internet startup disk model, realization obtains to be processed heterogeneous The target of each node is embedded in vector in information network.In addition, the embodiment of the present invention is adequately tested by three kinds of data sets of Fig. 5 And demonstrate the validity of heterogeneous information internet startup disk method of the present invention.The experimental results showed that the embodiment of the present invention is in multiple numbers According to the performance in mining task, it is better than existing internet startup disk method significantly.
Second aspect, the embodiment of the invention discloses a kind of heterogeneous information internet startup disk devices, as shown in Figure 6.Fig. 6 is this A kind of heterogeneous information internet startup disk apparatus structure schematic diagram of inventive embodiments, device include:
Nodal information obtains module 601, for obtaining the node class for each node for including in heterogeneous information network to be processed Type and each connection relation between nodes;
Relationship by objective (RBO) determining module 602 is determined for the node type and each connection relation between nodes according to each node The corresponding relationship by objective (RBO) of node relationships tuple that each connection relation between nodes and corresponding node are formed;Relationship by objective (RBO) is to be subordinate to pass System or interactive relation;
Membership node relationships tuple input module 603, for being directed to each node relationships tuple of membership, by this The initial insertion vector of connection relationship between node relationships tuple interior joint, in the node relationships tuple each node be initially embedded in Amount, is input in the membership model of predetermined heterogeneous information internet startup disk model;
Interactive relation node relationships tuple input module 604, for being directed to each node relationships tuple of interactive relation, by this The initial insertion vector of connection relationship between node relationships tuple interior joint, in the node relationships tuple each node be initially embedded in Amount, is input in the interactive relation model of predetermined heterogeneous information internet startup disk model;
Target is embedded in vector output module 605, in heterogeneous information internet startup disk model value minimum, export respectively to The target for handling each node in heterogeneous information network is embedded in vector.
In a kind of heterogeneous information internet startup disk device of the embodiment of the present invention, pass through relational structure in analysis heterogeneous network Feature, and then the node relationships tuple that each connection relation between nodes and corresponding node are formed is divided into membership or interaction pass System.Due to the similar characteristic of the nodes sharing of membership, the node of membership connection can be directly close to each other, the present invention Membership model is provided with to this;Node between interactive relation shows as strong interactive relation, and the present invention is provided with this Interactive relation model.Heterogeneous information internet startup disk model is obtained by joint membership model and interactive relation model, into And the minimum value of the heterogeneous information internet startup disk model is solved, realization obtains the target of each node in heterogeneous information network to be processed It is embedded in vector.The present invention is based on the architectural characteristic of heterogeneous information network itself, realize to node each in heterogeneous information network into Row targetedly analyze by internet startup disk, so that the result of obtained heterogeneous information internet startup disk preferably meets the need of subsequent applications It asks.
Optionally, in a kind of embodiment of heterogeneous information internet startup disk device of the present invention, relationship by objective (RBO) determining module 602, comprising:
Average angle value computational submodule, for the node type and each connection relation between nodes according to each node, respectively Calculate the average angle value of each connection relation between nodes corresponding node;
First object relationship determines submodule, for the big of the average angle value according to connection relation between nodes corresponding node It is small, determine the corresponding relationship by objective (RBO) of node relationships tuple that each connection relation between nodes and corresponding node are formed.
Optionally, in a kind of embodiment of heterogeneous information internet startup disk device of the present invention, relationship by objective (RBO) determining module 602, comprising:
Node relationships tuple classification submodule, for closing the node of each connection relation between nodes and corresponding node formation It is tuple, classifies according to connection relation between nodes;
Sparse angle value computational submodule calculates the type node relationships member for being directed to each type node relationship tuple The sparse angle value of group;
Second relationship by objective (RBO) determines submodule, for the size according to sparse angle value, determines all types of node relationships tuples Relationship by objective (RBO).
Optionally, in a kind of embodiment of heterogeneous information internet startup disk device of the present invention, device further include:
Sample acquisition module, for obtaining each positive sample of membership, each positive sample of interactive relation and presetting each negative sample This;Positive sample is that there are the node relationships tuples of connection relation between nodes in sample heterogeneous information network;Negative sample is that sample is different The node relationships tuple of connection relation between nodes is not present in matter information network;
It is subordinate to similarity function determining module, for being directed to membership, determines the person in servitude of each positive sample of membership respectively Similarity of genera function, and determine each negative sample are subordinate to similarity function:
Membership loss function determining module, for being subordinate to similarity function, negative by each positive sample of membership Sample is subordinate to similarity function, determines membership loss function;
Interaction similarity function determining module determines the friendship of each positive sample of interactive relation for being directed to interactive relation respectively Mutual similarity function, and determine the interaction similarity function of each negative sample:
Interactive relation loss function determining module, for passing through the interaction similarity function of each positive sample of interactive relation, bearing The interaction similarity function of sample, determines interactive relation loss function;
Heterogeneous information internet startup disk model determining module, for membership loss function, interactive relation loss function Summation, obtains heterogeneous information internet startup disk model.
Optionally, in a kind of embodiment of heterogeneous information internet startup disk device of the present invention, it is subordinate to similarity function expression Are as follows:
Wherein, f (p, q) indicates to be subordinate to similarity function between any membership positive sample interior joint;XpExpression is subordinate to pass It is the initial insertion vector of positive sample interior joint p;XqIndicate the initial insertion vector of membership positive sample interior joint q;wpqTable Show the weighted value of the connection relation between nodes of node p and node q;
Membership loss function indicates are as follows:
Wherein, LEuARIndicate membership loss function;s∈RARConnection relationship s between expression node relationships tuple interior joint Belong to membership;<p,s,q>∈ PARIndicate membership positive sample;< p ', s, q ') ∈ P 'ARIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;Any membership positive sample of f (p, q) expression is subordinate to similarity function;F (p ', q ') is indicated Any negative sample is subordinate to similarity function;
Interaction similarity function indicates are as follows:
G (u, v)=wU, v||Xu+Yr-Xv||
Wherein, g (u, v) indicates to be subordinate to similarity function between any interactive relation positive sample interior joint;XuIndicate that interaction is closed It is the initial insertion vector of positive sample interior joint u;XvIndicate the initial insertion vector of interactive relation positive sample interior joint v;YrIt indicates The initial insertion vector of connection relationship r between interactive relation positive sample interior joint;wU, vIt indicates to connect between node u and the node of node v The weighted value of relationship;
Interactive relation loss function indicates are as follows:
Wherein, LTrIRIndicate interactive relation loss function;r∈RIRConnection relationship r between expression node relationships tuple interior joint Belong to interactive relation;<u,r,v>∈ PIRIndicate interactive relation positive sample;<u′,r,v′>∈ P 'IRIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;F (u, v) indicates the interaction similarity function of any interactive relation positive sample;F (u ', v ') is indicated The interaction similarity function of any negative sample.
The third aspect, the embodiment of the invention discloses a kind of electronic equipment, as shown in Figure 7.Fig. 7 is the embodiment of the present invention A kind of electronic equipment structural schematic diagram, including processor 701, communication interface 702, memory 703 and communication bus 704, wherein, Processor 701, communication interface 702, memory 703 complete mutual communication by communication bus 704;
Memory 703, for storing computer program;
Processor 701 when for executing the program stored on memory, realizes following methods step:
Obtain the node type and each connection relation between nodes of each node for including in heterogeneous information network to be processed;
According to the node type and each connection relation between nodes of each node, each connection relation between nodes are determined And the corresponding relationship by objective (RBO) of node relationships tuple that corresponding node is formed;The relationship by objective (RBO) is that membership or interaction are closed System;
For each node relationships tuple of the membership, by between the node relationships tuple interior joint connection relationship just The initial insertion vector of each node, is input to predetermined heterogeneous information network in beginning insertion vector, the node relationships tuple In the membership model of incorporation model;
For each node relationships tuple of the interactive relation, by between the node relationships tuple interior joint connection relationship just The initial insertion vector of each node, is input to the predetermined heterogeneous information in beginning insertion vector, the node relationships tuple In the interactive relation model of internet startup disk model;
In the heterogeneous information internet startup disk model value minimum, export respectively each in the heterogeneous information network to be processed The target of node is embedded in vector.
The communication bus 704 that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus 704 can be divided into address bus, data/address bus, control bus etc..For Convenient for indicating, only indicated with a thick line in figure, it is not intended that an only bus or a type of bus.
Communication interface 702 is for the communication between above-mentioned electronic equipment and other equipment.
Memory 703 may include random access memory (Random Access Memory, RAM), also may include Nonvolatile memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory 703 can also be that at least one is located remotely from the storage device of aforementioned processor 701.
Above-mentioned processor 701 can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components.
In a kind of electronic equipment of the embodiment of the present invention, by relational structure feature in analysis heterogeneous network, and then will The node relationships tuple that each connection relation between nodes and corresponding node are formed is divided into membership or interactive relation.Due to being subordinate to The node of the similar characteristic of the nodes sharing of relationship, membership connection can be directly close to each other, and the present invention is provided with this Membership model;Node between interactive relation shows as strong interactive relation, and the present invention is provided with interactive relation mould to this Type.Heterogeneous information internet startup disk model is obtained by joint membership model and interactive relation model, and then it is different to solve this The minimum value of matter information network incorporation model is realized and obtains the target insertion vector of each node in heterogeneous information network to be processed. The present invention is based on the architectural characteristic of heterogeneous information network itself, realizes and node each in heterogeneous information network is carried out targetedly Internet startup disk analysis, so that the result of obtained heterogeneous information internet startup disk preferably meets the needs of subsequent applications.
Another aspect, the embodiment of the invention discloses a kind of computer readable storage medium, computer readable storage mediums It is inside stored with computer program, when computer program is executed by processor, realizes in above-mentioned heterogeneous information internet startup disk method and appoints One method and step.
It is special by relational structure in analysis heterogeneous network in a kind of computer readable storage medium of the embodiment of the present invention Sign, and then the node relationships tuple that each connection relation between nodes and corresponding node are formed is divided into membership or interaction pass System.Due to the similar characteristic of the nodes sharing of membership, the node of membership connection can be directly close to each other, the present invention Membership model is provided with to this;Node between interactive relation shows as strong interactive relation, and the present invention is provided with this Interactive relation model.Heterogeneous information internet startup disk model is obtained by joint membership model and interactive relation model, into And the minimum value of the heterogeneous information internet startup disk model is solved, realization obtains the target of each node in heterogeneous information network to be processed It is embedded in vector.The present invention is based on the architectural characteristic of heterogeneous information network itself, realize to node each in heterogeneous information network into Row targetedly analyze by internet startup disk, so that the result of obtained heterogeneous information internet startup disk preferably meets the need of subsequent applications It asks.
Another aspect, the embodiment of the invention discloses a kind of computer program products comprising instruction, when it is in computer When upper operation, so that computer executes method and step any in above-mentioned heterogeneous information internet startup disk method.
In a kind of computer program product comprising instruction of the embodiment of the present invention, pass through relationship in analysis heterogeneous network Structure feature, and then the node relationships tuple that each connection relation between nodes and corresponding node are formed is divided into membership or friendship Mutual relation.Due to the similar characteristic of the nodes sharing of membership, the node of membership connection can be directly close to each other, this Invention is provided with membership model to this;Node between interactive relation shows as strong interactive relation, and the present invention sets this Interactive relation model is set.Heterogeneous information internet startup disk mould is obtained by joint membership model and interactive relation model Type, and then the minimum value of the heterogeneous information internet startup disk model is solved, realization obtains each node in heterogeneous information network to be processed Target be embedded in vector.The present invention is based on the architectural characteristic of heterogeneous information network itself, realize to each in heterogeneous information network Node carries out targetedly internet startup disk and analyzes, so that the result of obtained heterogeneous information internet startup disk preferably meets subsequent answer Demand.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment and storage medium embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, phase Place is closed to illustrate referring to the part of embodiment of the method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of heterogeneous information internet startup disk method, which is characterized in that the described method includes:
Obtain the node type and each connection relation between nodes of each node for including in heterogeneous information network to be processed;
According to the node type and each connection relation between nodes of each node, determine each connection relation between nodes and The corresponding relationship by objective (RBO) of node relationships tuple that corresponding node is formed;The relationship by objective (RBO) is membership or interactive relation;
For each node relationships tuple of the membership, by between the node relationships tuple interior joint connection relationship it is initial embedding The initial insertion vector of each node, is input to predetermined heterogeneous information internet startup disk in incoming vector, the node relationships tuple In the membership model of model;
For each node relationships tuple of the interactive relation, by between the node relationships tuple interior joint connection relationship it is initial embedding The initial insertion vector of each node in incoming vector, the node relationships tuple, is input to the predetermined heterogeneous information network In the interactive relation model of incorporation model;
In the heterogeneous information internet startup disk model value minimum, each node in the heterogeneous information network to be processed is exported respectively Target be embedded in vector.
2. heterogeneous information internet startup disk method according to claim 1, which is characterized in that according to the node of each node Type and each connection relation between nodes determine the node relationships member that each connection relation between nodes and corresponding node are formed The corresponding relationship by objective (RBO) of group, comprising:
According to the node type and each connection relation between nodes of each node, each connection relation between nodes are calculated separately The average angle value of corresponding node;
According to the size of the average angle value of the connection relation between nodes corresponding node, each connection relation between nodes and right are determined The corresponding relationship by objective (RBO) of node relationships tuple for answering node to be formed.
3. heterogeneous information internet startup disk method according to claim 1, which is characterized in that described according to each node Node type and each connection relation between nodes, the node for determining that each connection relation between nodes and corresponding node are formed close It is the corresponding relationship by objective (RBO) of tuple, comprising:
The node relationships tuple that each connection relation between nodes and corresponding node are formed, is divided according to connection relation between nodes Class;
For each type node relationship tuple, the sparse angle value of the type node relationships tuple is calculated;
According to the size of the sparse angle value, the relationship by objective (RBO) of all types of node relationships tuples is determined.
4. heterogeneous information internet startup disk method according to claim 1, which is characterized in that predefine the heterogeneous information The step of internet startup disk model, comprising:
It obtains each positive sample of membership, each positive sample of interactive relation and presets each negative sample;The positive sample is that sample is different There are the node relationships tuples of connection relation between nodes in matter information network;The negative sample is the sample heterogeneous information network In be not present connection relation between nodes node relationships tuple;
For the membership, the similarity function that is subordinate to of each positive sample of the membership is determined respectively, and is determined each The negative sample is subordinate to similarity function:
By each positive sample of the membership be subordinate to similarity function, the negative sample is subordinate to similarity function, determine The membership loss function;
For the interactive relation, the interaction similarity function of each positive sample of the interactive relation is determined respectively, and is determined each The interaction similarity function of the negative sample:
By the interaction similarity function of each positive sample of the interactive relation, the interaction similarity function of the negative sample, determine The interactive relation loss function;
To the membership loss function, interactive relation loss function summation, heterogeneous information internet startup disk model is obtained.
5. heterogeneous information internet startup disk method according to claim 4, which is characterized in that described to be subordinate to similarity function table It is shown as:
Wherein, f (p, q) indicates to be subordinate to similarity function between any membership positive sample interior joint;XpIndicate the membership The initial insertion vector of positive sample interior joint p;XqIndicate the initial insertion vector of the membership positive sample interior joint q;wpq Indicate the weighted value of the connection relation between nodes of the node p and the node q;
The membership loss function indicates are as follows:
Wherein, LEuARIndicate the membership loss function;s∈RARConnection relationship s between expression node relationships tuple interior joint Belong to membership;<p,s,q>∈ PARIndicate membership positive sample;<p′,s,q′>∈ P 'ARIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;Any membership positive sample of f (p, q) expression is subordinate to similarity function;F (p ', q ') Indicate any negative sample is subordinate to similarity function;
The interactive similarity function indicates are as follows:
G (u, v)=wu,v||Xu+Yr-Xv||
Wherein, g (u, v) indicates to be subordinate to similarity function between any interactive relation positive sample interior joint;XuIndicate the interactive relation The initial insertion vector of positive sample interior joint u;XvIndicate the initial insertion vector of the interactive relation positive sample interior joint v;YrTable Show the initial insertion vector of connection relationship r between the interactive relation positive sample interior joint;wU, vIndicate the node u and the section The weighted value of the connection relation between nodes of point v;
The interactive relation loss function indicates are as follows:
Wherein, LTrIRIndicate the interactive relation loss function;r∈RIRConnection relationship r between expression node relationships tuple interior joint Belong to interactive relation;<u,r,v>∈ PIRIndicate interactive relation positive sample;<u′,r,v′>∈ P 'IRIndicate any negative sample;γ table Show interval hyper parameter, γ > 0;F (u, v) indicates the interaction similarity function of any interactive relation positive sample;F (u ', v ') Indicate the interaction similarity function of any negative sample.
6. a kind of heterogeneous information internet startup disk device, which is characterized in that described device includes:
Nodal information obtain module, for obtain each node for including in heterogeneous information network to be processed node type and Each connection relation between nodes;
Relationship by objective (RBO) determining module, for the node type and each connection relation between nodes according to each node, really The corresponding relationship by objective (RBO) of node relationships tuple that fixed each connection relation between nodes and corresponding node are formed;The relationship by objective (RBO) is Membership or interactive relation;
Membership node relationships tuple input module, for being directed to each node relationships tuple of the membership, by the section The initial insertion vector of connection relationship between point relationship tuple interior joint, in the node relationships tuple each node be initially embedded in Amount, is input in the membership model of predetermined heterogeneous information internet startup disk model;
Interactive relation node relationships tuple input module, for being directed to each node relationships tuple of the interactive relation, by the section The initial insertion vector of connection relationship between point relationship tuple interior joint, in the node relationships tuple each node be initially embedded in Amount, is input in the interactive relation model of the predetermined heterogeneous information internet startup disk model;
Target is embedded in vector output module, described in being exported respectively in the heterogeneous information internet startup disk model value minimum The target of each node is embedded in vector in heterogeneous information network to be processed.
7. heterogeneous information internet startup disk device according to claim 6, which is characterized in that the relationship by objective (RBO) determines mould Block, comprising:
Average angle value computational submodule, for the node type and each connection relation between nodes according to each node, Calculate separately the average angle value of each connection relation between nodes corresponding node;
First object relationship determines submodule, for the big of the average angle value according to the connection relation between nodes corresponding node It is small, determine the corresponding relationship by objective (RBO) of node relationships tuple that each connection relation between nodes and corresponding node are formed.
8. heterogeneous information internet startup disk device according to claim 6, which is characterized in that the relationship by objective (RBO) determines mould Block, comprising:
Node relationships tuple classification submodule, for the node relationships of each connection relation between nodes and corresponding node formation are first Group is classified according to connection relation between nodes;
Sparse angle value computational submodule calculates the type node relationships tuple for being directed to each type node relationship tuple Sparse angle value;
Second relationship by objective (RBO) determines submodule, for the size according to the sparse angle value, determines all types of node relationships tuples Relationship by objective (RBO).
9. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein, it is described Processor, the communication interface, the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on memory, realizes any method step of claim 1-5 Suddenly.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-5 any method and step when the computer program is executed by processor.
CN201910052260.6A 2019-01-21 2019-01-21 Heterogeneous information network embedding method and device, electronic equipment and storage medium Active CN109800232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910052260.6A CN109800232B (en) 2019-01-21 2019-01-21 Heterogeneous information network embedding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910052260.6A CN109800232B (en) 2019-01-21 2019-01-21 Heterogeneous information network embedding method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109800232A true CN109800232A (en) 2019-05-24
CN109800232B CN109800232B (en) 2021-03-19

Family

ID=66559911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910052260.6A Active CN109800232B (en) 2019-01-21 2019-01-21 Heterogeneous information network embedding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109800232B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861535A (en) * 2020-04-22 2020-10-30 北京嘀嘀无限科技发展有限公司 Order type prediction method, prediction device and readable storage medium
CN112232492A (en) * 2020-10-30 2021-01-15 北京邮电大学 Decoupling-based heterogeneous network embedding method and device and electronic equipment
CN112508115A (en) * 2020-12-15 2021-03-16 北京百度网讯科技有限公司 Method, apparatus, device and computer storage medium for building node representation model
CN112770013A (en) * 2021-01-15 2021-05-07 电子科技大学 Heterogeneous information network embedding method based on side sampling

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890703A (en) * 2012-07-20 2013-01-23 浙江工业大学 Network heterogeneous multidimensional scaling (HMDS) method
CN103034687A (en) * 2012-11-29 2013-04-10 中国科学院自动化研究所 Correlation module identifying method based on 2-type heterogeneous network
CN105761154A (en) * 2016-04-11 2016-07-13 北京邮电大学 Socialized recommendation method and device
CN106407373A (en) * 2016-09-12 2017-02-15 电子科技大学 Heterogeneous network community structure and community discovery method based on the structure
CN106777339A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of method that author is recognized based on heterogeneous network incorporation model
CN107491540A (en) * 2017-08-24 2017-12-19 济南浚达信息技术有限公司 A kind of combination depth Bayesian model and the film of collaboration Heterogeneous Information insertion recommend method
US20180020250A1 (en) * 2015-09-08 2018-01-18 Tencent Technology (Shenzhen) Company Limited Recommendation information pushing method, server, and storage medium
CN108694469A (en) * 2018-06-08 2018-10-23 哈尔滨工程大学 A kind of Relationship Prediction method of knowledge based collection of illustrative plates

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890703A (en) * 2012-07-20 2013-01-23 浙江工业大学 Network heterogeneous multidimensional scaling (HMDS) method
CN103034687A (en) * 2012-11-29 2013-04-10 中国科学院自动化研究所 Correlation module identifying method based on 2-type heterogeneous network
US20180020250A1 (en) * 2015-09-08 2018-01-18 Tencent Technology (Shenzhen) Company Limited Recommendation information pushing method, server, and storage medium
CN105761154A (en) * 2016-04-11 2016-07-13 北京邮电大学 Socialized recommendation method and device
CN106407373A (en) * 2016-09-12 2017-02-15 电子科技大学 Heterogeneous network community structure and community discovery method based on the structure
CN106777339A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of method that author is recognized based on heterogeneous network incorporation model
CN107491540A (en) * 2017-08-24 2017-12-19 济南浚达信息技术有限公司 A kind of combination depth Bayesian model and the film of collaboration Heterogeneous Information insertion recommend method
CN108694469A (en) * 2018-06-08 2018-10-23 哈尔滨工程大学 A kind of Relationship Prediction method of knowledge based collection of illustrative plates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAPER_READER: "异质信息网络嵌入学习", 《HTTPS://BLOG.CSDN.NET/PAPER_READER/ARTICLE/DETAILS/84197903》 *
伍晶等: "网络嵌入性对联合风险投资信息优势的影响", 《科研管理》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861535A (en) * 2020-04-22 2020-10-30 北京嘀嘀无限科技发展有限公司 Order type prediction method, prediction device and readable storage medium
CN112232492A (en) * 2020-10-30 2021-01-15 北京邮电大学 Decoupling-based heterogeneous network embedding method and device and electronic equipment
CN112508115A (en) * 2020-12-15 2021-03-16 北京百度网讯科技有限公司 Method, apparatus, device and computer storage medium for building node representation model
CN112508115B (en) * 2020-12-15 2023-10-24 北京百度网讯科技有限公司 Method, apparatus, device and computer storage medium for establishing node representation model
CN112770013A (en) * 2021-01-15 2021-05-07 电子科技大学 Heterogeneous information network embedding method based on side sampling

Also Published As

Publication number Publication date
CN109800232B (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN109800232A (en) A kind of heterogeneous information internet startup disk method, apparatus, electronic equipment and storage medium
Serafino et al. True scale-free networks hidden by finite size effects
Cao et al. Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization
US8615404B2 (en) Self-describing data framework
Li et al. A topic-biased user reputation model in rating systems
Song et al. Link sign prediction and ranking in signed directed social networks
CN106326585A (en) Prediction analysis method based on bayesian network reasoning and device thereof
Kumar et al. An upper approximation based community detection algorithm for complex networks
Chen et al. Finding communities by their centers
CN111126510A (en) Method for calculating similarity in heterogeneous network and related components thereof
ElBarawy et al. Improving social network community detection using DBSCAN algorithm
US20170236226A1 (en) Computerized systems, processes, and user interfaces for globalized score for a set of real-estate assets
Ganji et al. A declarative approach to constrained community detection
Liebmann et al. Hierarchical correlation clustering in multiple 2d scalar fields
Freire et al. Getting decision support from context-specific online social networks: a case study
Touboul Importance of the cutoff value in the quadratic adaptive integrate-and-fire model
Hart-Davidson et al. Genre signals in textual topologies
CN113761286A (en) Map embedding method and device of knowledge map and electronic equipment
Szymczak Stable Morse decompositions for piecewise constant vector fields on surfaces
Saxena et al. A survey of graph curvature and embedding in non-euclidean spaces
Song et al. Triangle-based representative possible worlds of uncertain graphs
Adi Prasetya et al. Modeling the co-evolving polarization of opinion and news propagation structure in social media
Han et al. An effective heterogeneous information network representation learning framework
Sathanur et al. Exploring the role of intrinsic nodal activation on the spread of influence in complex networks
Zhu et al. A robust reputation iterative algorithm based on Z-statistics in a rating system with thorny objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant