CN107944489B - Extensive combination chart feature learning method based on structure semantics fusion - Google Patents

Extensive combination chart feature learning method based on structure semantics fusion Download PDF

Info

Publication number
CN107944489B
CN107944489B CN201711169332.2A CN201711169332A CN107944489B CN 107944489 B CN107944489 B CN 107944489B CN 201711169332 A CN201711169332 A CN 201711169332A CN 107944489 B CN107944489 B CN 107944489B
Authority
CN
China
Prior art keywords
node
loss function
character representation
train
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711169332.2A
Other languages
Chinese (zh)
Other versions
CN107944489A (en
Inventor
王建民
龙明盛
裴忠
裴忠一
黄向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201711169332.2A priority Critical patent/CN107944489B/en
Publication of CN107944489A publication Critical patent/CN107944489A/en
Application granted granted Critical
Publication of CN107944489B publication Critical patent/CN107944489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing

Abstract

The present invention provides a kind of extensive combination chart feature learning method merged based on structure semantics, including:Obtain training semantic label information collection Strain, node is obtained to collecting Pe, Pe={ (u, v) }, traverse node is to (u, v);Judge whether traverse node completes (u, v);If judgement knows that traverse node does not complete (u, v), negative sampling is carried out to node u, and calculate and be connected loss function and be not attached to loss function;If judgement knows node u in VtrainIn, then according to StrainCalculate semantic loss function;The initialization feature for the node that the initialization feature of more new node u indicates, the initialization feature of node v indicates and negative sampling obtains indicates;Repetition judges whether traverse node completes (u, v), until traverse node completes (u, v).Extensive combination chart feature learning method provided by the invention based on structure semantics fusion, is corrected the character representation of node according to semantic label information, using semantic label information as a part for figure feature learning, improves the quality of figure feature learning.

Description

Extensive combination chart feature learning method based on structure semantics fusion
Technical field
The present invention relates to computer data analysis technical field more particularly to it is a kind of based on structure semantics fusion it is extensive Combination chart feature learning method.
Background technology
A large amount of valuable information can be excavated from figure, such as which node has higher similarity, which node Form a community, it is understood that there may be which potential connection relation etc..Figure feature learning as graphical data mining field one A important technology, to provide the foundation using machine learning algorithm on diagram data.The target of figure feature learning be every in figure A node generates a feature vector, using the input as machine learning algorithm, obtain the analysis result for meeting figure characteristic or Model.
A variety of figure feature learning methods are disclosed in the prior art, wherein how large quantities of working needles are to keep graph structure to ask Topic is suggested, and this kind of figure feature learning method based on connection structure achieves good effect.But in practical application In, node of graph often carries education background personal in some labels, such as social networks, professional background, hobby, content The label of blog and picture and text, grouping etc. in sharing website.These labels are of great significance for graphical data mining problem, and this The information that a little labels include can not often be excavated by the figure feature learning method based on connection structure and be arrived.In addition, there is some to grind Work is studied carefully based on the subsidiary text of node of graph community information, node of graph or pictorial information to carry out figure feature learning.
The figure feature learning method based on connection structure can not excavate the information that node of graph label includes in the prior art, And it is not suitable for come the method for carrying out figure feature learning based on the subsidiary text of node of graph community information, node of graph or pictorial information Excavation to label information, can not method excavate the information for including to node of graph label, the quality of figure feature learning is relatively low.
Invention content
(1) technical problems to be solved
The object of the present invention is to provide a kind of extensive combination chart feature learning methods based on structure semantics fusion, solve Figure feature learning method in the prior art cannot excavate the information and figure feature learning quality that node of graph label includes Relatively low technical problem.
(2) technical solution
In order to solve the above-mentioned technical problem, on the one hand, the present invention provide it is a kind of merged based on structure semantics it is extensive mixed Figure feature learning method is closed, including:
Obtain training semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding semantic label letter The set of breath, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure, it is described Figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
The node is traversed to collecting PeIn all nodes to (u, v);
Judge whether the traversal is completed;
If judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and It is not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, the initialization for updating the node u is special Sign indicates, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node indicates, obtains institute State the new character representation of node u, the new character representation of the node v and the negative new spy for sampling obtained node Sign indicates;
Repetition judges whether the traversal is completed, until the traversal is completed.
Further, the method further includes:
According to node semantics label information siWith label lj, generate label ljCharacter representation LFj, wherein the node language Adopted label information siFor the semantic label information of training node i, the siFor the StrainIn element, the label ljFor mark Label collection LjIn element, the tally set LjFor the set for the label that training node i is included, the trained node i is described VtrainIn node, j is positive integer.
Further, the method further includes:
According to the character representation of V described in normal distyribution function random initializtion, the set of the character representation of the V is obtained NF, NF={ NFk, k ∈ [1, n] }, wherein NFkFor the character representation of node k, n is the number of the V interior joints, and n, k are just whole Number.
Further, the acquisition node is to collecting Pe, Pe={ (u, v) } is specially:
The ratio that the sum of all side right weights are accounted for according to the weight of each edge in the E samples the E, obtains node To collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn an element, node u and node v are Corresponding two nodes in certain side obtained are sampled from the E.
Further, described that negative sampling is carried out to the node u, and calculate and be connected loss function and be not attached to loss function Specially:
Negative sampling is carried out to the node u according to default positive negative ratio, obtains node to collection Wherein, node to (u, w) be the node to collectionIn an element, node u and node w are Non-conterminous two nodes;
The loss function that is connected is calculated, specifically,
Wherein, Lossstructure(u, v) is the loss function that is connected, NFuFor the character representation of node u, NFvFor the spy of node v Sign indicates;
Calculating is not attached to loss function, specifically,
Wherein,To be not attached to loss function, NFuFor the character representation of node u, NFwFor node The character representation of w.
Further, described according to the StrainCalculating semantic loss function is specially:
Wherein,For semantic loss function, label luFor tally set LuIn element, mark LabelFor tally setIn element, the tally set LuBy the set of the node u labels for including, the tally setFor tally set L and the tally set LuDifference set, the tally set L is the set of all nodes label for including in figure, NFuFor the character representation of node u, LFuFor label luCharacter representation,For labelCharacter representation.
Further, the method further includes:
If judgement knows the node u not in the VtrainIn, then loss function and it is not attached to loss function according to being connected, The node that the initialization feature of the initialization feature expression, the node v that update the node u indicates and the negative sampling obtains Initialization feature indicate, obtain the new character representation of the node u, the new character representation of the node v and described The negative new character representation for sampling obtained node.
On the other hand, the present invention provides a kind of extensive combination chart feature learning device merged based on structure semantics, packet It includes:
Acquisition module, for obtaining trained semantic label information collection Strain, the StrainFor training set of node VtrainIt is corresponding Semantic label information set, the VtrainIt is several nodes obtained according to default oversampling ratio stochastical sampling from figure Set, the figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
Spider module, for traversing the node to collecting PeIn all nodes to (u, v);
Computing module, for judging whether the traversal is completed;
If judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and It is not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
Update module for the connected loss function of basis, is not attached to loss function and semantic loss function, updates the section The initialization feature of point u indicates, the initialization feature of the node v indicates and the negative initialization for sampling obtained node is special Sign indicates, obtains the new character representation of the node u, the new character representation of the node v and the negative sampling and obtains Node new character representation;
The computing module is additionally operable to repeat to judge whether the traversal is completed, until the traversal is completed.
In another aspect, the present invention provides a kind of electricity of the extensive combination chart feature learning for being merged based on structure semantics Sub- equipment, including:
Memory and processor, the processor and the memory complete mutual communication by bus;It is described to deposit Reservoir is stored with the program instruction that can be executed by the processor, and it is above-mentioned that the processor calls described program instruction to be able to carry out Method.
Another aspect, the present invention provide a kind of computer program product, and the computer program product is non-including being stored in Computer program in transitory computer readable storage medium, the computer program include program instruction, when described program refers to When order is computer-executed, the computer is made to execute above-mentioned method.
Another aspect, the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, the meter Calculation machine program realizes above-mentioned method when being executed by processor.
(3) advantageous effect
Extensive combination chart feature learning method provided by the invention based on structure semantics fusion, believes according to semantic label Breath is corrected the character representation of node, using semantic label information as a part for figure feature learning, improves figure feature The quality of study.
Description of the drawings
Fig. 1 is to illustrate according to the extensive combination chart feature learning method based on structure semantics fusion of the embodiment of the present invention Figure;
Fig. 2 is patrolling according to the extensive combination chart feature learning method based on structure semantics fusion of the embodiment of the present invention Collect flow chart;
Fig. 3 is to illustrate according to the extensive combination chart feature learning device based on structure semantics fusion of the embodiment of the present invention Figure;
Fig. 4 is the electricity of the extensive combination chart feature learning provided in an embodiment of the present invention for being merged based on structure semantics The structural schematic diagram of sub- equipment.
Specific implementation mode
In order to keep the purpose, technical scheme and advantage of the embodiment of the present invention clearer, implement below in conjunction with the present invention Attached drawing in example, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment of the present invention, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment 1:
Fig. 1 is to illustrate according to the extensive combination chart feature learning method based on structure semantics fusion of the embodiment of the present invention Figure, as shown in Figure 1, the embodiment of the present invention provides a kind of extensive combination chart feature learning method of structure semantics fusion, including:
Step S10, training semantic label information collection S is obtainedtrain, the StrainFor training set of node VtrainCorresponding language The set of adopted label information, the VtrainIt is the collection of several nodes obtained according to default oversampling ratio stochastical sampling from figure It closes, the figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Step S20, node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn An element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
Step S30, the node is traversed to collecting PeIn all nodes to (u, v);
Step S40, judge whether the traversal is completed;
If step S50, judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate the damage that is connected It loses function and is not attached to loss function;
If step S60, judging to know the node u in the VtrainIn, then according to the StrainCalculate semantic loss letter Number;
Step S70, according to the loss function that is connected, it is not attached to loss function and semantic loss function, updates the node u's Initialization feature indicates, the initialization feature of the node v indicates and the negative initialization feature table for sampling obtained node Show, obtains the section that the new character representation of the node u, the new character representation of the node v and the negative sampling obtain The new character representation of point;
Step S80, it repeats to judge whether the traversal is completed, until the traversal is completed.
Specifically, firstly, it is necessary to build the figure G=(V, E, S) comprising semantic label information, wherein V is node of graph, and E is Side in figure, S are graphic language justice label information;S={ si, i ∈ [1, n] }, n is node of graph number, siBelieve for node semantics label Breath, LiFor the set for the label that node i includes,Wherein,K-th of mark for including for node i Label.L is the complete or collected works of all labels in figure.
Then, training semantic label information collection S is obtainedtrain, the StrainFor training set of node VtrainCorresponding semantic mark Sign the set of information, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure. The default oversampling ratio could be provided as rl={ 0.01,0.05,0.1 }, randomly selects from S for generating according to the ratio The training semantic label information collection S of global node character representationtrain
Then, node is obtained to collecting Pe={ (u, v) }, wherein node u and node v is to sample certain obtained from the E Corresponding two nodes in side;
Then, traverse node is to (u, v);
Judge whether the traverse node completes (u, v);
If judgement knows that the traverse node does not complete (u, v), negative sampling is carried out to the node u, and calculate phase Connect loss function and is not attached to loss function;When carrying out negative sampling to node u, the specific feature in conjunction with figure is needed, it can be by According to preset positive negative ratio re={ 0.1,0.2,0.5 } is sampled.
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, the initialization for updating the node u is special Sign indicates, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node indicates, obtains institute State the new character representation of node u, the new character representation of the node v and the negative new spy for sampling obtained node Sign indicates;The initialization feature of the node u is indicated, the initialization feature of the node v indicates and described negative samples To the new character representation of node be updated, can be completed by back-propagation algorithm.
Repetition judge whether the traverse node completes (u, v), until the traverse node to (u, v) complete, finally Obtain the character representation of global node.
Further, the method further includes:
According to node semantics label information siWith label lj, generate label ljCharacter representation LFj, wherein the node language Adopted label information siFor the semantic label information of training node i, the siFor the StrainIn element, the label ljFor mark Label collection LjIn element, the tally set LjFor the set for the label that training node i is included, the trained node i is described VtrainIn node, j is positive integer.
Specifically, by node semantics label information siRegard sentence, each label l asjRegard word as, utilizes natural language learning The middle skip-gram models for generating word feature generate label ljCharacter representation LFj.Wherein, the node semantics label information si For the semantic label information of training node i, the siFor the StrainIn element, the label ljFor tally set LjIn member Element, the tally set LjFor the set for the label that training node i is included, the trained node i is the VtrainIn node, J is positive integer.
Further, the method further includes:
According to the character representation of V described in normal distyribution function random initializtion, the set of the character representation of the V is obtained NF, NF={ NFk, k ∈ [1, n] }, wherein NFkFor the character representation of node k, n is the number of the V interior joints, and n, k are just whole Number.
Specifically, according to the character representation of all nodes in normal distyribution function random initializtion figure, node of graph spy is obtained Sign indicates collection NF, NF={ NFk, k ∈ [1, n] }, wherein NFkFor the character representation of node k, each NFkIt is that length is special for m real numbers Sign vector, n are the number of the V interior joints, and n, k are positive integer.
Further, the acquisition node is to collecting Pe, Pe={ (u, v) } is specially:
The ratio that the sum of all side right weights are accounted for according to the weight of each edge in the E samples the E, obtains node To collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn an element, node u and node v are Corresponding two nodes in certain side obtained are sampled from the E.
Specifically, according to the weights omega of each edge in the EuvThe ratio for accounting for the sum of all side right weights samples E, obtains To node to set Pe={ (u, v) }, wherein node u and node v is that certain side corresponding two obtained is sampled from the E A node;ωuvBigger, then u and v plays the role of bigger in learning process.
Further, described that negative sampling is carried out to the node u, and calculate and be connected loss function and be not attached to loss function Specially:
Negative sampling is carried out to the node u according to default positive negative ratio, obtains node to collection Wherein, node to (u, w) be the node to collectionIn an element, node u and node w are Non-conterminous two nodes;
The loss function that is connected is calculated, specifically,
Wherein, Lossstructure(u, v) is the loss function that is connected, NFuFor the character representation of node u, NFvFor the spy of node v Sign indicates;
Calculating is not attached to loss function, specifically,
Wherein,To be not attached to loss function, NFuFor the character representation of node u, NFwFor node The character representation of w.
Specifically, according to the specific feature of figure, using negative sampling (negative sampling) technology, according to it is default just Negative ratio carries out negative sampling to node u and obtains node to setWherein, node is described to (u, w) Node is to collectionIn an element, u and w represent non-conterminous two nodes that negative sampling obtains.In, positive negative ratio reIt could be provided as re={ 0.1,0.2,0.5 }.
Then, the loss function that is connected is calculated, specifically,
Wherein, Lossstructure(u, v) is the loss function that is connected, NFuFor the character representation of node u, NFvFor the spy of node v Sign indicates.
This be connected loss function portray direct neighbor two nodes u and v character representation when characterizing associated relation band The error come;The error is smaller, then NFuAnd NFvThe connection relation of node u and v can more be embodied.What negative sampling obtained is several (u, v), (u, w1), (u, w2) are used to calculate connected loss function by a point to (u, w1), (u, w2) etc., then.
Then, it calculates and is not attached to loss function, specifically,
Wherein,To be not attached to loss function, NFuFor the character representation of node u, NFwFor node The character representation of w.
This is not attached to loss function and portrays the character representation of non-conterminous two nodes u and w when characterization is not attached to relationship The error brought;The error is smaller, then NFuAnd NFwNode u and w can more be embodied is not connected to relationship.It is negative sample obtain be (u, v), (u, w1), (u, w2) are used to calculate to be not attached to loss function by several points to (u, w1), (u, w2) etc., then.
Further, described according to the StrainCalculating semantic loss function is specially:
Wherein,For semantic loss function, label luFor tally set LuIn element, mark LabelFor tally setIn element, the tally set LuBy the set of the node u labels for including, the tally setFor tally set L and the tally set LuDifference set, the tally set L is the set of all nodes label for including in figure, NFuFor the character representation of node u, LFuFor label luCharacter representation,For labelCharacter representation.
If specifically, judging to know the node u in the VtrainIn, then according to the StrainCalculate semantic loss letter Number, specific computational methods are:
Wherein,For semantic loss function, label luFor tally set LuIn element, mark LabelFor tally setIn element, the tally set LuBy the set of the node u labels for including, For tally set LuIn k-th of element, the tally setFor tally set L with it is described Tally set LuDifference set, the tally set L is the set of all nodes label for including in figure, NFuFor the character representation of node u, LFuFor label luCharacter representation,For labelCharacter representation, LFuWithFeature as node u is empty Between anchor point.
The semanteme loss function portrays the error that the character representation of some point is brought when expressing semantic label information;The mistake Difference is smaller, then NFuU and l can more be embodieduIncidence relation and u andNot incidence relation.
Further, the method further includes:
If judgement knows the node u not in the VtrainIn, then loss function and it is not attached to loss function according to being connected, The node that the initialization feature of the initialization feature expression, the node v that update the node u indicates and the negative sampling obtains Initialization feature indicate, obtain the new character representation of the node u, the new character representation of the node v and described The negative new character representation for sampling obtained node.
Fig. 2 is patrolling according to the extensive combination chart feature learning method based on structure semantics fusion of the embodiment of the present invention Flow chart is collected, below using logical flow chart shown in Fig. 2 as the present embodiment example to the method for the embodiment of the present invention It is described further, as shown in Figure 2:
Firstly, it is necessary to build the figure G=(V, E, S) comprising semantic label information, and it is loaded into mixing diagram data G.
Then, semantic label information is sampled, that is, several sections obtained according to default oversampling ratio stochastical sampling from figure The set V of pointtrain, obtain and the trained set of node VtrainCorresponding trained semantic label information collection Strain
Then, the study of semantic label character representation is carried out, that is, according to node semantics label information siWith label lj, generate Label ljCharacter representation LFj.It initializes node diagnostic to indicate, that is, according to the spy of V described in normal distyribution function random initializtion Sign indicates, obtains the set NF={ NF of the character representation of the Vk, k ∈ [1, n] }, wherein NFkFor the character representation of node k, n For the number of the V interior joints, n, k are positive integer.
Then, side sampling is carried out by weight, that is, the ratio of the sum of all side right weights is accounted for according to the weight of each edge in the E Example samples the E, obtains node to collecting Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn One element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E.
Then, the side of sampling is traversed, that is, the corresponding node of opposite side traverses (u, v).
Then, judge whether traversal is completed, carry out negative sampling, calculate structural penalties function and negative example structural penalties function, The structural penalties function is called connected loss function, and negative example structural penalties function, which is called, is not attached to loss function.
Then, judge whether semantic label is sampled, that is, judge the node u whether in the VtrainIn, if by adopting Sample then calculates semantic loss function, and according to the semantic loss function, utilizes the mark sheet of back-propagation algorithm more new node Show, wherein node u, node v and the negative initialization feature expression for sampling obtained node are all updated.
Repetition judges whether the traversal is completed, and until traversal completion, finally obtains the character representation of global node.
If the corresponding semantic label information of node u is not sampled, according to the connected loss function and described it is not attached to Loss function utilizes the character representation of back-propagation algorithm more new node.
Repetition judges whether the traversal is completed, and until traversal completion, finally obtains the character representation of global node.
Extensive combination chart feature learning method provided in an embodiment of the present invention based on structure semantics fusion, according to semanteme Label information is corrected the character representation of node, using semantic label information as a part for figure feature learning, improves The quality of figure feature learning.And influence of the semantic information to global node is propagated using the part based on connection structure in study, So that can effectively promote whole feature learning effect when semantic information is less, has good practicability.Pass through Learning strategy is sampled, lower algorithm complexity is ensure that, to adapt to the feature learning of Large Scale Graphs.
Embodiment 2:
Fig. 3 is to illustrate according to the extensive combination chart feature learning device based on structure semantics fusion of the embodiment of the present invention Figure;As shown in figure 3, the embodiment of the present invention provides a kind of extensive combination chart feature learning device merged based on structure semantics, For completing the method described in above-described embodiment, including acquisition module 10, spider module 20, computing module 30 and update module 40, wherein
Acquisition module 10 is for obtaining trained semantic label information collection Strain, the StrainFor training set of node VtrainIt is right The set for the semantic label information answered, the VtrainIt is several sections obtained according to default oversampling ratio stochastical sampling from figure The set of point, the figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
Spider module 20 is for traversing the node to collecting PeIn all nodes to (u, v);
Computing module 30 is for judging whether the traversal is completed;
If judgement knows that the traversal section does not complete, negative sampling is carried out to the node u, and calculate the loss function that is connected Be not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
Update module 40 is used for the connected loss function of basis, is not attached to loss function and semantic loss function, described in update The initialization feature of node u indicates, the initialization feature of the node v indicates and the negative initialization for sampling obtained node Character representation, obtain the new character representation of the node u, the node v new character representation and described negative sample The new character representation of the node arrived;
The computing module is additionally operable to repeat to judge whether the traversal is completed, until the traversal is completed.
Extensive combination chart feature learning device provided in an embodiment of the present invention based on structure semantics fusion, according to semanteme Label information is corrected the character representation of node, using semantic label information as a part for figure feature learning, improves The quality of figure feature learning.And influence of the semantic information to global node is propagated using the part based on connection structure in study, So that can effectively promote whole feature learning effect when semantic information is less, has good practicability.Pass through Learning strategy is sampled, lower algorithm complexity is ensure that, to adapt to the feature learning of Large Scale Graphs.
Embodiment 3:
Fig. 4 is the electricity of the extensive combination chart feature learning provided in an embodiment of the present invention for being merged based on structure semantics The structural schematic diagram of sub- equipment, as shown in figure 4, the equipment includes:Processor (processor) 801, memory (memory) 802 and bus 803;
Wherein, processor 801 and memory 802 complete mutual communication by the bus 803;
Processor 801 is used to call program instruction in memory 802, is provided with to execute above-mentioned each method embodiment Method, such as including:
Obtain training semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding semantic label letter The set of breath, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure, it is described Figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
The node is traversed to collecting PeIn all nodes to (u, v);
Judge whether the traversal is completed;
If judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and It is not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, the initialization for updating the node u is special Sign indicates, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node indicates, obtains institute State the new character representation of node u, the new character representation of the node v and the negative new spy for sampling obtained node Sign indicates;
Repetition judges whether the traversal is completed, until the traversal is completed.
Embodiment 4:
The embodiment of the present invention discloses a kind of computer program product, and the computer program product is non-transient including being stored in Computer program on computer readable storage medium, the computer program include program instruction, when described program instructs quilt When computer executes, computer is able to carry out the method that above-mentioned each method embodiment is provided, such as including:
Obtain training semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding semantic label letter The set of breath, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure, it is described Figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
The node is traversed to collecting PeIn all nodes to (u, v);
Judge whether the traversal is completed;
If judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and It is not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, the initialization for updating the node u is special Sign indicates, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node indicates, obtains institute State the new character representation of node u, the new character representation of the node v and the negative new spy for sampling obtained node Sign indicates;
Repetition judges whether the traversal is completed, until the traversal is completed.
Embodiment 5:
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage Medium storing computer instructs, and the computer instruction makes the computer execute the side that above-mentioned each method embodiment is provided Method, such as including:
Obtain training semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding semantic label letter The set of breath, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure, it is described Figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn a member Element, node u and node v are that corresponding two nodes in certain side obtained are sampled from the E;
The node is traversed to collecting PeIn all nodes to (u, v);
Judge whether the traversal is completed;
If judging to know that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and It is not attached to loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, the initialization for updating the node u is special Sign indicates, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node indicates, obtains institute State the new character representation of node u, the new character representation of the node v and the negative new spy for sampling obtained node Sign indicates;
Repetition judges whether the traversal is completed, until the traversal is completed.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disc or light The various media that can store program code such as disk.
The embodiments such as device and equipment described above are only schematical, wherein described be used as separating component explanation Unit may or may not be physically separated, the component shown as unit may or may not be Physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of module therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case of performing creative labour, you can to understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be expressed in the form of software products in other words, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features; And these modifications or replacements, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (7)

1. a kind of extensive combination chart feature learning method based on structure semantics fusion, which is characterized in that including:
Obtain training semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding semantic label information Set, the VtrainIt is the set of several nodes obtained according to default oversampling ratio stochastical sampling from figure, the figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn an element, section Point u and node v is that corresponding two nodes in certain side obtained are sampled from the E;
The node is traversed to collecting PeIn all nodes to (u, v);
Judge whether the traversal is completed;
If judgement knows that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and non-phase Even loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
According to connected loss function, it is not attached to loss function and semantic loss function, updates the initialization feature table of the node u Show, the initialization feature expression for the node that the initialization feature of the node v indicates and the negative sampling obtains, obtains the section The new character representation of point u, the new character representation of the node v and the negative new mark sheet for sampling obtained node Show;
Repetition judges whether the traversal is completed, until the traversal is completed;
Further include:
According to node semantics label information siWith label lj, generate label ljCharacter representation LFj, wherein the node semantics mark Sign information siFor the semantic label information of training node i, the siFor the StrainIn element, the label ljFor tally set LjIn element, the tally set LjFor the set for the label that training node i is included, the trained node i is the Vtrain In node, j is positive integer;
It is described that negative sampling is carried out to the node u, and calculate to be connected loss function and be not attached to loss function and be specially:
Negative sampling is carried out to the node u according to default positive negative ratio, obtains node to collection Its In, node to (u, w) be the node to collectionIn an element, node u and node w are non-conterminous two nodes;
The loss function that is connected is calculated, specifically,
Wherein, Lossstructure(u, v) is the loss function that is connected, NFuFor the character representation of node u, NFvFor the mark sheet of node v Show;
Calculating is not attached to loss function, specifically,
Wherein,To be not attached to loss function, NFuFor the character representation of node u, NFwFor node w's Character representation;
It is described according to the StrainCalculating semantic loss function is specially:
Wherein,For semantic loss function, label luFor tally set LuIn element, labelFor tally setIn element, the tally set LuBy the set of the node u labels for including, the tally set For tally set L and the tally set LuDifference set, the tally set L is the set of all nodes label for including in figure, NFuFor The character representation of node u, LFuFor label luCharacter representation,For labelCharacter representation.
2. according to the method described in claim 1, it is characterized in that, further including:
According to the character representation of V described in normal distyribution function random initializtion, the set NF, NF of the character representation of the V are obtained ={ NFk, k ∈ [1, n] }, wherein NFkFor the character representation of node k, n is the number of the V interior joints, and n, k are positive integer.
3. according to the method described in claim 1, it is characterized in that, the acquisition node is to collecting Pe, Pe={ (u, v) } is specially:
The ratio that the sum of all side right weights are accounted for according to the weight of each edge in the E samples the E, obtains node to collection Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn an element, node u and node v is from institutes State corresponding two nodes in certain side that acquisition is sampled in E.
4. according to claim 1-3 any one of them methods, which is characterized in that further include:
If judgement knows the node u not in the VtrainIn, then loss function and it is not attached to loss function according to being connected, updated At the beginning of the node that the initialization feature of the node u indicates, the initialization feature of the node v indicates and the negative sampling obtains Beginningization character representation obtains the new character representation of the node u, the new character representation of the node v and described bear and adopts The new character representation for the node that sample obtains.
5. a kind of extensive combination chart feature learning device based on structure semantics fusion, which is characterized in that including:
Acquisition module, for obtaining trained semantic label information collection Strain, the StrainFor training set of node VtrainCorresponding language The set of adopted label information, the VtrainIt is the collection of several nodes obtained according to default oversampling ratio stochastical sampling from figure It closes, the figure is G, G=(V, E, S), wherein V is node of graph, and E is the side in figure, and S is graphic language justice label information;
Node is obtained to collecting Pe, Pe={ (u, v) }, wherein node to (u, v) be the node to collect PeIn an element, section Point u and node v is that corresponding two nodes in certain side obtained are sampled from the E;
Spider module, for traversing the node to collecting PeIn all nodes to (u, v);
Computing module, for judging whether the traversal is completed;
If judgement knows that the traversal does not complete, negative sampling is carried out to the node u, and calculate be connected loss function and non-phase Even loss function;
If judgement knows the node u in the VtrainIn, then according to the StrainCalculate semantic loss function;
Update module for the connected loss function of basis, is not attached to loss function and semantic loss function, updates the node u Initialization feature indicate, the initialization feature of the node v indicates and the negative initialization feature for sampling obtained node It indicates, obtains what the new character representation of the node u, the new character representation of the node v and the negative sampling obtained The new character representation of node;
The computing module is additionally operable to repeat to judge whether the traversal is completed, until the traversal is completed.
The acquisition module is additionally operable to according to node semantics label information siWith label lj, generate label ljCharacter representation LFj, Wherein, the node semantics label information siFor the semantic label information of training node i, the siFor the StrainIn member Element, the label ljFor tally set LjIn element, the tally set LjIt is described for the set for the label that training node i is included Training node i is the VtrainIn node, j is positive integer;
It is described that negative sampling is carried out to the node u, and calculate to be connected loss function and be not attached to loss function and be specially:
Negative sampling is carried out to the node u according to default positive negative ratio, obtains node to collection Its In, node to (u, w) be the node to collectionIn an element, node u and node w are non-conterminous two nodes;
The loss function that is connected is calculated, specifically,
Wherein, Lossstructure(u, v) is the loss function that is connected, NFuFor the character representation of node u, NFvFor the mark sheet of node v Show;
Calculating is not attached to loss function, specifically,
Wherein,To be not attached to loss function, NFuFor the character representation of node u, NFwFor node w's Character representation;
It is described according to the StrainCalculating semantic loss function is specially:
Wherein,For semantic loss function, label luFor tally set LuIn element, labelFor tally setIn element, the tally set LuBy the set of the node u labels for including, the tally set For tally set L and the tally set LuDifference set, the tally set L is the set of all nodes label for including in figure, NFuFor The character representation of node u, LFuFor label luCharacter representation,For labelCharacter representation.
6. a kind of electronic equipment of extensive combination chart feature learning for being merged based on structure semantics, which is characterized in that packet It includes:
Memory and processor, the processor and the memory complete mutual communication by bus;The memory It is stored with the program instruction that can be executed by the processor, the processor calls described program instruction to be able to carry out right such as and wants Seek 1 to 4 any method.
7. a kind of computer program product, which is characterized in that the computer program product includes being stored in non-transient computer Computer program on readable storage medium storing program for executing, the computer program include program instruction, when described program is instructed by computer When execution, the computer is made to execute the method as described in Claims 1-4 is any.
CN201711169332.2A 2017-11-17 2017-11-17 Extensive combination chart feature learning method based on structure semantics fusion Active CN107944489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711169332.2A CN107944489B (en) 2017-11-17 2017-11-17 Extensive combination chart feature learning method based on structure semantics fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711169332.2A CN107944489B (en) 2017-11-17 2017-11-17 Extensive combination chart feature learning method based on structure semantics fusion

Publications (2)

Publication Number Publication Date
CN107944489A CN107944489A (en) 2018-04-20
CN107944489B true CN107944489B (en) 2018-10-16

Family

ID=61929647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711169332.2A Active CN107944489B (en) 2017-11-17 2017-11-17 Extensive combination chart feature learning method based on structure semantics fusion

Country Status (1)

Country Link
CN (1) CN107944489B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417594B (en) * 2019-07-29 2020-10-27 吉林大学 Network construction method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386574B2 (en) * 2009-10-29 2013-02-26 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN104715418A (en) * 2015-03-16 2015-06-17 北京航空航天大学 Novel social network sampling method
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN107133262A (en) * 2017-03-30 2017-09-05 浙江大学 A kind of personalized POI embedded based on many influences recommends method
CN107273490A (en) * 2017-06-14 2017-10-20 北京工业大学 A kind of combination mistake topic recommendation method of knowledge based collection of illustrative plates
CN107729290A (en) * 2017-09-21 2018-02-23 北京大学深圳研究生院 A kind of expression learning method of ultra-large figure using the optimization of local sensitivity Hash

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386490B2 (en) * 2010-10-27 2013-02-26 Eastman Kodak Company Adaptive multimedia semantic concept classifier
CN106970614A (en) * 2017-03-10 2017-07-21 江苏物联网研究发展中心 The construction method of improved trellis topology semantic environment map
CN107294851B (en) * 2017-06-16 2019-11-26 西安电子科技大学 A kind of router level network topology estimating method
CN107341611A (en) * 2017-07-06 2017-11-10 浙江大学 A kind of operation flow based on convolutional neural networks recommends method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386574B2 (en) * 2009-10-29 2013-02-26 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN104715418A (en) * 2015-03-16 2015-06-17 北京航空航天大学 Novel social network sampling method
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN107133262A (en) * 2017-03-30 2017-09-05 浙江大学 A kind of personalized POI embedded based on many influences recommends method
CN107273490A (en) * 2017-06-14 2017-10-20 北京工业大学 A kind of combination mistake topic recommendation method of knowledge based collection of illustrative plates
CN107729290A (en) * 2017-09-21 2018-02-23 北京大学深圳研究生院 A kind of expression learning method of ultra-large figure using the optimization of local sensitivity Hash

Also Published As

Publication number Publication date
CN107944489A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN105843703B (en) Workflow is created to solve the method and system of at least one system problem
CN107330956A (en) A kind of unsupervised painting methods of caricature manual draw and device
CN110533974B (en) Intelligent volume assembling method and system and computer readable storage medium
CN109960810A (en) A kind of entity alignment schemes and device
Pinggera et al. Modeling styles in business process modeling
EP3309689A1 (en) Systems and methods for selecting optimal variables using modified teaching learning based search optimization technique
CN110288007A (en) The method, apparatus and electronic equipment of data mark
Ring Activity-based methodology for development and analysis of integrated DoD architectures
CN109948140A (en) A kind of term vector embedding grammar and device
CN108710907A (en) Handwritten form data classification method, model training method, device, equipment and medium
CN109784159A (en) The processing method of scene image, apparatus and system
CN109272044A (en) A kind of image similarity determines method, apparatus, equipment and storage medium
CN109614480A (en) A kind of generation method and device of the autoabstract based on production confrontation network
CN105335375B (en) Topics Crawling method and apparatus
CN107944489B (en) Extensive combination chart feature learning method based on structure semantics fusion
Esposito Modern web development: understanding domains, technologies, and user experience
CN107392229A (en) A kind of network representation method based on the Relation extraction that most gears to the needs of the society
Noguchi Recommended best practices based on MBSE pilot projects
CN109408396A (en) Method for evaluating software quality, device, equipment and computer readable storage medium
CN110263328A (en) A kind of disciplinary capability type mask method, device, storage medium and terminal device
CN109840867B (en) Intelligent teaching method, equipment and device
Hoel et al. Data sharing for learning analytics–Questioning the risks and benefits
Yu et al. Computational design: technology, cognition and environments
Azkue A digital tool for three-dimensional visualization and annotation in anatomy and embryology learning
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Long Mingsheng

Inventor after: Pei Zhongyi

Inventor after: Wang Jianmin

Inventor after: Huang Xiangdong

Inventor before: Wang Jianmin

Inventor before: Long Mingsheng

Inventor before: Pei Zhongyi

Inventor before: Huang Xiangdong