CN112559737B - Node classification method and system of knowledge graph - Google Patents

Node classification method and system of knowledge graph Download PDF

Info

Publication number
CN112559737B
CN112559737B CN202011311415.2A CN202011311415A CN112559737B CN 112559737 B CN112559737 B CN 112559737B CN 202011311415 A CN202011311415 A CN 202011311415A CN 112559737 B CN112559737 B CN 112559737B
Authority
CN
China
Prior art keywords
nodes
neighbor
node
features
aggregated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011311415.2A
Other languages
Chinese (zh)
Other versions
CN112559737A (en
Inventor
赵九州
侯乐
胡碧峰
胡茂海
李文锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Workway Shenzhen Information Technology Co ltd
Original Assignee
Workway Shenzhen Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Workway Shenzhen Information Technology Co ltd filed Critical Workway Shenzhen Information Technology Co ltd
Priority to CN202011311415.2A priority Critical patent/CN112559737B/en
Publication of CN112559737A publication Critical patent/CN112559737A/en
Application granted granted Critical
Publication of CN112559737B publication Critical patent/CN112559737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a node classification method and a node classification system of a knowledge graph, wherein the method comprises the following steps: extracting self attribute features of the nodes and node relation features of other nodes related to the nodes, and taking the self attribute features and the node relation features of the nodes as joint features of the nodes; calculating node values and webpage ranking values of all neighbor nodes of the nodes, filtering the neighbor nodes according to the node values and the webpage ranking values of all the neighbor nodes, and aggregating the filtered neighbor nodes to obtain aggregated neighbor characteristics; and training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features. The invention can improve the accuracy of node classification.

Description

Node classification method and system of knowledge graph
Technical Field
The invention relates to the technical field of natural language processing, in particular to a node classification method and system of a knowledge graph.
Background
The current node classification of the graph mainly comprises three methods. Based on a label propagation method, nodes of the same type usually have a certain relationship, for example, two persons of a colleague, one person is used for deep learning, and the other person is used for deep learning with a high probability. The method based on node representation adopts algorithms such as deep walk and the like to represent nodes into a vector form, and then carries out node classification through a classifier. The other method is node classification based on a graph neural network, the method is further divided into two types, the graph neural network based on an adjacency matrix and the graph neural network based on aggregation operation are greatly influenced by the graph scale in a mode based on the adjacency matrix, and when the number of nodes is too large, no method is used for constructing the too large adjacency matrix. Although the calculation amount of the neural network based on the aggregation operation is small, appropriate hyper-parameters need to be selected, such as the number of neighbor nodes, the relation degree needed to be considered and the like, neighbor nodes of different nodes may be too many or too few and are difficult to select, for example, in a wind control scene, the relation nodes of some users may be a large number of meaningless nodes, the neighbor nodes are useless for the representation of a main node, for the degree, the degree possibly involved in some relations is very high, and if the hyper-parameters lower than the degree are set in a model, the effect influence is very large.
Therefore, how to classify the nodes by using the self information of the nodes and the information of the neighbor nodes becomes a technical problem to be solved urgently.
Disclosure of Invention
In view of this, the present invention provides a node classification method for a knowledge graph, so as to achieve training by effectively utilizing node information and neighbor node information to obtain a target classifier, so as to improve accuracy of node classification.
On one hand, the invention provides a node classification method of a knowledge graph, which comprises the steps of extracting the self attribute characteristics of nodes and the node relation characteristics of other nodes related to the nodes, and taking the self attribute characteristics and the node relation characteristics of the nodes as the joint characteristics of the nodes;
calculating node values and webpage ranking values of all neighbor nodes of the nodes, filtering the neighbor nodes according to the node values and the webpage ranking values of all the neighbor nodes, and aggregating the filtered neighbor nodes to obtain aggregated neighbor characteristics;
and training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features.
Further, the step of training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features includes:
carrying out weighted summation according to the node joint characteristics and the aggregated neighbor characteristics;
inputting the value obtained by the weighted summation into an initial classifier, wherein the initial classifier comprises a graph neural network, a full connection layer and a softmax layer; the loss function of the initial classifier is a cross entropy loss function;
and minimizing the cross entropy loss function, and training to obtain the target classifier.
Further, the step of performing weighted summation according to the node joint feature and the aggregated neighbor feature is specifically represented as:
Figure BDA0002789950040000021
where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure BDA0002789950040000022
represents the aggregate characteristics of the node's current layer k,
Figure BDA0002789950040000023
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure BDA0002789950040000024
representing the aggregated characteristics of a layer k-1 above the node.
Further, the step of aggregating the filtered neighbor nodes to obtain the aggregated neighbor features is specifically represented as:
Figure BDA0002789950040000025
Figure BDA0002789950040000026
Figure BDA0002789950040000027
wherein Q, K, V represents the union feature of the filtered neighbor nodes
Figure BDA0002789950040000031
A matrix obtained by splicing, d represents a scaling factor, hu_aggRepresenting the neighbor features obtained by aggregation.
Further, the step of filtering the neighboring nodes according to the node values of the neighboring nodes and the web page ranking values includes:
and filtering out the neighbor nodes of which the node values are smaller than a first preset threshold and the webpage ranking values are smaller than a second preset threshold.
On the other hand, the invention provides a node classification system of a knowledge graph, which comprises a self-characteristic acquisition unit, a node classification unit and a node classification unit, wherein the self-characteristic acquisition unit is used for extracting self-attribute characteristics of nodes and node relation characteristics of other nodes related to the nodes, and taking the self-attribute characteristics and the node relation characteristics of the nodes as joint characteristics of the nodes;
the neighbor feature acquisition unit is used for calculating the node value and the webpage ranking value of each neighbor node of the nodes, filtering the neighbor nodes according to the node value and the webpage ranking value of each neighbor node, and aggregating the filtered neighbor nodes to obtain aggregated neighbor features;
and the classifier training unit is used for training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features.
Further, the classifier training unit includes:
the weighting processing subunit is used for carrying out weighted summation according to the node joint characteristic and the aggregated neighbor characteristic;
the learning training subunit is used for inputting the value obtained by the weighted summation into an initial classifier, and the initial classifier comprises a graph neural network, a full connection layer and a softmax layer; the loss function of the initial classifier is a cross entropy loss function; and minimizing the cross entropy loss function, and training to obtain the target classifier.
Further, the weighting processing subunit is specifically configured to perform weighted summation according to the following formula:
Figure BDA0002789950040000032
where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure BDA0002789950040000033
represents the aggregate characteristics of the node's current layer k,
Figure BDA0002789950040000034
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure BDA0002789950040000035
representing the aggregated characteristics of a layer k-1 above the node.
Further, the neighbor feature obtaining unit is configured to perform aggregation according to the following formula to obtain an aggregated neighbor feature:
Figure BDA0002789950040000041
Figure BDA0002789950040000042
Figure BDA0002789950040000043
wherein Q, K, V represents the union feature of the filtered neighbor nodes
Figure BDA0002789950040000044
A matrix obtained by splicing, d represents a scaling factor, hu_aggRepresenting the neighbor features obtained by aggregation.
Further, the neighbor feature obtaining unit is specifically configured to filter out neighbor nodes whose node degree values are smaller than a first preset threshold and whose web page ranking values are smaller than a second preset threshold.
The node classification method and the system of the knowledge graph combine the self attribute characteristics of the nodes with the node relation characteristics to form the joint characteristics of the nodes, filter the neighbor nodes according to the node values and the webpage ranking values of the neighbor nodes, select the optimal neighbor nodes to aggregate to obtain the aggregated neighbor characteristics, and train to obtain the target classifier for classifying the nodes of the knowledge graph according to the joint characteristics of the nodes and the aggregated neighbor characteristics, so that the self characteristics and the neighbor node characteristics of the nodes are effectively utilized, the calculation speed is high, and the target classifier is not influenced by the scale of the knowledge graph; the optimal neighbor nodes are selected for aggregation, negative effects caused by unreasonable degree and node number selection are avoided, more proper neighbor nodes are selected for aggregation, invalid nodes are effectively filtered, and the accuracy of node classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a node classification method of a knowledge-graph according to an exemplary first embodiment of the present invention.
Fig. 2 is a block diagram of a node classification system of a knowledge-graph according to an exemplary second embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, all other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort fall within the scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
Fig. 1 is a flowchart of a node classification method of a knowledge-graph according to an exemplary first embodiment of the present invention, and as shown in fig. 1, the node classification method of a knowledge-graph according to the present invention includes:
step 101: extracting self attribute features of the nodes and node relation features of other nodes related to the nodes, and taking the self attribute features and the node relation features of the nodes as joint features of the nodes.
The method specifically comprises the following steps: for the self attribute characteristics of the node, such as the "person" node, the characteristics may be age, occupation, income, etc. For the attribute characteristics of the nodes in the wind control scenario, for example, the character nodes may include characteristics of loan amount, bank card balance, monthly income, overdue, age, and the like, which are specifically shown in table 1.
TABLE 1
Name (I) Amount of loan Balance of bank card Income per month Whether or not there is overdue Age (age)
Zhang San 350000 100000 35000 Whether or not 35
Li Si 40000 20000 12000 Whether or not 24
For the node relationship characteristics of other nodes related to the node, it can be understood that the relationship prior characteristics of the node are introduced, that is, the nodes to be classified are artificially analyzed to determine which relationships are important, and the more important relationships are counted. For example, in a wind control scene, if two nodes have an n-degree relationship, and the relationship may have an influence on the result, the relationship is encoded, which may be a one-hot or other encoding method, and the encoded value and the original characteristics of the nodes are spliced. The method can avoid the problem that the relationship between two nodes is lost when the distance between the two nodes is long, and has high calculation speed.
Specifically, the prior characteristics of the user nodes needing loan overdue prediction are calculated, for example, whether an important relationship exists between a person node and a person who is overdue or not and whether a relation exists between a person node and a high-risk enterprise or not are calculated, and as shown in the following table 2, the relations are encoded and added to the node characteristics.
TABLE 2
Other users Relationships between Overdue user
Zhang San Father and son Li Si
Xiao Hong Friend's day Xiaoming liquor
Other users Relationships between High-risk enterprise
Zhang San Staff member Enterprise A
Xiao Hong High pipe Enterprise B
Step 102: and calculating the node value and the webpage ranking value of each neighbor node of the node, filtering the neighbor nodes according to the node value and the webpage ranking value of each neighbor node, and aggregating the filtered neighbor nodes to obtain the aggregated neighbor characteristics.
Specifically, the neighbor nodes are sampled in a biased manner, the mainstream method at present is random sampling, in this embodiment, some basic information of each neighbor node is judged, a node value and a webpage ranking value (i.e., pagerank value) of each neighbor node are calculated, weighted summation is performed on the values, the summed result is used as the weight of each node, and biased sampling is performed according to the weight. The node degree d (v) is calculated as follows, wherein v represents a node to be calculated, u represents a neighbor node of v, n (v) represents a set of neighbor nodes of v, and | u | represents the number of neighbor nodes.
d(v)=∑u∈N(v)|u|
The webpage ranking value is calculated as follows, wherein N represents the total number of nodes, q is a damping coefficient, and piIs a node to be solved, pjIs piOf neighbor nodes, L (p)j) Is represented by pjThe out degree of (c).
Figure BDA0002789950040000071
In a specific operation, the neighbor nodes with the node degree value smaller than the first preset threshold and the web page ranking value smaller than the second preset threshold may be filtered out. For example, in a wind control scenario, the account nodes may have many open users but not used, and the pagerank values of the nodes may be low, thereby filtering the meaningless nodes.
pagerank (frequently used account) > pagerank (infrequently used account)
The specific aggregation operation is to weight the nodes of the previous layer and then to perform weighted summation on the neighbor nodes. In specific operation, firstly, the characteristics of the neighbor nodes (which can be understood as the joint characteristics of the neighbor nodes) of the nodes to be solved after being filtered are obtained
Figure BDA0002789950040000072
Concatenating and respectively Q, K, V, Q, K, V being the same, by multiplying the transposes of Q and K by the equation labeled 3, to obtain a value n, n]N represents the number of nodes of Q/K/V, the matrix represents the correlation between nodes and divides the matrix by a scaling factor d that prevents the vanishing of softmax gradients, as shown in the following formula numbered 4, then softmax calculation is performed to obtain a normalized weight matrix, the obtained weight matrix is multiplied by V to obtain a weighted result, the process is called self-attribute, namely, the formula numbered 2, and the aggregated neighbor features h are obtainedu_agg
Figure BDA0002789950040000073
Figure BDA0002789950040000074
Figure BDA0002789950040000075
Figure BDA0002789950040000076
Where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure BDA0002789950040000077
represents the aggregate characteristics of the node's current layer k,
Figure BDA0002789950040000078
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure BDA0002789950040000079
representing the aggregated characteristics of a layer k-1 above the node.
Step 103: and training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features.
Preferably, step 103 may comprise:
carrying out weighted summation according to the node joint characteristics and the aggregated neighbor characteristics to obtain
Figure BDA0002789950040000081
I.e. weighted summation according to the formula labeled 1 above;
inputting the value obtained by the weighted summation into an activation function of an initial classifier, wherein the initial classifier comprises a graph neural network, a full connection layer and a softmax layer, namely, the graph neural network is connected with the full connection layer and the softmax layer for classification; the loss function of the initial classifier is a cross entropy loss function;
and calculating a cross entropy loss function, learning the weight of the initial classifier by minimizing the cross entropy loss function, training to obtain the target classifier, namely inputting the characteristics of each node into the target classifier so as to judge which users are possibly overdue according to the target classifier. An optimizer such as adam can be used to minimize the loss function during specific operations.
For the training process of the initial classifier and the process of minimizing the cross entropy loss function, the detailed process of training to obtain the target classifier may refer to the prior art, and is not described herein again.
The embodiment is improved on the basis of the current mainstream graph neural network based on the aggregation operation, combines the self attribute characteristics of the nodes with the node relation characteristics to form the joint characteristics of the nodes, filtering the neighbor nodes according to the node values of the neighbor nodes and the webpage ranking values, selecting the optimal neighbor nodes for aggregation to obtain aggregated neighbor characteristics, and then a target classifier for classifying the nodes of the knowledge graph is obtained by training according to the joint characteristics of the nodes and the aggregated neighbor characteristics, so that the self characteristics and the neighbor node characteristics of the nodes are effectively utilized, the calculation speed is high, an adjacency matrix is not required to be constructed, meanwhile, the final result is not greatly influenced by selection of the super-parameters, and more appropriate neighbor nodes are selected for aggregation, so that invalid nodes are effectively filtered, and the accuracy of node classification is improved.
Fig. 2 is a block diagram of a node classification system of a knowledge-graph according to an exemplary second embodiment of the present invention. As shown in fig. 2, the node classification system of the knowledge-graph includes:
a self-feature obtaining unit 201, configured to extract a self-attribute feature of a node and a node relationship feature of another node related to the node, and use the self-attribute feature and the node relationship feature of the node as a joint feature of the node;
the neighbor feature obtaining unit 202 is configured to calculate a node value and a web page ranking value of each neighbor node of the nodes, filter the neighbor nodes according to the node value and the web page ranking value of each neighbor node, and aggregate the filtered neighbor nodes to obtain an aggregated neighbor feature;
and the classifier training unit 203 is configured to train to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features.
Preferably, the classifier training unit includes:
a weighting processing subunit (not shown in the figure) for performing weighted summation according to the node joint feature and the aggregated neighbor feature;
a learning training subunit (not shown in the figure) for inputting the weighted sum value into an initial classifier, wherein the initial classifier comprises a graph neural network, a full connection layer and a softmax layer; the loss function of the initial classifier is a cross entropy loss function; and minimizing the cross entropy loss function, and training to obtain the target classifier.
Preferably, the weighting processing subunit is specifically configured to perform weighted summation according to the following formula:
Figure BDA0002789950040000091
where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure BDA0002789950040000092
represents the aggregate characteristics of the node's current layer k,
Figure BDA0002789950040000093
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure BDA0002789950040000094
representing the aggregated characteristics of a layer k-1 above the node.
Preferably, the neighbor feature obtaining unit is configured to perform aggregation according to the following formula to obtain an aggregated neighbor feature:
Figure BDA0002789950040000095
Figure BDA0002789950040000096
Figure BDA0002789950040000097
wherein Q, K, V represents the union feature of the filtered neighbor nodes
Figure BDA0002789950040000098
A matrix obtained by splicing, d represents a scaling factor, hu_aggRepresenting the neighbor features obtained by aggregation.
Preferably, the neighbor feature obtaining unit is specifically configured to filter out neighbor nodes whose node degree values are smaller than a first preset threshold and whose web page ranking values are smaller than a second preset threshold.
The embodiment is improved on the basis of the current mainstream graph neural network based on the aggregation operation, combines the self attribute characteristics of the nodes with the node relation characteristics to form the joint characteristics of the nodes, filtering the neighbor nodes according to the node values of the neighbor nodes and the webpage ranking values, selecting the optimal neighbor nodes for aggregation to obtain aggregated neighbor characteristics, and then a target classifier for classifying the nodes of the knowledge graph is obtained by training according to the joint characteristics of the nodes and the aggregated neighbor characteristics, so that the self characteristics and the neighbor node characteristics of the nodes are effectively utilized, the calculation speed is high, an adjacency matrix is not required to be constructed, meanwhile, the final result is not greatly influenced by selection of the super-parameters, and more appropriate neighbor nodes are selected for aggregation, so that invalid nodes are effectively filtered, and the accuracy of node classification is improved.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (2)

1. A node classification method of a knowledge graph is characterized by comprising the following steps:
extracting self attribute features of the nodes and node relation features of other nodes related to the nodes, and taking the self attribute features and the node relation features of the nodes as joint features of the nodes;
calculating node values and webpage ranking values of all neighbor nodes of the nodes, filtering the neighbor nodes according to the node values and the webpage ranking values of all the neighbor nodes, and aggregating the filtered neighbor nodes to obtain aggregated neighbor characteristics;
training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features;
the step of aggregating the filtered neighbor nodes to obtain the aggregated neighbor features is specifically represented as:
Figure FDA0003456096480000011
Figure FDA0003456096480000012
Figure FDA0003456096480000013
wherein Q, K, V represents the union feature of the filtered neighbor nodes
Figure FDA0003456096480000014
A matrix obtained by splicing, d represents a scaling factor, hu_aggDenotes polyThe step of training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features comprises the following steps:
carrying out weighted summation according to the node joint characteristics and the aggregated neighbor characteristics;
inputting the value obtained by the weighted summation into an initial classifier, wherein the initial classifier comprises a graph neural network, a full connection layer and a softmax layer; the loss function of the initial classifier is a cross entropy loss function;
minimizing the cross entropy loss function, training to obtain the target classifier, wherein the step of performing weighted summation according to the node joint features and the aggregated neighbor features is specifically represented as:
Figure FDA0003456096480000015
where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure FDA0003456096480000016
represents the aggregate characteristics of the node's current layer k,
Figure FDA0003456096480000021
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure FDA0003456096480000022
representing the aggregation characteristic of a layer k-1 above the node, wherein the step of filtering the neighbor nodes according to the node values and the webpage ranking values of the neighbor nodes comprises the following steps:
and filtering out the neighbor nodes of which the node values are smaller than a first preset threshold and the webpage ranking values are smaller than a second preset threshold.
2. A system for node classification of a knowledge-graph, comprising:
the self-feature acquisition unit is used for extracting self-attribute features of the nodes and node relation features of other nodes related to the nodes, and taking the self-attribute features and the node relation features of the nodes as joint features of the nodes;
the neighbor feature acquisition unit is used for calculating the node value and the webpage ranking value of each neighbor node of the nodes, filtering the neighbor nodes according to the node value and the webpage ranking value of each neighbor node, and aggregating the filtered neighbor nodes to obtain aggregated neighbor features;
the classifier training unit is used for training to obtain a target classifier for classifying the nodes of the knowledge graph according to the joint features of the nodes and the aggregated neighbor features;
the neighbor feature acquiring unit is used for aggregating according to the following formula to obtain aggregated neighbor features:
Figure FDA0003456096480000023
Figure FDA0003456096480000024
Figure FDA0003456096480000025
wherein Q, K, V represents the union feature of the filtered neighbor nodes
Figure FDA0003456096480000026
A matrix obtained by splicing, d represents a scaling factor, hu_aggRepresenting the aggregated neighbor features, the classifier training unit comprising:
the weighting processing subunit is used for carrying out weighted summation according to the node joint characteristic and the aggregated neighbor characteristic;
the learning training subunit is used for inputting the value obtained by the weighted summation into an initial classifier, and the initial classifier comprises a graph neural network, a full connection layer and a softmax layer; the loss function of the initial classifier is a cross entropy loss function; and minimizing the cross entropy loss function, training to obtain the target classifier, wherein the weighting processing subunit is specifically configured to perform weighted summation according to the following formula:
Figure FDA0003456096480000031
where σ denotes a nonlinear transformation, WkAnd BkEach represents a weight to learn;
Figure FDA0003456096480000032
represents the aggregate characteristics of the node's current layer k,
Figure FDA0003456096480000033
representing the aggregated characteristics of one layer k-1 above the neighboring nodes,
Figure FDA0003456096480000034
the neighbor feature obtaining unit is specifically configured to filter out neighbor nodes having a node value smaller than a first preset threshold and a web page ranking value smaller than a second preset threshold.
CN202011311415.2A 2020-11-20 2020-11-20 Node classification method and system of knowledge graph Active CN112559737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011311415.2A CN112559737B (en) 2020-11-20 2020-11-20 Node classification method and system of knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011311415.2A CN112559737B (en) 2020-11-20 2020-11-20 Node classification method and system of knowledge graph

Publications (2)

Publication Number Publication Date
CN112559737A CN112559737A (en) 2021-03-26
CN112559737B true CN112559737B (en) 2022-03-11

Family

ID=75044275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011311415.2A Active CN112559737B (en) 2020-11-20 2020-11-20 Node classification method and system of knowledge graph

Country Status (1)

Country Link
CN (1) CN112559737B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642704A (en) * 2021-08-02 2021-11-12 上海明略人工智能(集团)有限公司 Graph feature derivation method, system, storage medium and electronic device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10855706B2 (en) * 2016-10-11 2020-12-01 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN108875051B (en) * 2018-06-28 2020-04-28 中译语通科技股份有限公司 Automatic knowledge graph construction method and system for massive unstructured texts
CN110147414B (en) * 2019-05-23 2022-05-13 北京金山数字娱乐科技有限公司 Entity characterization method and device of knowledge graph
CN110321482B (en) * 2019-06-11 2023-04-18 创新先进技术有限公司 Information recommendation method, device and equipment
CN110866190B (en) * 2019-11-18 2021-05-14 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph
CN111813955B (en) * 2020-07-01 2021-10-19 浙江工商大学 Service clustering method based on knowledge graph representation learning
CN111597358A (en) * 2020-07-22 2020-08-28 中国人民解放军国防科技大学 Knowledge graph reasoning method and device based on relational attention and computer equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GPT-2代码解读[2]:Attention;iSikai;《https://blog.csdn.net/oksupersonic/article/details/104431143 CSDN博客》;20200221;第1-6页 *
如何理解attention中的Q,K,V?;陀飞轮、知乎用户等;《https://www.zhihu.com/question/298810062/answer/1828080188?utm_source=zhihu&utm_medium=social&utm_oi=862674298593230848 知乎》;20200405;第1-2页 *
注意力机制(Attention Mechanism)学习笔记;ferb2015;《https://blog.csdn.net/eqiang8848/article/details/84679396 CSDN博客》;20181201;第1-3页 *

Also Published As

Publication number Publication date
CN112559737A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
TWI769754B (en) Method and device for determining target business model based on privacy protection
Sousa et al. Technical efficiency of the Brazilian municipalities: correcting nonparametric frontier measurements for outliers
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN111373418A (en) Learning apparatus and learning method, recognition apparatus and recognition method, program, and recording medium
CN112818690A (en) Semantic recognition method and device combined with knowledge graph entity information and related equipment
CN112365007B (en) Model parameter determining method, device, equipment and storage medium
CN110347791A (en) A kind of topic recommended method based on multi-tag classification convolutional neural networks
CN107392392A (en) Microblogging forwarding Forecasting Methodology based on deep learning
CN111222583B (en) Image steganalysis method based on countermeasure training and critical path extraction
CN112559737B (en) Node classification method and system of knowledge graph
CN111160959A (en) User click conversion estimation method and device
CN110598129A (en) Cross-social network user identity recognition method based on two-stage information entropy
CN113779429A (en) Traffic congestion situation prediction method, device, equipment and storage medium
Wisse et al. Relieving the elicitation burden of Bayesian belief networks.
CN114708479A (en) Self-adaptive defense method based on graph structure and characteristics
CN112597309A (en) Detection system for identifying microblog data stream of sudden event in real time
CN112446777B (en) Credit evaluation method, device, equipment and storage medium
CN114820219B (en) Complex network-based fraud community identification method and system
CN116227939A (en) Enterprise credit rating method and device based on graph convolution neural network and EM algorithm
CN114387005A (en) Arbitrage group identification method based on graph classification
CN110489660A (en) A kind of user's economic situation portrait method of social media public data
CN111984842B (en) Bank customer data processing method and device
CN113706279B (en) Fraud analysis method, fraud analysis device, electronic equipment and storage medium
CN115511606A (en) Object identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant