CN114168799B - Method, device and medium for acquiring characteristics of node adjacency in graph data structure - Google Patents

Method, device and medium for acquiring characteristics of node adjacency in graph data structure Download PDF

Info

Publication number
CN114168799B
CN114168799B CN202111421510.2A CN202111421510A CN114168799B CN 114168799 B CN114168799 B CN 114168799B CN 202111421510 A CN202111421510 A CN 202111421510A CN 114168799 B CN114168799 B CN 114168799B
Authority
CN
China
Prior art keywords
node
degree
data structure
graph data
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111421510.2A
Other languages
Chinese (zh)
Other versions
CN114168799A (en
Inventor
赵亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd
Original Assignee
Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd filed Critical Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd
Priority to CN202111421510.2A priority Critical patent/CN114168799B/en
Publication of CN114168799A publication Critical patent/CN114168799A/en
Application granted granted Critical
Publication of CN114168799B publication Critical patent/CN114168799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of graph data structures, in particular to a method, a device and a medium for acquiring characteristics of node adjacency in a graph data structure, aiming at solving the problem of how to acquire deep characteristics of the node adjacency in the graph data structure. To this end, the method of the present invention includes acquiring a plurality of node paths for each node when each node is a starting node according to a graph data structure; generating a feature vector of a node adjacency relation of a starting node in a node path by adopting a word vector dictionary according to words represented by each node in the node path; training the language model by adopting each node and the feature vector; the feature vector of the node adjacency relation of the target node in the target graph data structure is obtained by adopting the trained language model, and the feature vector can represent the adjacency relation feature among the nodes of multiple levels, namely the deep feature of the node adjacency relation, so that the defect that only the shallow feature of the node adjacency relation in the graph data structure can be obtained in the prior art is overcome.

Description

Method, device and medium for acquiring characteristics of node adjacency in graph data structure
Technical Field
The invention relates to the technical field of graph data structures, and particularly provides a method, a device and a medium for acquiring characteristics of node adjacency relations in a graph data structure.
Background
The Graph data structure (Graph) is composed of a finite non-empty set of nodes (Vertice) and a set of edges (edges) between the nodes, typically denoted as G (V, E), where G represents a Graph, V is the set of nodes in Graph G, and E is the set of edges in Graph G. The current graph data structure is widely applied to social networks, online shopping, financial transactions, protein and chemical molecular structures, pattern recognition, very large scale integrated circuit design and other application scenes. For example, in an application scenario of online shopping, the graph data structure may be a graph composed of nodes (individuals, commodities) and edges (purchases, collections), and in an application scenario of financial transactions, the data structure may be a graph composed of nodes (individuals, businesses) and edges (funds transfer ). The features of adjacency of different nodes in the graph data structure have an important role in enhancing the feature expression of the nodes themselves, because neighboring nodes have the same or similar probability of being more distant nodes than distant nodes in the relevant data analysis.
The conventional method for acquiring the characteristics of the node adjacency in the graph data structure at present mainly comprises a node level acquisition method and a graph level acquisition method. The method for acquiring the node level mainly comprises the steps of respectively creating independent characteristics for each node, and acquiring characteristic information of the current node field by using the characteristics by adopting an iteration method; the graph level acquisition method mainly comprises the steps of constructing an adjacency matrix and acquiring characteristics of node adjacency relations by using the adjacency matrix. Because both the two acquisition methods can only acquire the shallow layer characteristics of the node adjacency in the graph data structure and cannot acquire the adjacency characteristics of the multi-level nodes in the graph data structure, the accuracy of data analysis can be greatly affected when the characteristics of the node adjacency obtained by the two acquisition methods are used for data analysis.
Accordingly, there is a need in the art for a new feature acquisition scheme for node adjacencies in graph data structures that addresses the above-described problems.
Disclosure of Invention
The present invention has been made to overcome the above-mentioned drawbacks, and provides a method, an apparatus, and a medium for obtaining a feature of a node adjacency in a graph data structure, which solve or at least partially solve the technical problem of how to obtain a deep feature of a node adjacency (adjacency feature of a multi-level node) in the graph data structure, so as to improve analysis accuracy when using the feature of the node adjacency for data analysis.
In a first aspect, the present invention provides a feature acquisition method for a node adjacency in a graph data structure, where each node in the graph data structure represents a word, edges between nodes represent semantic relationships between words represented by the nodes, and the feature acquisition method includes:
Acquiring a plurality of node paths corresponding to each node when each node is used as a starting node according to the graph data structure;
acquiring word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generating feature vectors of node adjacency relations corresponding to the initial nodes in each node path according to the word vectors;
taking each node in the graph data structure and the feature vector of the corresponding node adjacency relationship as training samples, and carrying out model training on the language model;
and obtaining the feature vector of the node adjacency relation of the target node in the target graph data structure by adopting the trained language model.
In one technical solution of the method for obtaining a feature of a node adjacency in the graph data structure, the step of obtaining, according to the graph data structure, a plurality of node paths corresponding to each node when each node is used as a starting node, specifically includes:
Step S1: taking a starting node of the current node path as a searching node;
Step S2: acquiring all first-degree nodes starting from a search node and respectively determining the weight of each first-degree node according to the degree of each first-degree node; if the graph data structure is a directed graph, the degree one node is a node with a degree of 1 and the degree of the degree one node is a degree of 1; if the graph data structure is an undirected graph, the one-degree node is a node with a degree of 1;
step S3: randomly selecting one degree node as the next node of the current node path according to the weight;
Step S4: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node is used as a searching node and the step S2 is carried out.
In one technical solution of the method for obtaining the characteristics of the node adjacency in the graph data structure, the step of determining the weight of each degree node according to the degree of each degree node specifically includes:
And respectively determining the weight of each degree node according to the degree of each degree node and according to the method shown in the following formula:
Wherein w ij represents the weight of the ith degree node of the jth search node, o ij represents the degree of the ith degree node of the jth search node, N j1 represents the set of the degrees of all the one degree nodes of the jth search node, Σn j1 represents the sum of the degrees of all the one degree nodes of the jth search node;
And/or
The path stop obtaining condition is that the next node does not have other neighbor nodes except the node in the current node path in the graph data structure, or the path stop obtaining condition is that the number of the nodes contained in the current node path reaches a preset number threshold.
In one technical solution of the method for obtaining the feature of the node adjacency in the graph data structure, after the step of obtaining the feature vector of the node adjacency of the target node in the target graph data structure by using the trained language model, the feature obtaining method further includes:
performing feature vector decomposition on the feature vector of the node adjacency of the target node by adopting a singular value decomposition method to obtain a final feature vector of the node adjacency;
And/or
The language model is a Word2vec model.
In a second aspect, there is provided a feature acquisition apparatus for node adjacency in a graph data structure, each node in the graph data structure representing a word, edges between nodes representing semantic relationships between words represented by the nodes, the feature acquisition apparatus comprising:
A node path acquisition module configured to acquire a plurality of node paths respectively corresponding to each node when each node is taken as a starting node according to the graph data structure;
The first feature vector acquisition module is configured to acquire word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generate feature vectors of node adjacency relations corresponding to initial nodes in each node path according to the word vectors;
A model training module configured to model a language model with feature vectors of each node and corresponding node adjacency in the graph data structure as training samples;
And the second feature vector acquisition module is configured to acquire feature vectors of node adjacency relations of the target nodes in the target graph data structure by adopting the trained language model.
In one aspect of the foregoing feature acquisition apparatus for node adjacency in a graph data structure, the node path acquisition module is further configured to perform the following operations:
Step S1: taking a starting node of the current node path as a searching node;
Step S2: acquiring all first-degree nodes starting from a search node and respectively determining the weight of each first-degree node according to the degree of each first-degree node; if the graph data structure is a directed graph, the degree one node is a node with a degree of 1 and the degree of the degree one node is a degree of 1; if the graph data structure is an undirected graph, the one-degree node is a node with a degree of 1;
step S3: randomly selecting one degree node as the next node of the current node path according to the weight;
Step S4: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node is used as a searching node and the step S2 is carried out.
In one aspect of the feature acquiring apparatus for node adjacency in the above graph data structure, the node path acquiring module includes a weight determining submodule configured to determine the weight of each of the first degree nodes according to the degree of each of the first degree nodes and according to the method shown in the following formula:
Wherein w ij represents the weight of the ith degree node of the jth search node, o ij represents the degree of the ith degree node of the jth search node, N j1 represents the set of the degrees of all the one degree nodes of the jth search node, Σn j1 represents the sum of the degrees of all the one degree nodes of the jth search node;
And/or
The path stop obtaining condition is that the next node does not have other neighbor nodes except the node in the current node path in the graph data structure, or the path stop obtaining condition is that the number of the nodes contained in the current node path reaches a preset number threshold.
In one technical solution of the node adjacency feature acquiring apparatus in the above graph data structure, the second feature vector acquiring module includes a feature vector decomposition sub-module configured to perform feature vector decomposition on a feature vector of the node adjacency of the target node by using a singular value decomposition method, so as to acquire a feature vector of the final node adjacency;
And/or
The language model is a Word2vec model.
In a third aspect, a control device is provided, where the control device includes a processor and a storage device, where the storage device is adapted to store a plurality of program codes, where the program codes are adapted to be loaded and executed by the processor to perform the method for acquiring the feature of the node adjacency in the graph data structure according to any one of the technical solutions of the method for acquiring the feature of the node adjacency in the graph data structure.
In a fourth aspect, there is provided a computer readable storage medium having stored therein a plurality of program codes adapted to be loaded and executed by a processor to perform the method for feature acquisition of node adjacencies in a graph data structure according to any one of the above-mentioned methods for feature acquisition of node adjacencies in a graph data structure.
The technical scheme provided by the invention has at least one or more of the following beneficial effects:
In the technical scheme of implementing the invention, each node in the graph data structure respectively represents a word, edges between the nodes represent semantic relations between words represented by the nodes, and the characteristic acquisition method of the node adjacency relations in the data structure can comprise acquiring a plurality of node paths respectively corresponding to each node when each node is taken as a starting node according to the graph data structure; acquiring word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generating feature vectors of node adjacency relations corresponding to the initial nodes in each node path according to the word vectors; taking each node in the graph data structure and the feature vector of the corresponding node adjacency as training samples, and carrying out model training on the language model; and obtaining the feature vector of the node adjacency relation of the target node in the target graph data structure by adopting the trained language model.
Because each node path is formed by sequentially connecting a plurality of nodes, and a plurality of node communication paths with different levels are formed, the adjacency relation features among the nodes with different levels, namely the deep layer features of the adjacency relation of the nodes, can be represented by adopting a word vector dictionary and generating a feature vector according to words represented by each node in the node path. By using the feature vectors to perform model training on the language model, the language model can learn how to acquire the deep features of the node adjacency from the graph data structure, so that the trained language model can be directly adopted to acquire the feature vector (the deep features of the node adjacency of the target node) of the node adjacency of any target node in the target graph data structure, thereby overcoming the defect that only the shallow features of the node adjacency in the graph data structure can be acquired in the prior art, and further reducing the analysis accuracy when the features of the node adjacency are used for data analysis.
Drawings
The present disclosure will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: the drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. Wherein:
FIG. 1 is a flow chart illustrating the main steps of a method for obtaining characteristics of node adjacencies in a graph data structure according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating the main steps of a method for obtaining characteristics of node adjacencies in a graph data structure according to another embodiment of the present invention;
Fig. 3 is a schematic block diagram of a main structure of a feature acquiring apparatus of a node adjacency in a graph data structure according to an embodiment of the present invention.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module," "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, or software components, such as program code, or a combination of software and hardware. The processor may be a central processor, a microprocessor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "a and/or B" means all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.
Referring to fig. 1, fig. 1 is a schematic flow chart of main steps of a method for obtaining characteristics of node adjacency in a graph data structure according to an embodiment of the present invention. In this embodiment, each node in the graph data structure represents a word, and edges between nodes represent semantic relationships between words represented by the nodes. For example, the words represented by two nodes are individuals and goods, respectively, and the semantic relationship represented by an edge between the two nodes may be a purchase. As shown in fig. 1, the method for obtaining the feature of the node adjacency in the graph data structure in the embodiment of the present invention mainly includes the following steps S101 to S104.
Step S101: and acquiring a plurality of node paths corresponding to each node when each node is used as a starting node according to the graph data structure.
In this embodiment, multiple node paths are acquired for each node, and the starting node in all node paths corresponding to the same node is the node. For example, the graph data structure includes 10 nodes (a, b, c, d, e, f, g, h, i, j), for each of which 3 node paths are acquired. For node a, the starting node in the 3 node paths of node a is node a.
The node path refers to a communication path formed by sequentially connecting a plurality of nodes. For example, one node path of the node a may be a communication path formed by sequentially connecting the nodes a, c, and d.
Step S102: and acquiring word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generating feature vectors of node adjacency relations corresponding to the initial nodes in each node path according to the word vectors.
The word vector dictionary may employ a dictionary containing word vectors for each word constructed by One-Hot Encoding (One-Hot Encoding). As shown in fig. 2, it can be determined that the word vector of the word represented by the node a in the graph structure (graph data structure) is 1000 … 0 and the word vector of the word represented by the node b is 0100 … 0 by referring to onehot dictionary in fig. 2.
In this embodiment, a word vector dictionary may be used to sequentially obtain word vectors of words represented by each node in the node path, and then a feature vector of the node path, that is, a feature vector of a node adjacency relationship of a starting node in the path, is generated according to the word vectors. The set in FIG. 2 represents a set of all node paths with node a as the starting node, and the node paths a-c-b-e-f are taken as examples, and the feature vector of this node path is the feature vector obtained from the word vector of the word represented by node a, c, b, e, f. In this embodiment, the word vector of the word represented by each node in the node path may be represented in a matrix.
Step S103: and taking each node in the graph data structure and the feature vector of the corresponding node adjacency as training samples, and carrying out model training on the language model.
In this embodiment, the language model may be a Word2vec model, or may be another type of language model in the technical field of natural language processing. In this embodiment, a CBOW (Continuous Bag-of-Words) model training method and a Skip-Gram model training method may be used to perform model training on the Word2vec model, and a person skilled in the art may flexibly set parameters used in the CBOW model training method and the Skip-Gram model training method according to actual requirements. As shown in fig. 2, the feature vector set of the node adjacency relationship may be input as input embedding to the language model, and after output embedding of the language model output is obtained, model loss may be calculated according to input embedding and output embedding, and further model parameters of the language model may be adjusted according to the model loss, so as to complete model training.
Step S104: and obtaining the feature vector of the node adjacency relation of the target node in the target graph data structure by adopting the trained language model.
In this embodiment, the target graph data structure may be input to a trained language model, and the language model may output feature vectors of node adjacency relations corresponding to each target node according to word vectors of words represented by each target node in the target graph data structure.
Based on the above steps S101 to S104, since each node path is formed by sequentially connecting a plurality of nodes, and a plurality of node communication paths of different levels are formed, the feature of the adjacency relationship between the nodes of the plurality of different levels, that is, the deep feature of the node adjacency relationship can be represented by using the word vector dictionary and generating the feature vector according to the word represented by each node in the node path. By using the feature vectors to carry out model training on the language model, the language model can learn the capability of obtaining the deep features of the node adjacency from the graph data structure, so that the feature vector of the node adjacency of any target node in the target graph data structure can be directly obtained by using the trained language model, the defect that only shallow features of the node adjacency in the graph data structure can be obtained in the prior art is overcome, and the analysis accuracy is further reduced when the features of the node adjacency are used for data analysis.
The above steps S101 and S104 are further described below.
In one embodiment of the above step S101, a plurality of node paths corresponding to each node when each node is used as a start node may be acquired according to the graph data structure and according to the following steps 11 to 14.
Step 11: and taking the initial node of the current node path as a searching node.
Step 12: and acquiring all the first degree nodes from the search node and respectively determining the weight of each first degree node according to the degree of each first degree node. The greater the weight of a degree node, the more important this degree node is to the search node.
In this embodiment, if the graph data structure is a directed graph, the degree of the degree node is a node with a degree of 1 and the degree of the degree node is a degree of 1; if the graph data structure is an undirected graph, the first degree node is a degree 1 node. Both the directed graph and the undirected graph are graph data structures which are conventional in the technical field of data structures, and the meaning of the degree of the directed graph and the meaning of the degree of the undirected graph are not repeated here.
In one embodiment of step 12, the weight of each degree node may be determined according to the degree of each degree node and according to the method shown in the following formula (1):
the meaning of each parameter in the formula (1) is as follows:
w ij denotes the weight of the ith degree node of the jth search node, o ij denotes the degree of the ith degree node of the jth search node, N j1 denotes the set of degrees of all the first degree nodes of the jth search node, Σn j1 denotes the sum of the degrees of all the first degree nodes of the jth search node.
Step 13: and randomly selecting one degree node as the next node of the current node path according to the weight.
According to the random selection algorithm with weight in the technical field of data processing, the data with large weight can be selected with the highest probability when randomly selecting according to the weight. Therefore, in the present embodiment, when one degree node is randomly selected as the next node of the current node path according to the weight, the degree node that is most important (the weight is the largest or the degree is the largest) for the search node can be selected as the next node with the highest probability, so that the node path formed according to the next node can more accurately reflect the node adjacency relationship of the search node in the graph data structure. Furthermore, while it is also possible to select a degree node that is less important (less weighted or less degree) to the search node as the next node based on the random selection of weights, it is also possible to mine some hidden adjacency node relationships in the search node graph data structure to some extent.
Step 14: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node obtained in the step 13 is used as a searching node and the step 12 is proceeded to.
The path stop acquisition condition in this embodiment may be that the next node has no other neighboring node in the graph data structure than the node in the current node path, i.e., the next node is the last node. In addition, the path stop acquiring condition may be that the number of nodes included in the current node path reaches a preset number threshold. In this embodiment, a person skilled in the art may flexibly set a specific value of the number threshold according to actual needs, for example, the number threshold may be 3 or 5.
One node path of one node can be obtained through the steps 11 to 14, and a plurality of node paths corresponding to each node can be obtained by repeatedly executing the steps 11 to 14.
Referring to fig. 2, the above steps 11 to 14 will be further described by taking the structure of the diagram shown in fig. 2 as an example. The graph structure shown in fig. 2 (the graph data structure described in the foregoing method embodiment) includes a node a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, and a node path for acquiring the node a is described below.
(1) Taking the node a as a searching node of the current node path, the first-degree node from the node a comprises nodes c, i, n and o, and the weights of the nodes c, i, n and o are 0.7, 0.1 and 0.1 in sequence. And randomly selecting a node c from the nodes c, i, n and o according to the weight, and taking the node c as the next node of the current node path, namely, the current node path is a-c.
(2) Assuming that the path stop acquisition condition is that the number of nodes included in the current node path reaches a preset number threshold and the preset number threshold is 3, if it is judged that the path stop acquisition condition is not met, taking the node c as a search node, wherein the first-degree nodes from the node c comprise nodes b, d, i and p, and the weights of the nodes b, d, i and p are 0.5, 0.2, 0.1 and 0.1 in sequence. And randomly selecting a node b from the nodes b, d, i and p according to the weight, and taking the node b as the next node of the current node path, namely, the current node path is a-c-b.
(3) Since the path stop acquisition condition has been satisfied at this time, the acquisition of the current node path is stopped, and the final result of the current node path is a-c-b.
Further, in another embodiment of the method for obtaining the characteristics of the node adjacencies in the graph data structure according to the present invention, the method for obtaining the characteristics of the node adjacencies in the graph data structure may include not only the steps S101 to S104 in the foregoing method embodiment, but also the following step S105.
Step S105: and adopting a singular value decomposition (Singular Value Decomposition, SVD) method to perform feature vector decomposition on the feature vector of the node adjacent relation of the target node so as to obtain the final feature vector of the node adjacent relation.
Because the number of the nodes in the graph data structure is usually relatively large, the sparsity of the feature vectors of the node adjacent relations is relatively large, and the sparsity of the feature vectors can be reduced by carrying out feature vector decomposition on the feature vectors of the node adjacent relations of the target node through a singular value decomposition method. The feature vector of the final node adjacency can represent the nodes with relatively close distances and relatively high degrees in the graph data structure, so that the node adjacency can be better represented.
It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.
The invention further provides a device for acquiring the characteristics of the node adjacency in the graph data structure.
Referring to fig. 3, fig. 3 is a main structural block diagram of a feature acquiring apparatus for node adjacency in a graph data structure according to an embodiment of the present invention, in which each node in the graph data structure represents a word, and edges between nodes represent semantic relationships between words represented by the nodes. As shown in fig. 3, the feature acquiring device for a node adjacency in the graph data structure in the embodiment of the present invention mainly includes a node path acquiring module, a first feature vector acquiring module, a model training module, and a second feature vector acquiring module. The node path acquisition module may be configured to acquire, according to the graph data structure, a plurality of node paths respectively corresponding to each node when each node is taken as a starting node; the first feature vector obtaining module may be configured to obtain a word vector of a word represented by each node in each node path using a word vector dictionary, and generate feature vectors of node adjacency relations corresponding to the start node in each node path according to the word vectors; the model training module may be configured to model the language model with feature vectors of each node and corresponding node adjacency in the graph data structure as training samples; the second feature vector acquisition module may be configured to acquire feature vectors of node adjacencies of the target nodes in the target graph data structure using the trained language model. The language model may be a Word2vec model.
In one embodiment, the node path acquisition module may be further configured to: step S1: taking a starting node of the current node path as a searching node; step S2: acquiring all the first degree nodes from the search node and respectively determining the weight of each first degree node according to the degree of each first degree node; if the graph data structure is a directed graph, the degree of the degree node is a node with the degree of 1 and the degree of the degree node is the degree of the degree; if the graph data structure is an undirected graph, the first degree node is a node with the degree of 1; step S3: randomly selecting a one-degree node as the next node of the current node path according to the weight; step S4: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node is used as the searching node and the step S2 is carried out.
In one embodiment, the node path acquisition module may include a weight determination sub-module that may be configured to determine the weight of each degree node separately from the degree of each degree node and in the manner shown in equation (1).
In one embodiment, the path stop acquisition condition is that the next node has no other neighboring nodes in the graph data structure than the node in the current node path, or the path stop acquisition condition is that the number of nodes contained in the current node path reaches a preset number threshold.
In one embodiment, the second feature vector obtaining module may include a feature vector decomposition sub-module, and the feature vector decomposition sub-module may be configured to perform feature vector decomposition on the feature vector of the node adjacency of the target node using a singular value decomposition method to obtain the feature vector of the final node adjacency.
The foregoing technical principles, the solved technical problems, and the technical effects of the embodiment of the method for obtaining the characteristics of the node adjacency in the graph data structure shown in fig. 1 are similar, and those skilled in the art can clearly understand that, for convenience and brevity of description, the specific working process and related description of the device for obtaining the characteristics of the node adjacency in the graph data structure can refer to the description of the embodiment of the method for obtaining the characteristics of the node adjacency in the graph data structure, which is not repeated herein.
It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device, medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, software distribution media, and the like capable of carrying the computer program code. It should be noted that the computer readable storage medium may include content that is subject to appropriate increases and decreases as required by jurisdictions and by jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunications signals.
Further, the invention also provides a control device. In one embodiment of the control device according to the present invention, the control device includes a processor and a storage device, the storage device may be configured to store a program for executing the method for acquiring the characteristics of the node adjacencies in the graph data structure of the above-described method embodiment, and the processor may be configured to execute the program in the storage device, the program including, but not limited to, the program for executing the method for acquiring the characteristics of the node adjacencies in the graph data structure of the above-described method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The control device may be a control device formed of various electronic devices.
Further, the invention also provides a computer readable storage medium. In one embodiment of the computer readable storage medium according to the present invention, the computer readable storage medium may be configured to store a program for executing the method for acquiring characteristics of node adjacencies in the graph data structure of the above-described method embodiment, where the program may be loaded and executed by a processor to implement the method for acquiring characteristics of node adjacencies in the graph data structure. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The computer readable storage medium may be a storage device including various electronic devices, and optionally, the computer readable storage medium in the embodiments of the present invention is a non-transitory computer readable storage medium.
Further, it should be understood that, since the respective modules are merely set to illustrate the functional units of the apparatus of the present invention, the physical devices corresponding to the modules may be the processor itself, or a part of software in the processor, a part of hardware, or a part of a combination of software and hardware. Accordingly, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solution to deviate from the principle of the present invention, and therefore, the technical solution after splitting or combining falls within the protection scope of the present invention.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims (8)

1. A feature acquisition method for node adjacency relations in a graph data structure, wherein each node in the graph data structure represents a word, edges between nodes represent semantic relations between words represented by the nodes, the feature acquisition method comprising:
Acquiring a plurality of node paths corresponding to each node when each node is used as a starting node according to the graph data structure;
acquiring word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generating feature vectors of node adjacency relations corresponding to the initial nodes in each node path according to the word vectors;
taking each node in the graph data structure and the feature vector of the corresponding node adjacency relationship as training samples, and carrying out model training on the language model;
Acquiring a feature vector of a node adjacency relationship of a target node in a target graph data structure by adopting a trained language model;
The step of acquiring a plurality of node paths corresponding to each node when each node is used as a starting node according to the graph data structure specifically comprises the following steps:
Step S1: taking a starting node of the current node path as a searching node;
Step S2: acquiring all first-degree nodes starting from a search node and respectively determining the weight of each first-degree node according to the degree of each first-degree node; if the graph data structure is a directed graph, the degree one node is a node with a degree of 1 and the degree of the degree one node is a degree of 1; if the graph data structure is an undirected graph, the one-degree node is a node with a degree of 1;
step S3: randomly selecting one degree node as the next node of the current node path according to the weight;
Step S4: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node is used as a searching node and the step S2 is carried out.
2. The method for obtaining characteristics of node adjacency in a graph data structure according to claim 1, wherein the step of determining the weight of each degree node according to the degree of each degree node comprises:
And respectively determining the weight of each degree node according to the degree of each degree node and according to the method shown in the following formula:
Wherein w ij represents the weight of the ith degree node of the jth search node, o ij represents the degree of the ith degree node of the jth search node, N j1 represents the set of the degrees of all the one degree nodes of the jth search node, Σn j1 represents the sum of the degrees of all the one degree nodes of the jth search node;
And/or
The path stop obtaining condition is that the next node does not have other neighbor nodes except the node in the current node path in the graph data structure, or the path stop obtaining condition is that the number of the nodes contained in the current node path reaches a preset number threshold.
3. The feature acquisition method of node adjacencies in a graph data structure according to claim 1, wherein after the step of acquiring feature vectors of node adjacencies of target nodes in a target graph data structure using a trained language model, the feature acquisition method further comprises:
performing feature vector decomposition on the feature vector of the node adjacency of the target node by adopting a singular value decomposition method to obtain a final feature vector of the node adjacency;
And/or
The language model is a Word2vec model.
4. A feature acquisition device for node adjacency in a graph data structure, wherein each node in the graph data structure represents a word, and edges between nodes represent semantic relationships between words represented by the nodes, the feature acquisition device comprising:
A node path acquisition module configured to acquire a plurality of node paths respectively corresponding to each node when each node is taken as a starting node according to the graph data structure;
The first feature vector acquisition module is configured to acquire word vectors of words represented by each node in each node path by using a word vector dictionary, and respectively generate feature vectors of node adjacency relations corresponding to initial nodes in each node path according to the word vectors;
A model training module configured to model a language model with feature vectors of each node and corresponding node adjacency in the graph data structure as training samples;
the second feature vector acquisition module is configured to acquire feature vectors of node adjacency relations of the target nodes in the target graph data structure by adopting a trained language model;
wherein the node path acquisition module is further configured to:
Step S1: taking a starting node of the current node path as a searching node;
Step S2: acquiring all first-degree nodes starting from a search node and respectively determining the weight of each first-degree node according to the degree of each first-degree node; if the graph data structure is a directed graph, the degree one node is a node with a degree of 1 and the degree of the degree one node is a degree of 1; if the graph data structure is an undirected graph, the one-degree node is a node with a degree of 1;
step S3: randomly selecting one degree node as the next node of the current node path according to the weight;
Step S4: judging whether a path stop acquisition condition is met; if yes, stopping acquiring the current node path; if not, the next node is used as a searching node and the step S2 is carried out.
5. The feature acquisition apparatus of node adjacency in a graph data structure according to claim 4, wherein the node path acquisition module includes a weight determination submodule configured to determine the weight of each of the degree nodes according to the degree of each of the degree nodes and according to a method shown in the following formula:
Wherein w ij represents the weight of the ith degree node of the jth search node, o ij represents the degree of the ith degree node of the jth search node, N j1 represents the set of the degrees of all the one degree nodes of the jth search node, Σn j1 represents the sum of the degrees of all the one degree nodes of the jth search node;
And/or
The path stop obtaining condition is that the next node does not have other neighbor nodes except the node in the current node path in the graph data structure, or the path stop obtaining condition is that the number of the nodes contained in the current node path reaches a preset number threshold.
6. The feature acquisition device of a node adjacency in a graph data structure according to claim 4, wherein the second feature vector acquisition module includes a feature vector decomposition sub-module configured to perform feature vector decomposition on feature vectors of the node adjacency of the target node by using a singular value decomposition method to acquire feature vectors of a final node adjacency;
And/or
The language model is a Word2vec model.
7. A control device comprising a processor and a storage device, said storage device being adapted to store a plurality of program code, characterized in that said program code is adapted to be loaded and executed by said processor to perform the method of feature acquisition of node adjacencies in a graph data structure as claimed in any of claims 1 to 3.
8. A computer readable storage medium having stored therein a plurality of program code adapted to be loaded and executed by a processor to perform the feature acquisition method of node adjacency in a graph data structure as claimed in any one of claims 1 to 3.
CN202111421510.2A 2021-11-26 2021-11-26 Method, device and medium for acquiring characteristics of node adjacency in graph data structure Active CN114168799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111421510.2A CN114168799B (en) 2021-11-26 2021-11-26 Method, device and medium for acquiring characteristics of node adjacency in graph data structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111421510.2A CN114168799B (en) 2021-11-26 2021-11-26 Method, device and medium for acquiring characteristics of node adjacency in graph data structure

Publications (2)

Publication Number Publication Date
CN114168799A CN114168799A (en) 2022-03-11
CN114168799B true CN114168799B (en) 2024-06-11

Family

ID=80480977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111421510.2A Active CN114168799B (en) 2021-11-26 2021-11-26 Method, device and medium for acquiring characteristics of node adjacency in graph data structure

Country Status (1)

Country Link
CN (1) CN114168799B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694957A (en) * 2020-05-29 2020-09-22 新华三大数据技术有限公司 Question list classification method and device based on graph neural network and storage medium
CN112069822A (en) * 2020-09-14 2020-12-11 上海风秩科技有限公司 Method, device and equipment for acquiring word vector representation and readable medium
CN112347316A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 GraphSAGE-based bad preference behavior detection method and device and electronic equipment
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph
CN113066480A (en) * 2021-03-26 2021-07-02 北京达佳互联信息技术有限公司 Voice recognition method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491534B (en) * 2017-08-22 2020-11-20 北京百度网讯科技有限公司 Information processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694957A (en) * 2020-05-29 2020-09-22 新华三大数据技术有限公司 Question list classification method and device based on graph neural network and storage medium
CN112069822A (en) * 2020-09-14 2020-12-11 上海风秩科技有限公司 Method, device and equipment for acquiring word vector representation and readable medium
CN112347316A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 GraphSAGE-based bad preference behavior detection method and device and electronic equipment
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph
CN113066480A (en) * 2021-03-26 2021-07-02 北京达佳互联信息技术有限公司 Voice recognition method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
图结构与Dijkstra算法在无人机航迹规划中的应用;阎昊;樊兴;夏学知;;火力与指挥控制;20100415(04);全文 *
采用RBF神经网络自适应PID控制器和自适应神经元解耦补偿器的多电机同步控制系统;赵亮,刘星桥,陈冲,白雪飞;电气传动;20090101;-(01);全文 *

Also Published As

Publication number Publication date
CN114168799A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
US11681922B2 (en) Performing inference and training using sparse neural network
CN112381216B (en) Training and predicting method and device for mixed graph neural network model
CN110008973B (en) Model training method, method and device for determining target user based on model
CN108875776A (en) Model training method and device, business recommended method and apparatus, electronic equipment
CN104951965A (en) Advertisement delivery method and device
CN113298096B (en) Method, system, electronic device and storage medium for training zero sample classification model
US20190266474A1 (en) Systems And Method For Character Sequence Recognition
US20220237465A1 (en) Performing inference and signal-to-noise ratio based pruning to train sparse neural network architectures
CN112084435A (en) Search ranking model training method and device and search ranking method and device
CN114861842B (en) Few-sample target detection method and device and electronic equipment
US10747845B2 (en) System, method and apparatus for computationally efficient data manipulation
KR20220110620A (en) Neural Hashing for Similarity Search
CN114168799B (en) Method, device and medium for acquiring characteristics of node adjacency in graph data structure
CN116720214A (en) Model training method and device for privacy protection
CN112364198A (en) Cross-modal Hash retrieval method, terminal device and storage medium
CN112381147A (en) Dynamic picture similarity model establishing method and device and similarity calculating method and device
CN113743593B (en) Neural network quantization method, system, storage medium and terminal
CN114797113A (en) Resource prediction method and device based on graph convolution
CN110543549B (en) Semantic equivalence judgment method and device
CN113705589A (en) Data processing method, device and equipment
CN113255933A (en) Feature engineering and graph network generation method and device and distributed system
CN112149566A (en) Image processing method and device, electronic equipment and storage medium
CN115827876B (en) Method and device for determining unlabeled text and electronic equipment
CN113806589B (en) Video clip positioning method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant