CN112463989A - Knowledge graph-based information acquisition method and system - Google Patents
Knowledge graph-based information acquisition method and system Download PDFInfo
- Publication number
- CN112463989A CN112463989A CN202011458989.2A CN202011458989A CN112463989A CN 112463989 A CN112463989 A CN 112463989A CN 202011458989 A CN202011458989 A CN 202011458989A CN 112463989 A CN112463989 A CN 112463989A
- Authority
- CN
- China
- Prior art keywords
- node
- knowledge
- target
- random walk
- mapping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 107
- 238000005295 random walk Methods 0.000 claims abstract description 107
- 238000013507 mapping Methods 0.000 claims abstract description 100
- 239000013598 vector Substances 0.000 claims abstract description 83
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 238000011161 development Methods 0.000 abstract description 6
- 230000008859 change Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 19
- 238000012549 training Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000005653 Brownian motion process Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000005537 brownian motion Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010225 co-occurrence analysis Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a knowledge graph-based information acquisition method and a knowledge graph-based information acquisition system, wherein the method comprises the following steps: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector. According to the knowledge graph-based information acquisition method and system, the knowledge graph can be modularized and called conveniently by establishing a knowledge graph-based text representation method and determining knowledge vectors of all nodes in the graph by adopting a random walk method so as to realize vectorization expression of the knowledge graph. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
Description
Technical Field
The invention relates to the technical field of rail transit, in particular to a knowledge graph-based information acquisition method and system.
Background
The knowledge graph is one of important technologies for assisting decision making, and is widely applied to industries such as medical treatment, finance, electronic commerce, judicial expertise and education.
Knowledge acquisition technology is a key step for creating a knowledge graph, and knowledge extraction generally comprises extraction of entities and extraction of relationships among the entities, and means that named entities are automatically identified from original corpora. At present, the widely adopted method is to predict the sequence text through a long-short term memory network model, and complete the recognition task of entity naming by combining with a Conditional Random Field (CRF). In addition, some knowledge is obtained by artificially creating semantic grammar rules to identify entities and relationships between entities.
From the perspective of natural language processing, knowledge extraction can be considered as a word embedding technique (which can also be referred to as a pre-training technique), thereby improving the effect of downstream tasks. A Word prediction task is designed in a Word2vec model, the text length is determined through a sliding window method, background words and central words are further distinguished, and the Word prediction task mainly relates to two implementation methods of CBOW and SKIP-GRAM. The Fast-CNN method, the GloVe method and the like complete knowledge acquisition tasks by utilizing word structure information and co-occurrence matrix information respectively.
In the process of acquiring knowledge by using the method, the modeling process is complex, or the extracted result has large deviation, so that the requirements of the rail transit in the aspects of actual operation, safety management, equipment maintenance and the like cannot be met.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a knowledge graph-based information acquisition method and system.
The invention provides a knowledge graph-based information acquisition method, which comprises the following steps:
constructing a knowledge graph based on a track traffic text corpus;
executing random walk by taking the target node as an initial node to obtain a target random walk path;
and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
According to the information acquisition method based on the knowledge graph, which is provided by the invention, the knowledge graph is constructed based on the track traffic text corpus, and the method comprises the following steps:
extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph;
setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence;
and acquiring the knowledge graph.
According to the method for acquiring the information based on the knowledge graph, which is provided by the invention, the target node is used as the starting node to execute the random walk to acquire the target random walk path, and the method comprises the following steps:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
According to the knowledge graph-based information acquisition method provided by the invention, the knowledge acquisition and mapping of the target random walk path to generate a knowledge vector comprises the following steps:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the above-mentionedThe mapping constraint condition is to maximize the Prn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
According to the knowledge graph-based information acquisition method provided by the invention, the knowledge acquisition mapping is carried out on the target random walk path to generate a knowledge vector, and the method comprises the following steps of
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
According to the method for acquiring the knowledge graph-based information, the iterative acquisition of the optimal mapping vector of each node specifically comprises the following steps:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
According to the information acquisition method based on the knowledge graph, provided by the invention, the acquisition target node n is mapped intovnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model;
constructing a Huffman tree model according to the quantity relation of the connecting edges of each node;
based on the Huffman tree model, obtaining the probability Pr corresponding to the target node nn。
The invention also provides an information acquisition system based on the knowledge graph, which comprises the following components: the method comprises the following steps:
the map construction unit is used for constructing a knowledge map based on a track traffic text corpus;
a path determining unit, configured to execute random walk by using the target node as an initial node, and acquire a target random walk path;
and the vector generation unit is used for carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the knowledge-graph-based information acquisition method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the knowledge-graph based information acquisition method as described in any one of the above.
According to the knowledge graph-based information acquisition method and system, the knowledge graph can be modularized and called conveniently by establishing a knowledge graph-based text representation method and determining knowledge vectors of all nodes in the graph by adopting a random walk method so as to realize vectorization expression of the knowledge graph. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a knowledge-graph based information acquisition method provided by the present invention;
FIG. 2 is a schematic diagram of a knowledge-graph based information acquisition system according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The knowledge-graph-based information acquisition method and system provided by the embodiment of the invention are described below with reference to fig. 1 to 3.
Fig. 1 is a schematic flow chart of the method for acquiring knowledge-graph-based information according to the present invention, as shown in fig. 1, including but not limited to the following steps:
step S1: constructing a knowledge graph based on a track traffic text corpus;
step S2: executing random walk by taking the target node as an initial node to obtain a target random walk path;
step S3: and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In step S1, the track traffic text corpus may be constructed according to various collected related data, such as maintenance records, data recorded by vehicle-mounted devices, and server process files. The data collection objects may be trains, stations, and dispatch centers, or may be data collected via internet, such as expert database data, and the invention is not limited in particular.
After the track traffic text corpus is created, further comprising: creating a Knowledge Graph (Knowledge Graph) based on the text content recorded in the track traffic text corpus. Knowledge-graph is essentially a semantic network. Its nodes represent entities (entries) or concepts (concepts), and edges represent various semantic relationships between entities or concepts. Therefore, in the invention, each word in the rail transit text corpus can be used as a node, and the association relation between every two words is used as a connecting edge between two nodes to create the knowledge graph.
The constructed knowledge graph combines theories and methods of subjects such as applied mathematics, graphics, information visualization technology, information science and the like with methods such as metrology citation analysis, co-occurrence analysis and the like, and displays complex knowledge in the aspect of rail transit by data mining, information processing, knowledge metering and graph drawing by utilizing the visualized graph, so that accurate and traceable knowledge data are provided for deducing the change development law of the knowledge and intelligent decision making in the field of rail transit.
Further, in step S2, the target node is any node in the knowledge graph, and each node in the knowledge graph is used as an initial node, and a random walk path corresponding to each node is obtained by using a random walk method.
In the constructed knowledge graph, random walk is started from a target node, and then an irregular random walk path formed by each node can be obtained. Optionally, the walking direction of the random walk may be limited according to a relevant constraint condition, and then the target random walk path may be acquired.
The random walk (random walk) is also called random walk, and the random walk and the like means that future development steps and directions cannot be predicted based on past performances. The core concept means that conservation quantities carried by any irregular walker correspond to a diffusion transport law respectively, are close to Brownian motion, and are ideal mathematical states of the Brownian motion.
Further, in step S3, since each node in the acquired random walk path is in text form, the entire knowledge graph is not convenient to store and modularize, and is not used as a specific operation tool for rail transit, such as failure cause analysis using the knowledge graph. According to the knowledge graph-based information acquisition method provided by the invention, knowledge acquisition mapping is carried out on any one acquired target random walk path, so that corresponding knowledge vectors are generated through vectorization of all target random walk paths, and vectorization of the whole knowledge graph can be realized to a certain extent.
Wherein the knowledge vector is synthesized by the sub-vectors corresponding to the words related to each node composing the target random walk path.
The knowledge graph-based information acquisition method provided by the invention realizes vectorization expression of the knowledge graph by establishing a knowledge graph-based text representation method and determining the knowledge vector of each node in the graph by adopting a random walk method, so that the knowledge graph can be modularized and is convenient to call. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
Based on the content of the foregoing embodiment, as an optional embodiment, the constructing a knowledge graph based on the track traffic text corpus includes: extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph; setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence; and acquiring the knowledge graph.
The term extraction is to identify each term in the constituent sentence or text, and usually includes tags such as appointments, organization/organization names, geographic locations, time/date, character values, etc., and the specific tag definition can be adjusted according to different tasks. The connection edge represents the association relationship between the words in a connection mode.
Optionally, since most of the sentences in the rail transit text corpus are chinese sentences, a term frequency-inverse document focus frequency index (TF-IDF) method may be used to extract the words in the sentences in the rail transit text corpus.
Furthermore, since each sentence in the track traffic text corpus not only includes the keyword but also often includes many invalid words, a pyhanlp word segmentation tool can be used to extract the keyword, that is, only the keyword is used as a node of the knowledge graph.
It should be noted that the knowledge graph-based information acquisition method provided by the present invention specifically defines what manner is adopted for word extraction.
Further, because the distribution of words in the language model follows power law distribution, the knowledge graph constructed by the invention is preferably a scale free (scale free) graph. Therefore, in the knowledge graph constructed by the invention, the nodes are used for representing words, the connection of the nodes and the nodes means that two words appear in the same sentence, the weight value of the edge in the graph is set to be constant 1, and the acquired knowledge graph is an unweighted and phase-free graph, namely, the node A on the knowledge graph is connected with the node B, and then the node B is also connected with the node B.
The invention provides an information acquisition method based on a knowledge graph. According to the distribution characteristics of the language model, namely, common words are often concentrated in a small part, and most of the common words are words which are rarely called, the knowledge graph is set to be a scale free graph, so that the real data characteristics can be reflected, and the complexity of graph construction can be effectively reduced.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing random walk with the target node as the start node to obtain the target random walk path includes:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
Specifically, in the process of random walk path generation, the parameter γ is determined first, meaning that each node generates γ paths. In addition, the length of each random walk path is defined as m, that is, each random walk path contains m words.
It should be noted that, in the process of generating the random walk path, the process of generating the random walk path is a process of generating node by node iteratively, wherein the selection of the next node can trace back to the previous node, and such a selection also conforms to the actual situation in the natural language processing. When the next node is selected, the next node can be selected by adopting a probability calculation mode of normal distribution.
For example, in the case that the current node is the nth node (n < m), and the n +1 th node needs to be determined, since the nth node has γ wandering paths that can be selected, that is, γ nodes may be the n +1 th node on the target wandering path. The probability of the gamma nodes serving as the (n + 1) th node can be respectively calculated by adopting a normal distribution probability method, and the node with the maximum probability is determined as the optimal wandering node.
By adopting the method, the next node is iteratively confirmed step by step from the next node of the target node, and the target random walk path meeting the requirement can be obtained.
For the target walking path, according to a language model, a calculation method of a node can be expressed as follows:
wherein, t1,t2,t3,…,tγRespectively gamma nodes on the target wandering path, k is an intermediate parameter, p (t)γ) Is the probability that the gamma-th node becomes the optimal wandering node of the gamma-1 th node.
According to the knowledge graph-based information acquisition method provided by the invention, in the process of acquiring the target random walk path, the probability of the next node of the target node is calculated by adopting normal distribution, and the target random walk path is generated step by step in a mode. The method conforms to the distribution rule of words in the language model, and can effectively improve the precision and robustness of path prediction.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing knowledge acquisition mapping on the target random walk path to generate a knowledge vector includes:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
The invention provides a method for vectorizing an obtained target random walk path, namely sequentially obtaining vectorization mapping of each node on the target random walk path.
The nodes passed by a certain random walk path are assumed to be: node 1-node 2-node 3-node 4-node 5-node 6.
Since the mapping of node 3 will be related to nodes 2 and 4, then the mapping of node 3 needs to be maximized by satisfying the following equation:
Pr(v2/v3);
Pr(v4/v3);
where v represents the vector representation of a node, Pr is the probability of mapping a node to a corresponding vector, and the two formulas represent: at a given v3In the case of (2), v is maximized respectively2And v4I.e. the following mapping constraints are satisfied: v. of3The previous node of the corresponding target node is mapped as v2Probability of and v3The latter node of the corresponding target node is mapped as v4Is required to be maximized.
And according to the last method, iterating to carry out vectorization representation of each node until the text vectorization of all the nodes is finished. And synthesizing the optimal mapping of the nodes, so as to obtain the target knowledge vector.
The knowledge graph-based information acquisition method provided by the invention determines the mapping mode of the target node according to the mapping probability of two adjacent nodes of each target node in the random walk path by utilizing a step-by-step iteration mode and according to the constraint condition, thereby effectively improving the mapping robustness and the mapping accuracy.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing knowledge acquisition mapping on the target random walk path to generate a knowledge vector includes
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
In the knowledge graph-based information acquisition method provided by the invention, firstly, a sliding window algorithm is adopted, and the probability average value of a certain number of nodes before and after the target node is used as the reference of the mapping direction.
The sliding Window algorithm (Moving Window) controls the traffic volume by limiting the maximum number of cells that can be received in each value Window. In the invention, the number of words (nodes) in one calculation is determined by a sliding window algorithm.
Wherein the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zRespectively obtaining the probabilities of the first z nodes of the target node on the target random walk path, and then obtaining the average value of the probabilities of all the z nodes as Prn-z. Similarly, the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+zRespectively obtaining the probabilities of the target nodes at the last z nodes of the target random walk path, and then obtaining the average value of the probabilities of all the z nodes as Prn+z。
According to the knowledge graph-based information acquisition method, the mapping of the target node is calculated by adopting a sliding window algorithm, the calculated amount is effectively reduced, and the accuracy and the robustness of the mapping can be greatly improved by taking an average value.
Based on the content of the foregoing embodiment, as an optional embodiment, the iteratively obtaining the optimal mapping vector of each node specifically includes:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
Because the iterative operation step is involved in the training process of calculating the optimal mapping vector of each node, in order to effectively improve the calculation efficiency, the knowledge graph-based information acquisition method provided by the invention adopts a random gradient descent and error back propagation method in the iterative calculation process.
The random Gradient Descent (Batch Gradient Descent) method solves the problem of overlarge training data. Each time the model parameters are updated, only q training data need to be processed, where q is a constant much smaller than the total number of training samples M (usually an integer power of 2), which greatly speeds up the training process.
The basic idea of the BP algorithm is that a learning process is composed of two processes of forward propagation of a signal and backward propagation of an Error. The forward propagation includes: input sample-input layer-hidden layers (processing) -output layer; if the actual output of the output layer does not match the expected output (teacher signal), the error back propagation process is carried out. The error back propagation includes: the method comprises the steps of outputting an error (in a certain form), hiding a layer (layer by layer), inputting the layer, and mainly aiming at distributing the error to all units of each layer by reversely transmitting the output error so as to obtain an error signal of each layer unit and further correcting the weight of each unit, namely the process of reversely transmitting the error is a process of adjusting the weight.
Based on the content of the foregoing embodiment, as an optional embodiment, the obtaining target node n is mapped as vnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model; constructing a Huffman tree model according to the quantity relation of the connecting edges of each node; and acquiring the probability Prn corresponding to the target node n based on the Huffman tree model.
Wherein, the word2vec (word to vector) model is a correlation model for generating word vectors. These models are shallow, double-layered neural networks that are represented by words and that have to guess the input words in neighboring positions, the order of the words is unimportant under the assumption of the bag-of-words model in word2 vec. After training is complete, the word2vec model may be used to map each node (word) to a vector representation, which may be used to represent the word-to-word relationship.
Regarding a Huffman Tree (Huffman Tree) model, given N weights as N leaf nodes, a binary Tree is constructed, and if the weighted path length of the Tree reaches the minimum, such binary Tree is called an optimal binary Tree, also called a Huffman Tree. The Huffman tree is the tree with the shortest path length and the node with the larger weight value is closer to the root.
In the above embodiment provided by the present invention, for a certain walking path, according to the language model, the calculation method of the node is as follows:
obviously, the calculation training can perform a large amount of node updates, which is not beneficial to the processing of large-scale graphs, and therefore, in the calculation process of the knowledge vector, the hierarchical softmax method in word2vec can be selected.
In the method, the nodes in the knowledge graph are constructed according to the quantity relation of the connecting edges of each node from more to less. The vector training process for a certain node can be approximated as a certain path from the root node to the leaf node, thereby greatly reducing computational complexity.
Fig. 2 is a schematic structural diagram of an information acquisition system based on a knowledge graph provided in the present invention, as shown in fig. 2, the system mainly includes a graph construction unit 1, a path determination unit 2, and a vector generation unit 3, where:
the map construction unit 1 is mainly used for constructing a knowledge map based on a track traffic text corpus;
the path determining unit 2 is mainly configured to execute random walk by using a target node as an initial node to obtain a target random walk path;
the vector generation unit 3 is mainly configured to perform knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
Specifically, on one hand, the map building unit 1 is used for building a track traffic text corpus from various collected relevant data such as maintenance records, data recorded by vehicle-mounted equipment, server processing process files and the like; on the other hand, the map building unit 1 is further configured to, after the creating of the track traffic text corpus is completed, further include: creating a text-based knowledge graph based on all text contents recorded in the rail transit text corpus, including: each word in the rail transit text corpus can be used as a node, and the association relationship between every two words can be used as a connecting edge between two nodes to create the knowledge graph.
Further, the path determining unit 2 obtains a random walk path corresponding to each node by using each node in the knowledge graph as an initial node and by using a random walk method.
Further, the vector generation unit 3 generates a corresponding knowledge vector by vectorizing all the target random walk paths by knowledge acquisition mapping of any one of the acquired target random walk paths, and can achieve vectorization of the entire knowledge map to some extent.
The knowledge graph-based information acquisition system provided by the invention realizes vectorization expression of the knowledge graph by establishing a knowledge graph-based text representation method and determining the knowledge vector of each node in the graph by adopting a random walk method, so that the knowledge graph can be modularized and is convenient to call. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
It should be noted that, when specifically executed, the system for improving train positioning accuracy provided in the embodiment of the present invention may be implemented based on the method for improving train positioning accuracy described in any of the above embodiments, and details of this embodiment are not described herein.
Fig. 3 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a knowledge-graph based information acquisition method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method for obtaining knowledge-graph-based information provided by the above methods, the method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for obtaining knowledge-graph-based information provided in the above embodiments, the method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A knowledge graph-based information acquisition method is characterized by comprising the following steps:
constructing a knowledge graph based on a track traffic text corpus;
executing random walk by taking the target node as an initial node to obtain a target random walk path;
and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
2. The knowledge-graph-based information acquisition method according to claim 1, wherein the construction of the knowledge graph based on the track traffic text corpus comprises:
extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph;
setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence;
and acquiring the knowledge graph.
3. The method according to claim 1, wherein the performing random walk with the target node as a start node to obtain a target random walk path comprises:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
4. The knowledge-graph-based information acquisition method according to claim 3, wherein the mapping knowledge acquisition of the target random walk path to generate a knowledge vector comprises:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
5. The method of claim 3, wherein the mapping knowledge acquisition of the target random walk path to generate a knowledge vector comprises
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z;
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
6. The method according to claim 4 or 5, wherein the iteratively obtaining the optimal mapping vector of each node specifically comprises:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
7. The knowledge-graph-based information acquisition method according to claim 4, wherein the acquisition target node n is mapped as vnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model;
constructing a Huffman tree model according to the quantity relation of the connecting edges of each node;
based on the Huffman tree model, obtaining the probability Pr corresponding to the target node nn。
8. A knowledge-graph-based information acquisition system, comprising:
the map construction unit is used for constructing a knowledge map based on a track traffic text corpus;
a path determining unit, configured to execute random walk by using the target node as an initial node, and acquire a target random walk path;
and the vector generation unit is used for carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the knowledge-graph based information acquisition method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for knowledge-graph based information acquisition of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011458989.2A CN112463989A (en) | 2020-12-11 | 2020-12-11 | Knowledge graph-based information acquisition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011458989.2A CN112463989A (en) | 2020-12-11 | 2020-12-11 | Knowledge graph-based information acquisition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112463989A true CN112463989A (en) | 2021-03-09 |
Family
ID=74803698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011458989.2A Pending CN112463989A (en) | 2020-12-11 | 2020-12-11 | Knowledge graph-based information acquisition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112463989A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113553443A (en) * | 2021-07-18 | 2021-10-26 | 北京智慧星光信息技术有限公司 | Relation map generation method and system for recording migration path of knowledge map |
CN114880484A (en) * | 2022-05-11 | 2022-08-09 | 军事科学院系统工程研究院网络信息研究所 | Satellite communication frequency-orbit resource map construction method based on vector mapping |
CN116737745A (en) * | 2023-08-16 | 2023-09-12 | 杭州州力数据科技有限公司 | Method and device for updating entity vector representation in supply chain network diagram |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080275902A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Web page analysis using multiple graphs |
US20120330864A1 (en) * | 2011-06-21 | 2012-12-27 | Microsoft Corporation | Fast personalized page rank on map reduce |
US20150032767A1 (en) * | 2013-07-26 | 2015-01-29 | Microsoft Corporation | Query expansion and query-document matching using path-constrained random walks |
CN108287881A (en) * | 2017-12-29 | 2018-07-17 | 北京理工大学 | A kind of optimization method found based on random walk relationship |
US20190122145A1 (en) * | 2017-10-23 | 2019-04-25 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for extracting information |
CN109992670A (en) * | 2019-04-04 | 2019-07-09 | 西安交通大学 | A kind of map completion method of knowledge based map neighbour structure |
CN110704636A (en) * | 2019-09-27 | 2020-01-17 | 吉林大学 | Improved Node2 vec-based knowledge graph vector representation method |
CN110807103A (en) * | 2019-10-18 | 2020-02-18 | 中国银联股份有限公司 | Knowledge graph construction method and device, electronic equipment and storage medium |
CN111079004A (en) * | 2019-12-06 | 2020-04-28 | 成都理工大学 | Three-part graph random walk recommendation method based on word2vec label similarity |
CN111241241A (en) * | 2020-01-08 | 2020-06-05 | 平安科技(深圳)有限公司 | Case retrieval method, device and equipment based on knowledge graph and storage medium |
CN111444317A (en) * | 2020-03-17 | 2020-07-24 | 杭州电子科技大学 | Semantic-sensitive knowledge graph random walk sampling method |
CN111597350A (en) * | 2020-04-30 | 2020-08-28 | 西安理工大学 | Rail transit event knowledge map construction method based on deep learning |
CN111597420A (en) * | 2020-04-29 | 2020-08-28 | 西安理工大学 | Deep learning-based rail transit standard relation extraction method |
CN112052342A (en) * | 2020-09-04 | 2020-12-08 | 西南大学 | Learning path recommendation method and system based on online test result big data analysis |
-
2020
- 2020-12-11 CN CN202011458989.2A patent/CN112463989A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080275902A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Web page analysis using multiple graphs |
US20120330864A1 (en) * | 2011-06-21 | 2012-12-27 | Microsoft Corporation | Fast personalized page rank on map reduce |
US20150032767A1 (en) * | 2013-07-26 | 2015-01-29 | Microsoft Corporation | Query expansion and query-document matching using path-constrained random walks |
US20190122145A1 (en) * | 2017-10-23 | 2019-04-25 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for extracting information |
CN108287881A (en) * | 2017-12-29 | 2018-07-17 | 北京理工大学 | A kind of optimization method found based on random walk relationship |
CN109992670A (en) * | 2019-04-04 | 2019-07-09 | 西安交通大学 | A kind of map completion method of knowledge based map neighbour structure |
CN110704636A (en) * | 2019-09-27 | 2020-01-17 | 吉林大学 | Improved Node2 vec-based knowledge graph vector representation method |
CN110807103A (en) * | 2019-10-18 | 2020-02-18 | 中国银联股份有限公司 | Knowledge graph construction method and device, electronic equipment and storage medium |
CN111079004A (en) * | 2019-12-06 | 2020-04-28 | 成都理工大学 | Three-part graph random walk recommendation method based on word2vec label similarity |
CN111241241A (en) * | 2020-01-08 | 2020-06-05 | 平安科技(深圳)有限公司 | Case retrieval method, device and equipment based on knowledge graph and storage medium |
CN111444317A (en) * | 2020-03-17 | 2020-07-24 | 杭州电子科技大学 | Semantic-sensitive knowledge graph random walk sampling method |
CN111597420A (en) * | 2020-04-29 | 2020-08-28 | 西安理工大学 | Deep learning-based rail transit standard relation extraction method |
CN111597350A (en) * | 2020-04-30 | 2020-08-28 | 西安理工大学 | Rail transit event knowledge map construction method based on deep learning |
CN112052342A (en) * | 2020-09-04 | 2020-12-08 | 西南大学 | Learning path recommendation method and system based on online test result big data analysis |
Non-Patent Citations (2)
Title |
---|
吴运兵;朱丹红;廖祥文;张栋;林开标;: "路径张量分解的知识图谱推理算法", 模式识别与人工智能, no. 05, 15 May 2017 (2017-05-15) * |
杨晓慧等: "基于符号语义映射的知识图谱表示学习算法", 《计算机研究与发展》, vol. 55, no. 8, 31 August 2018 (2018-08-31), pages 1773 - 1784 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113553443A (en) * | 2021-07-18 | 2021-10-26 | 北京智慧星光信息技术有限公司 | Relation map generation method and system for recording migration path of knowledge map |
CN113553443B (en) * | 2021-07-18 | 2023-08-22 | 北京智慧星光信息技术有限公司 | Relation map generation method and system for recording knowledge map migration path |
CN114880484A (en) * | 2022-05-11 | 2022-08-09 | 军事科学院系统工程研究院网络信息研究所 | Satellite communication frequency-orbit resource map construction method based on vector mapping |
CN114880484B (en) * | 2022-05-11 | 2023-06-16 | 军事科学院系统工程研究院网络信息研究所 | Satellite communication frequency track resource map construction method based on vector mapping |
CN116737745A (en) * | 2023-08-16 | 2023-09-12 | 杭州州力数据科技有限公司 | Method and device for updating entity vector representation in supply chain network diagram |
CN116737745B (en) * | 2023-08-16 | 2023-10-31 | 杭州州力数据科技有限公司 | Method and device for updating entity vector representation in supply chain network diagram |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11501182B2 (en) | Method and apparatus for generating model | |
CN106202010B (en) | Method and apparatus based on deep neural network building Law Text syntax tree | |
CN112463989A (en) | Knowledge graph-based information acquisition method and system | |
CN106910497A (en) | A kind of Chinese word pronunciation Forecasting Methodology and device | |
US20230244704A1 (en) | Sequenced data processing method and device, and text processing method and device | |
CN112100406B (en) | Data processing method, device, equipment and medium | |
CN109344242B (en) | Dialogue question-answering method, device, equipment and storage medium | |
CN116204674B (en) | Image description method based on visual concept word association structural modeling | |
CN113627797B (en) | Method, device, computer equipment and storage medium for generating staff member portrait | |
CN113204611A (en) | Method for establishing reading understanding model, reading understanding method and corresponding device | |
CN113705196A (en) | Chinese open information extraction method and device based on graph neural network | |
CN114297399A (en) | Knowledge graph generation method, knowledge graph generation system, storage medium and electronic equipment | |
CN114528398A (en) | Emotion prediction method and system based on interactive double-graph convolutional network | |
CN113111190A (en) | Knowledge-driven dialog generation method and device | |
CN116245097A (en) | Method for training entity recognition model, entity recognition method and corresponding device | |
CN111241843B (en) | Semantic relation inference system and method based on composite neural network | |
CN109979461A (en) | A kind of voice translation method and device | |
CN114490926A (en) | Method and device for determining similar problems, storage medium and terminal | |
CN114298031A (en) | Text processing method, computer device and storage medium | |
CN110046344A (en) | Add the method and terminal device of separator | |
CN113705207A (en) | Grammar error recognition method and device | |
CN112364659A (en) | Unsupervised semantic representation automatic identification method and unsupervised semantic representation automatic identification device | |
CN115587192A (en) | Relationship information extraction method, device and computer readable storage medium | |
CN112182253B (en) | Data processing method, data processing equipment and computer readable storage medium | |
CN114998041A (en) | Method and device for training claim settlement prediction model, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |