CN112463989A - Knowledge graph-based information acquisition method and system - Google Patents

Knowledge graph-based information acquisition method and system Download PDF

Info

Publication number
CN112463989A
CN112463989A CN202011458989.2A CN202011458989A CN112463989A CN 112463989 A CN112463989 A CN 112463989A CN 202011458989 A CN202011458989 A CN 202011458989A CN 112463989 A CN112463989 A CN 112463989A
Authority
CN
China
Prior art keywords
node
knowledge
target
random walk
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011458989.2A
Other languages
Chinese (zh)
Inventor
李振
肖骁
郜春海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Traffic Control Technology TCT Co Ltd
Original Assignee
Traffic Control Technology TCT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Traffic Control Technology TCT Co Ltd filed Critical Traffic Control Technology TCT Co Ltd
Priority to CN202011458989.2A priority Critical patent/CN112463989A/en
Publication of CN112463989A publication Critical patent/CN112463989A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a knowledge graph-based information acquisition method and a knowledge graph-based information acquisition system, wherein the method comprises the following steps: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector. According to the knowledge graph-based information acquisition method and system, the knowledge graph can be modularized and called conveniently by establishing a knowledge graph-based text representation method and determining knowledge vectors of all nodes in the graph by adopting a random walk method so as to realize vectorization expression of the knowledge graph. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.

Description

Knowledge graph-based information acquisition method and system
Technical Field
The invention relates to the technical field of rail transit, in particular to a knowledge graph-based information acquisition method and system.
Background
The knowledge graph is one of important technologies for assisting decision making, and is widely applied to industries such as medical treatment, finance, electronic commerce, judicial expertise and education.
Knowledge acquisition technology is a key step for creating a knowledge graph, and knowledge extraction generally comprises extraction of entities and extraction of relationships among the entities, and means that named entities are automatically identified from original corpora. At present, the widely adopted method is to predict the sequence text through a long-short term memory network model, and complete the recognition task of entity naming by combining with a Conditional Random Field (CRF). In addition, some knowledge is obtained by artificially creating semantic grammar rules to identify entities and relationships between entities.
From the perspective of natural language processing, knowledge extraction can be considered as a word embedding technique (which can also be referred to as a pre-training technique), thereby improving the effect of downstream tasks. A Word prediction task is designed in a Word2vec model, the text length is determined through a sliding window method, background words and central words are further distinguished, and the Word prediction task mainly relates to two implementation methods of CBOW and SKIP-GRAM. The Fast-CNN method, the GloVe method and the like complete knowledge acquisition tasks by utilizing word structure information and co-occurrence matrix information respectively.
In the process of acquiring knowledge by using the method, the modeling process is complex, or the extracted result has large deviation, so that the requirements of the rail transit in the aspects of actual operation, safety management, equipment maintenance and the like cannot be met.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a knowledge graph-based information acquisition method and system.
The invention provides a knowledge graph-based information acquisition method, which comprises the following steps:
constructing a knowledge graph based on a track traffic text corpus;
executing random walk by taking the target node as an initial node to obtain a target random walk path;
and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
According to the information acquisition method based on the knowledge graph, which is provided by the invention, the knowledge graph is constructed based on the track traffic text corpus, and the method comprises the following steps:
extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph;
setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence;
and acquiring the knowledge graph.
According to the method for acquiring the information based on the knowledge graph, which is provided by the invention, the target node is used as the starting node to execute the random walk to acquire the target random walk path, and the method comprises the following steps:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
According to the knowledge graph-based information acquisition method provided by the invention, the knowledge acquisition and mapping of the target random walk path to generate a knowledge vector comprises the following steps:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the above-mentionedThe mapping constraint condition is to maximize the Prn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
According to the knowledge graph-based information acquisition method provided by the invention, the knowledge acquisition mapping is carried out on the target random walk path to generate a knowledge vector, and the method comprises the following steps of
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
According to the method for acquiring the knowledge graph-based information, the iterative acquisition of the optimal mapping vector of each node specifically comprises the following steps:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
According to the information acquisition method based on the knowledge graph, provided by the invention, the acquisition target node n is mapped intovnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model;
constructing a Huffman tree model according to the quantity relation of the connecting edges of each node;
based on the Huffman tree model, obtaining the probability Pr corresponding to the target node nn
The invention also provides an information acquisition system based on the knowledge graph, which comprises the following components: the method comprises the following steps:
the map construction unit is used for constructing a knowledge map based on a track traffic text corpus;
a path determining unit, configured to execute random walk by using the target node as an initial node, and acquire a target random walk path;
and the vector generation unit is used for carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the knowledge-graph-based information acquisition method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the knowledge-graph based information acquisition method as described in any one of the above.
According to the knowledge graph-based information acquisition method and system, the knowledge graph can be modularized and called conveniently by establishing a knowledge graph-based text representation method and determining knowledge vectors of all nodes in the graph by adopting a random walk method so as to realize vectorization expression of the knowledge graph. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a knowledge-graph based information acquisition method provided by the present invention;
FIG. 2 is a schematic diagram of a knowledge-graph based information acquisition system according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The knowledge-graph-based information acquisition method and system provided by the embodiment of the invention are described below with reference to fig. 1 to 3.
Fig. 1 is a schematic flow chart of the method for acquiring knowledge-graph-based information according to the present invention, as shown in fig. 1, including but not limited to the following steps:
step S1: constructing a knowledge graph based on a track traffic text corpus;
step S2: executing random walk by taking the target node as an initial node to obtain a target random walk path;
step S3: and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In step S1, the track traffic text corpus may be constructed according to various collected related data, such as maintenance records, data recorded by vehicle-mounted devices, and server process files. The data collection objects may be trains, stations, and dispatch centers, or may be data collected via internet, such as expert database data, and the invention is not limited in particular.
After the track traffic text corpus is created, further comprising: creating a Knowledge Graph (Knowledge Graph) based on the text content recorded in the track traffic text corpus. Knowledge-graph is essentially a semantic network. Its nodes represent entities (entries) or concepts (concepts), and edges represent various semantic relationships between entities or concepts. Therefore, in the invention, each word in the rail transit text corpus can be used as a node, and the association relation between every two words is used as a connecting edge between two nodes to create the knowledge graph.
The constructed knowledge graph combines theories and methods of subjects such as applied mathematics, graphics, information visualization technology, information science and the like with methods such as metrology citation analysis, co-occurrence analysis and the like, and displays complex knowledge in the aspect of rail transit by data mining, information processing, knowledge metering and graph drawing by utilizing the visualized graph, so that accurate and traceable knowledge data are provided for deducing the change development law of the knowledge and intelligent decision making in the field of rail transit.
Further, in step S2, the target node is any node in the knowledge graph, and each node in the knowledge graph is used as an initial node, and a random walk path corresponding to each node is obtained by using a random walk method.
In the constructed knowledge graph, random walk is started from a target node, and then an irregular random walk path formed by each node can be obtained. Optionally, the walking direction of the random walk may be limited according to a relevant constraint condition, and then the target random walk path may be acquired.
The random walk (random walk) is also called random walk, and the random walk and the like means that future development steps and directions cannot be predicted based on past performances. The core concept means that conservation quantities carried by any irregular walker correspond to a diffusion transport law respectively, are close to Brownian motion, and are ideal mathematical states of the Brownian motion.
Further, in step S3, since each node in the acquired random walk path is in text form, the entire knowledge graph is not convenient to store and modularize, and is not used as a specific operation tool for rail transit, such as failure cause analysis using the knowledge graph. According to the knowledge graph-based information acquisition method provided by the invention, knowledge acquisition mapping is carried out on any one acquired target random walk path, so that corresponding knowledge vectors are generated through vectorization of all target random walk paths, and vectorization of the whole knowledge graph can be realized to a certain extent.
Wherein the knowledge vector is synthesized by the sub-vectors corresponding to the words related to each node composing the target random walk path.
The knowledge graph-based information acquisition method provided by the invention realizes vectorization expression of the knowledge graph by establishing a knowledge graph-based text representation method and determining the knowledge vector of each node in the graph by adopting a random walk method, so that the knowledge graph can be modularized and is convenient to call. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
Based on the content of the foregoing embodiment, as an optional embodiment, the constructing a knowledge graph based on the track traffic text corpus includes: extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph; setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence; and acquiring the knowledge graph.
The term extraction is to identify each term in the constituent sentence or text, and usually includes tags such as appointments, organization/organization names, geographic locations, time/date, character values, etc., and the specific tag definition can be adjusted according to different tasks. The connection edge represents the association relationship between the words in a connection mode.
Optionally, since most of the sentences in the rail transit text corpus are chinese sentences, a term frequency-inverse document focus frequency index (TF-IDF) method may be used to extract the words in the sentences in the rail transit text corpus.
Furthermore, since each sentence in the track traffic text corpus not only includes the keyword but also often includes many invalid words, a pyhanlp word segmentation tool can be used to extract the keyword, that is, only the keyword is used as a node of the knowledge graph.
It should be noted that the knowledge graph-based information acquisition method provided by the present invention specifically defines what manner is adopted for word extraction.
Further, because the distribution of words in the language model follows power law distribution, the knowledge graph constructed by the invention is preferably a scale free (scale free) graph. Therefore, in the knowledge graph constructed by the invention, the nodes are used for representing words, the connection of the nodes and the nodes means that two words appear in the same sentence, the weight value of the edge in the graph is set to be constant 1, and the acquired knowledge graph is an unweighted and phase-free graph, namely, the node A on the knowledge graph is connected with the node B, and then the node B is also connected with the node B.
The invention provides an information acquisition method based on a knowledge graph. According to the distribution characteristics of the language model, namely, common words are often concentrated in a small part, and most of the common words are words which are rarely called, the knowledge graph is set to be a scale free graph, so that the real data characteristics can be reflected, and the complexity of graph construction can be effectively reduced.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing random walk with the target node as the start node to obtain the target random walk path includes:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
Specifically, in the process of random walk path generation, the parameter γ is determined first, meaning that each node generates γ paths. In addition, the length of each random walk path is defined as m, that is, each random walk path contains m words.
It should be noted that, in the process of generating the random walk path, the process of generating the random walk path is a process of generating node by node iteratively, wherein the selection of the next node can trace back to the previous node, and such a selection also conforms to the actual situation in the natural language processing. When the next node is selected, the next node can be selected by adopting a probability calculation mode of normal distribution.
For example, in the case that the current node is the nth node (n < m), and the n +1 th node needs to be determined, since the nth node has γ wandering paths that can be selected, that is, γ nodes may be the n +1 th node on the target wandering path. The probability of the gamma nodes serving as the (n + 1) th node can be respectively calculated by adopting a normal distribution probability method, and the node with the maximum probability is determined as the optimal wandering node.
By adopting the method, the next node is iteratively confirmed step by step from the next node of the target node, and the target random walk path meeting the requirement can be obtained.
For the target walking path, according to a language model, a calculation method of a node can be expressed as follows:
Figure BDA0002830565370000091
wherein, t1,t2,t3,…,tγRespectively gamma nodes on the target wandering path, k is an intermediate parameter, p (t)γ) Is the probability that the gamma-th node becomes the optimal wandering node of the gamma-1 th node.
According to the knowledge graph-based information acquisition method provided by the invention, in the process of acquiring the target random walk path, the probability of the next node of the target node is calculated by adopting normal distribution, and the target random walk path is generated step by step in a mode. The method conforms to the distribution rule of words in the language model, and can effectively improve the precision and robustness of path prediction.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing knowledge acquisition mapping on the target random walk path to generate a knowledge vector includes:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
The invention provides a method for vectorizing an obtained target random walk path, namely sequentially obtaining vectorization mapping of each node on the target random walk path.
The nodes passed by a certain random walk path are assumed to be: node 1-node 2-node 3-node 4-node 5-node 6.
Since the mapping of node 3 will be related to nodes 2 and 4, then the mapping of node 3 needs to be maximized by satisfying the following equation:
Pr(v2/v3);
Pr(v4/v3);
where v represents the vector representation of a node, Pr is the probability of mapping a node to a corresponding vector, and the two formulas represent: at a given v3In the case of (2), v is maximized respectively2And v4I.e. the following mapping constraints are satisfied: v. of3The previous node of the corresponding target node is mapped as v2Probability of and v3The latter node of the corresponding target node is mapped as v4Is required to be maximized.
And according to the last method, iterating to carry out vectorization representation of each node until the text vectorization of all the nodes is finished. And synthesizing the optimal mapping of the nodes, so as to obtain the target knowledge vector.
The knowledge graph-based information acquisition method provided by the invention determines the mapping mode of the target node according to the mapping probability of two adjacent nodes of each target node in the random walk path by utilizing a step-by-step iteration mode and according to the constraint condition, thereby effectively improving the mapping robustness and the mapping accuracy.
Based on the content of the foregoing embodiment, as an optional embodiment, the performing knowledge acquisition mapping on the target random walk path to generate a knowledge vector includes
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
In the knowledge graph-based information acquisition method provided by the invention, firstly, a sliding window algorithm is adopted, and the probability average value of a certain number of nodes before and after the target node is used as the reference of the mapping direction.
The sliding Window algorithm (Moving Window) controls the traffic volume by limiting the maximum number of cells that can be received in each value Window. In the invention, the number of words (nodes) in one calculation is determined by a sliding window algorithm.
Wherein the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zRespectively obtaining the probabilities of the first z nodes of the target node on the target random walk path, and then obtaining the average value of the probabilities of all the z nodes as Prn-z. Similarly, the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+zRespectively obtaining the probabilities of the target nodes at the last z nodes of the target random walk path, and then obtaining the average value of the probabilities of all the z nodes as Prn+z
According to the knowledge graph-based information acquisition method, the mapping of the target node is calculated by adopting a sliding window algorithm, the calculated amount is effectively reduced, and the accuracy and the robustness of the mapping can be greatly improved by taking an average value.
Based on the content of the foregoing embodiment, as an optional embodiment, the iteratively obtaining the optimal mapping vector of each node specifically includes:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
Because the iterative operation step is involved in the training process of calculating the optimal mapping vector of each node, in order to effectively improve the calculation efficiency, the knowledge graph-based information acquisition method provided by the invention adopts a random gradient descent and error back propagation method in the iterative calculation process.
The random Gradient Descent (Batch Gradient Descent) method solves the problem of overlarge training data. Each time the model parameters are updated, only q training data need to be processed, where q is a constant much smaller than the total number of training samples M (usually an integer power of 2), which greatly speeds up the training process.
The basic idea of the BP algorithm is that a learning process is composed of two processes of forward propagation of a signal and backward propagation of an Error. The forward propagation includes: input sample-input layer-hidden layers (processing) -output layer; if the actual output of the output layer does not match the expected output (teacher signal), the error back propagation process is carried out. The error back propagation includes: the method comprises the steps of outputting an error (in a certain form), hiding a layer (layer by layer), inputting the layer, and mainly aiming at distributing the error to all units of each layer by reversely transmitting the output error so as to obtain an error signal of each layer unit and further correcting the weight of each unit, namely the process of reversely transmitting the error is a process of adjusting the weight.
Based on the content of the foregoing embodiment, as an optional embodiment, the obtaining target node n is mapped as vnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model; constructing a Huffman tree model according to the quantity relation of the connecting edges of each node; and acquiring the probability Prn corresponding to the target node n based on the Huffman tree model.
Wherein, the word2vec (word to vector) model is a correlation model for generating word vectors. These models are shallow, double-layered neural networks that are represented by words and that have to guess the input words in neighboring positions, the order of the words is unimportant under the assumption of the bag-of-words model in word2 vec. After training is complete, the word2vec model may be used to map each node (word) to a vector representation, which may be used to represent the word-to-word relationship.
Regarding a Huffman Tree (Huffman Tree) model, given N weights as N leaf nodes, a binary Tree is constructed, and if the weighted path length of the Tree reaches the minimum, such binary Tree is called an optimal binary Tree, also called a Huffman Tree. The Huffman tree is the tree with the shortest path length and the node with the larger weight value is closer to the root.
In the above embodiment provided by the present invention, for a certain walking path, according to the language model, the calculation method of the node is as follows:
Figure BDA0002830565370000131
obviously, the calculation training can perform a large amount of node updates, which is not beneficial to the processing of large-scale graphs, and therefore, in the calculation process of the knowledge vector, the hierarchical softmax method in word2vec can be selected.
In the method, the nodes in the knowledge graph are constructed according to the quantity relation of the connecting edges of each node from more to less. The vector training process for a certain node can be approximated as a certain path from the root node to the leaf node, thereby greatly reducing computational complexity.
Fig. 2 is a schematic structural diagram of an information acquisition system based on a knowledge graph provided in the present invention, as shown in fig. 2, the system mainly includes a graph construction unit 1, a path determination unit 2, and a vector generation unit 3, where:
the map construction unit 1 is mainly used for constructing a knowledge map based on a track traffic text corpus;
the path determining unit 2 is mainly configured to execute random walk by using a target node as an initial node to obtain a target random walk path;
the vector generation unit 3 is mainly configured to perform knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
Specifically, on one hand, the map building unit 1 is used for building a track traffic text corpus from various collected relevant data such as maintenance records, data recorded by vehicle-mounted equipment, server processing process files and the like; on the other hand, the map building unit 1 is further configured to, after the creating of the track traffic text corpus is completed, further include: creating a text-based knowledge graph based on all text contents recorded in the rail transit text corpus, including: each word in the rail transit text corpus can be used as a node, and the association relationship between every two words can be used as a connecting edge between two nodes to create the knowledge graph.
Further, the path determining unit 2 obtains a random walk path corresponding to each node by using each node in the knowledge graph as an initial node and by using a random walk method.
Further, the vector generation unit 3 generates a corresponding knowledge vector by vectorizing all the target random walk paths by knowledge acquisition mapping of any one of the acquired target random walk paths, and can achieve vectorization of the entire knowledge map to some extent.
The knowledge graph-based information acquisition system provided by the invention realizes vectorization expression of the knowledge graph by establishing a knowledge graph-based text representation method and determining the knowledge vector of each node in the graph by adopting a random walk method, so that the knowledge graph can be modularized and is convenient to call. And furthermore, the change development rule of knowledge can be deduced according to the constructed knowledge map, and accurate and traceable knowledge data are provided for intelligent decision making in the field of rail transit.
It should be noted that, when specifically executed, the system for improving train positioning accuracy provided in the embodiment of the present invention may be implemented based on the method for improving train positioning accuracy described in any of the above embodiments, and details of this embodiment are not described herein.
Fig. 3 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a knowledge-graph based information acquisition method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method for obtaining knowledge-graph-based information provided by the above methods, the method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for obtaining knowledge-graph-based information provided in the above embodiments, the method comprising: constructing a knowledge graph based on a track traffic text corpus; executing random walk by taking the target node as an initial node to obtain a target random walk path; and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A knowledge graph-based information acquisition method is characterized by comprising the following steps:
constructing a knowledge graph based on a track traffic text corpus;
executing random walk by taking the target node as an initial node to obtain a target random walk path;
and carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
2. The knowledge-graph-based information acquisition method according to claim 1, wherein the construction of the knowledge graph based on the track traffic text corpus comprises:
extracting words contained in each sentence in the track traffic text corpus to serve as nodes of the knowledge graph;
setting a connecting edge of the knowledge graph between any two nodes appearing in the same sentence;
and acquiring the knowledge graph.
3. The method according to claim 1, wherein the performing random walk with the target node as a start node to obtain a target random walk path comprises:
setting a maximum walk path γ of each node when the random walk is performed and a maximum number m of nodes of the target random walk path;
determining an optimal wandering node from gamma wandering paths corresponding to the current node by adopting a probability calculation method of normal distribution;
iteratively obtaining a next walking node of the optimal walking node, and outputting the target random walking path until the total number of nodes of the target random walking path is m;
the target random walk path is formed by sequentially connecting the target nodes with the optimal walk nodes according to an iteration sequence.
4. The knowledge-graph-based information acquisition method according to claim 3, wherein the mapping knowledge acquisition of the target random walk path to generate a knowledge vector comprises:
obtaining the mapping of a target node n as vnProbability of (Pr)nThe mapping of the target node at the previous node of the target random walk path is vn-1Probability of (Pr)n-1The mapping of the target node at the next node of the target random walk path is vn+1Probability of (Pr)n+1
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-1And the Prn+1(ii) a V isnAn optimal mapping vector, v, corresponding to the target noden-1For the optimal mapping vector, v, corresponding to the previous noden+1An optimal mapping vector corresponding to the next node;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
5. The method of claim 3, wherein the mapping knowledge acquisition of the target random walk path to generate a knowledge vector comprises
Based on a sliding window algorithm, under the condition that the size of a sliding window is set to be z, the mapping of a target node n to be v is obtainednProbability of (Pr)nAnd the mapping average probability Pr of the target node at the first z nodes of the target random walk pathn-zAnd the mapping average probability Pr of the target node at the last z nodes of the target random walk pathn+z
Determining an optimal mapping vector of the target node n according to a mapping constraint condition; the mapping constraint condition is that the Pr is maximizedn-zAnd the Prn+z(ii) a V isnThe optimal mapping vector corresponding to the target node is obtained;
iteratively obtaining an optimal mapping vector of each node;
and synthesizing each optimal mapping vector according to the target random walk path to generate the knowledge vector.
6. The method according to claim 4 or 5, wherein the iteratively obtaining the optimal mapping vector of each node specifically comprises:
and iteratively obtaining the optimal mapping vector of each node based on the adoption of a random gradient descent and error back propagation method.
7. The knowledge-graph-based information acquisition method according to claim 4, wherein the acquisition target node n is mapped as vnProbability of (Pr)nThe method comprises the following steps:
converting each node in the knowledge-graph into a respective vector representation based on a word2vec model;
constructing a Huffman tree model according to the quantity relation of the connecting edges of each node;
based on the Huffman tree model, obtaining the probability Pr corresponding to the target node nn
8. A knowledge-graph-based information acquisition system, comprising:
the map construction unit is used for constructing a knowledge map based on a track traffic text corpus;
a path determining unit, configured to execute random walk by using the target node as an initial node, and acquire a target random walk path;
and the vector generation unit is used for carrying out knowledge acquisition mapping on the target random walk path to generate a knowledge vector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the knowledge-graph based information acquisition method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for knowledge-graph based information acquisition of any one of claims 1 to 7.
CN202011458989.2A 2020-12-11 2020-12-11 Knowledge graph-based information acquisition method and system Pending CN112463989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011458989.2A CN112463989A (en) 2020-12-11 2020-12-11 Knowledge graph-based information acquisition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011458989.2A CN112463989A (en) 2020-12-11 2020-12-11 Knowledge graph-based information acquisition method and system

Publications (1)

Publication Number Publication Date
CN112463989A true CN112463989A (en) 2021-03-09

Family

ID=74803698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011458989.2A Pending CN112463989A (en) 2020-12-11 2020-12-11 Knowledge graph-based information acquisition method and system

Country Status (1)

Country Link
CN (1) CN112463989A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553443A (en) * 2021-07-18 2021-10-26 北京智慧星光信息技术有限公司 Relation map generation method and system for recording migration path of knowledge map
CN114880484A (en) * 2022-05-11 2022-08-09 军事科学院系统工程研究院网络信息研究所 Satellite communication frequency-orbit resource map construction method based on vector mapping
CN116737745A (en) * 2023-08-16 2023-09-12 杭州州力数据科技有限公司 Method and device for updating entity vector representation in supply chain network diagram

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
US20120330864A1 (en) * 2011-06-21 2012-12-27 Microsoft Corporation Fast personalized page rank on map reduce
US20150032767A1 (en) * 2013-07-26 2015-01-29 Microsoft Corporation Query expansion and query-document matching using path-constrained random walks
CN108287881A (en) * 2017-12-29 2018-07-17 北京理工大学 A kind of optimization method found based on random walk relationship
US20190122145A1 (en) * 2017-10-23 2019-04-25 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN109992670A (en) * 2019-04-04 2019-07-09 西安交通大学 A kind of map completion method of knowledge based map neighbour structure
CN110704636A (en) * 2019-09-27 2020-01-17 吉林大学 Improved Node2 vec-based knowledge graph vector representation method
CN110807103A (en) * 2019-10-18 2020-02-18 中国银联股份有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN111079004A (en) * 2019-12-06 2020-04-28 成都理工大学 Three-part graph random walk recommendation method based on word2vec label similarity
CN111241241A (en) * 2020-01-08 2020-06-05 平安科技(深圳)有限公司 Case retrieval method, device and equipment based on knowledge graph and storage medium
CN111444317A (en) * 2020-03-17 2020-07-24 杭州电子科技大学 Semantic-sensitive knowledge graph random walk sampling method
CN111597350A (en) * 2020-04-30 2020-08-28 西安理工大学 Rail transit event knowledge map construction method based on deep learning
CN111597420A (en) * 2020-04-29 2020-08-28 西安理工大学 Deep learning-based rail transit standard relation extraction method
CN112052342A (en) * 2020-09-04 2020-12-08 西南大学 Learning path recommendation method and system based on online test result big data analysis

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
US20120330864A1 (en) * 2011-06-21 2012-12-27 Microsoft Corporation Fast personalized page rank on map reduce
US20150032767A1 (en) * 2013-07-26 2015-01-29 Microsoft Corporation Query expansion and query-document matching using path-constrained random walks
US20190122145A1 (en) * 2017-10-23 2019-04-25 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN108287881A (en) * 2017-12-29 2018-07-17 北京理工大学 A kind of optimization method found based on random walk relationship
CN109992670A (en) * 2019-04-04 2019-07-09 西安交通大学 A kind of map completion method of knowledge based map neighbour structure
CN110704636A (en) * 2019-09-27 2020-01-17 吉林大学 Improved Node2 vec-based knowledge graph vector representation method
CN110807103A (en) * 2019-10-18 2020-02-18 中国银联股份有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN111079004A (en) * 2019-12-06 2020-04-28 成都理工大学 Three-part graph random walk recommendation method based on word2vec label similarity
CN111241241A (en) * 2020-01-08 2020-06-05 平安科技(深圳)有限公司 Case retrieval method, device and equipment based on knowledge graph and storage medium
CN111444317A (en) * 2020-03-17 2020-07-24 杭州电子科技大学 Semantic-sensitive knowledge graph random walk sampling method
CN111597420A (en) * 2020-04-29 2020-08-28 西安理工大学 Deep learning-based rail transit standard relation extraction method
CN111597350A (en) * 2020-04-30 2020-08-28 西安理工大学 Rail transit event knowledge map construction method based on deep learning
CN112052342A (en) * 2020-09-04 2020-12-08 西南大学 Learning path recommendation method and system based on online test result big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴运兵;朱丹红;廖祥文;张栋;林开标;: "路径张量分解的知识图谱推理算法", 模式识别与人工智能, no. 05, 15 May 2017 (2017-05-15) *
杨晓慧等: "基于符号语义映射的知识图谱表示学习算法", 《计算机研究与发展》, vol. 55, no. 8, 31 August 2018 (2018-08-31), pages 1773 - 1784 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553443A (en) * 2021-07-18 2021-10-26 北京智慧星光信息技术有限公司 Relation map generation method and system for recording migration path of knowledge map
CN113553443B (en) * 2021-07-18 2023-08-22 北京智慧星光信息技术有限公司 Relation map generation method and system for recording knowledge map migration path
CN114880484A (en) * 2022-05-11 2022-08-09 军事科学院系统工程研究院网络信息研究所 Satellite communication frequency-orbit resource map construction method based on vector mapping
CN114880484B (en) * 2022-05-11 2023-06-16 军事科学院系统工程研究院网络信息研究所 Satellite communication frequency track resource map construction method based on vector mapping
CN116737745A (en) * 2023-08-16 2023-09-12 杭州州力数据科技有限公司 Method and device for updating entity vector representation in supply chain network diagram
CN116737745B (en) * 2023-08-16 2023-10-31 杭州州力数据科技有限公司 Method and device for updating entity vector representation in supply chain network diagram

Similar Documents

Publication Publication Date Title
US11501182B2 (en) Method and apparatus for generating model
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN112463989A (en) Knowledge graph-based information acquisition method and system
CN106910497A (en) A kind of Chinese word pronunciation Forecasting Methodology and device
US20230244704A1 (en) Sequenced data processing method and device, and text processing method and device
CN112100406B (en) Data processing method, device, equipment and medium
CN109344242B (en) Dialogue question-answering method, device, equipment and storage medium
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN113204611A (en) Method for establishing reading understanding model, reading understanding method and corresponding device
CN113657100A (en) Entity identification method and device, electronic equipment and storage medium
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
CN114297399A (en) Knowledge graph generation method, knowledge graph generation system, storage medium and electronic equipment
CN116204674A (en) Image description method based on visual concept word association structural modeling
CN114528398A (en) Emotion prediction method and system based on interactive double-graph convolutional network
CN113111190A (en) Knowledge-driven dialog generation method and device
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN111241843B (en) Semantic relation inference system and method based on composite neural network
CN109979461A (en) A kind of voice translation method and device
CN114490926A (en) Method and device for determining similar problems, storage medium and terminal
CN114298031A (en) Text processing method, computer device and storage medium
CN110046344A (en) Add the method and terminal device of separator
CN112364659A (en) Unsupervised semantic representation automatic identification method and unsupervised semantic representation automatic identification device
CN117076765A (en) Intelligent recruitment system sentry matching method and system based on heterogeneous graph neural network
CN114065769B (en) Method, device, equipment and medium for training emotion reason pair extraction model
CN115587192A (en) Relationship information extraction method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination