CN110795417A - System and method for storing knowledge graph - Google Patents

System and method for storing knowledge graph Download PDF

Info

Publication number
CN110795417A
CN110795417A CN201911047867.1A CN201911047867A CN110795417A CN 110795417 A CN110795417 A CN 110795417A CN 201911047867 A CN201911047867 A CN 201911047867A CN 110795417 A CN110795417 A CN 110795417A
Authority
CN
China
Prior art keywords
data
entity
knowledge graph
entities
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911047867.1A
Other languages
Chinese (zh)
Inventor
张昭
钱学斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911047867.1A priority Critical patent/CN110795417A/en
Publication of CN110795417A publication Critical patent/CN110795417A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of storage technologies, and in particular, to a system and a method for storing a knowledge graph. The system for storing the knowledge graph comprises a computing engine and a distributed file storage database, wherein the computing engine is used for computing the acquired entity data of a plurality of entities corresponding to the target service to generate the knowledge graph corresponding to the target service, the entity data comprises attribute data and relation data of each entity in the plurality of entities, and the knowledge graph is stored through the distributed file storage database. By adopting the mode, the knowledge graph is calculated through the calculation engine, and is stored through the distributed file storage database, so that the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.

Description

System and method for storing knowledge graph
Technical Field
The present application relates to the field of storage technologies, and in particular, to a system and a method for storing a knowledge graph.
Background
The knowledge map is a method for describing knowledge resources and carriers thereof by using a visualization technology, mining, analyzing, constructing, drawing and displaying knowledge and the mutual relation among the knowledge resources and the carriers. The knowledge graph can extract hidden knowledge in large-scale data to construct a data model based on the graph. The final purpose of the technologies is to collect and arrange data into structured, reusable and reasonable storage, so that the data can be used for more use scenes, and the storage format of the knowledge graph can nearly perfectly match the requirements.
At present, the storage structure design of the knowledge graph has no uniform standard, and for the graph with not very large data volume and fixed structure, the traditional database and the relational table are generally used for storage. However, in the case of large data volume, an entity usually contains many attributes, and if the attributes are calculated and stored in a conventional database, such as a graphic database (Neo4j), the efficiency of data calculation, storage and retrieval is greatly reduced.
Disclosure of Invention
In view of this, an object of the embodiments of the present application is to provide a system and a method for storing a knowledge graph, which implement separation of computation and storage of the knowledge graph, and can improve efficiency of data computation, storage, and retrieval.
The application mainly comprises the following aspects:
in a first aspect, an embodiment of the present application provides a storage system for a knowledge graph, where the storage system includes a computing engine and a distributed file storage database;
the computing engine is used for acquiring entity data of a plurality of entities corresponding to a target service and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities;
and the distributed file storage database is used for acquiring the knowledge graph and storing the knowledge graph.
In one possible embodiment, the storage system further comprises a management relationship database;
the management relational database is used for storing relational data, and the relational data comprises the entity data.
In a possible implementation, the computation engine is specifically configured to generate the knowledge-graph according to the following steps:
according to the attribute data of each entity in the plurality of entities, establishing node data of a node corresponding to each entity in the knowledge graph;
establishing edge data in the knowledge graph between each entity and other entities according to the relationship data of each entity in the plurality of entities; the other entities are the entities except the entity in the plurality of entities; the edge data comprises an attribute type and an attribute value;
and generating the knowledge graph according to the node data of each entity in the plurality of entities and the edge data between each entity and other entities in the knowledge graph.
In one possible embodiment, the storage system further comprises a connector;
the connector is used for serializing the knowledge graph generated by the computing engine and sending the serialized knowledge graph to the distributed file storage database.
In a possible implementation, the distributed file storage database is specifically configured to store the knowledge-graph according to the following steps:
storing attribute data and relationship data corresponding to each entity in the knowledge graph respectively; the correlation data is stored in a form of a triple;
wherein the triplet includes a first entity-a relationship-a second entity, the relationship includes an attribute type and an attribute value, and the first entity and the second entity are two different entities of the plurality of entities.
In one possible embodiment, the distributed file storage database is further configured to:
establishing an index according to the identity of each entity in the plurality of entities so that a user can retrieve entity data of the corresponding entity through the identity;
wherein the attribute data comprises the identity.
In a possible implementation manner, the storage system further comprises a data calling module and a presentation module;
the data calling module is used for reading the knowledge graph from the distributed file storage database and screening out point data and edge data corresponding to the target attribute type from the knowledge graph according to the target attribute type;
and the display module is used for displaying the point data and the edge data which are screened out by the data calling module and correspond to the target attribute type.
In a second aspect, an embodiment of the present application further provides a method for storing a knowledge graph, where the method for storing a knowledge graph includes:
acquiring entity data of a plurality of entities corresponding to a target service, and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities;
and storing the knowledge graph.
In one possible embodiment, the storage method further includes:
storing relational data, the relational data comprising the entity data.
In a possible implementation manner, the generating a knowledge graph corresponding to the target service according to the entity data includes:
according to the attribute data of each entity in the plurality of entities, establishing node data of a node corresponding to each entity in the knowledge graph;
establishing edge data in the knowledge graph between each entity and other entities according to the relationship data of each entity in the plurality of entities; the other entities are the entities except the entity in the plurality of entities; the edge data comprises an attribute type and an attribute value;
and generating the knowledge graph according to the node data of each entity in the plurality of entities and the edge data between each entity and other entities in the knowledge graph.
In one possible embodiment, the storing the knowledge-graph comprises:
serializing the knowledge graph, and storing the serialized knowledge graph.
In a possible embodiment, the storing the knowledge-graph includes:
storing attribute data and relationship data corresponding to each entity in the knowledge graph respectively; the correlation data is stored in a form of a triple;
wherein the triplet includes a first entity-a relationship-a second entity, the relationship includes an attribute type and an attribute value, and the first entity and the second entity are two different entities of the plurality of entities.
In one possible embodiment, the storage method further includes:
establishing an index according to the identity of each entity in the plurality of entities so that a user can retrieve entity data of the corresponding entity through the identity;
wherein the attribute data comprises the identity.
In one possible embodiment, the storage method further includes:
according to the target attribute type, screening out point data and side data corresponding to the target attribute type from the knowledge graph;
and displaying the screened point data and the screened edge data corresponding to the target attribute type.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate with each other through the bus when the electronic device runs, and the machine-readable instructions are executed by the processor to perform the steps of the method for storing a knowledge-graph according to any one of the possible embodiments of the second aspect or the second aspect.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method for storing a knowledge-graph described in the second aspect or any one of the possible implementation manners of the second aspect.
In the embodiment of the application, the storage system of the knowledge graph comprises a computing engine and a distributed file storage database, the computing engine is used for computing the acquired entity data of a plurality of entities corresponding to the target service, and the knowledge graph corresponding to the target service can be generated, wherein the entity data comprise attribute data and relation data of each entity in the plurality of entities, and the knowledge graph is stored through the distributed file storage database. By adopting the mode, the knowledge graph is calculated through the calculation engine, and is stored through the distributed file storage database, so that the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a schematic diagram of a knowledge-graph storage system according to an embodiment of the present application;
FIG. 2 is a second schematic diagram of a knowledge-graph storage system according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a method for storing a knowledge-graph according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
To make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be performed in reverse order or concurrently. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
To enable those skilled in the art to utilize the present disclosure, in connection with the specific application scenario "computation and storage of a knowledge graph," the following embodiments are presented to enable those skilled in the art to apply the general principles defined herein to other embodiments and application scenarios without departing from the spirit and scope of the present disclosure.
The method, apparatus, electronic device or computer-readable storage medium described in the embodiments of the present application may be applied to any scenario that requires knowledge graph calculation and storage, and the embodiments of the present application do not limit a specific application scenario, and any scheme that uses the storage system and method of the knowledge graph provided in the embodiments of the present application is within the scope of protection of the present application.
It is noted that, before the present application is proposed, in the existing solution, there is no uniform standard for the storage structure design of the knowledge graph, and for the graph with not large data volume and fixed structure, the traditional database and the relational table are generally used for storage. However, in the case of a large amount of data, an entity usually contains many attributes, and if the attributes are calculated and stored in a conventional database, such as a graphic database, the efficiency of data calculation, storage and retrieval is greatly reduced.
In view of the above problems, in the embodiment of the present application, a storage system of a knowledge graph includes a computing engine and a distributed file storage database, and a knowledge graph corresponding to a target service may be generated by computing, by the computing engine, acquired entity data of a plurality of entities corresponding to the target service, where the entity data includes attribute data and relationship data of each of the plurality of entities, and the knowledge graph is stored by the distributed file storage database. By adopting the mode, the knowledge graph is calculated through the calculation engine, and is stored through the distributed file storage database, so that the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
For the convenience of understanding of the present application, the technical solutions provided in the present application will be described in detail below with reference to specific embodiments.
As shown in fig. 1 and fig. 2, fig. 1 is a schematic structural diagram of a knowledge-graph storage system 100 according to an embodiment of the present application; fig. 2 is a second schematic structural diagram of a knowledge-graph storage system 100 according to an embodiment of the present application. As shown in fig. 1 and 2, a storage system 100 for a knowledge graph provided by an embodiment of the present application includes a computing engine 110 and a distributed file storage database 120.
The computing engine 110 is configured to obtain entity data of a plurality of entities corresponding to a target service, and generate a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities.
In a specific implementation, after determining a target service to be performed, the computing engine 110 obtains entity data of a plurality of entities corresponding to the target service, and establishes a knowledge graph corresponding to the target service using the entity data.
Here, the entity data includes attribute data and relationship data of each entity in the plurality of entities, where the entity is a person, the attribute data may be a name, a sex, an age, a height, a living place, and the like of the person, and the relationship data may be understood as communication data between the person and the person, such as call data, short message data, logistics data, express delivery data, instant messaging list friends, and the like. The target service may be various types of services, such as criminal tracing service, virus-related end person mining service, and the like.
It should be noted that the computing engine 110 may be Spark, and the computing engine 110 is a fast general-purpose computing engine designed specifically for large-scale data processing. The knowledge graph may be generated from the entity data using graph computation software (sparkGraphX) provided by the computation engine 110. The data and the association between the data can be well organized and stored by utilizing the knowledge graph, and the structured data storage format can reduce or even avoid the difficulty of data mining and calculation, explore more effective information from the data and develop more application scenes.
The distributed file storage database 120 is configured to obtain the knowledge graph and store the knowledge graph.
In a specific implementation, the distributed file storage database 120 obtains a knowledge graph corresponding to the target service from the computing engine 110, and stores the knowledge graph.
Here, the distributed file storage database 120 may be a MongoDB, the distributed file storage database 120 is a Document-oriented database, and is not a relational database, the supported data structure is very loose, a Binary Serialized Document Format (BSON) can be directly accessed, and the Document-oriented database is a similar Object Notation (JSON) Format, so that more complex data types can be stored, which means that the distributed file storage database 120 is more flexible, and complex data types such as arrays can be directly inserted into documents. The distributed file storage database 120 is characterized in that the supported query language is very powerful, the syntax of the distributed file storage database is similar to that of an object-oriented query language, most functions of single-table query of similar relational databases can be realized almost, and index establishment of data is supported.
Here, the distributed File storage database 120 is selected instead of using a distributed File System (HDFS), on one hand, in terms of storage mode, the HDFS takes files as a unit, each File is different from 64MB to 128MB, and the distributed File storage database 120 as a document database shows finer granularity; on the other hand, the distributed file storage database 120 supports the concept of index that the HDFS does not have, so that the addition, deletion, and modification functions supported by the distributed file storage database 120 are faster in reading and are easier to modify the written data than the HDFS; in yet another aspect, the response level of HDFS is in minutes, while distributed file store database 120 is typically in milliseconds.
It should be noted that, in the prior art, in the case of a large amount of data, an entity usually includes a plurality of attributes, and these attributes are calculated and stored in a conventional database, such as a graph database, which may result in a situation that the efficiency of data calculation, storage and retrieval is greatly reduced. Here, the storage system 100 for a knowledge graph provided in the present application includes a calculation engine 110 and a distributed file storage database 120, and the calculation engine 110 calculates the knowledge graph, and the distributed file storage database 120 stores the knowledge graph, so that the separation of calculation and storage of the knowledge graph is realized, and the system can be applied to the storage of various big data knowledge graphs, and can improve the efficiency of data calculation, storage, and retrieval.
In one possible embodiment, as shown in FIG. 2, the knowledge-graph storage system 100 further includes a management relationship database 130;
the management relation database 130 is configured to store relational data, where the relational data includes the entity data.
In a specific implementation, the management relationship database 130 stores entity data, attribute data in the entity data is stored in the form of an entity table, and relationship data in the entity data is stored in the form of a relationship table.
Here, the Management Relational Database 130(Relational Database Management System: RDBMS) means a set of programs including a logical organization and access to the data in association with each other for storing Relational data, and commonly used Management Relational databases include an Oracle Database, a DB2 Database, and an SQL Server Database.
In one possible implementation, as shown in fig. 1 and 2, the computation engine 110 is specifically configured to generate the knowledge-graph according to the following steps:
step (1): and establishing node data of the node corresponding to each entity in the knowledge graph according to the attribute data of each entity in the plurality of entities.
In a specific implementation, the computing engine 110 may obtain entity data of a plurality of entities from the entity table and the relationship table in the management relationship database 130, and further construct node data and edge data of the knowledge graph according to the entity data of the plurality of entities, so as to generate the knowledge graph according to the node data and the edge data. Specifically, the calculation engine 110 establishes node data of a node corresponding to each entity in the knowledge graph according to the attribute data of each entity in the plurality of entities.
In an example, the target service is a virus-related key person mining service, the entity may be a person, the attribute data of the person may be an identity number, a gender, an age, a height, and a residence, and the node data of each node may include the attribute data of the person corresponding to the node.
Step (2): establishing edge data in the knowledge graph between each entity and other entities according to the relationship data of each entity in the plurality of entities; the other entities are the entities except the entity in the plurality of entities; the edge data includes an attribute type and an attribute value.
In particular implementations, the compute engine 110 builds edge data in the knowledge-graph between each entity and other entities based on the relationship data for each of the plurality of entities, where the edge data includes attribute types and attribute values.
Here, the attribute value corresponding to each attribute type between one entity and another entity may be the number of communications corresponding to the attribute type, for example, the attribute type is call data, and the attribute value may be the number of calls between the person a and the person B, the call duration, and the like.
In one example, the attribute types comprise table data of logistics, short messages, calls and the like, logistics records are read, entities corresponding to telephone numbers of a sender and a receiver of express are searched, one edge is constructed by using one piece of logistics data, and the edge data comprises sending time and express content; reading the short message record, searching entities corresponding to the telephone numbers of a short message sender and a short message receiver respectively, and constructing an edge by using a short message, wherein the edge data comprises the sending time and the short message content; reading the call record, searching entities corresponding to the telephone numbers of the calling party and the called party of the call record respectively, and constructing one edge by using one call record, wherein the edge data comprises the information of call time and duration.
And (3): and generating the knowledge graph according to the node data of each entity in the plurality of entities and the edge data between each entity and other entities in the knowledge graph.
In particular implementations, compute engine 110 generates a knowledge graph of the target business based on node data for each of the plurality of entities and edge data in the knowledge graph between each entity and other entities. Here, the knowledge graph may show node data corresponding to a plurality of entities and edge data of a relationship between any two entities in the plurality of entities, that is, hidden knowledge in large-scale data is extracted to construct a graph-based data model.
In the present application, the graph calculation software provided by the calculation engine 110 is used to construct the knowledge graph corresponding to the target service, and the graph data not only concerns the objects, but also concerns the relationships between the objects. The graph is a data structure composed of a node set and a relationship set (edge set) among nodes, in a general knowledge graph spectrogram, entities are nodes, the relationship is edges, in a directed graph, two nodes connected by one edge are two different entities, in an undirected graph, the edges have no direction, that is, the relationship is equal, for example, friends in an instant messenger, and if the edges have directions, the graph is a directed graph.
In one possible embodiment, as shown in FIG. 2, the knowledge-graph storage system 100 further comprises a connector 140;
the connector 140 is configured to serialize the knowledge graph generated by the computing engine 110, and send the serialized knowledge graph to the distributed file storage database 120.
In a specific implementation, in order to facilitate transmission and reduce the space occupied by the knowledge graph, the connector 140 serializes the knowledge graph generated by the computation engine 110, that is, a process of converting a graph object of the knowledge graph into a form capable of being stored or transmitted may be a JSON structure, which is convenient to store. During serialization, the graph objects are written to a temporary or persistent storage area, and typically, all fields of the object instances are serialized, which means that the data is represented as serialized data of the instances, i.e., the process of converting the graph objects in the knowledge graph into binary strings, and in short, the serialization is to store the objects generated by us (such as on a disk) so as to transmit the serialized knowledge graph to the distributed file storage database 120.
Here, the maximum field size of the distributed file storage database 120 is 16M, 1024B, 1048576B, and assuming that the average occupation of each edge is 100B, the conservative estimation can store at least 10000 edges, which number far exceeds the rendering speed of the front end and the display density of the display, and further can ensure the retrieval efficiency, far exceeding the efficiency of the conventional graph database retrieval.
It should be noted that the connector 140 is a middleware for connecting between the computing engine 110 and the distributed file storage database 120, where the connector 140 is a Spark-Mongodb connector, and the computing engine 110 and the distributed file storage database 120 can be connected through the connector 140.
In one possible embodiment, as shown in fig. 1 and 2, the distributed file storage database 120 is specifically configured to store the knowledge-graph according to the following steps:
storing attribute data and relationship data corresponding to each entity in the knowledge graph respectively; the correlation data is stored in a form of a triple; wherein the triplet includes a first entity-a relationship-a second entity, the relationship includes an attribute type and an attribute value, and the first entity and the second entity are two different entities of the plurality of entities.
In a specific implementation, the distributed file storage database 120 stores attribute data and relationship data corresponding to each entity in the knowledge graph, that is, stores node data and edge data separately, where the relationship data is stored in a triple form.
In an example, the knowledge graph includes 3 nodes 1, 2, and 3, where there are 10 calls between the node 1 and the node 3, 8 logistics times between the node 1 and the node 3, 2 logistics times between the node 2 and the node 3, and 3 short messages between the node 2 and the node 3, and then the triple forms corresponding to the relationship data between the node 1 and the node 3 are < node 1-10 calls-node 3> and < node 1-8 logistics-node 3>, and the triple forms corresponding to the relationship data between the node 2 and the node 3 are < node 2-2 logistics 2-node 3> and < node 2-3 short messages-node 3 >.
In one possible embodiment, as shown in fig. 1 and 2, the distributed file storage database 120 is further configured to:
establishing an index according to the identity of each entity in the plurality of entities so that a user can retrieve entity data of the corresponding entity through the identity; wherein the attribute data comprises the identity.
In a specific implementation, after the distributed file storage database 120 obtains the knowledge graph, the identity of each entity in the multiple entities is extracted from the knowledge graph, where the identity is an identity that can uniquely characterize the entity, and if the entity is a person, the identity can be an identity number, and an index is established according to the identity, so that a user can search entity data of an entity corresponding to the identity through the identity, and an establishment manner that a conventional database does not support such a search is adopted, where the entity data includes attribute data of the entity and associated data between the entity and other entities.
In one possible embodiment, as shown in fig. 2, the knowledge-graph storage system 100 further comprises a data calling module 150 and a presentation module 160; the data calling module 150 is configured to read the knowledge graph from the distributed file storage database 120, and screen out point data and edge data corresponding to a target attribute type from the knowledge graph according to the target attribute type; the display module 160 is configured to display the point data and the edge data, which are screened by the data calling module 150 and correspond to the target attribute type.
In specific implementation, the knowledge graph storage system 100 further includes a data calling module 150 and a display module 160, the data calling module 150 can read the knowledge graph from the distributed file storage database 120, and screen out point data and edge data corresponding to a target attribute type from the knowledge graph, so as to screen entities of specific attributes, further, the display module 160 displays the point data and the edge data corresponding to the target attribute type, which are screened out by the data calling module 150, during display, the point data and the edge data are classified according to specific attributes of people, a page can identify different types of entities with different colors, and relationships of different types can also be identified with different colors.
In this embodiment, the storage system 100 of the knowledge graph includes a computing engine 110 and a distributed file storage database 120, and the computing engine 110 may perform computation on acquired entity data of a plurality of entities corresponding to the target service to generate the knowledge graph corresponding to the target service, where the entity data includes attribute data and relationship data of each of the plurality of entities, and the knowledge graph is stored in the distributed file storage database 120. By adopting the mode, the knowledge graph is calculated through the calculation engine 110, and is stored through the distributed file storage database 120, so that the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
Based on the same application concept, a method for storing the knowledge graph corresponding to the system for storing the knowledge graph is further provided in the embodiment of the present application, and as the principle of solving the problem of the method in the embodiment of the present application is similar to that of the system for storing the knowledge graph in the embodiment of the present application, the implementation of the method can be referred to the implementation of the system, and repeated details are omitted.
Fig. 3 is a flowchart of a method for storing a knowledge graph according to an embodiment of the present application. As shown in fig. 3, the method for storing the knowledge-graph includes the following steps:
s301: acquiring entity data of a plurality of entities corresponding to a target service, and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities.
S302: and storing the knowledge graph.
In the embodiment of the application, a knowledge graph corresponding to a target service can be generated by calculating the acquired entity data of a plurality of entities corresponding to the target service, wherein the entity data comprises attribute data and relationship data of each entity in the plurality of entities, and the knowledge graph is stored. By adopting the mode, the separation of the calculation and the storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
In one possible embodiment, the storage method further includes:
storing relational data, the relational data comprising the entity data.
In a possible implementation manner, the generating a knowledge graph corresponding to the target service according to the entity data includes:
according to the attribute data of each entity in the plurality of entities, establishing node data of a node corresponding to each entity in the knowledge graph;
establishing edge data in the knowledge graph between each entity and other entities according to the relationship data of each entity in the plurality of entities; the other entities are the entities except the entity in the plurality of entities; the edge data comprises an attribute type and an attribute value;
and generating the knowledge graph according to the node data of each entity in the plurality of entities and the edge data between each entity and other entities in the knowledge graph.
In one possible embodiment, the storing the knowledge-graph comprises:
serializing the knowledge graph, and storing the serialized knowledge graph.
In a possible embodiment, the storing the knowledge-graph includes:
storing attribute data and relationship data corresponding to each entity in the knowledge graph respectively; the correlation data is stored in a form of a triple;
wherein the triplet includes a first entity-a relationship-a second entity, the relationship includes an attribute type and an attribute value, and the first entity and the second entity are two different entities of the plurality of entities.
In one possible embodiment, the storage method further includes:
establishing an index according to the identity of each entity in the plurality of entities so that a user can retrieve entity data of the corresponding entity through the identity;
wherein the attribute data comprises the identity.
In one possible embodiment, the storage method further includes:
according to the target attribute type, screening out point data and side data corresponding to the target attribute type from the knowledge graph;
and displaying the screened point data and the screened edge data corresponding to the target attribute type.
In the embodiment of the application, a knowledge graph corresponding to a target service can be generated by calculating the acquired entity data of a plurality of entities corresponding to the target service, wherein the entity data comprises attribute data and relationship data of each entity in the plurality of entities, and the knowledge graph is stored. By adopting the mode, the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
Based on the same application concept, referring to fig. 4, a schematic structural diagram of an electronic device 400 provided in the embodiment of the present application includes: a processor 410, a memory 420, and a bus 430, wherein the memory 420 stores machine-readable instructions executable by the processor 410, the processor 410 and the memory 420 communicate via the bus 430 when the electronic device 400 is executed, and the machine-readable instructions are executed by the processor 410 to perform the steps of the above-mentioned method for storing a knowledge map.
In particular, the machine readable instructions, when executed by the processor 410, may perform the following:
acquiring entity data of a plurality of entities corresponding to a target service, and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities;
and storing the knowledge graph.
In the embodiment of the application, a knowledge graph corresponding to a target service can be generated by calculating the acquired entity data of a plurality of entities corresponding to the target service, wherein the entity data comprises attribute data and relationship data of each entity in the plurality of entities, and the knowledge graph is stored. By adopting the mode, the separation of calculation and storage of the knowledge graph is realized, and the efficiency of data calculation, storage and retrieval can be improved.
Based on the same application concept, the embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for storing a knowledge graph provided by the above embodiment are performed.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the method for storing the knowledge graph can be executed, and by separating the calculation and the storage of the knowledge graph, the separation of the calculation and the storage of the knowledge graph is realized, and the efficiency of data calculation, storage, and retrieval can be improved.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A storage system for a knowledge graph, the storage system comprising a compute engine and a distributed file storage database; wherein the content of the first and second substances,
the computing engine is used for acquiring entity data of a plurality of entities corresponding to the target service and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities;
and the distributed file storage database is used for acquiring the knowledge graph and storing the knowledge graph.
2. The storage system of claim 1, further comprising a management relationship database;
the management relational database is used for storing relational data, and the relational data comprises the entity data.
3. The storage system of claim 1, wherein the computing engine is specifically configured to generate the knowledge-graph according to the following steps:
according to the attribute data of each entity in the plurality of entities, establishing node data of a node corresponding to each entity in the knowledge graph;
establishing edge data in the knowledge graph between each entity and other entities according to the relationship data of each entity in the plurality of entities; the other entities are the entities except the entity in the plurality of entities; the edge data comprises an attribute type and an attribute value;
and generating the knowledge graph according to the node data of each entity in the plurality of entities and the edge data between each entity and other entities in the knowledge graph.
4. The storage system of claim 1, further comprising a connector;
the connector is used for serializing the knowledge graph generated by the computing engine and sending the serialized knowledge graph to the distributed file storage database.
5. The storage system of claim 1, wherein the distributed file storage database is specifically configured to store the knowledge-graph according to the following steps:
storing attribute data and relationship data corresponding to each entity in the knowledge graph respectively; the correlation data is stored in a form of a triple;
wherein the triplet includes a first entity-a relationship-a second entity, the relationship includes an attribute type and an attribute value, and the first entity and the second entity are two different entities of the plurality of entities.
6. The storage system of claim 1, wherein the distributed file storage database is further configured to:
establishing an index according to the identity of each entity in the plurality of entities so that a user can retrieve entity data of the corresponding entity through the identity;
wherein the attribute data comprises the identity.
7. The storage system of claim 1, further comprising a data call module and a presentation module;
the data calling module is used for reading the knowledge graph from the distributed file storage database and screening out point data and edge data corresponding to the target attribute type from the knowledge graph according to the target attribute type;
and the display module is used for displaying the point data and the edge data which are screened out by the data calling module and correspond to the target attribute type.
8. A method for storing a knowledge graph, the method comprising:
acquiring entity data of a plurality of entities corresponding to a target service, and generating a knowledge graph corresponding to the target service according to the entity data; the entity data includes attribute data and relationship data for each of the plurality of entities;
and storing the knowledge graph.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is run, the machine-readable instructions when executed by the processor performing the steps of the method of storing a knowledge-graph of claim 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the steps of the method of storing a knowledge-graph according to claim 8.
CN201911047867.1A 2019-10-30 2019-10-30 System and method for storing knowledge graph Pending CN110795417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911047867.1A CN110795417A (en) 2019-10-30 2019-10-30 System and method for storing knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911047867.1A CN110795417A (en) 2019-10-30 2019-10-30 System and method for storing knowledge graph

Publications (1)

Publication Number Publication Date
CN110795417A true CN110795417A (en) 2020-02-14

Family

ID=69442209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911047867.1A Pending CN110795417A (en) 2019-10-30 2019-10-30 System and method for storing knowledge graph

Country Status (1)

Country Link
CN (1) CN110795417A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597354A (en) * 2020-05-21 2020-08-28 北京明略软件系统有限公司 Knowledge graph configuration method and device, computer equipment and readable storage medium
CN111597355A (en) * 2020-05-22 2020-08-28 北京明略软件系统有限公司 Information processing method and device
CN111639082A (en) * 2020-06-08 2020-09-08 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph
CN111898004A (en) * 2020-06-20 2020-11-06 中国建设银行股份有限公司 Data mining method and device, electronic equipment and readable storage medium thereof
CN111930518A (en) * 2020-09-22 2020-11-13 北京东方通科技股份有限公司 Knowledge graph representation learning-oriented distributed framework construction method
CN112287182A (en) * 2020-10-30 2021-01-29 杭州海康威视数字技术股份有限公司 Graph data storage and processing method and device and computer storage medium
CN112612908A (en) * 2021-01-05 2021-04-06 上海云扣科技发展有限公司 Natural resource knowledge graph construction method and device, server and readable memory
CN112860914A (en) * 2021-03-02 2021-05-28 中国电子信息产业集团有限公司第六研究所 Network data analysis system and method of multi-element identification
CN113094515A (en) * 2021-04-13 2021-07-09 国网北京市电力公司 Knowledge graph entity and link extraction method based on electric power marketing data
CN113868254A (en) * 2021-09-28 2021-12-31 北京百度网讯科技有限公司 Method, device and storage medium for removing duplication of entity node in graph database
CN114416913A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method and device for data slicing of knowledge graph
CN114416891A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method, system, apparatus and medium for data processing in a knowledge graph
CN114510582A (en) * 2022-04-18 2022-05-17 支付宝(杭州)信息技术有限公司 Knowledge graph-based information processing method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王颖 等: "科技大数据知识图谱构建模型与方法研究", 《数据分析与知识发现》 *
知乎: "知识图谱的构建流程?", 《HTTPS://WWW.ZHIHU.COM/QUESTION/299907037》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597354A (en) * 2020-05-21 2020-08-28 北京明略软件系统有限公司 Knowledge graph configuration method and device, computer equipment and readable storage medium
CN111597355A (en) * 2020-05-22 2020-08-28 北京明略软件系统有限公司 Information processing method and device
CN111639082B (en) * 2020-06-08 2022-12-23 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph
CN111639082A (en) * 2020-06-08 2020-09-08 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph
CN111898004A (en) * 2020-06-20 2020-11-06 中国建设银行股份有限公司 Data mining method and device, electronic equipment and readable storage medium thereof
CN111930518A (en) * 2020-09-22 2020-11-13 北京东方通科技股份有限公司 Knowledge graph representation learning-oriented distributed framework construction method
CN112287182A (en) * 2020-10-30 2021-01-29 杭州海康威视数字技术股份有限公司 Graph data storage and processing method and device and computer storage medium
CN112287182B (en) * 2020-10-30 2023-09-19 杭州海康威视数字技术股份有限公司 Graph data storage and processing method and device and computer storage medium
CN112612908A (en) * 2021-01-05 2021-04-06 上海云扣科技发展有限公司 Natural resource knowledge graph construction method and device, server and readable memory
CN112860914A (en) * 2021-03-02 2021-05-28 中国电子信息产业集团有限公司第六研究所 Network data analysis system and method of multi-element identification
CN113094515A (en) * 2021-04-13 2021-07-09 国网北京市电力公司 Knowledge graph entity and link extraction method based on electric power marketing data
CN113868254B (en) * 2021-09-28 2022-09-20 北京百度网讯科技有限公司 Method, device and storage medium for removing duplication of entity node in graph database
CN113868254A (en) * 2021-09-28 2021-12-31 北京百度网讯科技有限公司 Method, device and storage medium for removing duplication of entity node in graph database
CN114416891A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method, system, apparatus and medium for data processing in a knowledge graph
CN114416913A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method and device for data slicing of knowledge graph
CN114510582A (en) * 2022-04-18 2022-05-17 支付宝(杭州)信息技术有限公司 Knowledge graph-based information processing method and device

Similar Documents

Publication Publication Date Title
CN110795417A (en) System and method for storing knowledge graph
KR20210040003A (en) Knowledge graph generation method, relationship mining method, device, equipment and medium
CN110851209B (en) Data processing method and device, electronic equipment and storage medium
CN108874946B (en) ID management method and device
CN107832440B (en) Data mining method, device, server and computer readable storage medium
CN108733317B (en) Data storage method and device
CN110674247A (en) Barrage information intercepting method and device, storage medium and equipment
CN113297269A (en) Data query method and device
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN114564620A (en) Graph data storage method and system and computer equipment
CN113535749A (en) Query statement generation method and device
CN110782169A (en) Method and device for updating business process
WO2019179012A1 (en) Method, device, apparatus and computer readable storage medium for processing text data
CN105827780B (en) A kind of incoming display method and device
CN112765169A (en) Data processing method, device, equipment and storage medium
CN109542357B (en) Command parameter assembling method, system, equipment and computer storage medium
CN111949354A (en) Page content updating method and device
CN114328981B (en) Knowledge graph establishing and data acquiring method and device based on mode mapping
CN116401271A (en) Database table query method, computer device and computer storage medium
CN113407749B (en) Picture index construction method and device, electronic equipment and storage medium
CN111475492B (en) Data processing method and device
CN111310088B (en) Page rendering method and device
CN113742529A (en) Multi-table front-end processing method and device
CN116263770A (en) Method, device, terminal equipment and medium for storing business data based on database
CN115757049B (en) Multi-service module log recording method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200214