CN110472068B - Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph - Google Patents

Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph Download PDF

Info

Publication number
CN110472068B
CN110472068B CN201910770620.6A CN201910770620A CN110472068B CN 110472068 B CN110472068 B CN 110472068B CN 201910770620 A CN201910770620 A CN 201910770620A CN 110472068 B CN110472068 B CN 110472068B
Authority
CN
China
Prior art keywords
node
edge
attribute
data
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910770620.6A
Other languages
Chinese (zh)
Other versions
CN110472068A (en
Inventor
宋群豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN201910770620.6A priority Critical patent/CN110472068B/en
Publication of CN110472068A publication Critical patent/CN110472068A/en
Application granted granted Critical
Publication of CN110472068B publication Critical patent/CN110472068B/en
Priority to PCT/CN2020/109226 priority patent/WO2021032002A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The embodiment of the invention discloses a big data processing method, equipment and a medium based on a heterogeneous distributed knowledge graph. The method comprises the following steps: constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base; determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene; extracting at least one computing node corresponding to the map computing scene from the node table and the relation table; filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph; and performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph. The embodiment provides a data processing method for a heterogeneous distributed knowledge graph efficiently based on a node table and a relation table.

Description

Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
Technical Field
The embodiment of the invention relates to a knowledge graph technology, in particular to a big data processing method, equipment and a medium based on a heterogeneous distributed knowledge graph.
Background
Knowledge Graph (Knowledge Graph), also known as scientific Knowledge Graph, is known as Knowledge domain visualization or Knowledge domain mapping map in the book intelligence community. The life cycle of the knowledge graph consists of the following parts: data ETL (Extract-Transform-Load), knowledge extraction, map definition, data import, knowledge reasoning and knowledge application.
The knowledge graph is generally divided into a heterogeneous knowledge graph and a homogeneous knowledge graph, nodes and edges in the homogeneous knowledge graph respectively have the same type, namely, the types are not distinguished, and the nodes and the edges in the heterogeneous knowledge graph can have different types and even have different attributes. At present, heterogeneous knowledge maps are generally described in the form of triplets, quintuples, heptads, and the like, for example, a large-scale directed knowledge map composed of "point-edge" is represented by "concept, relationship, and rule". The knowledge graph is described through the multi-group form, so that the relation between concepts, the relation between concepts and entities, the relation between entities and attributes, the relation between attributes and attribute values and the like can be clearly represented.
Although the multi-tuple form brings many benefits, when the heterogeneous distributed knowledge graph is calculated, the multi-tuple form is not concise and contains a large amount of redundant information, which is not beneficial to filtering interested node data, and therefore the calculation complexity is greatly increased.
Disclosure of Invention
The embodiment of the invention provides a big data processing method, a big data processing device, big data processing equipment and a big data processing medium based on a heterogeneous distributed knowledge graph, and aims to provide an effective data processing scheme aiming at the heterogeneous distributed knowledge graph.
In a first aspect, an embodiment of the present invention provides a big data processing method based on a heterogeneous distributed knowledge graph, including:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
In a second aspect, an embodiment of the present invention further provides a big data processing apparatus based on a heterogeneous distributed knowledge graph, including:
the building module is used for building a node table and a relation table of the heterogeneous distributed knowledge graph according to the data structure of the heterogeneous distributed knowledge base;
the determining module is used for determining a map computing scene according to the map computing request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the map computing scene;
the calculation node acquisition module is used for extracting at least one calculation node corresponding to the graph calculation scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene; the filtering module is used for filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
the computing module is used for carrying out data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
In a third aspect, an embodiment of the present invention further provides a computer device, including a processor and a memory, where the memory is used to store instructions, and when the instructions are executed, the processor is caused to perform the following operations:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, where the storage medium is configured to store instructions for performing:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
In the embodiment of the invention, the heterogeneous distributed knowledge graph is represented by a node table and a relation table, and the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene can be determined from the node table and the relation table, so that the corresponding node data can be filtered from the heterogeneous distributed knowledge graph according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene; and performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph without performing data processing on the whole graph. Therefore, the data processing method for the heterogeneous distributed knowledge graph is provided based on the node table and the relation table.
Drawings
Fig. 1 is a flowchart of a big data processing method based on a heterogeneous distributed knowledge graph according to an embodiment of the present invention;
FIG. 2 is a flowchart of a big data processing method based on heterogeneous distributed knowledge graph according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a big data processing apparatus based on a heterogeneous distributed knowledge graph according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a big data processing method based on a heterogeneous distributed knowledge graph according to an embodiment of the present invention, which may be applied to a case of performing data processing on the heterogeneous distributed knowledge graph. The method can be executed by a large data processing device based on heterogeneous distributed knowledge graph, the device can be composed of hardware and/or software, and is generally integrated in computer equipment, wherein the software can be written by Scala programming language and Java programming language.
The heterogeneous knowledge graph is opposite to the homogeneous knowledge graph, nodes and edges in the homogeneous knowledge graph respectively have the same type, namely the types are not distinguished, and the nodes and the edges in the heterogeneous knowledge graph have different types. For example, each node in the isomorphic knowledge graph represents a person, and the relationships between people represent cognitive relationships. Nodes in the heterogeneous knowledge graph can represent people, accounts, companies and the like, the relation between the people and the accounts is an owning relation, and the relation between the people and the companies is an empowerment relation; moreover, each type of node and edge also has different attributes. The heterogeneous distributed knowledge graph in the embodiment refers to a heterogeneous knowledge graph which is stored in a plurality of devices in a distributed manner, the data size of the knowledge graph is huge, the types and the attributes are different, and an effective data processing method aiming at the knowledge graph does not exist at present. Based on this, with reference to fig. 1, the big data processing method provided in this embodiment includes the following operations:
and 110, constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to the data structure of the heterogeneous distributed knowledge base.
In this embodiment, the heterogeneous distributed knowledge graph (hereinafter referred to as a graph) corresponds to a node table and a relationship table. The node table includes: the type and attribute of each node, the relationship table includes: the type and attributes of each edge. Where different types of nodes or edges have different properties.
Illustratively, the node table includes: the identification of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node. The identifier of the node is used as a unique identifier of the node, and may be a character string or a number. The type of the node is a necessary element in the heteromorphic graph, and the attribute of the node can be a dictionary (map) type as a character string type: map < string >, e.g., map < sender- > man, age- >20 >. By using the map to store data (including numbers and character strings) in different formats, the unification of data formats is ensured, and very strong flexibility is provided, so that the method provided by the map can be adopted to perform targeted extraction when data processing is performed on nodes with certain attributes in the subsequent process. The type set and attribute set of the node represent which types of nodes are present in the graph, and each different type of node has which attributes can be used for computation. The type set and attribute set of the node may be stored in hidden columns of the node table, unique to the graph. Based on this, this information after serialization can be recorded in meta information of Schema column. This data need not be carried repeatedly in each row of data to save space.
The relationship table includes: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge. Wherein, the starting node identification and the target node identification can be character strings or numbers. The type of the edge is a necessary element in the heteromorphic graph, and the attribute of the node can be a dictionary (map) type as a character string type: map < string >, such as map < sender- > man, age- >20, progress- > M, city- > P >. By using the map to store data (including numbers and character strings) in different formats, the unification of data formats is ensured, and very strong flexibility is provided, so that the method provided by the map can be adopted to perform targeted extraction when data processing is performed on edges of certain attributes in the subsequent process. The type set and attribute set of edges represent which types of edges exist in the graph, and which attributes each different type of edge possesses can be used for computation. The type set and attribute set of the edge may be stored in hidden columns of the relational table, unique to the graph. Based on this, this information after serialization can be recorded in meta information of Schema column. This data need not be carried repeatedly in each row of data to save space.
Optionally, constructing a node table and a relationship table of the heterogeneous distributed knowledge graph according to the data structure of the heterogeneous distributed knowledge base may include: loading a node data source and an edge data source for constructing a heterogeneous distributed knowledge graph; identifying a data structure of a node data source and a data structure of an edge data source; constructing a node table according to the data structure of the node data source; constructing a relation table according to the data structure of the edge data source; and reading the data in the node data source and the edge data source into a graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table.
Optionally, constructing a node table according to the data structure of the node data source may include: generating an identifier of each node in the node table according to the serial number column in the node data source; generating the type of each node in the node table according to the type field in the node data source; generating attributes of each node in a node table according to attribute fields in a node data source; and respectively summarizing the types and the attributes of the nodes to generate a type set and an attribute set of the nodes.
Optionally, constructing the relationship table according to the data structure of the edge data source may include: generating an initial node identifier of each edge in a relation table according to a number column corresponding to the initial node in the edge data source; generating a target node identifier of each edge in a relation table according to a number column corresponding to a target node in an edge data source; generating the type of each edge in the relation table according to the type field in the edge data source; generating the attribute of each edge in the relation table according to the attribute field in the edge data source; and summarizing the types and the attributes of the edges respectively to generate a type set and an attribute set of the edges.
And step 120, determining a graph computation scene according to the graph computation request, and determining the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computation scene.
When the data processing is required to be carried out on the map, all node data does not need to be extracted, and only a specific map calculation scene needs to be determined according to the map calculation request, and then the node data required by the map calculation scene calculation is extracted, so that the calculation amount is saved. According to the description of S110, the present embodiment uses a node table and a relation table to represent all nodes and edges in the graph, in other words, the node table and the relation table constitute indexes of data of each node in the graph. Based on this, the types and/or attributes of the nodes required for the graph computation are determined from the node table, and the types and/or attributes of the edges required for the graph computation scenario are determined from the relationship table.
Optionally, determining the type and/or attribute of the node required by the graph computation scene from the type set and attribute set of the node; from the set of types and the set of attributes of the edges, the types and/or attributes of the edges required by the graph to compute the scene are determined.
And step 130, extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene.
Optionally, according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene, the identifier of the node is searched in the node table, and the identifier of the starting node and the identifier of the target node are searched in the relation table; and determining at least one computing node according to the node identifier, the starting node identifier and the target node identifier.
And step 140, filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph.
In an optional embodiment, after at least one computing node is determined, corresponding node data is filtered from the heterogeneous distributed knowledge graph according to the identifier of each computing node, the identifier of the start node, and the identifier of the target node.
In another optional implementation, the node data carries the node type and attribute corresponding to itself and the attribute of the type of the corresponding edge, and then the node data carrying the type and/or attribute of the node and the type and/or attribute of the edge required for calculation is searched in all the node data.
In the embodiment, a graph database is used as a storage mode of the graph, the graph database generally takes an attribute graph as a basic representation form, for example, a Neo4j graph database, and nodes and relations can contain attributes, which means that a real service scene is more easily expressed, and the method is more suitable for a storage scene of a heterogeneous distributed knowledge graph. Specifically, node data corresponding to at least one computing node is filtered from a graph database corresponding to the heterogeneous distributed knowledge graph.
And 150, performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph.
After the node data is filtered out, the node data is calculated based on the current calculation scene. The following describes the calculation method provided in this embodiment in detail with an application scenario of calculating a web page rank (PageRank).
Assuming that a map of the access relation between internet users and web pages and the link relation between web pages exists, the PageRank statistics needs to be carried out on all web pages. Table 1 is a node table, and table 2 is a relationship table.
TABLE 1 node table
Figure BDA0002173440640000101
TABLE 2 relationship table
Figure BDA0002173440640000102
First, the website type, click type, and link type required for computing PageRank are found in the Schema of the node table and relationship table. Then, the identifier of the node of the website type found in the node table includes: abc.com and bcd.com; meanwhile, the starting node identification abc.com and the target node identification bcd.com of the edge of the link relation are found in the relation table, and the starting node identification 001 and the target node identification abc.com of the edge of the click relation are found, so that the node data corresponding to 001, abc.com and bcd.com are filtered out from the graph database corresponding to the graph. Next, a PageRank calculation is performed on the node data of the website type to obtain a PageRank value of each website, for example, the PageRank value of abc.com is 1, and the PageRank value of bcd.com is 2.
Optionally, adding the calculation result to an attribute set in the node table or an attribute set in the relationship table; and/or adding the calculation result into the attribute of the corresponding node in the node table or the attribute of the corresponding edge in the relation table. Following the application scenario described above, table 3 shows a new node table.
TABLE 3 New node table
Figure BDA0002173440640000111
As can be seen, the PageRank attribute is added to the attributes of each node, and the PageRank attribute is added to the attribute set of the node.
In the embodiment of the invention, the heterogeneous distributed knowledge graph is represented by a node table and a relation table, and the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene can be determined from the node table and the relation table, so that the corresponding node data can be filtered from the heterogeneous distributed knowledge graph according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene; and performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph without performing data processing on the whole graph. Therefore, the data processing method for the heterogeneous distributed knowledge graph is provided based on the node table and the relation table.
The heterogeneous distributed knowledge graph storage is distributed and stored in a plurality of devices; distributed data processing of the atlas is required. In a preferred embodiment, filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph comprises: filtering out node data corresponding to at least one computing node from each device; calculating the filtered node data to obtain a calculation result based on the heterogeneous distributed knowledge graph, wherein the calculation result comprises the following steps: calculating the corresponding node data filtered from each device; and summarizing the calculation result of each device to obtain the calculation result based on the heterogeneous distributed knowledge graph.
Specifically, the node table further includes devices in which the respective nodes are stored. In the node table, according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene, the device in which the node required by calculation is stored is determined, and node data is filtered from the corresponding device. The filtered node data in each device is then computed separately, optionally using a library like NetworkX. And then, summarizing the calculation result of each device to obtain a final distributed calculation result.
Example two
The embodiment is further optimized on the basis of the above embodiment, and optionally, before the node table and the relationship table of the heterogeneous distributed knowledge graph are obtained, the method further includes a heterogeneous distributed knowledge graph construction process. The method is suitable for the construction of the heterogeneous distributed knowledge graph under the condition of multiple data sources. The multiple data sources include, but are not limited to, a traditional relational database (e.g., Oracle, MySQL, etc.), a distributed relational database (e.g., Hive), a distributed non-relational database (e.g., HBase, elastic search, etc.), a TXT file, and a CSV file.
Because the data required for constructing the knowledge graph is usually from a plurality of different data sources, the data has structured relational data and unstructured text data. Whether the data of different data sources are unified into a complete data source or a respective import script is written for each data source, a lot of resources and time are needed, so that the whole data ETL process needs a lot of time and labor. To solve the defects in the prior art, with reference to fig. 2, the method provided by the embodiment of the present invention specifically includes the following operations:
and step 210, loading a node data source and an edge data source for constructing the heterogeneous distributed knowledge graph.
In this embodiment, both the node data source and the edge data source may be structured relational databases, such as a traditional relational database (e.g., Oracle, MySQL, etc.), a distributed relational database (e.g., Hive), a distributed non-relational database (e.g., HBase, elastic search, etc.), and may further include unstructured texts, such as TXT files and CSV files.
The embodiment designs an abstract data source interface, so that the computing device of the heterogeneous distributed knowledge graph can be seamlessly interfaced with various data sources. After different data sources are butted through the data source interface, a uniform operation mode is executed; data of different data sources does not need to be imported into a unified data source. The data source interface comprises a data structure and definition interface and a data reading interface, and can also comprise at least one of a data state checking interface and a data writing interface. The data structure and the definition interface are packaged with a data structure identification method and a map interface definition method. The interfaces are Application Programming Interfaces (APIs).
The computing device of the heterogeneous distributed knowledge graph can load the corresponding node data source and the edge data source through the storage path of the node data source and the storage path of the edge data source.
Step 220, identify the data structure of the node data source and the data structure of the edge data source.
In this embodiment, a data structure and a definition interface are called, and data structures of a node data source and an edge data source are identified.
The data structure of the node data source comprises a number column, a type field, an attribute field and node data, and the data structure of the edge data source comprises a number column corresponding to the starting node, a number column corresponding to the target node, a type field and an attribute field.
Step 230, constructing a node table according to the data structure of the node data source; and constructing a relation table according to the data structure of the edge data source.
Optionally, calling a data structure and a definition interface, and constructing a node table according to the data structure of the node data source; and constructing a relation table according to the data structure of the edge data source.
When a node table is constructed, the identification of each node in the node table is generated according to the serial number column in the node data source; generating the type of each node in the node table according to the type field in the node data source; generating attributes of each node in a node table according to attribute fields in a node data source; and respectively summarizing the types and the attributes of the nodes to generate a type set and an attribute set of the nodes.
When a relation table is constructed, generating an initial node identifier of each edge in the relation table according to a number column corresponding to the initial node in the edge data source; generating a target node identifier of each edge in a relation table according to a number column corresponding to a target node in an edge data source; generating the type of each edge in the relation table according to the type field in the edge data source; generating the attribute of each edge in the relation table according to the attribute field in the edge data source; and summarizing the types and the attributes of the edges respectively to generate a type set and an attribute set of the edges.
And 240, reading the data in the node data source and the edge data source into a graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table.
Optionally, a data reading interface in the data source interface is called, and data in the node data source and the edge data source are read into the graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relationship table.
The data reading interface is packaged with a method for reading data into a graph database. The node table and the relationship table include respective nodes, types and attributes of edges, and connection relationships of the nodes and the edges, as described in operation S130. Based on the data, the data required by the graph database can be determined according to the node table and the relation table, and the node data is read into the graph database.
And step 250, acquiring a node table and a relation table of the heterogeneous distributed knowledge graph.
And step 260, determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene.
And 270, filtering corresponding node data from the heterogeneous distributed knowledge graph according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene.
And step 280, performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph.
In the embodiment of the invention, the data structure of each data source can be identified through the data structure and the definition interface in the data source interface, different data sources are not required to be imported into a unified database, data is not required to be screened, an import script is compiled, and an import tool is used; the data in the data source is read into the graph database according to the node table and the relation table by calling the data reading interface in the data source interface, so that the data is automatically read, and business knowledge and cooperative cooperation of business experts and engineering experts are not needed. According to the embodiment, the data structure is only needed to be identified through the data source interface, and then the data is automatically imported according to the node table and the relation table, so that a large amount of manpower and resources are saved and the cost is reduced when different data sources are docked in the data import process.
On the basis of the above embodiments, the data source interface further includes: a data status check interface and/or a data write interface. Based on the above, the heterogeneous distributed knowledge graph-based computing method further comprises at least one of the following two embodiments.
The first embodiment: after a node data source and an edge data source for constructing the heterogeneous distributed knowledge graph are loaded, a data state check interface is called, whether the working states of the node data source and the edge data source are normal or not is checked, and the data source with the abnormal working state is fed back to a user.
Optionally, when identifying the data structures of the node data source and the edge data source and reading data into the graph database, checking whether the working states of the node data source and the edge data source are normal; and periodically checking whether the working states of the node data source and the edge data source are normal. If the data source is online and the user has access right, the data source is in a normal state; if the data source is offline or the user does not have access right, the state of the data source is abnormal, so that the stability and the safety of the map construction process are effectively ensured.
The second embodiment: and calling a data reading interface in the data source interface, reading data in the node data source and the edge data source into a graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table, calling a data writing interface, and reversely writing the data in the graph database into the corresponding data source. In particular, data in the graph database carries a source data source and a location in the source data source, e.g., row number, column number. Therefore, the data writing interface is called, and the data is reversely written into the corresponding position of the corresponding data source according to the source data source and the position in the source data source of the data in the graph database.
The data writing interface realizes the function of reversely writing the data in the knowledge graph into the data source, and is beneficial to restoring the knowledge graph and mutual verification between the data source and the knowledge graph.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a big data processing apparatus based on a heterogeneous distributed knowledge graph according to a third embodiment of the present invention, and this embodiment is suitable for a case of processing big data on the heterogeneous distributed knowledge graph. With reference to fig. 3, the apparatus comprises: a construction module 31, a determination module 32, a calculation node acquisition module 33, a filtering module 34 and a calculation module 35.
The building module 31 is configured to build a node table and a relationship table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base;
a determining module 32, configured to determine a graph computation scenario according to the graph computation request, determine the type and/or attribute of a node and the type and/or attribute of an edge required by the graph computation scenario;
a computation node obtaining module 33, configured to extract at least one computation node corresponding to the graph computation scenario from the node table and the relationship table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computation scenario;
a filtering module 34, configured to filter out node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
the calculation module 35 is configured to perform data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
In the embodiment of the invention, the heterogeneous distributed knowledge graph is represented by a node table and a relation table, and the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene can be determined from the node table and the relation table, so that the corresponding node data can be filtered from the heterogeneous distributed knowledge graph according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph calculation scene; and performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph without performing data processing on the whole graph. Therefore, the data processing method for the heterogeneous distributed knowledge graph is provided based on the node table and the relation table.
Optionally, the heterogeneous distributed knowledge graph store is stored in a plurality of devices in a distributed manner. When filtering out node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph, the filtering module 34 is specifically configured to: node data corresponding to at least one computing node is filtered from each device. When the computing module 35 computes the filtered node data to obtain a computing result based on the heterogeneous distributed knowledge graph, it is specifically configured to: calculating the corresponding node data filtered from each device; and summarizing the calculation result of each device to obtain the calculation result based on the heterogeneous distributed knowledge graph.
Optionally, the apparatus further includes an adding module, configured to add the calculation result to an attribute set in the node table or an attribute set in the relationship table after performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph; and/or adding the calculation result into the attribute of the corresponding node in the node table or the attribute of the corresponding edge in the relation table.
Optionally, when determining at least one computing node corresponding to the data computing scenario from the node table and the relationship table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scenario, the filtering module 34 is specifically configured to: according to the type and/or attribute of the node and the type and/or attribute of the edge required by the map calculation scene, searching the identifier of the node in a node table, and searching the identifier of the starting node and the identifier of the target node in a relation table; and determining at least one computing node according to the node identifier, the starting node identifier and the target node identifier.
Optionally, the building module 31 is configured to load a node data source and an edge data source for building the heterogeneous distributed knowledge graph when constructing the node table and the relationship table of the heterogeneous distributed knowledge graph according to the data structure of the heterogeneous distributed knowledge base; identifying a data structure of a node data source and a data structure of an edge data source; constructing a node table according to the data structure of the node data source; constructing a relation table according to the data structure of the edge data source; and reading the data in the node data source and the edge data source into a graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table.
Optionally, when the building module builds the node table according to the data structure of the node data source, the building module is specifically configured to: generating an identifier of each node in the node table according to the serial number column in the node data source; generating the type of each node in the node table according to the type field in the node data source; generating attributes of each node in a node table according to attribute fields in a node data source; and respectively summarizing the types and the attributes of the nodes to generate a type set and an attribute set of the nodes.
Optionally, when the building module builds the relationship table according to the data structure of the edge data source, the building module is specifically configured to: generating an initial node identifier of each edge in a relation table according to a number column corresponding to the initial node in the edge data source; generating a target node identifier of each edge in a relation table according to a number column corresponding to a target node in an edge data source; generating the type of each edge in the relation table according to the type field in the edge data source; generating the attribute of each edge in the relation table according to the attribute field in the edge data source; and summarizing the types and the attributes of the edges respectively to generate a type set and an attribute set of the edges.
The big data processing device based on the heterogeneous distributed knowledge graph provided by the embodiment of the invention can execute the big data processing method based on the heterogeneous distributed knowledge graph provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a computer apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the computer apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of processors 40 in the computer device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.
The memory 41 serves as a computer-readable storage medium, and may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the heterogeneous distributed knowledge-graph-based big data processing method in the embodiment of the present invention (for example, the building module 31, the determining module 32, the computing node obtaining module 33, the filtering module 34, and the computing module 35 in the heterogeneous distributed knowledge-graph-based big data processing apparatus). The processor 40 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the memory 41, that is, implements the data processing method based on the heterogeneous distributed knowledge graph.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 is operable to receive input numeric or character information and to generate key signal inputs, such as node tables and data tables, associated with user settings and function controls of the computer apparatus. The output device 43 may include a display device such as a display screen for displaying the result of the big data processing.
EXAMPLE five
The fifth embodiment of the present invention further provides a storage medium having instructions stored thereon. The instructions, when executed by a computer processor, perform a heterogeneous distributed knowledge-graph based big data processing method, the method comprising:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of the heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering node data corresponding to at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein, the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
Of course, the storage medium storing the instructions provided by the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the heterogeneous distributed knowledge graph-based big data processing method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the big data processing apparatus based on the heterogeneous distributed knowledge graph, the included units and modules are only divided according to the functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (15)

1. A big data processing method based on a heterogeneous distributed knowledge graph is characterized by comprising the following steps:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of a heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering out node data corresponding to the at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
2. The method of claim 1, wherein the heterogeneous distributed knowledge-graph store is stored in a plurality of devices in a distributed manner;
filtering node data corresponding to the at least one computing node from the heterogeneous distributed knowledge graph, including:
filtering out node data corresponding to the at least one computing node from each device;
the step of calculating the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph comprises the following steps:
calculating the corresponding node data filtered from each device;
and summarizing the calculation result of each device to obtain a data processing result based on the heterogeneous distributed knowledge graph.
3. The method according to claim 1, wherein after performing data processing on the filtered node data to obtain a data processing result based on a heterogeneous distributed knowledge graph, the method further comprises:
adding the data processing result to an attribute set in the node table or an attribute set in a relation table; and/or the presence of a gas in the gas,
and adding the data processing result to the attribute of the corresponding node in the node table or the attribute of the corresponding edge in the relation table.
4. The method according to claim 3, wherein the determining at least one computing node corresponding to the data computing scenario from the node table and the relationship table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scenario comprises:
according to the type and/or attribute of the node and the type and/or attribute of the edge required by the scene of the graph calculation, searching the identifier of the node in a node table, and searching the identifier of the initial node and the identifier of the target node in a relation table;
and determining the at least one computing node according to the node identifier, the starting node identifier and the target node identifier.
5. The method of claim 1, wherein constructing the node tables and the relationship tables of the heterogeneous distributed knowledge graph according to the data structure of the heterogeneous distributed knowledge base comprises:
loading a node data source and an edge data source for constructing a heterogeneous distributed knowledge graph;
identifying a data structure of the node data source and a data structure of an edge data source;
constructing the node table according to a data structure of a node data source; constructing the relation table according to the data structure of the edge data source;
and reading the data in the node data source and the edge data source into the graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table.
6. The method of claim 5, wherein constructing the node table according to the data structure of the node data source comprises:
generating an identifier of each node in the node table according to the serial number column in the node data source;
generating the type of each node in the node table according to the type field in the node data source;
generating attributes of each node in a node table according to attribute fields in a node data source;
and respectively summarizing the types and the attributes of the nodes to generate a type set and an attribute set of the nodes.
7. The method of claim 5, wherein constructing the relational table according to the data structure of the edge data source comprises:
generating an initial node identifier of each edge in a relation table according to a number column corresponding to the initial node in the edge data source;
generating a target node identifier of each edge in a relation table according to a number column corresponding to a target node in an edge data source;
generating the type of each edge in the relation table according to the type field in the edge data source;
generating the attribute of each edge in the relation table according to the attribute field in the edge data source;
and summarizing the types and the attributes of the edges respectively to generate a type set and an attribute set of the edges.
8. A computer device comprising a processor and a memory, the memory to store instructions that, when executed, cause the processor to:
constructing a node table and a relation table of the heterogeneous distributed knowledge graph according to a data structure of a heterogeneous distributed knowledge base;
determining a graph calculation scene according to the graph calculation request, and determining the type and/or attribute of a node and the type and/or attribute of an edge required by the graph calculation scene;
extracting at least one computing node corresponding to the graph computing scene from the node table and the relation table according to the type and/or attribute of the node and the type and/or attribute of the edge required by the graph computing scene;
filtering out node data corresponding to the at least one computing node from the heterogeneous distributed knowledge graph;
performing data processing on the filtered node data to obtain a data processing result based on the heterogeneous distributed knowledge graph;
wherein the node table includes: the identifier of each node, the type of each node, the attribute of each node, the type set and the attribute set of the node, and the relationship table comprises: the method comprises the steps of identifying a starting node of each edge, identifying a target node of each edge, identifying the type of each edge, identifying the attribute of each edge, and collecting the type and attribute of each edge.
9. The computer device of claim 8, wherein the heterogeneous distributed knowledge-graph store is stored in a plurality of devices in a distributed manner;
the processor is configured to filter node data corresponding to the at least one computing node from the heterogeneous distributed knowledge graph by:
filtering out node data corresponding to the at least one computing node from each device;
the processor is configured to obtain a data processing result based on the heterogeneous distributed knowledge graph by:
calculating the corresponding node data filtered from each device;
and summarizing the calculation result of each device to obtain a data processing result based on the heterogeneous distributed knowledge graph.
10. The computer device of claim 8, wherein the processor is further configured to:
after data processing results based on the heterogeneous distributed knowledge graph are obtained,
adding the data processing result to an attribute set in the node table or an attribute set in a relation table; and/or the presence of a gas in the gas,
and adding the data processing result to the attribute of the corresponding node in the node table or the attribute of the corresponding edge in the relation table.
11. The computer device of claim 10, wherein the processor is configured to determine the at least one compute node corresponding to the data computation scenario by:
according to the type and/or attribute of the node and the type and/or attribute of the edge required by the scene of the graph calculation, searching the identifier of the node in a node table, and searching the identifier of the initial node and the identifier of the target node in a relation table;
and determining the at least one computing node according to the node identifier, the starting node identifier and the target node identifier.
12. The computer apparatus of claim 8, wherein the processor is configured to construct the node tables and relationship tables of the heterogeneous distributed knowledge graph by:
loading a node data source and an edge data source for constructing a heterogeneous distributed knowledge graph;
identifying a data structure of the node data source and a data structure of an edge data source;
constructing the node table according to a data structure of a node data source; constructing the relation table according to the data structure of the edge data source;
and reading the data in the node data source and the edge data source into the graph database corresponding to the heterogeneous distributed knowledge graph according to the node table and the relation table.
13. The computer device of claim 12, wherein the processor is configured to construct the node table by:
generating an identifier of each node in the node table according to the serial number column in the node data source;
generating the type of each node in the node table according to the type field in the node data source;
generating attributes of each node in a node table according to attribute fields in a node data source;
and respectively summarizing the types and the attributes of the nodes to generate a type set and an attribute set of the nodes.
14. The computer device of claim 12, wherein the processor is configured to construct the relationship table by:
generating an initial node identifier of each edge in a relation table according to a number column corresponding to the initial node in the edge data source;
generating a target node identifier of each edge in a relation table according to a number column corresponding to a target node in an edge data source;
generating the type of each edge in the relation table according to the type field in the edge data source;
generating the attribute of each edge in the relation table according to the attribute field in the edge data source;
and summarizing the types and the attributes of the edges respectively to generate a type set and an attribute set of the edges.
15. A storage medium for storing instructions for performing the heterogeneous distributed knowledge graph-based big data processing method according to any one of claims 1 to 7.
CN201910770620.6A 2019-08-20 2019-08-20 Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph Active CN110472068B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910770620.6A CN110472068B (en) 2019-08-20 2019-08-20 Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
PCT/CN2020/109226 WO2021032002A1 (en) 2019-08-20 2020-08-14 Big data processing method based on heterogeneous distributed knowledge graph, device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910770620.6A CN110472068B (en) 2019-08-20 2019-08-20 Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph

Publications (2)

Publication Number Publication Date
CN110472068A CN110472068A (en) 2019-11-19
CN110472068B true CN110472068B (en) 2020-04-24

Family

ID=68512958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910770620.6A Active CN110472068B (en) 2019-08-20 2019-08-20 Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph

Country Status (2)

Country Link
CN (1) CN110472068B (en)
WO (1) WO2021032002A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046115B (en) * 2019-12-24 2023-08-08 四川文轩教育科技有限公司 Heterogeneous database interconnection management method based on knowledge graph
CN111324643B (en) * 2020-03-30 2023-08-29 北京百度网讯科技有限公司 Knowledge graph generation method, relationship mining method, device, equipment and medium
CN113704559A (en) * 2020-05-21 2021-11-26 北京金山数字娱乐科技有限公司 Data processing method and device
CN111708894B (en) * 2020-05-28 2023-06-20 北京赛博云睿智能科技有限公司 Knowledge graph creation method
CN111708895B (en) * 2020-05-28 2023-06-20 北京赛博云睿智能科技有限公司 Knowledge graph system construction method and device
CN113761286B (en) * 2020-06-01 2024-01-02 杭州海康威视数字技术股份有限公司 Knowledge graph embedding method and device and electronic equipment
CN111858956B (en) * 2020-07-07 2024-04-12 咪咕文化科技有限公司 Knowledge graph construction method, knowledge graph construction device, network equipment and storage medium
CN111931069B (en) * 2020-09-25 2021-01-22 浙江口碑网络技术有限公司 User interest determination method and device and computer equipment
CN112364045A (en) * 2020-10-23 2021-02-12 济南慧天云海信息技术有限公司 Heterogeneous data aggregation method
CN112271001B (en) * 2020-11-17 2022-08-16 中山大学 Medical consultation dialogue system and method applying heterogeneous graph neural network
CN114615027A (en) * 2022-02-24 2022-06-10 奇安信科技集团股份有限公司 Behavior data processing method, behavior data processing device, behavior data processing equipment and storage medium
CN114282011B (en) * 2022-03-01 2022-08-23 支付宝(杭州)信息技术有限公司 Knowledge graph construction method and device, and graph calculation method and device
CN114491085B (en) * 2022-04-15 2022-08-09 支付宝(杭州)信息技术有限公司 Graph data storage method and distributed graph data calculation method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013517574A (en) * 2010-01-15 2013-05-16 アビニシオ テクノロジー エルエルシー Managing data queries
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates
CN109213747A (en) * 2018-08-08 2019-01-15 麒麟合盛网络技术股份有限公司 A kind of data managing method and device
CN109240821A (en) * 2018-07-20 2019-01-18 北京航空航天大学 A kind of cross-domain cooperated computing of distribution and service system and method based on edge calculations
CN109388663A (en) * 2018-08-24 2019-02-26 中国电子科技集团公司电子科学研究院 A kind of big data intellectualized analysis platform of security fields towards the society
CN109918478A (en) * 2019-02-26 2019-06-21 北京悦图遥感科技发展有限公司 The method and apparatus of knowledge based map acquisition geographic products data
CN110119463A (en) * 2019-04-04 2019-08-13 厦门快商通信息咨询有限公司 Information processing method, device, equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504568B2 (en) * 2009-01-08 2013-08-06 Fluid Operations Gmbh Collaborative workbench for managing data from heterogeneous sources
US10586156B2 (en) * 2015-06-25 2020-03-10 International Business Machines Corporation Knowledge canvassing using a knowledge graph and a question and answer system
CN106503035A (en) * 2016-09-14 2017-03-15 海信集团有限公司 A kind of data processing method of knowledge mapping and device
CN109657065A (en) * 2018-10-31 2019-04-19 百度在线网络技术(北京)有限公司 Knowledge mapping processing method, device and electronic equipment
CN109766445B (en) * 2018-12-13 2024-03-26 平安科技(深圳)有限公司 Knowledge graph construction method and data processing device
CN109885698A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of knowledge mapping construction method and device, electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013517574A (en) * 2010-01-15 2013-05-16 アビニシオ テクノロジー エルエルシー Managing data queries
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates
CN109240821A (en) * 2018-07-20 2019-01-18 北京航空航天大学 A kind of cross-domain cooperated computing of distribution and service system and method based on edge calculations
CN109213747A (en) * 2018-08-08 2019-01-15 麒麟合盛网络技术股份有限公司 A kind of data managing method and device
CN109388663A (en) * 2018-08-24 2019-02-26 中国电子科技集团公司电子科学研究院 A kind of big data intellectualized analysis platform of security fields towards the society
CN109918478A (en) * 2019-02-26 2019-06-21 北京悦图遥感科技发展有限公司 The method and apparatus of knowledge based map acquisition geographic products data
CN110119463A (en) * 2019-04-04 2019-08-13 厦门快商通信息咨询有限公司 Information processing method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于BSP模型的图计算预处理研究";陈鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315;全文 *
"组合知识图谱和深度学习的城市交通拥堵区域预测研究";周光临;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20190815;全文 *

Also Published As

Publication number Publication date
CN110472068A (en) 2019-11-19
WO2021032002A1 (en) 2021-02-25

Similar Documents

Publication Publication Date Title
CN110472068B (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
US20200183995A1 (en) Discovery of linkage points between data sources
US10725981B1 (en) Analyzing big data
US9361320B1 (en) Modeling big data
AU2013335231B2 (en) Profiling data with location information
CN112199366A (en) Data table processing method, device and equipment
EP3674918B1 (en) Column lineage and metadata propagation
US20150019544A1 (en) Information service for facts extracted from differing sources on a wide area network
US11698918B2 (en) System and method for content-based data visualization using a universal knowledge graph
US20230075655A1 (en) Systems and methods for context-independent database search paths
CN112231598A (en) Webpage path navigation method and device, electronic equipment and storage medium
CN113688288A (en) Data association analysis method and device, computer equipment and storage medium
CN112579578A (en) Metadata-based data quality management method, device and system and server
CN111639016A (en) Big data log analysis method and device and computer storage medium
CN115062023A (en) Wide table optimization method and device, electronic equipment and computer readable storage medium
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium
CN113157934A (en) Knowledge graph origin processing method and system, electronic device and storage medium
CN113760864A (en) Data model generation method and device
CN109408704B (en) Fund data association method, system, computer device and storage medium
JP6870454B2 (en) Analytical equipment, analytical programs and analytical methods
CN110851517A (en) Source data extraction method, device and equipment and computer storage medium
Aydin et al. Data modelling for large-scale social media analytics: design challenges and lessons learned
CN111078671A (en) Method, device, equipment and medium for modifying data table field
CN115422367A (en) User data mapping construction method and system, electronic equipment and storage medium
CN115629958A (en) Universal field level automatic checking method and device for different service interfaces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.