CN111813873B - Entity relationship automatic discovery method and system - Google Patents

Entity relationship automatic discovery method and system Download PDF

Info

Publication number
CN111813873B
CN111813873B CN202010867916.2A CN202010867916A CN111813873B CN 111813873 B CN111813873 B CN 111813873B CN 202010867916 A CN202010867916 A CN 202010867916A CN 111813873 B CN111813873 B CN 111813873B
Authority
CN
China
Prior art keywords
data
entity
metadata
relationship
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010867916.2A
Other languages
Chinese (zh)
Other versions
CN111813873A (en
Inventor
周春姐
戴鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai Cloud Software Co ltd
Original Assignee
Yantai Cloud Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai Cloud Software Co ltd filed Critical Yantai Cloud Software Co ltd
Priority to CN202010867916.2A priority Critical patent/CN111813873B/en
Publication of CN111813873A publication Critical patent/CN111813873A/en
Application granted granted Critical
Publication of CN111813873B publication Critical patent/CN111813873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Abstract

The invention discloses an automatic entity relation discovery method, which comprises the following steps: step S5: and taking out the data from the message queue, simultaneously, extracting the data from the relation rediscovery, analyzing the extracted entity data, and acquiring the metadata from the analyzed entity data according to the entity type. According to the invention, an automatic relation discovery algorithm and a system engine are established based on the support of a data element and a data standard system by using a graph storage engine and an entity document storage engine, so that the problems of massive data flooding, rapid discovery and establishment of association relation and relation graphs thereof are solved, the error rate is obviously reduced compared with a manual carding mode, and the consumption of manpower, material resources and financial resources can be reduced.

Description

Entity relationship automatic discovery method and system
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to an automatic entity relationship discovery method and system.
Background
Big data, IT industry terminology, refers to a data set that cannot be captured, managed and processed with conventional software tools within a certain time frame, is a massive, high growth rate and diversified information asset that requires a new processing mode to have stronger decision making, insight discovery and process optimization capabilities.
In the traditional service platform, the relation between data basically depends on manual establishment of association fields and association information, and the relation database is used for storing entities, and the relation table is used for storing the relation between the entities, but under the current big data age environment, the process of carding massive and changeable data is still performed manually, and under the condition of continuously increasing the association relation between the data, the process becomes worry and unrealistic, and a great deal of manpower, material resources and financial resources are consumed, so that an automatic discovery method and system for the entity relation are needed to solve the problems in the market at present.
Disclosure of Invention
The invention aims at: in order to solve the problems that when massive and changeable data are combed in the current big data age environment, the method and the system for automatically discovering the entity relationship are still manually carried out, and the association relationship between the data is continuously increased, which is not practical and requires a great deal of manpower, material resources and financial resources to be consumed.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an automatic entity relationship discovery method, the automatic entity relationship discovery method comprising the steps of:
step S1: the method comprises the steps of accessing Data, receiving external entity Data (Schema, data) and entering a relation engine message queue, and adding a global unique ID (identity) to the accessed entity Data (Schema, data) in the accessing process;
step S2: storing entity data, taking out data from the message queue, and storing the data into a corresponding entity database according to entity types (Schema);
step S3: taking out data from the message queue, establishing an entity fixed point V in the graph database according to an entity type (Schema), and storing the entity fixed point V in the graph database in the form of a vertex;
step S4: taking out data from the message queue and storing the data into a compensation data pool in the relation compensation engine;
step S5: taking out Data from the message queue, simultaneously, extracting Data from relation rediscovery, analyzing the extracted entity Data (Schema, data), acquiring metadata from the analyzed entity Data (Schema, data) according to the entity type (Schema), and storing the acquired metadata into a metadata and Data element cache pool;
step S6: relationship compensation, namely when metadata acquired according to an entity type (Schema) simultaneously enter a plurality of entity data, it is possible to find blind points by storing data relationship among the newly entered entities, so that new data is stored in a compensation data pool according to a time period;
step S7: and accessing the graph database through a unified relationship access interface, and performing relationship visualization.
As a further description of the above technical solution:
in the step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.
As a further description of the above technical solution:
in the step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT of the field F, to obtain a metadata list MDL.
As a further description of the above technical solution:
and performing collision comparison on the data FD1 in the entity data corresponding to the field F and field data FD2 with the same service type of all the entities with the same service type.
As a further description of the above technical solution:
and if the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same with the data FD2 of the same service type, the two entities are represented to have the relation of FD1 = FD2 based on the F field, so that an entity relation edge E is stored in the graph database, an entity relation E is established, and E is stored in the graph database.
As a further description of the above technical solution:
and if the collision comparison results of the data FD1 and the field data FD2 of the same service type are different, no processing is performed.
As a further description of the above technical solution:
and the collision comparison result of the data FD1 and all the entities with the same service type and the field data FD2 with the same service type is continuously traversed no matter whether the data FD1 has the relation of FD 1=FD 2 or not, until the traversal is finished after all the fields are traversed.
As a further description of the above technical solution:
in the step S6, the data compensation task is periodically executed according to a set fixed time interval, and the definition of the fixed time interval depends on the requirement of timeliness of the data relationship in the actual service scenario.
As a further description of the above technical solution:
the compensation business process of the data compensation task is the relationship rediscovery process, namely the process method is identical to the process method of the step S5.
As a further description of the above technical solution:
the entity relation automatic discovery system comprises a relation discovery engine, a large data platform data management center, metadata and a data element buffer pool, a relation compensation engine, an entity database and a compensation system formed by a graph database, wherein the relation discovery engine needs to support the metadata system, the graph database and the entity database, metadata in the metadata are used for defining a data standard structure, the metadata consist of individual data elements, the data elements are business type definitions for describing data attributes, the graph database is used for storing discovered entity relations and relation anchor points, and the entity database is used for storing entity information data.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1. according to the invention, an entity identification method, a data element identification method, a concurrent distributed relation discovery method and a relation omission discovery and compensation method possibly brought by concurrent relation discovery in a mass data environment are utilized, an automatic relation discovery algorithm and a system engine are established based on the support of a data element, metadata and a data standard system by utilizing a graph storage engine and an entity document storage engine, the problems of massive data flooding, rapid discovery and establishment of an association relation and a relation graph thereof are solved, and compared with a manual carding mode, the error rate is obviously reduced, and the consumption of manpower, material resources and financial resources can be reduced.
2. In the invention, the establishment of the entity relationship is evolved from manual arrangement to automatic discovery, so that the working efficiency is improved while the error rate is reduced, and the relationship is automatically discovered in a mass data environment, so that the working efficiency is greatly improved.
3. According to the invention, a compensation mechanism can be provided for data omission, and the integrity of entity relationship is ensured.
Drawings
FIG. 1 is a schematic diagram of an entity relationship automatic discovery method and system according to the present invention;
fig. 2 is a flow chart of a method and system for automatically discovering entity relationships according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides a technical solution: an automatic entity relationship discovery method, the automatic entity relationship discovery method comprising the steps of:
step S1: the method comprises the steps of accessing Data, receiving external entity Data (Schema, data) and entering a relation engine message queue, and adding a global unique ID (identity) to the accessed entity Data (Schema, data) in the accessing process;
step S2: storing entity data, taking out data from the message queue, and storing the data into a corresponding entity database according to entity types (Schema);
step S3: taking out data from the message queue, establishing an entity fixed point V in the graph database according to an entity type (Schema), and storing the entity fixed point V in the graph database in the form of a vertex;
step S4: taking out data from the message queue and storing the data into a compensation data pool in the relation compensation engine;
step S5: taking out Data from the message queue, simultaneously, extracting Data from relation rediscovery, analyzing the extracted entity Data (Schema, data), acquiring metadata from the analyzed entity Data (Schema, data) according to the entity type (Schema), and storing the acquired metadata into a metadata and Data element cache pool;
step S6: relationship compensation, namely when metadata acquired according to an entity type (Schema) simultaneously enter a plurality of entity data, it is possible to find blind points by storing data relationship among the newly entered entities, so that new data is stored in a compensation data pool according to a time period;
step S7: and accessing the graph database through a unified relationship access interface, and performing relationship visualization.
Specifically, in step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.
Specifically, in step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT in the field F, to obtain a metadata list MDL.
Specifically, collision comparison is performed between the data FD1 in the entity data corresponding to the field F and field data FD2 of the same service type of all entities having the same service type.
Specifically, if the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same with all the entities having the same service type, the two entities are indicated to have the relationship of FD 1=fd 2 based on the F field, so that an entity relationship edge E is stored in the graph database, an entity relationship E is established, and the E is stored in the graph database.
Specifically, if the collision comparison results of the data FD1 and the field data FD2 of the same service type are different, no processing is performed.
Specifically, the collision comparison result of the data FD1 and the field data FD2 with the same service type is carried out on all the entities with the same service type, whether the collision comparison result has the relationship of FD 1=fd 2 or not, the traversal is continued until the traversal is finished after all the fields are traversed.
Specifically, in step S6, the data compensation task is periodically executed according to a set fixed time interval, and the definition of the fixed time interval depends on the requirement of timeliness of the data relationship in the actual service scenario.
Specifically, the compensation business process of the data compensation task, namely, the relationship rediscovery process, is exactly the same as the processing method of step S5.
The entity relation automatic discovery system comprises a relation discovery engine, a large data platform data management center, a metadata and data element buffer pool, a relation compensation engine, an entity database and a compensation system formed by a graph database, wherein the relation discovery engine needs to support the metadata system, the graph database and the entity database, metadata in the metadata are used for defining a data standard structure, the metadata consist of individual data elements, the data elements are business type definitions for describing data attributes, the graph database is used for storing discovered entity relations and relation anchor points, and the entity database is used for storing entity information data.
Working principle: when in use, the Data access is carried out, the external entity types (Schema, data) are received, the Data enter a relation engine message queue after the addition of the entity Data global unique ID is completed, the entity Data are put into storage, the Data in the message queue are extracted, the Data in the message queue are stored in corresponding entity databases according to the entity types (Schema), the entity fixed points V in the graph databases are established according to the entity types (Schema) and stored in the graph databases in the form of vertexes, the Data in the message queue are extracted, the Data in the message queue are directly stored in a compensation Data pool, the Data in the message queue are extracted, the metadata definition is acquired according to the entity types (Schema), all service fields F are traversed according to the field definition in the metadata, and the field service types (metadata) MT are acquired, querying all metadata having the same service type according to the service type (metadata) of the field F, so as to obtain a metadata list MDL, performing collision comparison between the Data FD1 in the entity Data corresponding to the field F and the field Data FD2 having the same service type according to the entity having the same service type, if the collision comparison between the Data FD1 and the field Data FD2 having the same service type is the same, indicating that the two entities have a relationship of FD 1=fd 2 based on the F field, storing an entity relationship edge E in the graph database, performing collision comparison between the Data FD1 and the field Data FD2 having the same service type if the collision comparison is different, the collision comparison result of the data FD1 and all entities with the same service type is carried out on the field data FD2 with the same service type, whether the relationship of FD1 = FD2 exists or not, the traversal is continued until all the fields are traversed, relationship compensation is carried out, when metadata acquired according to the entity type (Schema) simultaneously enter multiple entity data, blind points are possibly found out by storing the data relationship among the newly-entered entities, therefore, new data are stored in a compensation data pool according to a time period, data compensation tasks are periodically executed according to a set fixed time interval, the fixed time interval is defined according to the requirement of timeliness of the data relationship in an actual service scene, namely a relationship rediscovery flow, namely, the processing method for acquiring the metadata definition according to the entity type (Schema) is identical with that of the data in an extraction message queue.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (10)

1. An automatic entity relationship discovery method, which is characterized by comprising the following steps:
step S1: the method comprises the steps of accessing Data, receiving external entity Data (Schema, data) and entering a relation engine message queue, and adding a global unique ID (identity) to the accessed entity Data (Schema, data) in the accessing process;
step S2: storing entity data, taking out data from the message queue, and storing the data into a corresponding entity database according to entity types (Schema);
step S3: taking out data from the message queue, establishing an entity fixed point V in the graph database according to an entity type (Schema), and storing the entity fixed point V in the graph database in the form of a vertex;
step S4: taking out data from the message queue and storing the data into a compensation data pool in the relation compensation engine;
step S5: the method comprises the steps of taking out Data from a message queue, simultaneously, extracting Data from relation rediscovery, analyzing the extracted entity Data (Schema, data), then acquiring metadata from the analyzed entity Data (Schema, data) according to entity types (Schema), and storing the acquired metadata into a metadata and Data element cache pool, wherein the entity relation automatic discovery system comprises a relation discovery engine, a big Data platform Data management center, a metadata and Data element cache pool, a relation compensation engine, an entity database and a compensation system formed by a graph database;
step S6: relationship compensation, namely when metadata acquired according to an entity type (Schema) simultaneously enter a plurality of entity data, it is possible to find blind points by storing data relationship among the newly entered entities, so that new data is stored in a compensation data pool according to a time period;
step S7: and accessing the graph database through a unified relationship access interface, and performing relationship visualization.
2. The method according to claim 1, wherein in step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.
3. The method according to claim 2, wherein in step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT of the field F to obtain a metadata list MDL.
4. The method for automatically discovering entity relationships according to claim 3, wherein the data FD1 in the entity data corresponding to the field F is compared with the field data FD2 of the same service type of all entities having the same service type.
5. The method of claim 4, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same for all the entities having the same service type, and if the collision comparison result is the same, the two entities have a relationship of FD 1=fd 2 based on the F field, so that an entity relationship edge E is stored in the graph database, an entity relationship E is established, and the E is stored in the graph database.
6. The method for automatically discovering entity relationships according to claim 5, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is not processed if the collision comparison result is different from the collision comparison result of the field data FD2 of the same service type.
7. The method for automatically discovering entity relationships according to claim 6, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is performed with all the entities having the same service type, whether the data FD 1=fd 2 relationship exists or not, and the traversal is continued until the traversal is completed after all the fields are traversed.
8. The method according to claim 1, wherein the data compensation task is periodically performed according to a set fixed time interval in step S6, and the fixed time interval is defined according to the requirement of timeliness of the data relationship in the actual service scenario.
9. The method of claim 8, wherein the compensation business process of the data compensation task is a relationship rediscovery process, which is exactly the same as the processing method of step S5.
10. An automated entity relationship discovery system according to any one of claims 1-9 wherein operation of the relationship discovery engine requires support of a metadata system, a graph database, and an entity database, the metadata in the metadata being used to define a data standard structure, and the metadata being composed of individual data elements, and the data elements being service type definitions for describing data attributes, the graph database being used to store discovered entity relationships and relationship anchors, the entity database being used to store entity information data.
CN202010867916.2A 2020-08-26 2020-08-26 Entity relationship automatic discovery method and system Active CN111813873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010867916.2A CN111813873B (en) 2020-08-26 2020-08-26 Entity relationship automatic discovery method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010867916.2A CN111813873B (en) 2020-08-26 2020-08-26 Entity relationship automatic discovery method and system

Publications (2)

Publication Number Publication Date
CN111813873A CN111813873A (en) 2020-10-23
CN111813873B true CN111813873B (en) 2023-09-26

Family

ID=72860696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010867916.2A Active CN111813873B (en) 2020-08-26 2020-08-26 Entity relationship automatic discovery method and system

Country Status (1)

Country Link
CN (1) CN111813873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220405309A1 (en) * 2021-06-09 2022-12-22 Adstra, Inc. Systems and methods for a unified matching engine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10237294B1 (en) * 2017-01-30 2019-03-19 Splunk Inc. Fingerprinting entities based on activity in an information technology environment
CN110533339A (en) * 2019-09-02 2019-12-03 北京旷视科技有限公司 The determination method, apparatus and system of security protection cost
CN110750599A (en) * 2019-09-20 2020-02-04 中国电子科技集团公司第二十八研究所 Associated information extraction and display method based on entity modeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10237294B1 (en) * 2017-01-30 2019-03-19 Splunk Inc. Fingerprinting entities based on activity in an information technology environment
CN110533339A (en) * 2019-09-02 2019-12-03 北京旷视科技有限公司 The determination method, apparatus and system of security protection cost
CN110750599A (en) * 2019-09-20 2020-02-04 中国电子科技集团公司第二十八研究所 Associated information extraction and display method based on entity modeling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王战英 ; 王占宏 ; .基于元数据的分布式通用查询系统研究与实现.微型电脑应用.2017,(08),全文. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220405309A1 (en) * 2021-06-09 2022-12-22 Adstra, Inc. Systems and methods for a unified matching engine

Also Published As

Publication number Publication date
CN111813873A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN105045820B (en) Method for processing video image information of high-level data and database system
CN103092867B (en) Method and system for managing data, and data analyzing device
CN105912666B (en) A kind of mixed structure data high-performance storage of facing cloud platform, querying method
CN108256115B (en) Spark Sql-oriented HDFS small file real-time combination implementation method
EP2477355A1 (en) Method and device for managing association of network resources
CN113064866B (en) Power business data integration system
CN107944036B (en) Method for acquiring map change difference
CN111400288A (en) Data quality inspection method and system
CN111813873B (en) Entity relationship automatic discovery method and system
CN115269515A (en) Processing method for searching specified target document data
Girsang et al. Business intelligence for construction company acknowledgement reporting system
CN111708895B (en) Knowledge graph system construction method and device
CN113722325A (en) Method and device for detecting table information in database, computer equipment and storage medium
CN112052248A (en) Audit big data processing method and system
CN116737511A (en) Graph-based scheduling job monitoring method and device
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
CN108733781B (en) Cluster temporal data indexing method based on memory calculation
CN112199401B (en) Data request processing method, device, server, system and storage medium
CN113761390B (en) Method and system for analyzing attribute intimacy
CN114357068A (en) Method for synchronizing data from kafka to database
Colosi et al. Time series data management optimized for smart city policy decision
CN112328604A (en) Data middlebox construction method, system and medium for spatiotemporal portrait label management
CN113849659A (en) Construction method of audit system time sequence knowledge graph
CN109617734B (en) Network operation capability analysis method and device
CN111752984B (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant