CN111813873B

CN111813873B - Entity relationship automatic discovery method and system

Info

Publication number: CN111813873B
Application number: CN202010867916.2A
Authority: CN
Inventors: 周春姐; 戴鹏飞
Original assignee: Yantai Cloud Software Co ltd
Current assignee: Yantai Cloud Software Co ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2023-09-26
Anticipated expiration: 2040-08-26
Also published as: CN111813873A

Abstract

The invention discloses an automatic entity relation discovery method, which comprises the following steps: step S5: and taking out the data from the message queue, simultaneously, extracting the data from the relation rediscovery, analyzing the extracted entity data, and acquiring the metadata from the analyzed entity data according to the entity type. According to the invention, an automatic relation discovery algorithm and a system engine are established based on the support of a data element and a data standard system by using a graph storage engine and an entity document storage engine, so that the problems of massive data flooding, rapid discovery and establishment of association relation and relation graphs thereof are solved, the error rate is obviously reduced compared with a manual carding mode, and the consumption of manpower, material resources and financial resources can be reduced.

Description

Entity relationship automatic discovery method and system

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to an automatic entity relationship discovery method and system.

Background

Big data, IT industry terminology, refers to a data set that cannot be captured, managed and processed with conventional software tools within a certain time frame, is a massive, high growth rate and diversified information asset that requires a new processing mode to have stronger decision making, insight discovery and process optimization capabilities.

In the traditional service platform, the relation between data basically depends on manual establishment of association fields and association information, and the relation database is used for storing entities, and the relation table is used for storing the relation between the entities, but under the current big data age environment, the process of carding massive and changeable data is still performed manually, and under the condition of continuously increasing the association relation between the data, the process becomes worry and unrealistic, and a great deal of manpower, material resources and financial resources are consumed, so that an automatic discovery method and system for the entity relation are needed to solve the problems in the market at present.

Disclosure of Invention

The invention aims at: in order to solve the problems that when massive and changeable data are combed in the current big data age environment, the method and the system for automatically discovering the entity relationship are still manually carried out, and the association relationship between the data is continuously increased, which is not practical and requires a great deal of manpower, material resources and financial resources to be consumed.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an automatic entity relationship discovery method, the automatic entity relationship discovery method comprising the steps of:

step S1: the method comprises the steps of accessing Data, receiving external entity Data (Schema, data) and entering a relation engine message queue, and adding a global unique ID (identity) to the accessed entity Data (Schema, data) in the accessing process;

step S2: storing entity data, taking out data from the message queue, and storing the data into a corresponding entity database according to entity types (Schema);

step S3: taking out data from the message queue, establishing an entity fixed point V in the graph database according to an entity type (Schema), and storing the entity fixed point V in the graph database in the form of a vertex;

step S4: taking out data from the message queue and storing the data into a compensation data pool in the relation compensation engine;

step S5: taking out Data from the message queue, simultaneously, extracting Data from relation rediscovery, analyzing the extracted entity Data (Schema, data), acquiring metadata from the analyzed entity Data (Schema, data) according to the entity type (Schema), and storing the acquired metadata into a metadata and Data element cache pool;

step S6: relationship compensation, namely when metadata acquired according to an entity type (Schema) simultaneously enter a plurality of entity data, it is possible to find blind points by storing data relationship among the newly entered entities, so that new data is stored in a compensation data pool according to a time period;

step S7: and accessing the graph database through a unified relationship access interface, and performing relationship visualization.

As a further description of the above technical solution:

in the step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.

As a further description of the above technical solution:

in the step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT of the field F, to obtain a metadata list MDL.

As a further description of the above technical solution:

and performing collision comparison on the data FD1 in the entity data corresponding to the field F and field data FD2 with the same service type of all the entities with the same service type.

As a further description of the above technical solution:

and if the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same with the data FD2 of the same service type, the two entities are represented to have the relation of FD1 = FD2 based on the F field, so that an entity relation edge E is stored in the graph database, an entity relation E is established, and E is stored in the graph database.

As a further description of the above technical solution:

and if the collision comparison results of the data FD1 and the field data FD2 of the same service type are different, no processing is performed.

As a further description of the above technical solution:

and the collision comparison result of the data FD1 and all the entities with the same service type and the field data FD2 with the same service type is continuously traversed no matter whether the data FD1 has the relation of FD 1=FD 2 or not, until the traversal is finished after all the fields are traversed.

As a further description of the above technical solution:

in the step S6, the data compensation task is periodically executed according to a set fixed time interval, and the definition of the fixed time interval depends on the requirement of timeliness of the data relationship in the actual service scenario.

As a further description of the above technical solution:

the compensation business process of the data compensation task is the relationship rediscovery process, namely the process method is identical to the process method of the step S5.

As a further description of the above technical solution:

the entity relation automatic discovery system comprises a relation discovery engine, a large data platform data management center, metadata and a data element buffer pool, a relation compensation engine, an entity database and a compensation system formed by a graph database, wherein the relation discovery engine needs to support the metadata system, the graph database and the entity database, metadata in the metadata are used for defining a data standard structure, the metadata consist of individual data elements, the data elements are business type definitions for describing data attributes, the graph database is used for storing discovered entity relations and relation anchor points, and the entity database is used for storing entity information data.

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

1. according to the invention, an entity identification method, a data element identification method, a concurrent distributed relation discovery method and a relation omission discovery and compensation method possibly brought by concurrent relation discovery in a mass data environment are utilized, an automatic relation discovery algorithm and a system engine are established based on the support of a data element, metadata and a data standard system by utilizing a graph storage engine and an entity document storage engine, the problems of massive data flooding, rapid discovery and establishment of an association relation and a relation graph thereof are solved, and compared with a manual carding mode, the error rate is obviously reduced, and the consumption of manpower, material resources and financial resources can be reduced.

2. In the invention, the establishment of the entity relationship is evolved from manual arrangement to automatic discovery, so that the working efficiency is improved while the error rate is reduced, and the relationship is automatically discovered in a mass data environment, so that the working efficiency is greatly improved.

3. According to the invention, a compensation mechanism can be provided for data omission, and the integrity of entity relationship is ensured.

Drawings

FIG. 1 is a schematic diagram of an entity relationship automatic discovery method and system according to the present invention;

fig. 2 is a flow chart of a method and system for automatically discovering entity relationships according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-2, the present invention provides a technical solution: an automatic entity relationship discovery method, the automatic entity relationship discovery method comprising the steps of:

Specifically, in step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.

Specifically, in step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT in the field F, to obtain a metadata list MDL.

Specifically, collision comparison is performed between the data FD1 in the entity data corresponding to the field F and field data FD2 of the same service type of all entities having the same service type.

Specifically, if the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same with all the entities having the same service type, the two entities are indicated to have the relationship of FD 1=fd 2 based on the F field, so that an entity relationship edge E is stored in the graph database, an entity relationship E is established, and the E is stored in the graph database.

Specifically, if the collision comparison results of the data FD1 and the field data FD2 of the same service type are different, no processing is performed.

Specifically, the collision comparison result of the data FD1 and the field data FD2 with the same service type is carried out on all the entities with the same service type, whether the collision comparison result has the relationship of FD 1=fd 2 or not, the traversal is continued until the traversal is finished after all the fields are traversed.

Specifically, in step S6, the data compensation task is periodically executed according to a set fixed time interval, and the definition of the fixed time interval depends on the requirement of timeliness of the data relationship in the actual service scenario.

Specifically, the compensation business process of the data compensation task, namely, the relationship rediscovery process, is exactly the same as the processing method of step S5.

The entity relation automatic discovery system comprises a relation discovery engine, a large data platform data management center, a metadata and data element buffer pool, a relation compensation engine, an entity database and a compensation system formed by a graph database, wherein the relation discovery engine needs to support the metadata system, the graph database and the entity database, metadata in the metadata are used for defining a data standard structure, the metadata consist of individual data elements, the data elements are business type definitions for describing data attributes, the graph database is used for storing discovered entity relations and relation anchor points, and the entity database is used for storing entity information data.

Working principle: when in use, the Data access is carried out, the external entity types (Schema, data) are received, the Data enter a relation engine message queue after the addition of the entity Data global unique ID is completed, the entity Data are put into storage, the Data in the message queue are extracted, the Data in the message queue are stored in corresponding entity databases according to the entity types (Schema), the entity fixed points V in the graph databases are established according to the entity types (Schema) and stored in the graph databases in the form of vertexes, the Data in the message queue are extracted, the Data in the message queue are directly stored in a compensation Data pool, the Data in the message queue are extracted, the metadata definition is acquired according to the entity types (Schema), all service fields F are traversed according to the field definition in the metadata, and the field service types (metadata) MT are acquired, querying all metadata having the same service type according to the service type (metadata) of the field F, so as to obtain a metadata list MDL, performing collision comparison between the Data FD1 in the entity Data corresponding to the field F and the field Data FD2 having the same service type according to the entity having the same service type, if the collision comparison between the Data FD1 and the field Data FD2 having the same service type is the same, indicating that the two entities have a relationship of FD 1=fd 2 based on the F field, storing an entity relationship edge E in the graph database, performing collision comparison between the Data FD1 and the field Data FD2 having the same service type if the collision comparison is different, the collision comparison result of the data FD1 and all entities with the same service type is carried out on the field data FD2 with the same service type, whether the relationship of FD1 = FD2 exists or not, the traversal is continued until all the fields are traversed, relationship compensation is carried out, when metadata acquired according to the entity type (Schema) simultaneously enter multiple entity data, blind points are possibly found out by storing the data relationship among the newly-entered entities, therefore, new data are stored in a compensation data pool according to a time period, data compensation tasks are periodically executed according to a set fixed time interval, the fixed time interval is defined according to the requirement of timeliness of the data relationship in an actual service scene, namely a relationship rediscovery flow, namely, the processing method for acquiring the metadata definition according to the entity type (Schema) is identical with that of the data in an extraction message queue.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. An automatic entity relationship discovery method, which is characterized by comprising the following steps:

step S5: the method comprises the steps of taking out Data from a message queue, simultaneously, extracting Data from relation rediscovery, analyzing the extracted entity Data (Schema, data), then acquiring metadata from the analyzed entity Data (Schema, data) according to entity types (Schema), and storing the acquired metadata into a metadata and Data element cache pool, wherein the entity relation automatic discovery system comprises a relation discovery engine, a big Data platform Data management center, a metadata and Data element cache pool, a relation compensation engine, an entity database and a compensation system formed by a graph database;

2. The method according to claim 1, wherein in step S5, all the service fields F in the metadata are traversed according to the field definitions in the metadata.

3. The method according to claim 2, wherein in step S5, all metadata having the same service type are queried reversely according to the service type (metadata) MT of the field F to obtain a metadata list MDL.

4. The method for automatically discovering entity relationships according to claim 3, wherein the data FD1 in the entity data corresponding to the field F is compared with the field data FD2 of the same service type of all entities having the same service type.

5. The method of claim 4, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is the same for all the entities having the same service type, and if the collision comparison result is the same, the two entities have a relationship of FD 1=fd 2 based on the F field, so that an entity relationship edge E is stored in the graph database, an entity relationship E is established, and the E is stored in the graph database.

6. The method for automatically discovering entity relationships according to claim 5, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is not processed if the collision comparison result is different from the collision comparison result of the field data FD2 of the same service type.

7. The method for automatically discovering entity relationships according to claim 6, wherein the collision comparison result of the data FD1 and the field data FD2 of the same service type is performed with all the entities having the same service type, whether the data FD 1=fd 2 relationship exists or not, and the traversal is continued until the traversal is completed after all the fields are traversed.

8. The method according to claim 1, wherein the data compensation task is periodically performed according to a set fixed time interval in step S6, and the fixed time interval is defined according to the requirement of timeliness of the data relationship in the actual service scenario.

9. The method of claim 8, wherein the compensation business process of the data compensation task is a relationship rediscovery process, which is exactly the same as the processing method of step S5.

10. An automated entity relationship discovery system according to any one of claims 1-9 wherein operation of the relationship discovery engine requires support of a metadata system, a graph database, and an entity database, the metadata in the metadata being used to define a data standard structure, and the metadata being composed of individual data elements, and the data elements being service type definitions for describing data attributes, the graph database being used to store discovered entity relationships and relationship anchors, the entity database being used to store entity information data.