WO2021218021A1 - Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium - Google Patents

Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium Download PDF

Info

Publication number
WO2021218021A1
WO2021218021A1 PCT/CN2020/118135 CN2020118135W WO2021218021A1 WO 2021218021 A1 WO2021218021 A1 WO 2021218021A1 CN 2020118135 W CN2020118135 W CN 2020118135W WO 2021218021 A1 WO2021218021 A1 WO 2021218021A1
Authority
WO
WIPO (PCT)
Prior art keywords
blood relationship
data
entity object
graph
target
Prior art date
Application number
PCT/CN2020/118135
Other languages
French (fr)
Chinese (zh)
Inventor
黄祥铮
李钊
万书武
李均
赵素群
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021218021A1 publication Critical patent/WO2021218021A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • This application relates to the technical field of knowledge relationship analysis, in particular to data blood relationship analysis methods, devices, equipment, and computer-readable storage media.
  • Blood relationship analysis is a relatively common method in the field of data governance. Blood relationship analysis finds all related metadata objects starting from a certain data object and the relationship between these metadata objects through a comprehensive tracking of the data processing process, and can realize data Traceability of fusion processing.
  • the inventor found that with regard to data blood relationship management, there are currently data blood relationship analysis tools based on relational databases or big data platforms on the market. These analysis tools can only perform blood relationship analysis on data in a single type of database, which cannot meet the requirements of production practice. Data blood relationship analysis requirements for different types of databases.
  • the main purpose of this application is to propose a data blood relationship analysis method, device, equipment, and computer readable storage medium, which are designed to meet the data blood relationship analysis needs of different types of databases in production practice.
  • the first aspect of the present application provides a data blood relationship analysis method.
  • the data blood relationship analysis method includes the following steps:
  • mapping relationship determine the target entity object node to which the ancestor node is to be added in the blood relationship graph
  • a corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  • a second aspect of the present application provides a data blood relationship analysis device, and the data blood relationship analysis device includes:
  • the obtaining module is used to obtain the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
  • a conversion module configured to convert the input table and the output table into entity objects under a preset type system, respectively, and store the entity objects in a preset graphic database;
  • a construction module configured to construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship
  • the receiving module is used to receive the mapping relationship between the service source table and the big data table sent by the data access platform, wherein the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to The big data table of the big data platform, and in the process of extracting business data, the mapping relationship between the business source table and the big data table is recorded;
  • a determining module configured to determine a target entity object node to which an ancestor node is to be added in the blood relationship graph according to the mapping relationship;
  • the adding module is used to add a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, wherein the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
  • a third aspect of the present application provides a data blood relationship analysis device.
  • the data blood relationship analysis device includes a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor communicate with each other through a wire. Connect; the at least one processor calls the instructions in the memory, so that the data blood relationship analysis device executes the steps of the data blood relationship analysis method as described below:
  • mapping relationship determine the target entity object node to which the ancestor node is to be added in the blood relationship graph
  • a corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  • the fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the steps of the data blood relationship analysis method as described below:
  • mapping relationship determine the target entity object node to which the ancestor node is to be added in the blood relationship graph
  • a corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  • This application obtains the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table; compare the input table and the output table Respectively transforming into entity objects under a preset type system, storing the entity objects in a preset graphic database; constructing a blood relationship graph between the entity objects in the graphic database according to the blood relationship; Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform.
  • Data table and in the process of extracting business data, record the mapping relationship between the business source table and the big data table; according to the mapping relationship, determine the target of the ancestor node to be added in the blood relationship graph Entity object node; adding a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, wherein the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
  • the target blood relationship graph is generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them, and realizes the metadata of the relational data and the big data type data.
  • governance is integrated to meet the data blood relationship analysis needs of different types of databases in production practice.
  • FIG. 1 is a schematic flowchart of an embodiment of a method for analyzing data blood relationship of this application
  • FIG. 2 is a schematic diagram of the communication architecture between the data blood relationship analysis platform and other business platforms in an embodiment of the application;
  • FIG. 3 is a schematic diagram of the blood relationship diagram of the big data table in the embodiment of the application.
  • FIG. 4 is a schematic diagram of updating the blood relationship map in FIG. 3;
  • FIG. 5 is a schematic diagram of modules of an embodiment of the data blood relationship analysis device of this application.
  • FIG. 6 is a schematic structural diagram of a data blood relationship analysis device provided by an embodiment of the application.
  • the embodiment of the application provides a data blood relationship analysis method, device, equipment, and computer-readable storage medium, which are generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them
  • the target blood relationship map realizes the integration of metadata governance of relational data and big data type data, and meets the data blood relationship analysis needs of different types of databases in production practice.
  • Fig. 1 is a schematic flow chart of an embodiment of a method for data blood relationship analysis according to the present application, and the method includes:
  • Step 101 Obtain the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
  • the data blood relationship analysis method is applied to a server, and the server is equipped with a data blood relationship analysis platform.
  • Figure 2 is a schematic diagram of the communication architecture between the data blood relationship analysis platform and other business platforms in an embodiment of the application.
  • the communication architecture includes a data blood relationship analysis platform, a data access platform, a big data platform, and a relational business database. in:
  • the data access platform is responsible for extracting business data from the relational business database and transferring it to the big data platform. At the same time, it records the mapping relationship between the business source table and the big data table, and stores the mapping relationship in the supporting database of the data access platform In, timing synchronization to the data blood relationship analysis platform;
  • the big data platform is responsible for obtaining the flow relationship between the big data tables in the big data platform through the structured query language (SQL) statements currently executed on the big data platform and sending it to the data blood relationship analysis platform;
  • SQL structured query language
  • the data blood relationship analysis platform is responsible for generating a blood relationship graph based on the mapping relationship between the business source table and the big data table, and the circulation relationship between the big data tables in the big data platform, so as to display the data blood relationship in a visual way.
  • relational database is a database widely used in the production practice of enterprises.
  • the relational database and big data platform in this embodiment are determined according to actual business needs.
  • the relational database can be MySQL, Oracle, SQL Server,
  • relational databases such as Postgre SQL
  • the big data platform can be Hadoop, Spark, Storm and other big data platforms.
  • the server obtains the input table and output table of the SQL statement executed on the Hadoop big data platform, as well as the blood relationship between the input table and the output table.
  • the input table represents the source table input when the SQL statement is executed, and the output table represents the execution.
  • the target table output in the SQL statement, and the blood relationship between the input table and the output table can be obtained by parsing the SQL statement.
  • the above step 101 may include: monitoring the structured query language SQL statement currently executed on the big data platform through a preset hook program; The SQL statement is parsed, and the input table, output table of the SQL statement, and the blood relationship between the input table and the output table are obtained.
  • a hook program can be set in the server in advance, and the SQL statement currently executed on the big data platform can be monitored through the hook program.
  • the server parses the SQL statement into " Input” (input) and “Output” (output) two data sets, and from these two data sets, the input table, output table of the SQL statement, and the blood relationship between the input table and the output table are obtained.
  • the hook program listens to the SQL statement currently being executed on the big data platform: "insert overwrite table T1 select*from T2" (overwrite the data in table T2 into table T1), then use the preset syntax
  • the parser and the lexical parser can parse the SQL statement into: input table T2, output table T1, and T2 is the source table of T1.
  • Step 102 Convert the input table and the output table into entity objects under a preset type system respectively, and store the entity objects in a preset graphics database;
  • the Type System is used to define how to classify values and expressions in programming languages into many different types, how to manipulate these types, and how these types interact with each other.
  • the graph database is a non-relational database, which uses graph theory to store the relational information between entities.
  • the server converts the input table and output table into entity objects under the preset type system, and stores the entity objects in the preset graph database.
  • JanusGraph is mainly composed of two parts :
  • Hbase is a distributed, column-oriented, high-performance, non-relational database that supports real-time reading and writing. Through Hbase, specific entity objects generated by the type system can be stored in real time, as well as the blood relationship of the entity objects;
  • ElasticSearch is a distributed and scalable real-time search and analysis engine. Through ElasticSearch, an index is created for the entity objects in Hbase, and the entity objects and their blood relationships can be quickly retrieved in real time.
  • the server can store the entity object in Hbase.
  • Step 103 Construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
  • the server constructs a blood relationship graph between the entity objects in the graph database according to the blood relationship between the input table and the output table.
  • this step 103 may include: invoking a preset graph processing engine, and creating entity object nodes corresponding to the input table and output table one-to-one in the graph database through the graph processing engine; Directed edges are added between them to generate a graph of blood relationship between entity objects.
  • the graph processing engine can be Graph Engine, which is a memory-based distributed large-scale graph data processing engine.
  • Graph Engine Through Graph Engine, entities corresponding to input tables and output tables can be created in the graph database. Object nodes, and then add directed edges between the created entity object nodes according to the blood relationship between the tables, and a visualized blood relationship graph of the big data table can be generated.
  • the constructed blood relationship map can refer to Figure 3, which is a schematic diagram of the blood relationship map of the big data table in the embodiment of the application.
  • the ancestor node of the table “tmp1_org_info” in the figure is the table “tmp2_org_info”, and the descendant nodes are the table “test_org_info”. , The whole blood relationship is clear at a glance.
  • Step 104 Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform.
  • Big data table and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
  • the data access platform when the data access platform extracts business data from the business source table of the relational business database and transfers it to the big data table of the big data platform, it records the mapping relationship between the business source table and the big data table, and adds The mapping relationship is synchronized to the data blood relationship analysis platform at regular intervals, and the data blood relationship analysis platform receives the mapping relationship between the business source table and the big data table sent by the data access platform, thereby providing a prerequisite guarantee for the subsequent generation of the target blood relationship map.
  • Step 105 According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
  • the server determines the target entity object node of the ancestor node to be added in the blood relationship graph generated above according to the mapping relationship between the business source table and the big data table.
  • this step 105 may include: obtaining the table name of the big data table in the mapping relationship; judging whether there is an entity object node corresponding to the table name in the blood relationship graph; if there is an entity object corresponding to the table name in the blood relationship graph Node, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
  • the server obtains the table name of the big data table in the mapping relationship between the business source table and the big data table, and then determines whether there is an entity object node corresponding to the table name in the blood relationship graph, and if it exists, Explain that the table data of the entity object node comes from the business source table of the relational business database. At this time, the entity object node is determined as the target entity object node of the ancestor node to be added. If it does not exist, the blood relationship graph is directly determined For the target blood relationship map.
  • Step 106 Add a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  • the server obtains the business source table corresponding to the target entity object node, converts the business source table into entity objects under the preset type system, and adds the entity object as the ancestor node of the target entity object node.
  • the target blood relationship map is obtained, thereby completing the complete blood relationship link from the relational business database table to the big data table.
  • the target blood relationship graph is generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them, and realizes the metadata of the relational data and the big data type data.
  • governance is integrated to meet the data blood relationship analysis needs of different types of databases in production practice.
  • step 106 it may further include:
  • the server can receive a selection instruction triggered by the user to select the entity object node to be analyzed in the target blood relationship graph; of course, the server can also use the preset entity object node as the entity object node to be analyzed, where: Analysis refers to the analysis of the business involved in the entity object node.
  • the server can read the preset business configuration file to obtain the business associated with the entity object node to be analyzed.
  • the server can periodically collect statistics and statistics.
  • the number of chains in the blood relationship chain of the analyzed entity object node indicates the reference situation of the entity object node to be analyzed. The more the number of chains, the more popular the business involved in the entity object node.
  • the number of chains is compared with the first preset threshold and the second preset threshold.
  • the first preset threshold is greater than the second preset threshold; when the number of chains is greater than or equal to the first preset threshold, the physical object to be analyzed is compared
  • the business associated with the node is marked as a hot business; when the number of chains is less than or equal to the second preset threshold, the business associated with the entity object node to be analyzed is marked as an unpopular business.
  • the server compares the obtained number of chains with the first preset threshold and the second preset threshold respectively.
  • the number of chains is greater than or equal to the first preset threshold, it means that the entity object node to be analyzed is frequently cited , The business associated with it becomes more popular.
  • the server marks the business associated with the entity object node to be analyzed as a hot business.
  • the server marks the services associated with the entity object node to be analyzed as unpopular services.
  • the server can also send marked hot and unpopular businesses to the front-end page for display. For popular businesses, relevant business maintenance and attention can be strengthened in production, and for unpopular businesses, improvements may be needed.
  • the popularity analysis of the business related to the entity object node in the target blood relationship graph is realized, which is convenient for managers to understand the hot and cold conditions of the business department and adjust the production plan of the business department in time.
  • step 106 may further include: receiving a query instruction based on the target blood relationship graph through a preset user interaction page; according to the query instruction, sending the target data blood relationship analysis graph to the user interaction page for visualization exhibit.
  • the data blood relationship analysis platform can provide a user interaction page and an open application programming interface to provide real-time query and search services to managers or other external systems.
  • the server may receive a query instruction based on the target blood relationship map through a preset user interaction page, and then according to the query instruction, send the target data blood relationship analysis map to the user interaction page for visual display.
  • step of sending the blood relationship analysis graph of the target data to the user interaction page for visual display according to the query instruction it may further include:
  • the preset receiving frequency receive the mapping relationship between the business source table and the big data table sent by the data access platform; determine whether the mapping relationship is updated, and detect whether a new SQL statement is executed on the big data platform; If the relationship is updated, or a new SQL statement is executed on the big data platform, the target blood relationship graph will be updated accordingly.
  • the server can receive the mapping relationship between the service source table and the big data table sent by the data access platform according to the preset receiving frequency, and determine whether the mapping relationship is updated, and at the same time, detect the big data platform Whether a new SQL statement is executed on the database, if the mapping relationship is updated, or a new SQL statement is executed on the big data platform, the target blood relationship graph will be updated accordingly.
  • FIG. 4 is a schematic diagram of updating the blood relationship map in FIG.
  • “insert overwrite table test_org_info select*from delta1_org_info” (the data in the table “delta1_org_info” is inserted into the table “test_org_info”)
  • the blood of the table “test_org_info” The relationship chain has become two, converged in the table "test_org_info” node.
  • the real-time update of the target blood relationship map is realized, which provides a guarantee for accurate business traceability and impact analysis.
  • the embodiment of the present application also provides a data blood relationship analysis device.
  • FIG. 5 is a schematic diagram of modules of an embodiment of the data blood relationship analysis device of the present application.
  • the data blood relationship analysis device includes:
  • the obtaining module 501 is configured to obtain the input table and the output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
  • the conversion module 502 is configured to convert the input table and the output table into entity objects under a preset type system, respectively, and store the entity objects in a preset graphics database;
  • the construction module 503 is configured to construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
  • the receiving module 504 is configured to receive the mapping relationship between the service source table and the big data table sent by the data access platform, wherein the data access platform is used to extract the service data from the service source table of the relational service database and save it To the big data table of the big data platform, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
  • the determining module 505 is configured to determine the target entity object node to which the ancestor node is to be added in the blood relationship graph according to the mapping relationship;
  • the adding module 506 is configured to add a corresponding ancestor node to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
  • the obtaining module 501 is further configured to:
  • the monitored SQL statement is parsed to obtain the input table, output table of the SQL statement, and the blood relationship between the input table and the output table .
  • the building module 503 is also used to:
  • a directed edge is added between the created entity object nodes to generate a blood relationship graph between the entity objects.
  • the determining module 505 is further configured to:
  • the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
  • the data blood relationship analysis device further includes a service marking module, and the service marking module is configured to:
  • the service associated with the entity object node to be analyzed is marked as an unpopular service.
  • the data blood relationship analysis device further includes a query module, and the query module is configured to:
  • the blood relationship analysis graph of the target data is sent to the user interaction page for visual display.
  • the data blood relationship analysis device further includes an update module, and the update module is configured to:
  • the target blood relationship graph is updated correspondingly.
  • the data blood relationship analysis device in the embodiment of the present application is described in detail above from the perspective of a modular functional entity, and the data blood relationship analysis device in the embodiment of the present application is described in detail below from the perspective of hardware processing.
  • FIG. 6 is a schematic structural diagram of a data blood relationship analysis device provided by an embodiment of the application.
  • the data blood relationship analysis device 600 may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 (for example, one or more processors) and a memory 620. Or more than one storage medium 630 (for example, one or one storage device with a large amount of storage) storing application programs 533 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data blood relationship analysis device 600. Further, the processor 610 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the data blood relationship analysis device 600.
  • the data blood relationship analysis device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or one or more operating systems 631, such as Windows Serve , Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 631 such as Windows Serve , Mac OS X, Unix, Linux, FreeBSD, etc.
  • FIG. 6 does not constitute a limitation on the data blood relationship analysis device, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • a data blood relationship analysis program is stored, and when the data blood relationship analysis program is executed by a processor, the steps of the data blood relationship analysis method described above are realized.
  • the aforementioned integrated modules or units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Abstract

The present application relates to the technical field of big data, and discloses a data-based blood relationship analysis method, apparatus, and device and a computer-readable storage medium, used to meet the data-based blood relationship analysis requirements of different types of databases during production practices. The method comprises: acquiring an input table and an output table of structured query language (SQL) statements currently being executed on a big data platform, and a blood relationship between the input table and the output table; converting each of the input table and the output table into an entity object in a system having a pre-configured type, and storing the entity objects in a pre-configured graph database; constructing a blood relationship graph between the entity objects in the graph database according to the blood relationship; receiving a mapping relationship between a service source table and a big data table sent by a data access platform; determining, in the blood relationship graph and according to the mapping relationship, a target entity object node to which an ancestor node is to be added; and adding the corresponding ancestor node to the target entity object node so as to acquire a target blood relationship graph.

Description

数据血缘分析方法、装置、设备及计算机可读存储介质Data blood relationship analysis method, device, equipment and computer readable storage medium
本申请要求于2020年4月28日提交中国专利局、申请号为202010350107.4、发明名称为“数据血缘分析方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 28, 2020, the application number is 202010350107.4, and the invention title is "data blood relationship analysis method, device, equipment, and computer readable storage medium", and its entire content Incorporated in the application by reference.
技术领域Technical field
本申请涉及知识关系分析技术领域,尤其涉及数据血缘分析方法、装置、设备及计算机可读存储介质。This application relates to the technical field of knowledge relationship analysis, in particular to data blood relationship analysis methods, devices, equipment, and computer-readable storage media.
背景技术Background technique
随着互联网技术的飞速发展,每天都会产生海量的业务数据,面对日益增长的海量数据,对数据的治理愈发成为各大公司的重要关注点,尤其在大数据走进各大公司日常经营分析决策系统的当下,当某些数据发生变化的时候,如何准确回溯数据的源头以及如何进行数据影响分析,成为了一个重要课题。With the rapid development of Internet technology, massive amounts of business data are generated every day. Faced with the ever-increasing massive amounts of data, the governance of data has increasingly become an important focus of major companies, especially when big data enters the daily operations of major companies. In the current analysis of the decision-making system, when certain data changes, how to accurately trace the source of the data and how to analyze the impact of the data has become an important topic.
血缘分析是数据治理领域一种较为普遍的手段,血缘分析通过对数据处理过程的全面追踪,找到某个数据对象为起点的所有相关元数据对象以及这些元数据对象之间的关系,能够实现数据融合处理的可追溯。发明人发现关于数据血缘治理,目前市面上有基于关系型数据库的,或者基于大数据平台的数据血缘分析工具,这些分析工具仅能够对单一类型数据库中的数据进行血缘分析,无法满足生产实践中对不同类型数据库的数据血缘分析需求。Blood relationship analysis is a relatively common method in the field of data governance. Blood relationship analysis finds all related metadata objects starting from a certain data object and the relationship between these metadata objects through a comprehensive tracking of the data processing process, and can realize data Traceability of fusion processing. The inventor found that with regard to data blood relationship management, there are currently data blood relationship analysis tools based on relational databases or big data platforms on the market. These analysis tools can only perform blood relationship analysis on data in a single type of database, which cannot meet the requirements of production practice. Data blood relationship analysis requirements for different types of databases.
发明内容Summary of the invention
本申请的主要目的在于提出一种数据血缘分析方法、装置、设备及计算机可读存储介质,旨在满足生产实践中对不同类型数据库的数据血缘分析需求。The main purpose of this application is to propose a data blood relationship analysis method, device, equipment, and computer readable storage medium, which are designed to meet the data blood relationship analysis needs of different types of databases in production practice.
本申请第一方面提供了一种数据血缘分析方法,所述数据血缘分析方法包括如下步骤:The first aspect of the present application provides a data blood relationship analysis method. The data blood relationship analysis method includes the following steps:
获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
本申请第二方面提供了一种数据血缘分析装置,所述数据血缘分析装置包括:A second aspect of the present application provides a data blood relationship analysis device, and the data blood relationship analysis device includes:
获取模块,用于获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;The obtaining module is used to obtain the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
转化模块,用于将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;A conversion module, configured to convert the input table and the output table into entity objects under a preset type system, respectively, and store the entity objects in a preset graphic database;
构建模块,用于根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;A construction module, configured to construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
接收模块,用于接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;The receiving module is used to receive the mapping relationship between the service source table and the big data table sent by the data access platform, wherein the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to The big data table of the big data platform, and in the process of extracting business data, the mapping relationship between the business source table and the big data table is recorded;
确定模块,用于根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;A determining module, configured to determine a target entity object node to which an ancestor node is to be added in the blood relationship graph according to the mapping relationship;
添加模块,用于为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。The adding module is used to add a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, wherein the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
本申请第三方面提供了一种数据血缘分析设备,所述数据血缘分析设备包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述数据血缘分析设备执行如下所述的数据血缘分析方法的步骤:A third aspect of the present application provides a data blood relationship analysis device. The data blood relationship analysis device includes a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor communicate with each other through a wire. Connect; the at least one processor calls the instructions in the memory, so that the data blood relationship analysis device executes the steps of the data blood relationship analysis method as described below:
获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的数据血缘分析方法的步骤:The fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the steps of the data blood relationship analysis method as described below:
获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
本申请获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。这种方式通过结合关系型业务数据库的业务源表、大数据平台的大数据表,以及它们之间的血缘关系生成目标血缘关系图谱,实现了把关系型数据和大数据类型的数据的元数据治理整合在一起,满足了生产实践中对不同类型数据库的数据血缘分析需求。This application obtains the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table; compare the input table and the output table Respectively transforming into entity objects under a preset type system, storing the entity objects in a preset graphic database; constructing a blood relationship graph between the entity objects in the graphic database according to the blood relationship; Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table; according to the mapping relationship, determine the target of the ancestor node to be added in the blood relationship graph Entity object node; adding a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, wherein the ancestor node is used to represent the entity object transformed from the business source table of the target entity object. In this way, the target blood relationship graph is generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them, and realizes the metadata of the relational data and the big data type data. Governance is integrated to meet the data blood relationship analysis needs of different types of databases in production practice.
附图说明Description of the drawings
图1为本申请数据血缘分析方法的一个实施例的流程示意图;FIG. 1 is a schematic flowchart of an embodiment of a method for analyzing data blood relationship of this application;
图2为本申请实施例中数据血缘分析平台与其他业务平台之间的通信架构示意图;2 is a schematic diagram of the communication architecture between the data blood relationship analysis platform and other business platforms in an embodiment of the application;
图3为本申请实施例中大数据表的血缘关系图谱示意图;3 is a schematic diagram of the blood relationship diagram of the big data table in the embodiment of the application;
图4为对图3中的血缘关系图谱进行更新的示意图;4 is a schematic diagram of updating the blood relationship map in FIG. 3;
图5为本申请数据血缘分析装置的一个实施例的模块示意图;FIG. 5 is a schematic diagram of modules of an embodiment of the data blood relationship analysis device of this application;
图6为本申请实施例提供的数据血缘分析设备的结构示意图。FIG. 6 is a schematic structural diagram of a data blood relationship analysis device provided by an embodiment of the application.
具体实施方式Detailed ways
本申请实施例提供了一种数据血缘分析方法、装置、设备及计算机可读存储介质,通过结合关系型业务数据库的业务源表、大数据平台的大数据表,以及它们之间的血缘关系生成目标血缘关系图谱,实现了把关系型数据和大数据类型的数据的元数据治理整合在一起,满足了生产实践中对不同类型数据库的数据血缘分析需求。The embodiment of the application provides a data blood relationship analysis method, device, equipment, and computer-readable storage medium, which are generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them The target blood relationship map realizes the integration of metadata governance of relational data and big data type data, and meets the data blood relationship analysis needs of different types of databases in production practice.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
为便于理解,下面对本申请数据血缘分析方法实施例的具体流程进行描述。For ease of understanding, the specific process of the embodiment of the data blood relationship analysis method of the present application will be described below.
参照图1,图1为本申请数据血缘分析方法的一个实施例的流程示意图,该方法包括:Referring to Fig. 1, Fig. 1 is a schematic flow chart of an embodiment of a method for data blood relationship analysis according to the present application, and the method includes:
步骤101,获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及输入表和输出表之间的血缘关系;Step 101: Obtain the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
在本实施例中,数据血缘分析方法应用于服务器,该服务器上搭载有一数据血缘分析平台。参照图2,图2为本申请实施例中数据血缘分析平台与其他业务平台之间的通信架构示意图,该通信架构包括数据血缘分析平台、数据接入平台、大数据平台和关系型业务数据库,其中:In this embodiment, the data blood relationship analysis method is applied to a server, and the server is equipped with a data blood relationship analysis platform. Referring to Figure 2, Figure 2 is a schematic diagram of the communication architecture between the data blood relationship analysis platform and other business platforms in an embodiment of the application. The communication architecture includes a data blood relationship analysis platform, a data access platform, a big data platform, and a relational business database. in:
数据接入平台,负责从关系型业务数据库抽取业务数据转存至大数据平台,同时记录下业务源表与大数据表之间的映射关系,并把映射关系存储在数据接入平台的支撑数据库中,定时同步到数据血缘分析平台;The data access platform is responsible for extracting business data from the relational business database and transferring it to the big data platform. At the same time, it records the mapping relationship between the business source table and the big data table, and stores the mapping relationship in the supporting database of the data access platform In, timing synchronization to the data blood relationship analysis platform;
大数据平台,负责通过当前在大数据平台上执行的结构化查询语言(structured query language,SQL)语句,获取大数据平台里的各个大数据表之间的流转关系并发送到数据血缘分析平台;The big data platform is responsible for obtaining the flow relationship between the big data tables in the big data platform through the structured query language (SQL) statements currently executed on the big data platform and sending it to the data blood relationship analysis platform;
数据血缘分析平台,负责根据业务源表与大数据表之间的映射关系,以及大数据平台里的各个大数据表之间的流转关系生成血缘关系图谱,以通过可视化的方式展示数据血缘关系。The data blood relationship analysis platform is responsible for generating a blood relationship graph based on the mapping relationship between the business source table and the big data table, and the circulation relationship between the big data tables in the big data platform, so as to display the data blood relationship in a visual way.
需要说明的是,关系型数据库是企业生产实践中广泛使用的数据库,本实施例中的关 系型数据库和大数据平台根据实际业务需求而定,比如关系型数据库可以是MySQL、Oracle、SQL Server、Postgre SQL等关系型数据库,大数据平台可以是Hadoop、Spark、Storm等大数据平台。It should be noted that the relational database is a database widely used in the production practice of enterprises. The relational database and big data platform in this embodiment are determined according to actual business needs. For example, the relational database can be MySQL, Oracle, SQL Server, For relational databases such as Postgre SQL, the big data platform can be Hadoop, Spark, Storm and other big data platforms.
首先,服务器获取在Hadoop大数据平台上执行的SQL语句的输入表、输出表,以及输入表和输出表之间的血缘关系,其中输入表表示执行SQL语句时输入的源表,输出表表示执行SQL语句时输出的目标表,输入表和输出表之间的血缘关系可以通过解析SQL语句得到。First, the server obtains the input table and output table of the SQL statement executed on the Hadoop big data platform, as well as the blood relationship between the input table and the output table. The input table represents the source table input when the SQL statement is executed, and the output table represents the execution. The target table output in the SQL statement, and the blood relationship between the input table and the output table can be obtained by parsing the SQL statement.
在一实施方式中,上述步骤101可以包括:通过预设的钩子程序,监听当前在大数据平台上执行的结构化查询语言SQL语句;通过预设的语法解析器和词法解析器,对监听到的SQL语句进行解析,得到SQL语句的输入表、输出表,以及输入表和输出表之间的血缘关系。In one embodiment, the above step 101 may include: monitoring the structured query language SQL statement currently executed on the big data platform through a preset hook program; The SQL statement is parsed, and the input table, output table of the SQL statement, and the blood relationship between the input table and the output table are obtained.
具体地,可以预先在服务器中设置一个钩子程序,通过该钩子程序监听当前在大数据平台上执行的SQL语句,之后,服务器通过预设的语法解析器和词法解析器,将SQL语句解析为“Input”(输入)和“Output”(输出)两个数据集,进而从这两个数据集中获取到SQL语句的输入表、输出表,以及输入表和输出表之间的血缘关系。Specifically, a hook program can be set in the server in advance, and the SQL statement currently executed on the big data platform can be monitored through the hook program. After that, the server parses the SQL statement into " Input” (input) and “Output” (output) two data sets, and from these two data sets, the input table, output table of the SQL statement, and the blood relationship between the input table and the output table are obtained.
例如,若钩子程序监听到当前在大数据平台上执行的SQL语句为:“insert overwrite table T1 select*from T2”(将表T2中的数据覆盖插入到表T1中),则通过预设的语法解析器和词法解析器,可以将该SQL语句解析为:输入表T2,输出表T1,且T2为T1的源表。For example, if the hook program listens to the SQL statement currently being executed on the big data platform: "insert overwrite table T1 select*from T2" (overwrite the data in table T2 into table T1), then use the preset syntax The parser and the lexical parser can parse the SQL statement into: input table T2, output table T1, and T2 is the source table of T1.
步骤102,将输入表和输出表分别转化为预设类型系统下的实体对象,将实体对象存储至预设的图形数据库中;Step 102: Convert the input table and the output table into entity objects under a preset type system respectively, and store the entity objects in a preset graphics database;
在计算机科学中,类型系统(Type System)用于定义如何将编程语言中的数值和表达式归类为许多不同的类型,如何操作这些类型,这些类型如何互相作用。图形数据库是一种非关系型数据库,它应用图形理论存储实体之间的关系信息。In computer science, the Type System is used to define how to classify values and expressions in programming languages into many different types, how to manipulate these types, and how these types interact with each other. The graph database is a non-relational database, which uses graph theory to store the relational information between entities.
该步骤中,服务器将输入表和输出表分别转化为预设类型系统下的实体对象,并将该实体对象存储至预设的图形数据库中,以图形数据库JanusGraph为例,JanusGraph主要由2部分组成:In this step, the server converts the input table and output table into entity objects under the preset type system, and stores the entity objects in the preset graph database. Taking the graph database JanusGraph as an example, JanusGraph is mainly composed of two parts :
1、Hbase,Hbase是一个分布式的,面向列的,高性能的,支持实时读写的非关系型数据库,通过Hbase,可以实时存储类型系统生成的具体实体对象,及实体对象的血缘关系;1. Hbase, Hbase is a distributed, column-oriented, high-performance, non-relational database that supports real-time reading and writing. Through Hbase, specific entity objects generated by the type system can be stored in real time, as well as the blood relationship of the entity objects;
2、ElasticSearch,ElasticSearch是一个分布式的可扩展的实时搜索和分析引擎,通过ElasticSearch,给Hbase中的实体对象创建索引,可以实时快速检索到实体对象及其血缘关系。2. ElasticSearch, ElasticSearch is a distributed and scalable real-time search and analysis engine. Through ElasticSearch, an index is created for the entity objects in Hbase, and the entity objects and their blood relationships can be quickly retrieved in real time.
在本实施例中,服务器可以将实体对象存储至Hbase中。In this embodiment, the server can store the entity object in Hbase.
步骤103,根据血缘关系,在图形数据库中构建实体对象之间的血缘关系图谱;Step 103: Construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
该步骤中,服务器根据输入表和输出表之间的血缘关系,在图形数据库中构建实体对象之间的血缘关系图谱。In this step, the server constructs a blood relationship graph between the entity objects in the graph database according to the blood relationship between the input table and the output table.
进一步地,该步骤103可以包括:调用预设的图处理引擎,通过图处理引擎在图形数据 库中创建与输入表和输出表一一对应的实体对象节点;根据血缘关系,在创建的实体对象节点之间添加有向边,生成实体对象之间的血缘关系图谱。Further, this step 103 may include: invoking a preset graph processing engine, and creating entity object nodes corresponding to the input table and output table one-to-one in the graph database through the graph processing engine; Directed edges are added between them to generate a graph of blood relationship between entity objects.
本实施例中,图处理引擎可以是Graph Engine,Graph Engine是一个基于内存的分布式大规模图数据处理引擎,通过Graph Engine,可以在图形数据库中创建与输入表和输出表一一对应的实体对象节点,然后根据表之间的血缘关系,在创建的各实体对象节点之间添加有向边,可生成可视化的大数据表的血缘关系图谱。In this embodiment, the graph processing engine can be Graph Engine, which is a memory-based distributed large-scale graph data processing engine. Through Graph Engine, entities corresponding to input tables and output tables can be created in the graph database. Object nodes, and then add directed edges between the created entity object nodes according to the blood relationship between the tables, and a visualized blood relationship graph of the big data table can be generated.
例如,当前在大数据平台上先后执行了以下2条SQL语句:For example, currently the following two SQL statements have been executed on the big data platform:
1、insert overwrite table test_org_info select*from tmp1_org_info(将表“tmp1_org_info”中的数据覆盖插入到表“test_org_info”中);1. Insert overwrite table test_org_info select*from tmp1_org_info (overwrite the data in the table "tmp1_org_info" into the table "test_org_info");
2、insert overwrite table tmp1_org_info select*from tmp2_org_info(将表“tmp2_org_info”中的数据覆盖插入到表“tmp1_org_info”中)。2. Insert overwrite table tmp1_org_info select* from tmp2_org_info (overwrite the data in the table "tmp2_org_info" and insert it into the table "tmp1_org_info").
则构建的血缘关系图谱可以参照图3,图3为本申请实施例中大数据表的血缘关系图谱示意图,图中表“tmp1_org_info”的祖先节点是表“tmp2_org_info”,子孙节点是表“test_org_info”,整个血缘关系一目了然。The constructed blood relationship map can refer to Figure 3, which is a schematic diagram of the blood relationship map of the big data table in the embodiment of the application. The ancestor node of the table "tmp1_org_info" in the figure is the table "tmp2_org_info", and the descendant nodes are the table "test_org_info". , The whole blood relationship is clear at a glance.
步骤104,接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录业务源表与大数据表之间的映射关系;Step 104: Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Big data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
该步骤中,数据接入平台从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表的过程中,记录业务源表与大数据表之间的映射关系,并将该映射关系定时同步到数据血缘分析平台,数据血缘分析平台接收数据接入平台发送的业务源表与大数据表之间的映射关系,从而为后续生成目标血缘关系图谱提供前提保证。In this step, when the data access platform extracts business data from the business source table of the relational business database and transfers it to the big data table of the big data platform, it records the mapping relationship between the business source table and the big data table, and adds The mapping relationship is synchronized to the data blood relationship analysis platform at regular intervals, and the data blood relationship analysis platform receives the mapping relationship between the business source table and the big data table sent by the data access platform, thereby providing a prerequisite guarantee for the subsequent generation of the target blood relationship map.
步骤105,根据映射关系,在血缘关系图谱中确定待添加祖先节点的目标实体对象节点;Step 105: According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
该步骤中,服务器根据业务源表与大数据表之间的映射关系,在上述生成的血缘关系图谱中确定待添加祖先节点的目标实体对象节点。In this step, the server determines the target entity object node of the ancestor node to be added in the blood relationship graph generated above according to the mapping relationship between the business source table and the big data table.
进一步地,该步骤105可以包括:获取映射关系中的大数据表的表名;判断血缘关系图谱中是否存在与表名对应的实体对象节点;若血缘关系图谱中存在与表名对应的实体对象节点,则将与表名对应的实体对象节点确定为待添加祖先节点的目标实体对象节点。Further, this step 105 may include: obtaining the table name of the big data table in the mapping relationship; judging whether there is an entity object node corresponding to the table name in the blood relationship graph; if there is an entity object corresponding to the table name in the blood relationship graph Node, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
在本实施例中,服务器获取业务源表与大数据表之间的映射关系中的大数据表的表名,然后判断血缘关系图谱中是否存在与该表名对应的实体对象节点,若存在,说明该实体对象节点的表数据来源于关系型业务数据库的业务源表,此时将该实体对象节点确定为待添加祖先节点的目标实体对象节点,若不存在,则直接将该血缘关系图谱确定为目标血缘关系图谱。In this embodiment, the server obtains the table name of the big data table in the mapping relationship between the business source table and the big data table, and then determines whether there is an entity object node corresponding to the table name in the blood relationship graph, and if it exists, Explain that the table data of the entity object node comes from the business source table of the relational business database. At this time, the entity object node is determined as the target entity object node of the ancestor node to be added. If it does not exist, the blood relationship graph is directly determined For the target blood relationship map.
步骤106,为目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,祖先节点用于表示由目标实体对象的业务源表转化得到的实体对象。Step 106: Add a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
该步骤中,服务器获取与目标实体对象节点对应的业务源表,将该业务源表分别转化为预设类型系统下的实体对象,并将该实体对象作为上述目标实体对象节点的祖先节点添加到血缘关系图谱中,得到目标血缘关系图谱,由此完成了完整的从关系型业务数据库表到大数据表的全链路血缘关系链接。In this step, the server obtains the business source table corresponding to the target entity object node, converts the business source table into entity objects under the preset type system, and adds the entity object as the ancestor node of the target entity object node. In the blood relationship map, the target blood relationship map is obtained, thereby completing the complete blood relationship link from the relational business database table to the big data table.
本实施例通过结合关系型业务数据库的业务源表、大数据平台的大数据表,以及它们之间的血缘关系生成目标血缘关系图谱,实现了把关系型数据和大数据类型的数据的元数据治理整合在一起,满足了生产实践中对不同类型数据库的数据血缘分析需求。In this embodiment, the target blood relationship graph is generated by combining the business source table of the relational business database, the big data table of the big data platform, and the blood relationship between them, and realizes the metadata of the relational data and the big data type data. Governance is integrated to meet the data blood relationship analysis needs of different types of databases in production practice.
进一步地,基于本申请中数据血缘分析方法的第一实施例,提出本申请中数据血缘分析方法的第二实施例。Further, based on the first embodiment of the data blood relationship analysis method in this application, a second embodiment of the data blood relationship analysis method in this application is proposed.
在本实施例中,上述步骤106之后,还可以包括:In this embodiment, after the above step 106, it may further include:
在目标血缘关系图谱中确定待分析的实体对象节点;Determine the entity object node to be analyzed in the target blood relationship graph;
该步骤中,服务器可以接收用户触发的选择指令,从而在目标血缘关系图谱中选取待分析的实体对象节点;当然,服务器也可以将预设的实体对象节点作为待分析的实体对象节点,其中,分析指的是对实体对象节点所涉及的业务进行分析。In this step, the server can receive a selection instruction triggered by the user to select the entity object node to be analyzed in the target blood relationship graph; of course, the server can also use the preset entity object node as the entity object node to be analyzed, where: Analysis refers to the analysis of the business involved in the entity object node.
获取待分析的实体对象节点所关联的业务,并统计包含待分析的实体对象节点的血缘关系链的链条数;Obtain the business associated with the entity object node to be analyzed, and count the number of chains containing the blood relationship chain of the entity object node to be analyzed;
该步骤中,服务器可以读取预设的业务配置文件,从而获取到待分析的实体对象节点所关联的业务,此外,由于一个实体对象节点可能存在多条血缘关系链接,服务器可以定时统计包含待分析的实体对象节点的血缘关系链的链条数,该链条数表示待分析的实体对象节点的引用情况,链条数越多,说明实体对象节点所涉及的业务越热门。In this step, the server can read the preset business configuration file to obtain the business associated with the entity object node to be analyzed. In addition, since there may be multiple blood relationship links for an entity object node, the server can periodically collect statistics and statistics. The number of chains in the blood relationship chain of the analyzed entity object node. The number of chains indicates the reference situation of the entity object node to be analyzed. The more the number of chains, the more popular the business involved in the entity object node.
将链条数分别与第一预设阈值和第二预设阈值进行比较,第一预设阈值大于第二预设阈值;当链条数大于或等于第一预设阈值时,将待分析的实体对象节点所关联的业务标记为热门业务;当链条数小于或等于第二预设阈值时,将待分析的实体对象节点所关联的业务标记为冷门业务。The number of chains is compared with the first preset threshold and the second preset threshold. The first preset threshold is greater than the second preset threshold; when the number of chains is greater than or equal to the first preset threshold, the physical object to be analyzed is compared The business associated with the node is marked as a hot business; when the number of chains is less than or equal to the second preset threshold, the business associated with the entity object node to be analyzed is marked as an unpopular business.
该步骤中,服务器将获取到的链条数分别与第一预设阈值和第二预设阈值进行比较,当链条数大于或等于第一预设阈值时,说明待分析的实体对象节点被经常引用,与其关联的业务也越热门,此时服务器将该待分析的实体对象节点所关联的业务标记为热门业务,反之,当链条数小于或等于第二预设阈值时,说明待分析的实体对象节点被引用得较少,与其关联的业务相对冷淡,此时服务器将该待分析的实体对象节点所关联的业务标记为冷门业务。此外,服务器还可以将被标记的热门业务和冷门业务发送至前端页面进行展示,对于热门业务,在生产中可以加强相关业务维护及关注,对于冷门业务,则可能需要进行改良。In this step, the server compares the obtained number of chains with the first preset threshold and the second preset threshold respectively. When the number of chains is greater than or equal to the first preset threshold, it means that the entity object node to be analyzed is frequently cited , The business associated with it becomes more popular. At this time, the server marks the business associated with the entity object node to be analyzed as a hot business. On the contrary, when the number of chains is less than or equal to the second preset threshold, the entity object to be analyzed is indicated Nodes are cited less frequently, and the services associated with them are relatively indifferent. At this time, the server marks the services associated with the entity object node to be analyzed as unpopular services. In addition, the server can also send marked hot and unpopular businesses to the front-end page for display. For popular businesses, relevant business maintenance and attention can be strengthened in production, and for unpopular businesses, improvements may be needed.
通过上述方式,实现了对目标血缘关系图谱中的实体对象节点所关联的业务进行热门 程度分析,方便管理人员了解业务部门的冷热情况,及时调整业务部门生产规划。Through the above method, the popularity analysis of the business related to the entity object node in the target blood relationship graph is realized, which is convenient for managers to understand the hot and cold conditions of the business department and adjust the production plan of the business department in time.
进一步地,基于本申请中数据血缘分析方法的第一实施例,提出本申请中数据血缘分析方法的第三实施例。Further, based on the first embodiment of the data blood relationship analysis method in this application, a third embodiment of the data blood relationship analysis method in this application is proposed.
在本实施例中,上述步骤106之后,还可以包括:通过预设的用户交互页面,接收基于目标血缘关系图谱的查询指令;根据查询指令,将目标数据血缘分析图谱发送至用户交互页面进行可视化展示。In this embodiment, after step 106, it may further include: receiving a query instruction based on the target blood relationship graph through a preset user interaction page; according to the query instruction, sending the target data blood relationship analysis graph to the user interaction page for visualization exhibit.
在本实施例中,数据血缘分析平台可以提供用户交互页面和开放的应用程序编程接口,以对管理人员,或者对外部其他系统提供实时的查询和搜索服务。具体地,服务器可以通过预设的用户交互页面接收基于目标血缘关系图谱的查询指令,进而根据该查询指令,将目标数据血缘分析图谱发送至用户交互页面进行可视化展示。In this embodiment, the data blood relationship analysis platform can provide a user interaction page and an open application programming interface to provide real-time query and search services to managers or other external systems. Specifically, the server may receive a query instruction based on the target blood relationship map through a preset user interaction page, and then according to the query instruction, send the target data blood relationship analysis map to the user interaction page for visual display.
通过数据血缘的可视化展示,可以清晰的了解到业务数据的祖亲数据,临时生产发生情况时候可以快速追溯到准确的源头,及时分析事件原因改善生产措施。Through the visual display of the blood relationship of the data, the ancestral data of the business data can be clearly understood, and the occurrence of temporary production can be quickly traced back to the accurate source, and the cause of the incident can be analyzed in time to improve production measures.
进一步地,根据查询指令,将目标数据血缘分析图谱发送至用户交互页面进行可视化展示的步骤之后,还可以包括:Further, after the step of sending the blood relationship analysis graph of the target data to the user interaction page for visual display according to the query instruction, it may further include:
根据预设的接收频率,接收数据接入平台发送的业务源表与大数据表之间的映射关系;判断映射关系是否存在更新,并检测大数据平台上是否执行了新的SQL语句;若映射关系存在更新,或大数据平台上执行了新的SQL语句,则对应更新目标血缘关系图谱。According to the preset receiving frequency, receive the mapping relationship between the business source table and the big data table sent by the data access platform; determine whether the mapping relationship is updated, and detect whether a new SQL statement is executed on the big data platform; If the relationship is updated, or a new SQL statement is executed on the big data platform, the target blood relationship graph will be updated accordingly.
在本实施例中,服务器可以根据预设的接收频率,接收数据接入平台发送的业务源表与大数据表之间的映射关系,并判断该映射关系是否存在更新,同时,检测大数据平台上是否执行了新的SQL语句,若映射关系存在更新,或大数据平台上执行了新的SQL语句,则对应更新目标血缘关系图谱。In this embodiment, the server can receive the mapping relationship between the service source table and the big data table sent by the data access platform according to the preset receiving frequency, and determine whether the mapping relationship is updated, and at the same time, detect the big data platform Whether a new SQL statement is executed on the database, if the mapping relationship is updated, or a new SQL statement is executed on the big data platform, the target blood relationship graph will be updated accordingly.
参照图4,图4为对图3中的血缘关系图谱进行更新的示意图。当检测到大数据平台上执行了新的SQL语句:“insert overwrite table test_org_info select*from delta1_org_info”(将表“delta1_org_info”中的数据覆盖插入到表“test_org_info”中)时,表“test_org_info”的血缘关系链变为了2条,汇聚于表“test_org_info”节点。Referring to FIG. 4, FIG. 4 is a schematic diagram of updating the blood relationship map in FIG. When it is detected that a new SQL statement is executed on the big data platform: "insert overwrite table test_org_info select*from delta1_org_info" (the data in the table "delta1_org_info" is inserted into the table "test_org_info"), the blood of the table "test_org_info" The relationship chain has become two, converged in the table "test_org_info" node.
通过上述方式,实现了目标血缘关系图谱的实时更新,为精准化的业务溯源及影响分析提供了保证。Through the above method, the real-time update of the target blood relationship map is realized, which provides a guarantee for accurate business traceability and impact analysis.
本申请实施例还提供一种数据血缘分析装置。The embodiment of the present application also provides a data blood relationship analysis device.
参照图5,图5为本申请数据血缘分析装置的一个实施例的模块示意图。本实施例中,所述数据血缘分析装置包括:Referring to FIG. 5, FIG. 5 is a schematic diagram of modules of an embodiment of the data blood relationship analysis device of the present application. In this embodiment, the data blood relationship analysis device includes:
获取模块501,用于获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;The obtaining module 501 is configured to obtain the input table and the output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
转化模块502,用于将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;The conversion module 502 is configured to convert the input table and the output table into entity objects under a preset type system, respectively, and store the entity objects in a preset graphics database;
构建模块503,用于根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;The construction module 503 is configured to construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
接收模块504,用于接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;The receiving module 504 is configured to receive the mapping relationship between the service source table and the big data table sent by the data access platform, wherein the data access platform is used to extract the service data from the service source table of the relational service database and save it To the big data table of the big data platform, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
确定模块505,用于根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;The determining module 505 is configured to determine the target entity object node to which the ancestor node is to be added in the blood relationship graph according to the mapping relationship;
添加模块506,用于为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。The adding module 506 is configured to add a corresponding ancestor node to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
可选的,所述获取模块501还用于:Optionally, the obtaining module 501 is further configured to:
通过预设的钩子程序,监听当前在大数据平台上执行的结构化查询语言SQL语句;Through the preset hook program, monitor the structured query language SQL statement currently executed on the big data platform;
通过预设的语法解析器和词法解析器,对监听到的所述SQL语句进行解析,得到所述SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系。Through the preset syntax parser and lexical parser, the monitored SQL statement is parsed to obtain the input table, output table of the SQL statement, and the blood relationship between the input table and the output table .
可选的,所述构建模块503还用于:Optionally, the building module 503 is also used to:
调用预设的图处理引擎,通过所述图处理引擎在所述图形数据库中创建与所述输入表和所述输出表一一对应的实体对象节点;Calling a preset graph processing engine, and creating entity object nodes corresponding to the input table and the output table one-to-one in the graph database through the graph processing engine;
根据所述血缘关系,在创建的所述实体对象节点之间添加有向边,生成所述实体对象之间的血缘关系图谱。According to the blood relationship, a directed edge is added between the created entity object nodes to generate a blood relationship graph between the entity objects.
可选的,所述确定模块505还用于:Optionally, the determining module 505 is further configured to:
获取所述映射关系中的大数据表的表名;Acquiring the table name of the big data table in the mapping relationship;
判断所述血缘关系图谱中是否存在与所述表名对应的实体对象节点;Judging whether there is an entity object node corresponding to the table name in the blood relationship graph;
若所述血缘关系图谱中存在与所述表名对应的实体对象节点,则将与所述表名对应的实体对象节点确定为待添加祖先节点的目标实体对象节点。If there is an entity object node corresponding to the table name in the blood relationship graph, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
可选的,所述数据血缘分析装置还包括业务标记模块,所述业务标记模块用于:Optionally, the data blood relationship analysis device further includes a service marking module, and the service marking module is configured to:
在所述目标血缘关系图谱中确定待分析的实体对象节点;Determine the entity object node to be analyzed in the target blood relationship graph;
获取所述待分析的实体对象节点所关联的业务,并统计包含所述待分析的实体对象节点的血缘关系链的链条数;Acquiring the business associated with the entity object node to be analyzed, and counting the number of chains containing the blood relationship chain of the entity object node to be analyzed;
将所述链条数分别与第一预设阈值和第二预设阈值进行比较,所述第一预设阈值大于所述第二预设阈值;Comparing the number of chains with a first preset threshold and a second preset threshold respectively, where the first preset threshold is greater than the second preset threshold;
当所述链条数大于或等于所述第一预设阈值时,将所述待分析的实体对象节点所关联的业务标记为热门业务;When the number of chains is greater than or equal to the first preset threshold, mark the business associated with the entity object node to be analyzed as a hot business;
当所述链条数小于或等于所述第二预设阈值时,将所述待分析的实体对象节点所关联的业务标记为冷门业务。When the number of chains is less than or equal to the second preset threshold, the service associated with the entity object node to be analyzed is marked as an unpopular service.
可选的,所述数据血缘分析装置还包括查询模块,所述查询模块用于:Optionally, the data blood relationship analysis device further includes a query module, and the query module is configured to:
通过预设的用户交互页面,接收基于所述目标血缘关系图谱的查询指令;Receiving a query instruction based on the target blood relationship graph through a preset user interaction page;
根据所述查询指令,将所述目标数据血缘分析图谱发送至所述用户交互页面进行可视化展示。According to the query instruction, the blood relationship analysis graph of the target data is sent to the user interaction page for visual display.
可选的,所述数据血缘分析装置还包括更新模块,所述更新模块用于:Optionally, the data blood relationship analysis device further includes an update module, and the update module is configured to:
根据预设的接收频率,接收所述数据接入平台发送的业务源表与大数据表之间的映射关系;Receiving the mapping relationship between the service source table and the big data table sent by the data access platform according to a preset receiving frequency;
判断所述映射关系是否存在更新,并检测所述大数据平台上是否执行了新的SQL语句;Judging whether the mapping relationship has been updated, and detecting whether a new SQL statement is executed on the big data platform;
若所述映射关系存在更新,或所述大数据平台上执行了新的SQL语句,则对应更新所述目标血缘关系图谱。If there is an update of the mapping relationship, or a new SQL statement is executed on the big data platform, the target blood relationship graph is updated correspondingly.
上述数据血缘分析装置中各个模块的功能实现及有益效果与上述数据血缘分析方法实施例中各步骤相对应,此处不再赘述。The functional realization and beneficial effects of each module in the data blood relationship analysis device correspond to the steps in the embodiment of the data blood relationship analysis method, and will not be repeated here.
上面从模块化功能实体的角度对本申请实施例中的数据血缘分析装置进行了详细描述,下面从硬件处理的角度对本申请实施例中数据血缘分析设备进行详细描述。The data blood relationship analysis device in the embodiment of the present application is described in detail above from the perspective of a modular functional entity, and the data blood relationship analysis device in the embodiment of the present application is described in detail below from the perspective of hardware processing.
参照图6,图6为本申请实施例提供的数据血缘分析设备的结构示意图。该数据血缘分析设备600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)610(例如,一个或一个以上处理器)和存储器620,一个或一个以上存储应用程序533或数据632的存储介质630(例如一个或一个以上海量存储设备)。其中,存储器620和存储介质630可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据血缘分析设备600中的一系列指令操作。更进一步地,处理器610可以设置为与存储介质630通信,在数据血缘分析设备600上执行存储介质630中的一系列指令操作。Referring to FIG. 6, FIG. 6 is a schematic structural diagram of a data blood relationship analysis device provided by an embodiment of the application. The data blood relationship analysis device 600 may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 (for example, one or more processors) and a memory 620. Or more than one storage medium 630 (for example, one or one storage device with a large amount of storage) storing application programs 533 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data blood relationship analysis device 600. Further, the processor 610 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the data blood relationship analysis device 600.
数据血缘分析设备600还可以包括一个或一个以上电源640,一个或一个以上有线或无线网络接口650,一个或一个以上输入输出接口660,和/或,一个或一个以上操作系统631,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图6示出的数据血缘分析设备结构并不构成对数据血缘分析设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The data blood relationship analysis device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or one or more operating systems 631, such as Windows Serve , Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the data blood relationship analysis device shown in FIG. 6 does not constitute a limitation on the data blood relationship analysis device, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有数据血缘分析程序,所述数据血缘分析程序被处理器执行时实现如上所述的数据血缘分 析方法的步骤。The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. A data blood relationship analysis program is stored, and when the data blood relationship analysis program is executed by a processor, the steps of the data blood relationship analysis method described above are realized.
其中,在所述处理器上运行的数据血缘分析程序被执行时所实现的方法及有益效果可参照本申请数据血缘分析方法的各个实施例,此处不再赘述。Wherein, the method and beneficial effects achieved when the data blood relationship analysis program running on the processor is executed can refer to each embodiment of the data blood relationship analysis method of the present application, which will not be repeated here.
本领域技术人员可以理解,上述集成的模块或单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that if the aforementioned integrated modules or units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still compare the previous embodiments. The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种数据血缘分析方法,其中,所述数据血缘分析方法包括如下步骤:A data blood relationship analysis method, wherein the data blood relationship analysis method includes the following steps:
    获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
    将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
    根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
    接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
    根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
    为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  2. 如权利要求1所述的数据血缘分析方法,其中,所述获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系的步骤包括:The data blood relationship analysis method according to claim 1, wherein said obtaining the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the relationship between the input table and the output table The steps of blood relationship include:
    通过预设的钩子程序,监听当前在大数据平台上执行的结构化查询语言SQL语句;Through the preset hook program, monitor the structured query language SQL statement currently executed on the big data platform;
    通过预设的语法解析器和词法解析器,对监听到的所述SQL语句进行解析,得到所述SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系。Through the preset syntax parser and lexical parser, the monitored SQL statement is parsed to obtain the input table, output table of the SQL statement, and the blood relationship between the input table and the output table .
  3. 如权利要求1所述的数据血缘分析方法,其中,所述根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱的步骤包括:The data blood relationship analysis method of claim 1, wherein the step of constructing a blood relationship graph between the entity objects in the graph database according to the blood relationship comprises:
    调用预设的图处理引擎,通过所述图处理引擎在所述图形数据库中创建与所述输入表和所述输出表一一对应的实体对象节点;Calling a preset graph processing engine, and creating entity object nodes corresponding to the input table and the output table one-to-one in the graph database through the graph processing engine;
    根据所述血缘关系,在创建的所述实体对象节点之间添加有向边,生成所述实体对象之间的血缘关系图谱。According to the blood relationship, a directed edge is added between the created entity object nodes to generate a blood relationship graph between the entity objects.
  4. 如权利要求1所述的数据血缘分析方法,其中,所述根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点的步骤包括:The data blood relationship analysis method according to claim 1, wherein the step of determining the target entity object node of the ancestor node to be added in the blood relationship graph according to the mapping relationship comprises:
    获取所述映射关系中的大数据表的表名;Acquiring the table name of the big data table in the mapping relationship;
    判断所述血缘关系图谱中是否存在与所述表名对应的实体对象节点;Judging whether there is an entity object node corresponding to the table name in the blood relationship graph;
    若所述血缘关系图谱中存在与所述表名对应的实体对象节点,则将与所述表名对应的实体对象节点确定为待添加祖先节点的目标实体对象节点。If there is an entity object node corresponding to the table name in the blood relationship graph, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
  5. 如权利要求1-4中任一项所述的数据血缘分析方法,其中,所述为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象的步骤之后,还包括:The data blood relationship analysis method according to any one of claims 1 to 4, wherein the corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, wherein the ancestor node is used to represent After the step of transforming the entity object obtained from the business source table of the target entity object, it further includes:
    在所述目标血缘关系图谱中确定待分析的实体对象节点;Determine the entity object node to be analyzed in the target blood relationship graph;
    获取所述待分析的实体对象节点所关联的业务,并统计包含所述待分析的实体对象节点的血缘关系链的链条数;Acquiring the business associated with the entity object node to be analyzed, and counting the number of chains containing the blood relationship chain of the entity object node to be analyzed;
    将所述链条数分别与第一预设阈值和第二预设阈值进行比较,所述第一预设阈值大于所述第二预设阈值;Comparing the number of chains with a first preset threshold and a second preset threshold respectively, where the first preset threshold is greater than the second preset threshold;
    当所述链条数大于或等于所述第一预设阈值时,将所述待分析的实体对象节点所关联的业务标记为热门业务;When the number of chains is greater than or equal to the first preset threshold, mark the business associated with the entity object node to be analyzed as a hot business;
    当所述链条数小于或等于所述第二预设阈值时,将所述待分析的实体对象节点所关联的业务标记为冷门业务。When the number of chains is less than or equal to the second preset threshold, the service associated with the entity object node to be analyzed is marked as an unpopular service.
  6. 如权利要求1-4中任一项所述的数据血缘分析方法,其中,所述为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象的步骤之后,还包括:The data blood relationship analysis method according to any one of claims 1 to 4, wherein the corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, wherein the ancestor node is used to represent After the step of transforming the entity object obtained from the business source table of the target entity object, it further includes:
    通过预设的用户交互页面,接收基于所述目标血缘关系图谱的查询指令;Receiving a query instruction based on the target blood relationship graph through a preset user interaction page;
    根据所述查询指令,将所述目标数据血缘分析图谱发送至所述用户交互页面进行可视化展示。According to the query instruction, the blood relationship analysis graph of the target data is sent to the user interaction page for visual display.
  7. 如权利要求6所述的数据血缘分析方法,其中,所述根据所述查询指令,将所述目标数据血缘分析图谱发送至所述用户交互页面进行可视化展示的步骤之后,还包括:The data blood relationship analysis method according to claim 6, wherein after the step of sending the target data blood relationship analysis graph to the user interaction page for visual display according to the query instruction, the method further comprises:
    根据预设的接收频率,接收所述数据接入平台发送的业务源表与大数据表之间的映射关系;Receiving the mapping relationship between the service source table and the big data table sent by the data access platform according to a preset receiving frequency;
    判断所述映射关系是否存在更新,并检测所述大数据平台上是否执行了新的SQL语句;Judging whether the mapping relationship has been updated, and detecting whether a new SQL statement is executed on the big data platform;
    若所述映射关系存在更新,或所述大数据平台上执行了新的SQL语句,则对应更新所述目标血缘关系图谱。If there is an update of the mapping relationship, or a new SQL statement is executed on the big data platform, the target blood relationship graph is updated correspondingly.
  8. 一种数据血缘分析装置,其中,所述数据血缘分析装置包括:A data blood relationship analysis device, wherein the data blood relationship analysis device includes:
    获取模块,用于获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;The obtaining module is used to obtain the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
    转化模块,用于将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;A conversion module, configured to convert the input table and the output table into entity objects under a preset type system, respectively, and store the entity objects in a preset graphic database;
    构建模块,用于根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;A construction module, configured to construct a blood relationship graph between the entity objects in the graphic database according to the blood relationship;
    接收模块,用于接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;The receiving module is used to receive the mapping relationship between the service source table and the big data table sent by the data access platform, wherein the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to The big data table of the big data platform, and in the process of extracting business data, the mapping relationship between the business source table and the big data table is recorded;
    确定模块,用于根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目 标实体对象节点;The determining module is configured to determine the target entity object node of the ancestor node to be added in the blood relationship graph according to the mapping relationship;
    添加模块,用于为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。The adding module is used to add a corresponding ancestor node to the target entity object node to obtain the target blood relationship graph, wherein the ancestor node is used to represent the entity object transformed from the business source table of the target entity object.
  9. 一种数据血缘分析设备,其中,所述数据血缘分析设备包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;A data blood relationship analysis device, wherein the data blood relationship analysis device includes a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires;
    所述至少一个处理器调用所述存储器中的所述指令,以使得所述数据血缘分析设备执行如下所述的数据血缘分析方法的步骤:The at least one processor invokes the instructions in the memory, so that the data blood relationship analysis device executes the steps of the data blood relationship analysis method as described below:
    获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
    将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
    根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
    接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
    根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
    为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  10. 如权利要求9所述的数据血缘分析设备,其中,所述获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系,包括以下步骤:The data blood relationship analysis device according to claim 9, wherein said obtaining the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the relationship between the input table and the output table The blood relationship includes the following steps:
    通过预设的钩子程序,监听当前在大数据平台上执行的结构化查询语言SQL语句;Through the preset hook program, monitor the structured query language SQL statement currently executed on the big data platform;
    通过预设的语法解析器和词法解析器,对监听到的所述SQL语句进行解析,得到所述SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系。Through the preset syntax parser and lexical parser, the monitored SQL statement is parsed to obtain the input table, output table of the SQL statement, and the blood relationship between the input table and the output table .
  11. 如权利要求9所述的数据血缘分析设备,其中,所述根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱,包括以下步骤:9. The data blood relationship analysis device according to claim 9, wherein said constructing a blood relationship graph between said entity objects in said graphic database according to said blood relationship includes the following steps:
    调用预设的图处理引擎,通过所述图处理引擎在所述图形数据库中创建与所述输入表和所述输出表一一对应的实体对象节点;Calling a preset graph processing engine, and creating entity object nodes corresponding to the input table and the output table one-to-one in the graph database through the graph processing engine;
    根据所述血缘关系,在创建的所述实体对象节点之间添加有向边,生成所述实体对象之间的血缘关系图谱。According to the blood relationship, a directed edge is added between the created entity object nodes to generate a blood relationship graph between the entity objects.
  12. 如权利要求9所述的数据血缘分析设备,其中,所述根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点,包括以下步骤:The data blood relationship analysis device according to claim 9, wherein the determining the target entity object node of the ancestor node to be added in the blood relationship graph according to the mapping relationship comprises the following steps:
    获取所述映射关系中的大数据表的表名;Acquiring the table name of the big data table in the mapping relationship;
    判断所述血缘关系图谱中是否存在与所述表名对应的实体对象节点;Judging whether there is an entity object node corresponding to the table name in the blood relationship graph;
    若所述血缘关系图谱中存在与所述表名对应的实体对象节点,则将与所述表名对应的实体对象节点确定为待添加祖先节点的目标实体对象节点。If there is an entity object node corresponding to the table name in the blood relationship graph, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
  13. 如权利要求9-12中任一项所述的数据血缘分析设备,其中,所述为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象的步骤之后,还包括以下步骤:The data blood relationship analysis device according to any one of claims 9-12, wherein the corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, wherein the ancestor node is used to represent After the step of transforming the entity object obtained from the business source table of the target entity object, it further includes the following steps:
    在所述目标血缘关系图谱中确定待分析的实体对象节点;Determine the entity object node to be analyzed in the target blood relationship graph;
    获取所述待分析的实体对象节点所关联的业务,并统计包含所述待分析的实体对象节点的血缘关系链的链条数;Acquiring the business associated with the entity object node to be analyzed, and counting the number of chains containing the blood relationship chain of the entity object node to be analyzed;
    将所述链条数分别与第一预设阈值和第二预设阈值进行比较,所述第一预设阈值大于所述第二预设阈值;Comparing the number of chains with a first preset threshold and a second preset threshold respectively, where the first preset threshold is greater than the second preset threshold;
    当所述链条数大于或等于所述第一预设阈值时,将所述待分析的实体对象节点所关联的业务标记为热门业务;When the number of chains is greater than or equal to the first preset threshold, mark the business associated with the entity object node to be analyzed as a hot business;
    当所述链条数小于或等于所述第二预设阈值时,将所述待分析的实体对象节点所关联的业务标记为冷门业务。When the number of chains is less than or equal to the second preset threshold, the service associated with the entity object node to be analyzed is marked as an unpopular service.
  14. 如权利要求9-12中任一项所述的数据血缘分析设备,其中,所述为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象的步骤之后,还包括以下步骤:The data blood relationship analysis device according to any one of claims 9-12, wherein the corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, wherein the ancestor node is used to represent After the step of transforming the entity object obtained from the business source table of the target entity object, it further includes the following steps:
    通过预设的用户交互页面,接收基于所述目标血缘关系图谱的查询指令;Receiving a query instruction based on the target blood relationship graph through a preset user interaction page;
    根据所述查询指令,将所述目标数据血缘分析图谱发送至所述用户交互页面进行可视化展示。According to the query instruction, the blood relationship analysis graph of the target data is sent to the user interaction page for visual display.
  15. 如权利要求14所述的数据血缘分析设备,其中,所述根据所述查询指令,将所述目标数据血缘分析图谱发送至所述用户交互页面进行可视化展示的步骤之后,还包括以下步骤:The data blood relationship analysis device according to claim 14, wherein after the step of sending the target data blood relationship analysis graph to the user interaction page for visual display according to the query instruction, the method further comprises the following steps:
    根据预设的接收频率,接收所述数据接入平台发送的业务源表与大数据表之间的映射关系;Receiving the mapping relationship between the service source table and the big data table sent by the data access platform according to a preset receiving frequency;
    判断所述映射关系是否存在更新,并检测所述大数据平台上是否执行了新的SQL语句;Judging whether the mapping relationship has been updated, and detecting whether a new SQL statement is executed on the big data platform;
    若所述映射关系存在更新,或所述大数据平台上执行了新的SQL语句,则对应更新所述目标血缘关系图谱。If there is an update of the mapping relationship, or a new SQL statement is executed on the big data platform, the target blood relationship graph is updated correspondingly.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时执行如下所述的数据血缘分析方法的步骤:A computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein when the computer program is executed by a processor, the steps of the data blood relationship analysis method described below are executed:
    获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系;Acquiring the input table and output table of the structured query language SQL statement currently executed on the big data platform, and the blood relationship between the input table and the output table;
    将所述输入表和所述输出表分别转化为预设类型系统下的实体对象,将所述实体对象存储至预设的图形数据库中;Converting the input table and the output table into entity objects under a preset type system, respectively, and storing the entity objects in a preset graphic database;
    根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱;Constructing a graph of the blood relationship between the entity objects in the graphic database according to the blood relationship;
    接收数据接入平台发送的业务源表与大数据表之间的映射关系,其中,所述数据接入平台用于从关系型业务数据库的业务源表抽取业务数据转存至大数据平台的大数据表,并在抽取业务数据的过程中,记录所述业务源表与所述大数据表之间的映射关系;Receive the mapping relationship between the service source table and the big data table sent by the data access platform, where the data access platform is used to extract the service data from the service source table of the relational service database and transfer it to the big data platform. Data table, and in the process of extracting business data, record the mapping relationship between the business source table and the big data table;
    根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点;According to the mapping relationship, determine the target entity object node to which the ancestor node is to be added in the blood relationship graph;
    为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象。A corresponding ancestor node is added to the target entity object node to obtain a target blood relationship graph, where the ancestor node is used to represent the entity object converted from the business source table of the target entity object.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述数据血缘分析方法的计算机程序被所述处理器执行所述获取当前在大数据平台上执行的结构化查询语言SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系的步骤时,包括以下步骤:The computer-readable storage medium according to claim 16, wherein the computer program of the data blood relationship analysis method is executed by the processor to obtain the input table of the structured query language SQL statement currently executed on the big data platform , The output table, and the steps of the blood relationship between the input table and the output table include the following steps:
    通过预设的钩子程序,监听当前在大数据平台上执行的结构化查询语言SQL语句;Through the preset hook program, monitor the structured query language SQL statement currently executed on the big data platform;
    通过预设的语法解析器和词法解析器,对监听到的所述SQL语句进行解析,得到所述SQL语句的输入表、输出表,以及所述输入表和所述输出表之间的血缘关系。Through the preset syntax parser and lexical parser, the monitored SQL statement is parsed to obtain the input table, output table of the SQL statement, and the blood relationship between the input table and the output table .
  18. 如权利要求16所述的计算机可读存储介质,其中,所述数据血缘分析方法的计算机程序被所述处理器执行所述根据所述血缘关系,在所述图形数据库中构建所述实体对象之间的血缘关系图谱的步骤时,包括以下步骤:The computer-readable storage medium according to claim 16, wherein the computer program of the data blood relationship analysis method is executed by the processor, and the physical object is constructed in the graphic database according to the blood relationship. The steps of the blood relationship map include the following steps:
    调用预设的图处理引擎,通过所述图处理引擎在所述图形数据库中创建与所述输入表和所述输出表一一对应的实体对象节点;Calling a preset graph processing engine, and creating entity object nodes corresponding to the input table and the output table one-to-one in the graph database through the graph processing engine;
    根据所述血缘关系,在创建的所述实体对象节点之间添加有向边,生成所述实体对象之间的血缘关系图谱。According to the blood relationship, a directed edge is added between the created entity object nodes to generate a blood relationship graph between the entity objects.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述数据血缘分析方法的计算机程序被所述处理器执行所述根据所述映射关系,在所述血缘关系图谱中确定待添加祖先节点的目标实体对象节点的步骤时,包括以下步骤:The computer-readable storage medium according to claim 16, wherein the computer program of the data blood relationship analysis method is executed by the processor, and the ancestor node to be added is determined in the blood relationship graph according to the mapping relationship. The steps of the target entity object node include the following steps:
    获取所述映射关系中的大数据表的表名;Acquiring the table name of the big data table in the mapping relationship;
    判断所述血缘关系图谱中是否存在与所述表名对应的实体对象节点;Judging whether there is an entity object node corresponding to the table name in the blood relationship graph;
    若所述血缘关系图谱中存在与所述表名对应的实体对象节点,则将与所述表名对应的实体对象节点确定为待添加祖先节点的目标实体对象节点。If there is an entity object node corresponding to the table name in the blood relationship graph, the entity object node corresponding to the table name is determined as the target entity object node of the ancestor node to be added.
  20. 如权利要求16-19中任一项所述的计算机可读存储介质,其中,所述数据血缘分析方法的计算机程序被所述处理器执行所述为所述目标实体对象节点添加对应的祖先节点,得到目标血缘关系图谱,其中,所述祖先节点用于表示由所述目标实体对象的业务源表转化得到的实体对象的步骤之后,还包括以下步骤:The computer-readable storage medium according to any one of claims 16-19, wherein the computer program of the data blood relationship analysis method is executed by the processor, and the corresponding ancestor node is added to the target entity object node , After obtaining the target blood relationship graph, wherein the ancestor node is used to represent the entity object obtained from the business source table of the target entity object, and further includes the following steps:
    在所述目标血缘关系图谱中确定待分析的实体对象节点;Determine the entity object node to be analyzed in the target blood relationship graph;
    获取所述待分析的实体对象节点所关联的业务,并统计包含所述待分析的实体对象节点的血缘关系链的链条数;Acquiring the business associated with the entity object node to be analyzed, and counting the number of chains containing the blood relationship chain of the entity object node to be analyzed;
    将所述链条数分别与第一预设阈值和第二预设阈值进行比较,所述第一预设阈值大于所述第二预设阈值;Comparing the number of chains with a first preset threshold and a second preset threshold respectively, where the first preset threshold is greater than the second preset threshold;
    当所述链条数大于或等于所述第一预设阈值时,将所述待分析的实体对象节点所关联的业务标记为热门业务;When the number of chains is greater than or equal to the first preset threshold, mark the business associated with the entity object node to be analyzed as a hot business;
    当所述链条数小于或等于所述第二预设阈值时,将所述待分析的实体对象节点所关联的业务标记为冷门业务。When the number of chains is less than or equal to the second preset threshold, the service associated with the entity object node to be analyzed is marked as an unpopular service.
PCT/CN2020/118135 2020-04-28 2020-09-27 Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium WO2021218021A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010350107.4 2020-04-28
CN202010350107.4A CN111694858A (en) 2020-04-28 2020-04-28 Data blood margin analysis method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021218021A1 true WO2021218021A1 (en) 2021-11-04

Family

ID=72476738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118135 WO2021218021A1 (en) 2020-04-28 2020-09-27 Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN111694858A (en)
WO (1) WO2021218021A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328471A (en) * 2022-03-14 2022-04-12 杭州半云科技有限公司 Data model based on data virtualization engine and construction method thereof
CN114428822A (en) * 2022-01-27 2022-05-03 云启智慧科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114911785A (en) * 2022-05-16 2022-08-16 北京航空航天大学 Data blood reason management method and device and electronic equipment
CN116166718A (en) * 2023-04-25 2023-05-26 北京捷泰云际信息技术有限公司 Data blood margin acquisition method and device
CN116450908A (en) * 2023-06-19 2023-07-18 北京大数据先进技术研究院 Self-service data analysis method and device based on data lake and electronic equipment
CN116662308A (en) * 2023-07-28 2023-08-29 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform
CN117131477A (en) * 2023-08-14 2023-11-28 南昌大学 Full-link data tracing method based on local data blood-edge digital watermark
CN117273131A (en) * 2023-11-22 2023-12-22 四川三合力通科技发展集团有限公司 Cross-node data relationship discovery system and method
CN117555950A (en) * 2024-01-12 2024-02-13 山东再起数据科技有限公司 Data blood relationship construction method based on data center

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694858A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Data blood margin analysis method, device, equipment and computer readable storage medium
CN112100201B (en) * 2020-09-30 2024-02-06 东莞盟大集团有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112256720B (en) * 2020-10-21 2021-08-17 平安科技(深圳)有限公司 Data cost calculation method, system, computer device and storage medium
CN112328575A (en) * 2020-11-12 2021-02-05 杭州数梦工场科技有限公司 Data asset blood margin generation method and device and electronic equipment
CN112363713A (en) * 2020-11-30 2021-02-12 杭州玳数科技有限公司 Binding type SQL blood margin analysis data flow visualization interaction method
CN112540970A (en) * 2020-12-07 2021-03-23 航天信息股份有限公司 Metadata blood relationship analysis method and system based on version management
CN112434071B (en) * 2020-12-15 2021-07-20 北京三维天地科技股份有限公司 Metadata blood relationship and influence analysis platform based on data map
CN112685439B (en) * 2020-12-29 2023-09-22 上海豹云网络信息服务有限公司 Count making method, system, device and storage medium for wind control system
CN112634004B (en) * 2020-12-30 2023-10-13 中国农业银行股份有限公司 Method and system for analyzing blood-cause atlas of credit investigation data
CN112818015B (en) * 2021-01-21 2022-07-15 广州汇通国信科技有限公司 Data tracking method, system and storage medium based on data blood margin analysis
CN112860662B (en) * 2021-01-22 2023-10-17 平安科技(深圳)有限公司 Automatic production data blood relationship establishment method, device, computer equipment and storage medium
CN112749186B (en) * 2021-01-22 2024-02-09 广州虎牙科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN112800149B (en) * 2021-02-18 2023-08-08 浪潮云信息技术股份公司 Data treatment method and system based on data blood edge analysis
CN112989151A (en) * 2021-03-11 2021-06-18 北京锐安科技有限公司 Data blood relationship display method and device, electronic equipment and storage medium
CN113326261B (en) * 2021-04-29 2024-03-08 奇富数科(上海)科技有限公司 Data blood relationship extraction method and device and electronic equipment
CN113312410B (en) * 2021-06-10 2023-11-21 平安证券股份有限公司 Data map construction method, data query method and terminal equipment
CN113360720B (en) * 2021-06-24 2023-11-21 湖北华中电力科技开发有限责任公司 Data asset visualization method, device and equipment based on data blood relationship
CN113486008A (en) * 2021-06-30 2021-10-08 平安信托有限责任公司 Data blood margin analysis method, device, equipment and storage medium
CN113672674A (en) * 2021-07-15 2021-11-19 浙江大华技术股份有限公司 Method, electronic device and storage medium for automatically arranging service flow
CN113485715A (en) * 2021-07-30 2021-10-08 浙江大华技术股份有限公司 Code prompting method and system based on data center platform and data computing platform
CN114969819A (en) * 2022-06-02 2022-08-30 蚂蚁区块链科技(上海)有限公司 Data asset risk discovery method and device
CN116932656B (en) * 2023-09-18 2024-01-09 中孚安全技术有限公司 Data blood edge storage method, system, equipment and medium based on block chain
CN117238398A (en) * 2023-09-19 2023-12-15 昆仑数智科技有限责任公司 Method, device, equipment and readable storage medium for determining data blood relationship
CN117421462B (en) * 2023-12-18 2024-03-08 中信证券股份有限公司 Data processing method and device and electronic equipment
CN117688217A (en) * 2024-02-02 2024-03-12 北方健康医疗大数据科技有限公司 System, method and medium for realizing data blood relationship structure based on directed graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10025878B1 (en) * 2014-11-11 2018-07-17 Google Llc Data lineage analysis
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109582660A (en) * 2018-12-06 2019-04-05 深圳前海微众银行股份有限公司 Data consanguinity analysis method, apparatus, equipment, system and readable storage medium storing program for executing
CN109739894A (en) * 2019-01-04 2019-05-10 深圳前海微众银行股份有限公司 Supplement method, apparatus, equipment and the storage medium of metadata description
CN111694858A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Data blood margin analysis method, device, equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10025878B1 (en) * 2014-11-11 2018-07-17 Google Llc Data lineage analysis
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109582660A (en) * 2018-12-06 2019-04-05 深圳前海微众银行股份有限公司 Data consanguinity analysis method, apparatus, equipment, system and readable storage medium storing program for executing
CN109739894A (en) * 2019-01-04 2019-05-10 深圳前海微众银行股份有限公司 Supplement method, apparatus, equipment and the storage medium of metadata description
CN111694858A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Data blood margin analysis method, device, equipment and computer readable storage medium

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428822A (en) * 2022-01-27 2022-05-03 云启智慧科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114428822B (en) * 2022-01-27 2022-07-29 云启智慧科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114328471A (en) * 2022-03-14 2022-04-12 杭州半云科技有限公司 Data model based on data virtualization engine and construction method thereof
CN114911785A (en) * 2022-05-16 2022-08-16 北京航空航天大学 Data blood reason management method and device and electronic equipment
CN116166718A (en) * 2023-04-25 2023-05-26 北京捷泰云际信息技术有限公司 Data blood margin acquisition method and device
CN116166718B (en) * 2023-04-25 2023-07-14 北京捷泰云际信息技术有限公司 Data blood margin acquisition method and device
CN116450908A (en) * 2023-06-19 2023-07-18 北京大数据先进技术研究院 Self-service data analysis method and device based on data lake and electronic equipment
CN116450908B (en) * 2023-06-19 2023-10-03 北京大数据先进技术研究院 Self-service data analysis method and device based on data lake and electronic equipment
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform
CN116662308A (en) * 2023-07-28 2023-08-29 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116662308B (en) * 2023-07-28 2023-11-03 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN117131477A (en) * 2023-08-14 2023-11-28 南昌大学 Full-link data tracing method based on local data blood-edge digital watermark
CN117131477B (en) * 2023-08-14 2024-03-29 南昌大学 Full-link data tracing method based on local data blood-edge digital watermark
CN117273131A (en) * 2023-11-22 2023-12-22 四川三合力通科技发展集团有限公司 Cross-node data relationship discovery system and method
CN117273131B (en) * 2023-11-22 2024-02-13 四川三合力通科技发展集团有限公司 Cross-node data relationship discovery system and method
CN117555950A (en) * 2024-01-12 2024-02-13 山东再起数据科技有限公司 Data blood relationship construction method based on data center
CN117555950B (en) * 2024-01-12 2024-04-02 山东再起数据科技有限公司 Data blood relationship construction method based on data center

Also Published As

Publication number Publication date
CN111694858A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
WO2021218021A1 (en) Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium
US20210374169A1 (en) Hybrid structured/unstructured search and query system
WO2021088724A1 (en) Testing method and apparatus
US8204848B2 (en) Detecting and applying database schema changes to reports
US11941034B2 (en) Conversational database analysis
US20100017395A1 (en) Apparatus and methods for transforming relational queries into multi-dimensional queries
CN109902117B (en) Business system analysis method and device
CN113312191B (en) Data analysis method, device, equipment and storage medium
US20160110471A1 (en) Method and system of intelligent generation of structured data and object discovery from the web using text, images, video and other data
US20240061831A1 (en) Generating Object Morphisms During Object Search
CN115630105A (en) Micro-service architecture management method and system based on knowledge graph
CN114661832A (en) Multi-mode heterogeneous data storage method and system based on data quality
CN113326261B (en) Data blood relationship extraction method and device and electronic equipment
US7844601B2 (en) Quality of service feedback for technology-neutral data reporting
US11308104B2 (en) Knowledge graph-based lineage tracking
Cota High-quality knowledge graphs generation: R2RML and RML comparison, rules validation and inconsistency resolution
US20230393963A1 (en) Record-replay testing framework with machine learning based assertions
CN116795663B (en) Method for tracking and analyzing execution performance of trino engine
Castro et al. From XML to relational models
Zeng et al. Intelligent Recognition of Digital Archives of Application and Installation in Power Business Expanding Based On Image Recognition Technology
KR20230091749A (en) Mehtod and device for generating datamap
Hao et al. Power Data Traceability Mechanism Based on Data Processing Unit
Mirza Value name conflict while integrating data indatabase integration
Beckerle et al. Semantic Links Across Distributed Heterogeneous Data
CN114546768A (en) Multi-source heterogeneous log data processing method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933094

Country of ref document: EP

Kind code of ref document: A1