CN110727677A - Method and device for tracing blood relationship of table in data warehouse - Google Patents

Method and device for tracing blood relationship of table in data warehouse Download PDF

Info

Publication number
CN110727677A
CN110727677A CN201910890108.5A CN201910890108A CN110727677A CN 110727677 A CN110727677 A CN 110727677A CN 201910890108 A CN201910890108 A CN 201910890108A CN 110727677 A CN110727677 A CN 110727677A
Authority
CN
China
Prior art keywords
data warehouse
relationship
reading
blood
blood relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910890108.5A
Other languages
Chinese (zh)
Other versions
CN110727677B (en
Inventor
杨涵冰
吴豪
刘倩
万鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuhe Information Technology Co Ltd
Original Assignee
Shanghai Shuhe Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuhe Information Technology Co Ltd filed Critical Shanghai Shuhe Information Technology Co Ltd
Priority to CN201910890108.5A priority Critical patent/CN110727677B/en
Publication of CN110727677A publication Critical patent/CN110727677A/en
Application granted granted Critical
Publication of CN110727677B publication Critical patent/CN110727677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention provides a method and a device for tracing the blood relationship of a table in a data warehouse, wherein the method comprises the following steps: reading the generation mode of each table; analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table; and reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table. According to the method and the device for tracing the blood relationship of the table in the data warehouse, the complicated blood relationship does not need to be manually maintained, and errors are avoided; the existing data warehouse development process is not required to be modified, and the method is transparent to developers of the data warehouse.

Description

Method and device for tracing blood relationship of table in data warehouse
Technical Field
The invention relates to the field of computers, in particular to a method and a device for tracing blood relationship of tables in a data warehouse.
Background
When using a data warehouse, a user needs to know the source of the forms he uses. When updating data warehouse tables, the developer also needs to determine which tables upstream and downstream will be affected by the modification operation he does. A data warehouse relationship service is therefore needed to address these issues.
The traditional data warehouse blood relationship needs to depend on a data warehouse developer to manually maintain a table, the workload is extremely large, and the data warehouse needs to be manually updated along with the modification of the data warehouse, so that the mismatch and missing condition is easy to occur.
Disclosure of Invention
In order to solve the problems of low efficiency and high error probability of manual maintenance of the blood relationship of a data warehouse in the prior art, the invention provides a method and a device for tracing the blood relationship of a table in the data warehouse.
In a first aspect, the present invention provides a method for tracing blood relationship of a table in a data warehouse, the method comprising:
reading the generation mode of each table;
analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table;
and reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table.
Further, the method further comprises:
and drawing the upstream and downstream blood relationship of each table by using a rendering tool.
Further, reading the generation mode of each table includes:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
Further, reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table comprises:
and deleting the temporary tables in the database tables generated according to the data warehouse tool, and determining the relationship between the upstream and downstream blood relationship of each table.
In a second aspect, the present invention provides an apparatus for blood-related relationship tracing of tables in a data warehouse, the apparatus comprising:
the read table generation mode module is used for reading the generation mode of each table;
the table content analyzing module is used for analyzing the table content by using the corresponding analyzing tool according to the generating mode of each table;
and the blood relationship tracing module is used for reading the table contents by using the script language and determining the blood relationship between the upstream and the downstream of each table.
Further, the apparatus further comprises:
and the drawing module is used for drawing the relationship between the upstream and downstream blood margins of each table by using a rendering tool.
Further, the read table generation mode module includes:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
Further, the blood relationship tracing module comprises:
and the temporary table deleting unit is used for deleting the temporary tables in the database tables generated according to the data warehouse tool and determining the relationship between the blood relationship between the upstream and the downstream of each table.
In a third aspect, the present invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for blood relationship tracing of tables in a data warehouse provided in the first aspect.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for blood-related relationship tracing of tables within a data warehouse provided in the first aspect.
According to the method and the device for tracing the blood relationship of the table in the data warehouse, the complicated blood relationship does not need to be manually maintained, and errors are avoided; the existing data warehouse development process is not required to be modified, and the method is transparent to developers of the data warehouse.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for tracking blood relationship of a table in a data warehouse according to an embodiment of the present invention;
FIG. 2 is a block diagram of an apparatus for tracking blood-related relationships of tables in a data warehouse according to an embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The traditional data warehouse blood relationship needs to depend on a data warehouse developer to manually maintain a table, and at the moment, the manual update is carried out manually, so that the mismatch and missing match conditions are easy to occur. In order to solve the above problem, an embodiment of the present invention provides a method for tracing a blood relationship of a table in a data warehouse, as shown in fig. 1, the method includes:
step S101, reading the generation mode of each table;
step S102, analyzing table contents by using corresponding analysis tools according to the generation mode of each table;
step S103, reading the table content by using the script language, and determining the blood relationship between the upstream and the downstream of each table.
Specifically, the server reads tables on different platforms to obtain a generation manner of each table, for example, a table generated using Hive SQL or a table generated using Spark SQL. The Hive is a data warehouse tool based on Hadoop, can map the structured data file into a database table, and provides a query function similar to SQL. Spark is a unified computing engine for large-scale data processing, and is suitable for various scenes which originally need a plurality of different distributed platforms for processing.
And analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table. For example, for a table generated using Hive SQL: intercepting in the Hive query process through Hive Hook, reading a Hive SQL execution plan, and recording the read table and the written table into a MySQL table; hook is a mechanism that intercepts events, messages or function calls during processing. For another example, for a table generated using Spark SQL: and analyzing in advance in a script for executing Spark SQL, acquiring an execution plan by using an explicit function of Hive, and recording a read table and a written table into a MySQL table.
The table content is read by using a script language, for example, the records written in the table are read within 40 days by using Python script every day, then operation and duplication removal are carried out, the upstream and downstream blood relationship between the tables is obtained, namely where the data read by each table comes from and which tables the data of the table depends on, and the result is stored.
According to the method for tracing the blood relationship of the table in the data warehouse, the complicated blood relationship does not need to be manually maintained, and errors are avoided; the existing data warehouse development process is not required to be modified, and the method is transparent to developers of the data warehouse.
Based on the content of the above embodiments, as an alternative embodiment: the method further comprises the following steps:
and drawing the upstream and downstream blood relationship of each table by using a rendering tool.
Specifically, the complete blood relationship of each table is rendered and drawn through the visualization tool D3.
Based on the content of the above embodiments, as an alternative embodiment: the generation mode for reading each table comprises the following steps:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
Specifically, the reading of the tables generated by using different tools may be a database table generated by using a data warehouse tool, a database table generated by using a large-scale data processing calculation engine, and the like, and the embodiment of the present invention does not specifically limit the generation manner of the tables.
Based on the content of the above embodiments, as an alternative embodiment: reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table comprises the following steps:
and deleting the temporary tables in the database tables generated according to the data warehouse tool, and determining the relationship between the upstream and downstream blood relationship of each table.
Specifically, when a table name to be queried is input during interface query, the JS script obtains a complete upstream and downstream blood relationship of the table from a file generated by the Python script. Because Hive table generation often needs to establish a temporary table as an aid, the temporary table may affect the viewing of the blood relationship, and the temporary table needs to be hidden in the blood relationship. Therefore, the JS script additionally accesses metadata information of each table, so as to eliminate the deleted temporary table and finally determine the upstream and downstream blood relationship of each table.
According to a further aspect of the present invention, an apparatus for tracing the blood relationship of a table in a data warehouse is provided in an embodiment of the present invention, and referring to fig. 2, fig. 2 is a block diagram of the apparatus for tracing the blood relationship of a table in a data warehouse according to an embodiment of the present invention. The device is used for completing the blood relationship tracing of the table in the database provided by the embodiment of the invention in the previous embodiments. Therefore, the description and definition in the method for tracing the blood relationship of the table in the database provided by the embodiment of the present invention in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.
The device includes:
a read table generation mode module 201, configured to read generation modes of the tables;
the table content analyzing module 202 is used for analyzing the table content by using the corresponding analyzing tool according to the generation mode of each table;
and the blood relationship tracing module 203 is configured to read table contents by using a script language, and determine the blood relationship between the upstream and downstream of each table.
According to the device for tracing the blood relationship of the table in the data warehouse, the complicated blood relationship does not need to be manually maintained, and errors are avoided; the existing data warehouse development process is not required to be modified, and the method is transparent to developers of the data warehouse.
Based on the content of the above embodiments, as an alternative embodiment: the device also includes:
and the drawing module is used for drawing the relationship between the upstream and downstream blood margins of each table by using a rendering tool.
Based on the content of the above embodiments, as an alternative embodiment: the reading table generation mode module comprises:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
Based on the content of the above embodiments, as an alternative embodiment: the blood relationship tracing module comprises:
and the temporary table deleting unit is used for deleting the temporary tables in the database tables generated according to the data warehouse tool and determining the relationship between the blood relationship between the upstream and the downstream of each table.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device includes: a processor 301, a memory 302, and a bus 303;
the processor 301 and the memory 302 respectively complete communication with each other through a bus 303; the processor 301 is configured to call program instructions in the memory 302 to perform the method for tracing the blood relationship of the table in the data warehouse provided by the above embodiment, for example, including: reading the generation mode of each table; analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table; and reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps of a method for blood-related relationship tracing of tables within a data warehouse. Examples include: reading the generation mode of each table; analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table; and reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, the principle and the implementation of the present invention are explained by applying the specific embodiments in the present invention, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for blood-related relationship tracing of tables within a data warehouse, the method comprising:
reading the generation mode of each table;
analyzing the table contents by using the corresponding analysis tool according to the generation mode of each table;
and reading the table contents by using a script language, and determining the upstream and downstream blood relationship of each table.
2. The method of claim 1, further comprising:
and drawing the upstream and downstream blood relationship of each table by using a rendering tool.
3. The method of claim 1, wherein reading each table is generated in a manner comprising:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
4. The method of claim 3, wherein reading table contents by using a scripting language and determining the upstream and downstream blood relationship of each table comprises:
and deleting the temporary tables in the database tables generated according to the data warehouse tool, and determining the relationship between the upstream and downstream blood relationship of each table.
5. An apparatus for blood-related relationship tracing of tables within a data warehouse, the apparatus comprising:
the read table generation mode module is used for reading the generation mode of each table;
the table content analyzing module is used for analyzing the table content by using the corresponding analyzing tool according to the generation mode of each table;
and the blood relationship tracing module is used for reading the table contents by using the script language and determining the blood relationship between the upstream and the downstream of each table.
6. The apparatus of claim 5, further comprising:
and the drawing module is used for drawing the upstream and downstream blood relationship of each table by using a rendering tool.
7. The apparatus of claim 5, wherein the read table generation module comprises:
reading database tables generated using a data warehouse tool, reading database tables generated using a large scale data processing computing engine.
8. The apparatus of claim 7, wherein the blood relationship traceability module comprises:
and the temporary table deleting unit is used for deleting the temporary tables in the database tables generated according to the data warehouse tool and determining the relationship between the blood relationship between the upstream and the downstream of each table.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for blood-related relationship tracing of tables within a data warehouse according to any one of claims 1 to 4.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of a method for blood-related relationship tracing of tables within a data warehouse as claimed in any one of claims 1 to 4.
CN201910890108.5A 2019-09-19 2019-09-19 Method and device for tracing blood relationship of table in data warehouse Active CN110727677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910890108.5A CN110727677B (en) 2019-09-19 2019-09-19 Method and device for tracing blood relationship of table in data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910890108.5A CN110727677B (en) 2019-09-19 2019-09-19 Method and device for tracing blood relationship of table in data warehouse

Publications (2)

Publication Number Publication Date
CN110727677A true CN110727677A (en) 2020-01-24
CN110727677B CN110727677B (en) 2022-12-30

Family

ID=69219294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910890108.5A Active CN110727677B (en) 2019-09-19 2019-09-19 Method and device for tracing blood relationship of table in data warehouse

Country Status (1)

Country Link
CN (1) CN110727677B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060235836A1 (en) * 2005-04-14 2006-10-19 International Business Machines Corporation Query conditions on related model entities
WO2016192583A1 (en) * 2015-06-04 2016-12-08 阿里巴巴集团控股有限公司 Data processing method and device for data warehouse
CN107180053A (en) * 2016-03-11 2017-09-19 中国移动通信集团河北有限公司 A kind of data warehouse optimization method and device
CN107644073A (en) * 2017-09-18 2018-01-30 广东中标数据科技股份有限公司 A kind of field consanguinity analysis method, system and device based on depth-first traversal
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060235836A1 (en) * 2005-04-14 2006-10-19 International Business Machines Corporation Query conditions on related model entities
WO2016192583A1 (en) * 2015-06-04 2016-12-08 阿里巴巴集团控股有限公司 Data processing method and device for data warehouse
CN107180053A (en) * 2016-03-11 2017-09-19 中国移动通信集团河北有限公司 A kind of data warehouse optimization method and device
CN107644073A (en) * 2017-09-18 2018-01-30 广东中标数据科技股份有限公司 A kind of field consanguinity analysis method, system and device based on depth-first traversal
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
无西LC: "hive血缘关系之输入表与目标表的解析", 《HTTPS://WWW.CNBLOGS.COM/WUXILC/P/9326130.HTML》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment

Also Published As

Publication number Publication date
CN110727677B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
US10956422B2 (en) Integrating event processing with map-reduce
CN111026568B (en) Data and task relation construction method and device, computer equipment and storage medium
US20170212945A1 (en) Branchable graph databases
WO2019169725A1 (en) Test data generation method, device, and apparatus, and computer readable storage medium
CN111177113B (en) Data migration method, device, computer equipment and storage medium
US20230012642A1 (en) Method and device for snapshotting metadata, and storage medium
JP2018060570A (en) Reference data segmentation from single to multiple tables
CN111435367A (en) Knowledge graph construction method, system, equipment and storage medium
CN111125064B (en) Method and device for generating database schema definition statement
CN113018870A (en) Data processing method and device and computer readable storage medium
CN110727677B (en) Method and device for tracing blood relationship of table in data warehouse
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN113900944A (en) Logic verification method and device applied to Flink SQL
CN110704635B (en) Method and device for converting triplet data in knowledge graph
CN113760242A (en) Data processing method, device, server and medium
US11900269B2 (en) Method and apparatus for managing knowledge base, device and medium
US11487643B1 (en) Debugging for integrated scripting applications
CN116185389A (en) Code generation method and device, electronic equipment and medium
CN109977104B (en) Data management method and device
CN115470229B (en) Data table processing method and device, electronic equipment and storage medium
CN113642295B (en) Page typesetting method, device and computer program product
US11544451B2 (en) System and method for maintaining links and revisions
CN117234899A (en) Regression testing method, device, equipment and computer medium
CN117290393A (en) Database statement execution method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant