CN111125229A - Data blood margin generation method and device and electronic equipment - Google Patents
Data blood margin generation method and device and electronic equipment Download PDFInfo
- Publication number
- CN111125229A CN111125229A CN201911376186.XA CN201911376186A CN111125229A CN 111125229 A CN111125229 A CN 111125229A CN 201911376186 A CN201911376186 A CN 201911376186A CN 111125229 A CN111125229 A CN 111125229A
- Authority
- CN
- China
- Prior art keywords
- data
- source
- processing
- target
- identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000008280 blood Substances 0.000 title claims abstract description 120
- 210000004369 blood Anatomy 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000008569 process Effects 0.000 claims description 31
- 238000004891 communication Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 abstract description 4
- 238000003754 machining Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 6
- 239000003550 marker Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 210000001503 joint Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 244000286893 Aspalathus contaminatus Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A data blood margin generation method, a data blood margin generation device, an electronic device and a machine-readable storage medium are disclosed. In the application, source data is obtained from a butted service system and is stored locally; wherein the source data is table data based on a database; generating target data corresponding to the source data; the target data at least comprise a first blood margin identification uniquely representing the data source of each row of data of the target data, so that accurate construction of data blood margin relation based on row-level data is realized, and the data blood margin tracing efficiency and accuracy are improved.
Description
Technical Field
One or more embodiments of the present application relate to the field of computer application technologies, and in particular, to a data blood margin generation method, apparatus, electronic device, and machine-readable storage medium.
Background
Data Warehouse (DW or DWH) is a theme-oriented, integrated, time-varying, but relatively stable collection of Data itself. For example, in practical applications, data warehouses are often widely used in support of administrative decisions by enterprises, providing data sets of all types of data relevant to the administrative decisions.
The data warehouse mainly comprises four characteristics: "subject oriented", "integrated", "time varying", "data of the data warehouse is not updatable"; the term "subject-oriented" means that the data warehouse is based on a definite subject, only the data related to the subject is needed, and other irrelevant detail data are excluded; the integration is a process from data acquisition of different source data to generation of target data, and data processing is required to be performed based on an Extract-Transform-Load (ETL) technology; "time-varying" refers to implicit or explicit time-based changes in data; the term "data of the data warehouse is not updatable" means that after the data is loaded into the ETL (Load), only data query operation can be performed, and there is no conventional operation of adding, deleting and modifying the database.
The data of the data warehouse is based On Online Analytical Processing (OLAP), which reflects the content of historical data for a long time, is a collection of database snapshots at different time points, and is derived data by statistics, synthesis and recombination based On the snapshots, whereas the data of the traditional database is based On Online Transaction Processing (OLTP).
Disclosure of Invention
The application provides a data blood margin generation method, which is applied to a data warehouse management system and comprises the following steps:
acquiring source data from the butted service system and storing the source data locally; wherein the source data is table data based on a database;
generating target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
Optionally, the target data further includes an index identifier uniquely characterizing each line of data of the target data;
the generating target data corresponding to the source data comprises:
generating processing data of the source data; wherein the processing data is process data between the source data and the target data, and the processing data at least comprises a second blood margin identifier uniquely representing a data source of each row of data of the processing data;
and generating target data corresponding to the source data based on the index identification and the processing data.
Optionally, the generating the processing data of the source data includes:
generating first processing data of the source data; wherein the first processed data includes at least the source data, each line of data uniquely characterizing the processed data corresponding to a second blood margin identification from a data source of the source data;
generating second processing data of the first processing data; wherein the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data;
and iteratively generating the processing data of the second processing data until final third processing data is obtained.
Optionally, the generating target data corresponding to the source data based on the index identifier and the processing data includes:
and generating target data corresponding to the source data by taking the index identification and the processing data as table data of the target data.
Optionally, when data blood source tracing needs to be performed on the target data, the method further includes:
and constructing a data blood margin query instruction for the target data, and querying the target data based on the data blood margin query instruction to obtain the data blood margin of the target data traced back to the source data.
Optionally, the third processing data further includes an index identifier uniquely representing each row of data of the third processing data, where the index identifier of the third processing data is obtained by combining a table identifier based on the third processing data and a unique identifier generated by a unique identifier algorithm; the first blood margin mark points to an index mark of the third processed data.
Optionally, the source data further includes an index identifier uniquely representing each line of data of the source data, the first processing data further includes an index identifier uniquely representing each line of data of the first processing data, and the second processing data further includes an index identifier uniquely representing each line of data of the second processing data;
the second blood margin identifier points to the index identifier of the source data, or the second blood margin identifier points to the index identifier of the first processing data; alternatively, the second blood margin identifier points to an index identifier of the second processed data.
Optionally, the unique identification algorithm is a UUID algorithm or a hash algorithm.
The present application further provides a data blood margin generation device, which is applied to a data warehouse management system, the device includes:
the acquisition module acquires source data from the butted service system and stores the source data in the local; wherein the source data is table data based on a database;
a generation module that generates target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
Optionally, the target data further includes an index identifier uniquely characterizing each line of data of the target data;
the generation module further:
generating processing data of the source data; wherein the processing data is process data between the source data and the target data, and the processing data at least comprises a second blood margin identifier uniquely representing a data source of each row of data of the processing data;
and generating target data corresponding to the source data based on the index identification and the processing data.
Optionally, the generating module further:
generating first processing data of the source data; wherein the first processed data comprises at least the source data, each line of data uniquely characterizing the processed data corresponding to a second blood margin identification from a data source of the source data;
generating second processing data of the first processing data; wherein the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data;
and iteratively generating the processing data of the second processing data until final third processing data is obtained.
Optionally, the generating module further:
and generating target data corresponding to the source data by taking the index identification and the processing data as table data of the target data.
Optionally, when data blood source tracing needs to be performed on the target data, the method further includes:
and the source tracing module is used for constructing a data blood relationship query instruction for the target data, querying the target data based on the data blood relationship query instruction and obtaining the data blood relationship of the target data traced back to the source data.
Optionally, the third processing data further includes an index identifier uniquely representing each row of data of the third processing data, where the index identifier of the third processing data is obtained by combining a table identifier based on the third processing data and a unique identifier generated by a unique identifier algorithm; the first blood margin mark points to an index mark of the third processed data.
Optionally, the source data further includes an index identifier uniquely representing each line of data of the source data, the first processing data further includes an index identifier uniquely representing each line of data of the first processing data, and the second processing data further includes an index identifier uniquely representing each line of data of the second processing data;
the second blood margin identifier points to the index identifier of the source data, or the second blood margin identifier points to the index identifier of the first processing data; alternatively, the second blood margin identifier points to an index identifier of the second processed data.
Optionally, the unique identification algorithm is a UUID algorithm or a hash algorithm.
The application also provides an electronic device, which comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are mutually connected through the bus;
the memory stores machine-readable instructions, and the processor executes the method by calling the machine-readable instructions.
Through the embodiment, the source data is obtained from the butted service system and is stored locally; and generating target data which correspond to the source data and contain the blood margin identification, thereby realizing accurate construction of the data blood margin relation based on row-level data and improving the efficiency and accuracy of data blood margin tracing.
Drawings
FIG. 1 is a schematic diagram of an exemplary embodiment of an ETL data processing performed by a data warehouse management system;
FIG. 2 is a flow chart of a method for data consanguinity provided by an exemplary embodiment;
FIG. 3 is a hardware block diagram of an electronic device provided by an exemplary embodiment;
fig. 4 is a block diagram of a data blood margin generation apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present disclosure, the following briefly describes the related art of data generation related to the embodiments of the present disclosure.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating ETL data processing performed by a data warehousing management system according to an embodiment of the present disclosure.
As shown in fig. 1, the data warehousing management system is deployed in a cluster-based manner, and includes a control node and one or more working nodes managed by the control node, and performs task scheduling on the managed working nodes, and performs ETL data processing on source data by the working nodes to obtain target data after ETL data processing.
On the basis of the networking architecture shown above, the present application aims to provide a technical solution for generating a blood margin identifier characterizing a data source of source data for each line of data of target data, so as to implement data blood margin generation.
When the data warehouse management system is realized, the data warehouse management system acquires source data from the butted service system and stores the source data in the local; wherein the source data is table data based on a database; further, the data warehouse management system generates target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
In the scheme, source data are obtained from a butted service system and are stored locally; and generating target data which correspond to the source data and contain the blood margin identification, thereby realizing accurate construction of the data blood margin relation based on row-level data and improving the efficiency and accuracy of data blood margin tracing.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 2, fig. 2 is a flowchart of a data blood-source generating method applied to a data warehouse management system according to an embodiment of the present application, where the method performs the following steps:
In this specification, the data warehouse management system refers to a machine or a cluster of machines that perform ETL data processing on data.
For example, the data warehouse management system may be a machine or a machine cluster which includes a control node and a plurality of working nodes and can perform ETL data processing, as shown in fig. 1.
For ease of understanding, ETL data processing is briefly described herein. ETL, an abbreviation used in english Extract-Transform-Load, is used to describe the process of extracting (Extract), transposing (Transform), and loading (Load) data from a source end to a destination end. For example, in practical applications, based on ETL data processing, the data warehouse management system may extract data (e.g., various data tables in the source database) of the source database in the distributed and heterogeneous data sources to the temporary middle layer, perform data cleaning, conversion, and integration, and finally load the data into the target database of the data warehouse, thereby serving as a basis for online analysis and further data mining of the data warehouse.
In this specification, the business system refers to any business system interfacing with the data warehouse management system.
For example, in practical applications, the business system may include a cloud computing business system interfacing with the data warehouse management system; the business system can also be a big data business system which is in butt joint with the data warehouse management system; the business system can also be in butt joint with the data warehouse management system to form a safe business system.
In this specification, the source data refers to table data based on a database in the business system.
For example, in practical applications, the source data may include table data of one or more data tables based on a relational database (e.g., mysql, Pgsql).
In this specification, the data warehouse management system obtains the source data from the business system and stores the source data locally.
Taking the service system as a big data service system for example, the data warehouse management system acquires the source data from the big data service system and stores the source data in a local source database so as to perform subsequent ETL data processing on the source database; the source data stored locally in the data warehouse management system, for example, includes two data tables: source table ta, source table tb; wherein, the source table ta is shown in Table 1 below, and the source table tb is shown in Table 2 below.
id_card | name | age |
ID00001 | First of all | 20 |
ID00002 | Second step | 21 |
ID00003 | C3 | 22 |
ID00004 | T-shirt | 23 |
ID00001 | First of all | 20 |
TABLE 1
id_card | degree | graduation |
ID00001 | This section | 2018-06-06 |
ID00003 | Doctor (Rooibos) | 2018-06-07 |
TABLE 2
The top fields (id _ card, name, age) as shown in table 1, the top fields (id _ card, hierarchy, growing) as shown in table 2, are table fields for each table; each row of data other than the top fields as shown in tables 1 and 2 is a row of data for each table.
In this specification, the target data is data that is obtained by subjecting the source data to ETL data processing by the data warehouse management system and that has a data relationship with the source data.
For ease of understanding, the following data relationship is briefly introduced here. The data blood relationship refers to an inheritance relationship similar to the human social blood relationship formed between the finally obtained data and the source data from the data in the processes of generation, processing, circulation to extinction.
In this specification, the first blood-related identifier is an identifier included in the target data and uniquely indicating a data source of each line of data of the target data.
In one implementation, the first blood margin identifier may be a table field uniquely representing a data source of each row of data in a data table of the target data.
Continuing with the above example, as shown in source table ta and source table tb of FIG. 1 and FIG. 2, the target data obtained by ETL data processing performed by the data warehouse management system can be target table ta and target table tb; the relationships between target table ta and source table ta, and target table tb and source table tb are shown in Table 3 below:
source meter | Target table |
Source table ta | Target table ta ═ ETL processed source table ta + srckey |
Source table tb | Target table tb is ETL processed source table tb + srckey |
TABLE 3
As shown in Table 3, target table ta may include ETL processed source table ta and src key table fields; where srkey is the first blood-border identification that uniquely characterizes the data source for each row of data of target table ta. Similarly, target table tb can include the source table tb and srckey table fields after ETL processing; where srckey is the first consanguinity that uniquely characterizes the data source for each row of data for target table tb.
In one embodiment, the target data includes, in addition to the first blood-related identifier, an index identifier uniquely identifying each line of data of the target data.
Continuing with the example above, target table ta and target table tb, include the contents shown in Table 4 below:
target table | Contents of the target table |
Target table ta | Source table ta + srckey after rowkey + ETL processing |
Target table tb | Rowkey + ETL processed Source Table tb + srckey |
TABLE 4
As shown in table 4, the target table ta may include, in addition to the src key table field (first blood-border identifier), an index identifier rowkey uniquely characterizing each row of data of the target table ta. Similarly, target table tb can include an index identification rowkey uniquely characterizing each row of data of target table tb in addition to the srckey table field (first kindred identification).
In this specification, the processing data is process data in a process of performing ETL data processing on the source data by the data warehouse management system to obtain the target data.
For example, in practical applications, the data warehouse management system may perform ETL data processing on the source data to obtain corresponding processing data; further, ETL data processing may be performed on the processing data to obtain processing data processed by ETL data processing for multiple times.
In this specification, the processing data includes at least a second blood margin identifier uniquely identifying a data source of each row of data of the processing data; the second blood margin identifier is an identifier included in the processing data and uniquely representing a data source of each row of data of the processing data.
In one implementation, the second blood margin identifier may be a table field that uniquely identifies a data source of each row of data in a data table of the processing data.
For example, the processing data corresponding to source table ta is processing table ta, and the processing data corresponding to source table tb is processing table tb; the relationships of process table ta to source table ta and process table tb to source table tb are shown in Table 5 below:
source meter | Processing watch |
Source table ta | Processing table ta source table ta + srckey1 after ETL processing |
Source table tb | Processed table tb-ETL processed Source table tb + srckey1 |
TABLE 5
As shown in Table 5, the processing table ta may include the source table ta and srckey1 table fields after ETL processing; where srckey1 is the second consanguinity that uniquely characterizes the source of the data for each row of data in processing table ta. Similarly, processing table tb can include the table fields of source table tb and srckey1 after ETL processing; where srckey1 is the second consanguinity uniquely identifying the data source of each row of data in processing table tb.
In this specification, the data warehouse management system generates target data including a first blood margin marker corresponding to the source data.
In one embodiment, the data warehouse management system generates the processed data of the source data in a process of generating target data including a first blood margin marker corresponding to the source data.
For convenience of understanding, the data warehouse management system performs ETL data processing on the source data to obtain corresponding processing data; and, a process of performing iterative ETL data processing on the processing data to obtain corresponding target data is described in detail below with a specific embodiment.
In one embodiment, the data warehouse management system generates first processed data of the source data during generation of the processed data of the source data; the first processing data at least comprises the source data, and each line of data uniquely representing the processing data corresponds to a second blood margin identifier from a data source of the source data.
Continuing with the above example, the first processed data corresponding to source table ta is first processed table ta, and the first processed data corresponding to source table tb is first processed table tb; the relationships between first process table ta and source table ta, and first process table tb and source table tb are shown in Table 6 below:
source meter | First processing table |
Source table ta | First processing table ta source table ta + srckey1-a |
Source table tb | First processing table tb is source table tb + srckey1-A |
TABLE 6
As shown in Table 6, a first processing table ta may include ETL processed source table ta and srckey1-A table fields; where srckey1-A identifies the second blood margin where each row of data uniquely characterizing the first processing table ta corresponds to a data source from the source table ta. Similarly, first processing table tb may include the source table tb and srckey1-A table fields after ETL processing; where srckey1-A identifies the second blood margin for each row of data uniquely characterizing the first processing table tb that corresponds to the data source from source table tb.
In this specification, the data warehouse management system may generate second processing data of the first processing data; the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data.
Continuing the example following the above example, the second machining data corresponding to the first machining table ta (first machining data) is the second machining table ta, and the second machining data corresponding to the first machining table tb (first machining data) is the second machining table tb; the relationships between the second processing table ta and the first processing table ta, and between the second processing table tb and the first processing table tb are shown in the following table 7:
TABLE 7
As shown in Table 7, the second processing table ta may include the first processing table ta and the srckey1-B table fields; where srckey1-B is a second blood margin identifier that uniquely identifies each row of data of the second processing table ta as corresponding to the source of data from the first processing table ta. Similarly, second process table tb can include second process table tb and srckey1-B table fields; wherein srckey1-B is a second limbal identification that uniquely identifies each row of data of the second processing table tb corresponding to the data source from the first processing table tb.
In this specification, the data warehouse management system may further iteratively generate the processing data of the second processing data until a final third processing data is obtained; wherein the third processing data is corresponding processing data before ETL data processing is performed on the target data.
Continuing the example from the above example, the data warehouse management system may perform 1 or more iterations of ETL data processing on the second processed data until the final third processed data is obtained. For convenience of description and understanding, the data warehouse management system performs ETL data processing on the second processing data 1 more times until the processing data of the second processing data is obtained: the third processing data; the third processing data is corresponding processing data before ETL data processing is carried out on the target data;
wherein the third processing data corresponding to the second processing table ta (second processing data) is the third processing table ta, and the third processing data corresponding to the second processing table tb (second processing data) is the third processing table tb; the relationships between the third processing table ta and the second processing table ta, and between the third processing table tb and the second processing table tb are shown in the following Table 8:
TABLE 8
As shown in Table 8, the third processing table ta may include second processing table ta and srckey1-C table fields; where srckey1-C is the second blood-border identification that uniquely characterizes each row of the third processing table ta as corresponding to the data source from the second processing table ta. Similarly, third processing table tb can include second processing table tb and srckey1-C table fields; wherein srckey1-C is a second consanguinity that uniquely identifies each row of data of the third processing table tb as corresponding to a data source from the second processing table tb.
In an embodiment shown in the figure, the third processing data further includes an index identifier uniquely characterizing each line of the third processing data, and the index identifier of the third processing data is obtained by combining a table identifier based on the third processing data and a unique identifier generated by a unique identifier algorithm; wherein the table identifier is obtained by combining the table name of the third processing data and the unique identifier generated by the unique identifier algorithm; the unique identification algorithm is a UUID algorithm or a Hash algorithm; the first blood margin indicator points to an index indicator of the third processed data.
Continuing the example from the above example, the third processing data (e.g., the third processing table ta, the third processing table tb shown in Table 8) further includes an index identifier (e.g., the primary key of the third processing table ta, the primary key of the third processing table tb) uniquely identifying each row of data of the third processing data; the index identifiers (the primary key rowkey3 of the third processing table ta and the primary key rowkey3 of the third processing table tb) of the third processing data are obtained by combining table identifiers (such as the table name of the third processing table ta and the table name of the third processing table tb) based on the third processing data and unique identifiers (UIDn, UIDm; wherein, n and m are natural numbers) generated by a unique identification algorithm, such as: the main key corresponding to each row of data in the third processing table ta is "unique identifier UIDn of table name # of the third processing table ta"; the primary key corresponding to each line of data of the third processing table tb is "table name # unique identification UIDm of the third processing table tb".
As shown in Table 4 for each corresponding first blood margin identification in the target data (target table ta, target table tb): srckey indicates the index identifiers (primary key rowkey3 of third processing table ta and primary key rowkey3 of third processing table tb) of the third processing data (such as third processing table ta and third processing table tb shown in table 8), that is, the first blood margin identifiers corresponding to the target data (target table ta and target table tb): srckey stores a copy of the index flag having the same value as the index flag (primary key rowkey3 in third processing table ta, primary key rowkey3 in third processing table tb) of the third processing data (third processing table ta and third processing table tb shown in table 8).
In practical applications, the data warehouse management system may obtain the index identifier of the third processed data by combining the table identifier of the third processed data and the unique identifier generated by the unique identifier algorithm in another manner, and obtain a combination manner of the index identifier of the third processed data, which is not particularly limited in this specification.
In one embodiment, the source data further includes an index identifier uniquely identifying each row of data of the source data, the first processing data further includes an index identifier uniquely identifying each row of data of the first processing data, and the second processing data further includes an index identifier uniquely identifying each row of data of the second processing data.
In this case, the index identifier of each of the source data, the first processed data, and the second processed data may be a primary key of each of the source data, the first processed data, and the second processed data.
Continuing with the above example, the index of the source data (e.g., source table ta shown in Table 1, source table tb shown in Table 2) is identified as "id _ card" shown in Table 1, and "id _ card" shown in Table 2. Similarly, the index of the first processed data (e.g., first processing table ta and first processing table tb shown in table 6) is abbreviated as rowkey1, and the index of the second processed data (e.g., second processing table ta and second processing table tb shown in table 7) is abbreviated as rowkey 2. It should be noted that rowkey1 and rowkey2 may be generated in a manner similar to rowkey3, that is, based on: the table name # of the current table is the unique identifier generated by the unique identifier algorithm (for example, UUID algorithm or hash algorithm), and rowkey1 and rowkey2 are generated, which is not described herein again in detail.
In one embodiment, the second blood margin indicator points to an index indicator of the source data.
Continuing with the above example, the second blood margin identifiers corresponding to the first processing data (the first processing table ta, the first processing table tb) shown in table 6 are: src key1-a indicates the index identifiers (id _ card shown in table 1, id _ card shown in table 2) of the source data (e.g., source table ta and source table tb shown in table 6), i.e., the first blood margin identifiers corresponding to the first processed data (first processed table ta and first processed table tb): srckey1-A stores a copy of the index identifier having the same value as the index identifier (id _ card shown in Table 1, id _ card shown in Table 2) of the source data (source table ta, source table tb).
In another embodiment, the second blood margin indicator points to an index indicator of the first processed data.
Continuing with the example above, the second blood margin identifiers corresponding to the second processing data (the second processing table ta, the second processing table tb) shown in Table 7 are: srckey1-B indicates the index identifiers (primary key rowkey1 of first processing table ta, primary key rowkey1 of second processing table ta) of the first processing data (first processing table ta, first processing table tb shown in table 7), that is, the second edge identifiers corresponding to the second processing data (second processing table ta, second processing table tb): srckey1-B stores a copy of the index flag having the same value as the index flag (primary key rowkey1 of first processing table ta ) of the first processing data (first processing table ta, first processing table tb).
In yet another embodiment, the second blood margin indicator points to an index indicator of the second processed data.
Continuing the example from the above example, the second blood margin identifiers corresponding to the second processing data (the third processing table ta, the third processing table tb) shown in Table 8 are: srckey1-C respectively indicate the index identifiers (primary key rowkey2 of second processing table ta ) of the second processing data (such as second processing table ta and second processing table tb shown in table 8), that is, the second edge identifiers respectively corresponding to the third processing data (third processing table ta, third processing table tb): srckey1-C stores a copy of the index flag having the same value as the index flag (primary key rowkey2 of second processing table ta ) of the second processing data (second processing table ta, second processing table tb).
In an exemplary process of generating the processing data of the source data by the data warehouse management system described in tables 6 to 8, the data warehouse management system performs 3 ETL data processing on the source data: the source data- > the first processing data- > the second processing data- > the third processing data; wherein the third machining data is final machining data for obtaining the target data. In practical applications, the data warehouse management system performs ETL data processing on the source data to obtain processing times of the processing data, which is not specifically limited in this specification, and includes: the data warehouse management system may perform ETL data processing only on the source data for 1 time to obtain first processed data, where the first processed data is the target data; alternatively, the data warehouse management system may perform 2 or more than 3 ETL data processes on the source data.
In this specification, after the processed data of the source data is generated, the data warehouse management system generates target data corresponding to the source data based on the index markers and the processed data.
Continuing the example from the above example, the index identifier is, for example, rowkey as shown in table 4, the machining data is, for example, third machining data, and the third machining data includes: third processing table ta, third processing table tb as shown in Table 8; the data warehouse management system generates target data (target table ta and target table tb) corresponding to the source data (source table ta and source table tb) by using rowkey (index mark) and third processed data (third processed table ta and third processed table tb) as table data of the target data;
wherein target table ta and target table tb, including contents, are changed based on Table 4, see Table 9 below:
target table | Contents of the target table |
Target table ta | rowkey + third processing table ta + srckey |
Target table tb | rowkey + third processing Table tb + srckey |
TABLE 9
As shown in table 9, the target table ta includes: rowkey, third processing table ta, srckey; where rowkey is the index identifier uniquely indicating each row of data of target table ta, srckkey is the first blood-border identifier uniquely characterizing each row of data of target table ta from the data source of third processing table ta.
Similarly, target table tb includes: rowkey, third processing table tb, srckey; where rowkey is the index identifier uniquely identifying each row of data for target table tb and srckkey is the first blood margin identifier uniquely identifying the data source from the third processing table tb for each row of data for target table tb.
It should be noted that, in the exemplary process of generating the target data corresponding to the source data by the data warehouse management system described in correspondence with tables 6 to 9, the following data processing is performed: the source data- > first processing data- > second processing data- > third processing data- > target data to obtain target data; the target data obtained by substituting the source data, the first processed data, the second processed data, and the third processed data into the target data is shown in table 10 below:
watch 10
As shown in Table 10, srckey (first kindred identifier) points to rowkey3, srckey1-C (second kindred identifier) points to rowkey2, srckey1-B (second kindred identifier) points to rowkey1, and srckey1-A (second kindred identifier) points to the index identifiers (primary key of source table) of source tables (source table ta, source table tb).
It should be noted that, according to the above-described technical solution, the data warehouse management system generates the multi-level blood margin identifier through iteration, and can realize fast tracing and accurate tracing of the blood margin of the data (can trace the row-level data of the source data).
In an embodiment, after the target data is generated, when data blooding source tracing needs to be performed on the target data, the data warehouse management system constructs a data blooding source query instruction for the target data, queries the target data based on the data blooding source query instruction, and obtains a data blooding source of the target data traced back to the source data.
Continuing the example from the above example, when data blooding tracing is desired for target data (target table ta, target table tb as shown in Table 10), such as: tracing a data source of a certain row of data in the target table ta or the target table tb (for example, after several ETL data processing, which processing tables exist in processing data in a processing process, and the row of data corresponds to a target table and a target row from source data), the data warehouse management system may construct a data blood-edge query instruction for the target data based on SQL (Structured query language);
the data blood margin query instruction may include blood margin identifiers corresponding to-be-performed data blood margin tracing data in the target data, such as:
the limbal marker is a multi-level limbal marker of srckey, srckey1-C, srckey1-B, srckey1-A as shown in Table 10.
Furthermore, the data warehouse management system can trace back the data from the target data to the processing data to the source data layer by layer through the multi-stage blood margin identifier until the data is traced back to the target data of the source data; the data blood relationship tracing sequence is as follows:
srckey- > rowkey3- > srckey1-C- > rowkey2- > srckey1-B- > srckey1-a- > index identification of the source table (primary bond of the source table).
It should be noted that the type of the database in which the data warehouse management system executes the SQL-based data lineage query instruction is not specifically limited in this specification.
In the technical scheme, source data are obtained from a butted service system and are stored locally; and generating target data which correspond to the source data and contain the blood margin identification, thereby realizing accurate construction of the data blood margin relation based on row-level data and improving the efficiency and accuracy of data blood margin tracing.
Corresponding to the above method embodiments, the present application also provides embodiments of a data-based blood-margin generation apparatus.
In accordance with the above method embodiments, the present specification also provides embodiments of a data margin generation apparatus. The embodiments of the data blood margin generation apparatus of the present specification can be applied to electronic devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 3, the hardware structure diagram of the electronic device in which the data blood margin generating apparatus of the present specification is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to the actual function of the electronic device, which is not described again.
Fig. 4 is a block diagram of a data-vein generating apparatus according to an exemplary embodiment of the present disclosure.
Referring to fig. 4, the data blood margin generation apparatus 40 may be applied to the electronic device shown in fig. 3, and the apparatus is applied to a data warehouse management system, and the apparatus includes:
an obtaining module 401, which obtains source data from the docked service system and stores the source data locally; wherein the source data is table data based on a database;
a generating module 402 for generating target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
In this embodiment, the target data further includes an index identifier uniquely representing each row of data of the target data;
the generation module 402 further:
generating processing data of the source data; wherein the processing data is process data between the source data and the target data, and the processing data at least comprises a second blood margin identifier uniquely representing a data source of each row of data of the processing data;
and generating target data corresponding to the source data based on the index identification and the processing data.
In this embodiment, the generating module 402 further:
generating first processing data of the source data; wherein the first processed data includes at least the source data, each line of data uniquely characterizing the processed data corresponding to a second blood margin identification from a data source of the source data;
generating second processing data of the first processing data; wherein the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data;
and iteratively generating the processing data of the second processing data until final third processing data is obtained.
In this embodiment, the generating module 402 further:
and generating target data corresponding to the source data by taking the index identification and the processing data as table data of the target data.
In this embodiment, when data blood source tracing needs to be performed on the target data, the method further includes:
the source tracing module 403 is configured to construct a data blood relationship query instruction for the target data, and query the target data based on the data blood relationship query instruction to obtain a data blood relationship of the target data traced back to the source data.
In this embodiment, the third processing data further includes an index identifier uniquely representing each row of data of the third processing data, and the index identifier of the third processing data is obtained by combining a table identifier based on the third processing data and a unique identifier generated by a unique identifier algorithm; the first blood margin mark points to an index mark of the third processed data.
In this embodiment, the source data further includes an index identifier uniquely representing each line of data of the source data, the first processing data further includes an index identifier uniquely representing each line of data of the first processing data, and the second processing data further includes an index identifier uniquely representing each line of data of the second processing data;
the second blood margin identifier points to the index identifier of the source data, or the second blood margin identifier points to the index identifier of the first processing data; alternatively, the second blood margin identifier points to an index identifier of the second processed data.
In this embodiment, the unique identification algorithm is a UUID algorithm or a hash algorithm.
The apparatuses, modules or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by an article with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
Corresponding to the method embodiment, the present specification also provides an embodiment of an electronic device. The electronic equipment can be applied to a data warehouse management system; the electronic device includes: a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.
In this embodiment, the processor is caused to:
acquiring source data from the butted service system and storing the source data locally; wherein the source data is table data based on a database;
generating target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
In this embodiment, the target data further includes an index identifier uniquely representing each row of data of the target data; by reading and executing machine-executable instructions stored by the memory corresponding to control logic for data lineage generation, the processor is caused to:
generating processing data of the source data; wherein the processing data is process data between the source data and the target data, and the processing data at least comprises a second blood margin identifier uniquely representing a data source of each row of data of the processing data;
and generating target data corresponding to the source data based on the index identification and the processing data.
In this embodiment, the processor is caused to:
generating first processing data of the source data; wherein the first processed data includes at least the source data, each line of data uniquely characterizing the processed data corresponding to a second blood margin identification from a data source of the source data;
generating second processing data of the first processing data; wherein the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data;
and iteratively generating the processing data of the second processing data until final third processing data is obtained.
In this embodiment, the processor is caused to:
and generating target data corresponding to the source data by taking the index identification and the processing data as table data of the target data.
In this embodiment, when data lineage tracing of the target data is required, the processor is caused to, by reading and executing machine executable instructions stored in the memory and corresponding to control logic for data lineage generation:
and constructing a data blood margin query instruction for the target data, and querying the target data based on the data blood margin query instruction to obtain the data blood margin of the target data traced back to the source data.
In this embodiment, the third processing data further includes an index identifier uniquely representing each row of data of the third processing data, and the index identifier of the third processing data is obtained by combining a table identifier based on the third processing data and a unique identifier generated by a unique identifier algorithm; the first blood margin mark points to an index mark of the third processed data.
In this embodiment, the source data further includes an index identifier uniquely representing each line of data of the source data, the first processing data further includes an index identifier uniquely representing each line of data of the first processing data, and the second processing data further includes an index identifier uniquely representing each line of data of the second processing data;
the second blood margin identifier points to the index identifier of the source data, or the second blood margin identifier points to the index identifier of the first processing data; alternatively, the second blood margin identifier points to an index identifier of the second processed data.
In this embodiment, the unique identification algorithm is a UUID algorithm or a hash algorithm.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
Claims (11)
1. A data lineage generation method applied to a data warehouse management system, the method comprising:
acquiring source data from the butted service system and storing the source data locally; wherein the source data is table data based on a database;
generating target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
2. The method of claim 1, the target data further comprising an index identification for each row of data that uniquely characterizes the target data;
the generating target data corresponding to the source data comprises:
generating processing data of the source data; wherein the processing data is process data between the source data and the target data, and the processing data at least comprises a second blood margin identifier uniquely representing a data source of each row of data of the processing data;
and generating target data corresponding to the source data based on the index identification and the processing data.
3. The method of claim 2, the generating the processed data of the source data comprising:
generating first processing data of the source data; wherein the first processed data includes at least the source data, each line of data uniquely characterizing the processed data corresponding to a second blood margin identification from a data source of the source data;
generating second processing data of the first processing data; wherein the second processing data at least comprises the first processing data, and each line of data uniquely representing the second processing data corresponds to a second blood margin identifier from a data source of the first processing data;
and iteratively generating the processing data of the second processing data until final third processing data is obtained.
4. The method of claim 2, said generating target data corresponding to said source data based on said index identification and said process data, comprising:
and generating target data corresponding to the source data by taking the index identification and the processing data as table data of the target data.
5. The method of claim 1, when data blooding tracing of the target data is required, further comprising:
and constructing a data blood margin query instruction for the target data, and querying the target data based on the data blood margin query instruction to obtain the data blood margin of the target data traced back to the source data.
6. The method of claim 3, the third process data further comprising an index identifier uniquely characterizing each row of data of the third process data, the index identifier of the third process data being derived from a unique identifier combination generated based on a table identifier and a unique identifier algorithm of the third process data; the first blood margin mark points to an index mark of the third processed data.
7. The method of claim 3, the source data further comprising an index identifier uniquely characterizing each row of data of the source data, the first process data further comprising an index identifier uniquely characterizing each row of data of the first process data, the second process data further comprising an index identifier uniquely characterizing each row of data of the second process data;
the second blood margin identifier points to the index identifier of the source data, or the second blood margin identifier points to the index identifier of the first processing data; alternatively, the second blood margin identifier points to an index identifier of the second processed data.
8. The method of claim 6 or 7, the unique identification algorithm being a UUID algorithm or a hash algorithm.
9. A data blood margin generation apparatus, the apparatus being applied to a data warehouse management system, the apparatus comprising:
the acquisition module acquires source data from the butted service system and stores the source data in the local; wherein the source data is table data based on a database;
a generation module that generates target data corresponding to the source data; wherein the target data comprises at least a first blood margin identifier uniquely characterizing a data source of each line of data of the target data.
10. An electronic device comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;
the memory has stored therein machine-readable instructions, the processor executing the method of any of claims 1 to 8 by calling the machine-readable instructions.
11. A machine readable storage medium having stored thereon machine readable instructions which, when invoked and executed by a processor, carry out the method of any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911376186.XA CN111125229B (en) | 2019-12-24 | 2019-12-24 | Data blood edge generation method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911376186.XA CN111125229B (en) | 2019-12-24 | 2019-12-24 | Data blood edge generation method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111125229A true CN111125229A (en) | 2020-05-08 |
CN111125229B CN111125229B (en) | 2024-06-28 |
Family
ID=70503890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911376186.XA Active CN111125229B (en) | 2019-12-24 | 2019-12-24 | Data blood edge generation method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111125229B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666326A (en) * | 2020-05-29 | 2020-09-15 | 中国工商银行股份有限公司 | ETL scheduling method and device |
CN112115315A (en) * | 2020-09-25 | 2020-12-22 | 平安国际智慧城市科技股份有限公司 | Blood relationship data query method and device, computer equipment and storage medium |
CN112328575A (en) * | 2020-11-12 | 2021-02-05 | 杭州数梦工场科技有限公司 | Data asset blood margin generation method and device and electronic equipment |
CN112463978A (en) * | 2020-11-13 | 2021-03-09 | 上海逸迅信息科技有限公司 | Method and device for generating data blood relationship |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN114064640A (en) * | 2021-11-09 | 2022-02-18 | 珠海市新德汇信息技术有限公司 | Blood relationship construction method, storage medium and equipment applied to data tracing |
CN114490627A (en) * | 2020-10-27 | 2022-05-13 | 杭州数梦工场科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120011096A1 (en) * | 2010-07-08 | 2012-01-12 | Oracle International Corporation | Efficiently updating rows in a data warehouse |
CN103902653A (en) * | 2014-02-28 | 2014-07-02 | 珠海多玩信息技术有限公司 | Method and device for creating data warehouse table blood relationship graph |
CN108694195A (en) * | 2017-04-10 | 2018-10-23 | 腾讯科技(深圳)有限公司 | A kind of management method and system of Distributed Data Warehouse |
CN108846039A (en) * | 2018-05-29 | 2018-11-20 | 新华三大数据技术有限公司 | Data flow determines method and device |
CN109241026A (en) * | 2018-07-18 | 2019-01-18 | 阿里巴巴集团控股有限公司 | The method, apparatus and system of data management |
CN109299073A (en) * | 2018-10-19 | 2019-02-01 | 杭州数梦工场科技有限公司 | A kind of generation method, system, electronic equipment and the storage medium of data blood relationship |
CN109669981A (en) * | 2018-12-21 | 2019-04-23 | 成都四方伟业软件股份有限公司 | Data relationship management method, device, data relationship acquisition methods and storage medium |
CN109739894A (en) * | 2019-01-04 | 2019-05-10 | 深圳前海微众银行股份有限公司 | Supplement method, apparatus, equipment and the storage medium of metadata description |
US20190171735A1 (en) * | 2017-12-01 | 2019-06-06 | Salesforce.Com, Inc. | Data resolution system for management of distributed data |
CN110019182A (en) * | 2017-08-15 | 2019-07-16 | 华为技术有限公司 | A kind of data traceability method and device |
-
2019
- 2019-12-24 CN CN201911376186.XA patent/CN111125229B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120011096A1 (en) * | 2010-07-08 | 2012-01-12 | Oracle International Corporation | Efficiently updating rows in a data warehouse |
CN103902653A (en) * | 2014-02-28 | 2014-07-02 | 珠海多玩信息技术有限公司 | Method and device for creating data warehouse table blood relationship graph |
CN108694195A (en) * | 2017-04-10 | 2018-10-23 | 腾讯科技(深圳)有限公司 | A kind of management method and system of Distributed Data Warehouse |
CN110019182A (en) * | 2017-08-15 | 2019-07-16 | 华为技术有限公司 | A kind of data traceability method and device |
US20190171735A1 (en) * | 2017-12-01 | 2019-06-06 | Salesforce.Com, Inc. | Data resolution system for management of distributed data |
CN108846039A (en) * | 2018-05-29 | 2018-11-20 | 新华三大数据技术有限公司 | Data flow determines method and device |
CN109241026A (en) * | 2018-07-18 | 2019-01-18 | 阿里巴巴集团控股有限公司 | The method, apparatus and system of data management |
CN109299073A (en) * | 2018-10-19 | 2019-02-01 | 杭州数梦工场科技有限公司 | A kind of generation method, system, electronic equipment and the storage medium of data blood relationship |
CN109669981A (en) * | 2018-12-21 | 2019-04-23 | 成都四方伟业软件股份有限公司 | Data relationship management method, device, data relationship acquisition methods and storage medium |
CN109739894A (en) * | 2019-01-04 | 2019-05-10 | 深圳前海微众银行股份有限公司 | Supplement method, apparatus, equipment and the storage medium of metadata description |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666326A (en) * | 2020-05-29 | 2020-09-15 | 中国工商银行股份有限公司 | ETL scheduling method and device |
CN112115315A (en) * | 2020-09-25 | 2020-12-22 | 平安国际智慧城市科技股份有限公司 | Blood relationship data query method and device, computer equipment and storage medium |
CN114490627A (en) * | 2020-10-27 | 2022-05-13 | 杭州数梦工场科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112328575A (en) * | 2020-11-12 | 2021-02-05 | 杭州数梦工场科技有限公司 | Data asset blood margin generation method and device and electronic equipment |
CN112463978A (en) * | 2020-11-13 | 2021-03-09 | 上海逸迅信息科技有限公司 | Method and device for generating data blood relationship |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN112817984B (en) * | 2021-02-22 | 2023-10-20 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source acquisition method and device |
CN114064640A (en) * | 2021-11-09 | 2022-02-18 | 珠海市新德汇信息技术有限公司 | Blood relationship construction method, storage medium and equipment applied to data tracing |
Also Published As
Publication number | Publication date |
---|---|
CN111125229B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111125229B (en) | Data blood edge generation method and device and electronic equipment | |
US20200327107A1 (en) | Data Processing Method, Apparatus, and System | |
US10318551B2 (en) | Reporting and summarizing metrics in sparse relationships on an OLTP database | |
US9047333B2 (en) | Dynamic updates to a semantic database using fine-grain locking | |
US20100235344A1 (en) | Mechanism for utilizing partitioning pruning techniques for xml indexes | |
US11194840B2 (en) | Incremental clustering for enterprise knowledge graph | |
CN109885614B (en) | Data synchronization method and device | |
CN107665255B (en) | Method, device, equipment and storage medium for key value database data change | |
US10311093B2 (en) | Entity resolution from documents | |
CN109522332A (en) | Customer profile data merging method, device, equipment and readable storage medium storing program for executing | |
US9390111B2 (en) | Database insert with deferred materialization | |
US20110264703A1 (en) | Importing Tree Structure | |
CN104423982A (en) | Request processing method and device | |
US20180357330A1 (en) | Compound indexes for graph databases | |
CN106557307A (en) | The processing method and processing system of business datum | |
Camacho-Rodríguez et al. | Building large XML stores in the Amazon cloud | |
CN113704248B (en) | Block chain query optimization method based on external index | |
CN113407565B (en) | Cross-database data query method, device and equipment | |
CN111046106A (en) | Cache data synchronization method, device, equipment and medium | |
US11080332B1 (en) | Flexible indexing for graph databases | |
CN112328575A (en) | Data asset blood margin generation method and device and electronic equipment | |
CN111767267A (en) | Metadata processing method and device and electronic equipment | |
CN116089417A (en) | Information acquisition method, information acquisition device, storage medium and computer equipment | |
CN115114297A (en) | Data lightweight storage and search method and device, electronic equipment and storage medium | |
CN114356945A (en) | Data processing method, data processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |