CN103902653A - Method and device for creating data warehouse table blood relationship graph - Google Patents

Method and device for creating data warehouse table blood relationship graph Download PDF

Info

Publication number
CN103902653A
CN103902653A CN201410072773.0A CN201410072773A CN103902653A CN 103902653 A CN103902653 A CN 103902653A CN 201410072773 A CN201410072773 A CN 201410072773A CN 103902653 A CN103902653 A CN 103902653A
Authority
CN
China
Prior art keywords
data warehouse
action statement
name
statement
table name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410072773.0A
Other languages
Chinese (zh)
Other versions
CN103902653B (en
Inventor
陈武
刘超洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd
Original Assignee
ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd filed Critical ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd
Priority to CN201410072773.0A priority Critical patent/CN103902653B/en
Publication of CN103902653A publication Critical patent/CN103902653A/en
Application granted granted Critical
Publication of CN103902653B publication Critical patent/CN103902653B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and device for creating a data warehouse table blood relationship graph and belongs to the field of computers. The method comprises the steps of analyzing each data warehouse operational statement which has access to a data warehouse to obtain the name of a data warehouse destination table to which each data warehouse operational statement has access, storing the corresponding relationship between the statement identification of each data warehouse operational statement and the name of the corresponding data warehouse destination table in a corresponding relationship table, obtaining the name of a data warehouse source table corresponding to each data warehouse destination table in the corresponding relationship table, and creating the data warehouse table blood relationship graph according to the name of each data warehouse destination table and the name of the data warehouse source table corresponding to each data warehouse destination table. The device comprises an analyzing module, a first storage module, a first obtaining module and a creating module. According to the method and device, a server can automatically create the data warehouse table blood relationship graph.

Description

A kind of method and apparatus that builds data warehouse table genetic connection figure
Technical field
The present invention relates to computer realm, particularly a kind of method and apparatus that builds data warehouse table genetic connection figure.
Background technology
In data warehouse, store various business datums, different business datums is stored in different traffic table.Therefore, store multiple traffic table in data warehouse, how multiple traffic table of storing in data warehouse being built into data warehouse table genetic connection figure is problem in the urgent need to address.
At present, be all data warehouse management personnel resolution data warehouse action statement and build data warehouse table genetic connection figure.And data warehouse management personnel are while building data warehouse table genetic connection figure, easily make mistakes; And the business datum amount in data warehouse is very large, cause data warehouse management personnel's workload large.
Summary of the invention
In order to solve the problem of prior art, the invention provides a kind of method and apparatus that builds data warehouse table genetic connection figure.Described technical scheme is as follows:
On the one hand, provide a kind of method that builds data warehouse table genetic connection figure, described method comprises:
Resolve each data warehouse action statement in visit data warehouse, obtain the table name of the data warehouse object table of described each data warehouse action statement access;
The statement mark of described each data warehouse action statement is stored in mapping table with the corresponding relation of the table name of the data warehouse object table of access;
According to described mapping table, obtain the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown;
According to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, build data warehouse table genetic connection figure.
Further, each data warehouse action statement in described parsing visit data warehouse, obtains the table name of the data warehouse object table of described each data warehouse action statement access, comprising:
Resolve each data warehouse action statement in described visit data warehouse, obtain the access mode corresponding to each data warehouse action statement in described visit data warehouse;
Obtaining access mode is the data warehouse action statement of WriteMode;
Resolving described access mode is the data warehouse action statement of WriteMode, and obtaining described access mode is the table name of all data warehouse object tables of the access of WriteMode.
Further, each data warehouse action statement in described parsing visit data warehouse, after obtaining the table name of data warehouse object table of described each data warehouse action statement access, described method also comprises:
Obtain task type and be the data warehouse action statement of lead-in type and corresponding lead-in path with it;
Obtaining task type according to described lead-in path is analysis type and the data warehouse action statement with described lead-in path;
Data warehouse action statement and the described task type of binding described task type and be lead-in type are analysis type and the data warehouse action statement with described lead-in path.
Further, described according to described mapping table, obtain the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown, comprising:
For every record in described mapping table, obtain the statement mark of the data warehouse action statement of storing in described record and the table name of data warehouse object table;
Obtain data warehouse action statement according to the described statement mark of obtaining;
The data warehouse action statement of obtaining described in parsing, obtains the table name of data warehouse source corresponding to described each data warehouse object table table.
Further, describedly build data warehouse table genetic connection figure according to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, comprising:
In data warehouse table genetic connection figure, build the node corresponding to table name of described data warehouse object table, and build the node corresponding to table name of the data warehouse source table that described data warehouse object table is corresponding;
Child node using the node corresponding table name of described data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to described data warehouse object table.
Further, after node corresponding to the table name of the described data warehouse object of described structure table, described method also comprises:
The data warehouse action statement of the described data warehouse object table of access is stored in the node that the table name of described data warehouse object table is corresponding;
Described data warehouse table genetic connection figure is sent to terminal, be shown to user by described terminal.
On the other hand, the invention provides a kind of device that builds data warehouse table genetic connection figure, described device comprises:
Parsing module, for resolving each data warehouse action statement in visit data warehouse, obtains the table name of the data warehouse object table of described each data warehouse action statement access;
The first memory module, for being stored in mapping table by the statement mark of described each data warehouse action statement with the corresponding relation of the table name of the data warehouse object table of access;
The first acquisition module, for according to described mapping table, obtains the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown;
Build module, for according to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, build data warehouse table genetic connection figure.
Further, described parsing module, comprising:
The first resolution unit, for resolving each data warehouse action statement in described visit data warehouse, obtains the access mode corresponding to each data warehouse action statement in described visit data warehouse;
Acquiring unit, for obtaining the data warehouse action statement that access mode is WriteMode;
The second resolution unit, for resolving the data warehouse action statement that described access mode is WriteMode, obtain described access mode the table name of all data warehouse object tables of the access that is WriteMode.
Further, described device also comprises:
The second acquisition module is the data warehouse action statement of lead-in type and corresponding lead-in path with it for obtaining task type;
The 3rd acquisition module is analysis type and the data warehouse action statement with described lead-in path for obtain task type according to described lead-in path;
Binding module is analysis type and the data warehouse action statement with described lead-in path for binding data warehouse action statement and the described task type that described task type is lead-in type.
Further, described the first acquisition module, comprising:
The first acquiring unit, for every record for described mapping table, obtains the statement mark of the data warehouse action statement of storing in described record and the table name of data warehouse object table;
Second acquisition unit, obtains data warehouse action statement for the statement mark of obtaining described in basis;
The 3rd resolution unit, for the data warehouse action statement of obtaining described in resolving, obtains the table name of the data warehouse source table that described each data warehouse object table is corresponding.
Further, described structure module, comprising:
Construction unit, at data warehouse table genetic connection figure, builds the node corresponding to table name of described data warehouse object table, and builds the node corresponding to table name of the data warehouse source table that described data warehouse object table is corresponding;
As unit, for the child node using the node corresponding table name of described data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to described data warehouse object table.
Further, described device also comprises:
The second memory module, for being stored in the data warehouse action statement of the described data warehouse object table of access the node corresponding to table name of described data warehouse object table;
Sending module, for described data warehouse table genetic connection figure is sent to terminal, is shown to user by described terminal.
In embodiments of the present invention, each data warehouse action statement in server parses visit data warehouse, obtain the table name of storehouse, the source data warehouse table that the table name of data warehouse object table of each data warehouse action statement access and each data warehouse object table are corresponding, and according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, automatically build data warehouse table genetic connection figure, reduce labor workload, and, improved the speed and the accuracy that build data warehouse table genetic connection.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is a kind of method flow diagram that builds data warehouse table genetic connection figure that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of method flow diagram that builds data warehouse table genetic connection figure that the embodiment of the present invention 2 provides;
Fig. 3 is a kind of apparatus structure schematic diagram that builds data warehouse table genetic connection figure that the embodiment of the present invention 3 provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment 1
The embodiment of the present invention provides a kind of method that builds data warehouse table genetic connection figure.Referring to Fig. 1, wherein, the method comprises:
Step 101: resolve each data warehouse action statement in visit data warehouse, obtain the table name of the data warehouse object table of each data warehouse action statement access;
Step 102: statement mark and the corresponding relation of the table name of the data warehouse object table of access of each data warehouse action statement are stored in mapping table;
Step 103: according to mapping table, obtain the table name that data warehouse source corresponding to each data warehouse object table in mapping table shown;
Step 104: according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, build data warehouse table genetic connection figure.
In embodiments of the present invention, each data warehouse action statement in server parses visit data warehouse, obtain the table name of storehouse, the source data warehouse table that the table name of data warehouse object table of each data warehouse action statement access and each data warehouse object table are corresponding, and according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, automatically build data warehouse table genetic connection figure, reduce labor workload, and, improved the speed and the accuracy that build data warehouse table genetic connection.
Embodiment 2
The embodiment of the present invention provides a kind of method that builds data warehouse table genetic connection figure.Referring to Fig. 2 wherein, the method comprises:
Step 201: server obtains the data warehouse action statement in the visit data warehouse of each service point from data warehouse;
Particularly, server, according to type of service, obtains the data warehouse action statement in the visit data warehouse of each service point that belongs to same type of service from data warehouse, and is each data warehouse action statement allocate statement mark.
Wherein, in server, store the corresponding relation of the data warehouse action statement in type of service and visit data warehouse, can, from the corresponding relation of the data warehouse action statement in type of service and visit data warehouse, obtain the data warehouse action statement in the visit data warehouse of each service point corresponding with type of service according to type of service.
Wherein, it should be noted that, be all by the data warehouse action statement in visit data warehouse, the data in data warehouse to be operated in data warehouse.The data warehouse action statement in visit data warehouse can be:
insert?overwrite?table?hive_table_b
Select*
From?hive_table_a
……
Step 202: each data warehouse action statement in server parses visit data warehouse, obtains the table name of the data warehouse object table of each data warehouse action statement access;
Particularly, each data warehouse action statement in server parses visit data warehouse, obtain the access mode corresponding to each data warehouse action statement in visit data warehouse, this access mode comprises read mode and WriteMode, and WriteMode comprises and writes data to data warehouse and write data to local file.Server obtains each data warehouse action statement that access mode is WriteMode, and the each data warehouse action statement that is WriteMode by the first matching regular expressions rule parsing access mode, search out the table name of the data warehouse object table of each data warehouse action statement access.
Wherein, the first matching regular expressions rule is as follows:
(1), at character " insert into " below, and " (" above one does not comprise blank character and the " character string of (";
(2) " (" above one does not comprise blank character and the " character string of (", at character " replace into " below, and at character;
(3), do not comprise the character string of blank character below one of character " insert overwrite table ";
(4), do not comprise the character string of blank character below one of character " overwrite into table ".
Wherein, it should be noted that, above rule is concurrency relation, if arbitrary rule establishment, and in embodiments of the present invention, all characters are not distinguished alphabet size and are write.
Wherein, the first regular expression code is:
(?i)insert\\s+into\\s+([^\\s\\(]+)\\s*\\(|(?i)replace\\s+into\\s+([^\\s\\(]+)\\s*\\(|(?i)i?nsert\\s+overwrite\\s+table\\s+(\\S+)\\s+|(?i)overwrite\\s+into\\s+table\\s+(\\S+)\\s+。
Wherein, it should be noted that, in data warehouse, store various business datums, server is by being loaded on the business datum of these magnanimity levels in data warehouse hive layer source table, again according to different business, hive layer source table is carried out to different extractions, cleaning and conversion, obtain various hive layer service tables.Server is again based on these hive layer service tables, and according to the service needed of thinner layer, the thinner layer background carrying out many times splits, and obtains more hive layer service table, facilitates different business diagnosis statistics.And server can also, according to different business, carry out the business datum in hive layer service table after analytic statistics, import in mysql layer service table, do web page display.Some special circumstances, can also do further and extract and calculate the business datum of mysql layer service table, with the more convenient web page display that does.
Wherein, can search out the table name of mysql layer by above-mentioned matching regular expressions rule step (1) and (2), step (3) and (4) can search out the table name of hive layer.
Step 203: each data warehouse action statement in server parses visit data warehouse, obtains the task type of each data warehouse action statement in visit data warehouse;
Wherein, task type comprises: analyze, import and extract.Hive layer service splits and the analysis of hive layer service belongs to analysis type; Business datum in hive layer service table imports mysql layer service table and belongs to lead-in type; The business datum of mysql layer service table extracts to calculate and belongs to extraction type.
Step 204: it is the data warehouse action statement of lead-in type and corresponding lead-in path with it that server obtains task type;
Wherein, if when task type is lead-in type, corresponding relation that can storage data warehouse action statement and lead-in path in server, can obtain the data warehouse action statement of lead-in type and the lead-in path of correspondence with it according to task type.
For example, the data warehouse action statement that task type is lead-in type is:
Insert?into?mysql_table_a(key1,col1,col2)
values(:key1,:col1,:col2)
ON?DUPLICATE?KEY?UPDATE
col1=values(col1),
col2=values(col2);
The lead-in path corresponding with above-mentioned data base manipulation statement is "/path1/path2/ ".
Further, server reads lead-in path for all content of text under "/path1/path2/ " catalogue, carries out following action statement:
Insert?into?mysql_table_a(key1,col1,col2)
values(:key1,:col1,:col2)
ON?DUPLICATE?KEY?UPDATE
col1=values(col1),
col2=values(col2)。
Step 205: it is analysis type and the data warehouse action statement with this lead-in path that server obtains task type according to this lead-in path;
Particularly, server obtains the data warehouse action statement that task type is analysis type, and the data warehouse action statement that is analysis type from the task type obtaining, obtain with step 204 in the data warehouse action statement of lead-in type there is the data warehouse action statement of identical lead-in path.
Wherein, the data warehouse object table of the data warehouse action statement that task type is lead-in type, can at least two data warehouse action statement of correspondence.One of them data warehouse action statement is that current task type is the data warehouse action statement of lead-in type, and other data warehouse action statement is the data warehouse action statement with the task type data warehouse action statement that is lead-in type with the analysis type of identical lead-in path.
For example, task type be analysis type and and the data warehouse action statement of the lead-in type data warehouse action statement with identical lead-in path be:
insert?overwrite?local?directory‘/path1/path2/path3/’
select?count(1)
from?hive_table_a
Step 206: the data warehouse action statement that server binding task type is lead-in type and task type are analysis type and the data warehouse action statement with this lead-in path;
Wherein, the data warehouse action statement that server binding task type is lead-in type and task type are analysis type and the data warehouse action statement with the lead-in path in step 204, thereby can set up the incidence relation of mysql layer table name and hive layer table name.
Step 207: server is stored in the statement mark of each data warehouse action statement in mapping table with the corresponding relation of the table name of the data warehouse object table of access;
The statement mark of the data warehouse action statement that particularly, server is lead-in type by the data warehouse object table of access and the task type corresponding with data warehouse object table and task type be analysis type and and the data warehouse action statement of the lead-in type statement with the data warehouse action statement of identical lead-in path identify and be stored in mapping table.
Wherein, server is stored in the statement mark of each data warehouse action statement in mapping table with the corresponding relation of the table name of the data warehouse object table of access, and server can obtain the visit data warehouse action statement at the table name place of the data warehouse object table of access according to the statement mark of data warehouse action statement.
Step 208: server, according to mapping table, obtains the table name that data warehouse source corresponding to each data warehouse object table in mapping table shown;
Wherein, step 208 specifically can comprise the steps (1) to (3):
(1), for every record in mapping table, server obtains the statement mark of the data warehouse action statement of storing in record and the table name of data warehouse object table;
Particularly, for every record in mapping table, server obtains the statement mark of the data warehouse action statement of storing in every record in mapping table and the table name of data warehouse object table successively.
(2), server obtains data warehouse action statement according to the statement mark of obtaining;
Wherein, in server, store the corresponding relation of statement mark and data warehouse action statement, can from the corresponding relation of statement mark and data warehouse action statement, obtain with statement and identify corresponding data warehouse action statement according to statement mark.
(3), the data warehouse action statement obtained of server parses, obtain the table name of the data warehouse source table that each data warehouse object table is corresponding.
Particularly, the data warehouse action statement that server obtains by the second matching regular expressions rule parsing, searches out the table name of the source database warehouse table that data warehouse object table that the data warehouse action statement obtained comprises is corresponding.
Wherein, the second regular expression code is as follows:
"(?i)\\s+"+table+"(\\s+|$|;)"。
Wherein, in embodiments of the present invention, server judges the table name that whether comprises data warehouse object table in the data warehouse action statement of obtaining, if comprised, the table name that in the data warehouse action statement of obtaining, parsing obtains is the table name of the data warehouse source table that data warehouse object table is corresponding.
In the time that the task type of the action statement of data warehouse is analysis type, server only need to mate hive layer table name; When the task type of the action statement of data warehouse is lead-in type or while extracting type, server only need to mate mysql layer table name.
Further, server can be kept at the table name of the table name of each data warehouse object table and data warehouse source corresponding to each data warehouse object table table in the second mapping table, according to the content in the second mapping table, can build complete database genetic connection figure.
Step 209: server, according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, builds data warehouse table genetic connection figure;
Wherein, step 209 specifically can comprise the following steps (1) to (2):
(1), server in data warehouse table genetic connection figure, build the node corresponding to table name of data warehouse object table, and build the node corresponding to table name of the data warehouse source table that data warehouse object table is corresponding;
(2), the child node of server using the node corresponding table name of data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to data warehouse object table.
Step 210: server is stored in the data warehouse action statement of visit data warehouse object table in the node that the table name of data warehouse object table is corresponding;
Wherein, in data warehouse table genetic connection figure, server is stored in the data warehouse action statement of visit data warehouse object table in the node that the table name of data warehouse object table is corresponding, and server just can obtain the data warehouse action statement of visit data warehouse object table in node corresponding to the table name of data warehouse object table.
Step 211: data warehouse genetic connection figure is sent to terminal by server;
Step 212: the data warehouse genetic connection figure that terminal reception server sends, and data warehouse genetic connection figure is shown to user.
In embodiments of the present invention, each data warehouse action statement in server parses visit data warehouse, obtain the table name of storehouse, the source data warehouse table that the table name of data warehouse object table of each data warehouse action statement access and each data warehouse object table are corresponding, and according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, automatically build data warehouse table genetic connection figure, reduce labor workload, and, improved the speed and the accuracy that build data warehouse table genetic connection.
Embodiment 3
The embodiment of the present invention provides a kind of device that builds data warehouse table genetic connection figure.Referring to Fig. 3, wherein, this device comprises:
Parsing module 301, for resolving each data warehouse action statement in visit data warehouse, obtains the table name of the data warehouse object table of each data warehouse action statement access;
The first memory module 302, for being stored in mapping table by the statement mark of each data warehouse action statement with the corresponding relation of the table name of the data warehouse object table of access;
The first acquisition module 303, for according to mapping table, obtains the table name that data warehouse source corresponding to each data warehouse object table in mapping table shown;
Build module 304, for according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, build data warehouse table genetic connection figure.
Further, parsing module 301, comprising:
The first resolution unit, for resolving each data warehouse action statement in visit data warehouse, obtains the access mode corresponding to each data warehouse action statement in visit data warehouse;
Acquiring unit, for obtaining the data warehouse action statement that access mode is WriteMode;
The second resolution unit, for resolving the data warehouse action statement that access mode is WriteMode, obtain access mode the table name of all data warehouse object tables of the access that is WriteMode.
Further, this device also comprises:
The second acquisition module is the data warehouse action statement of lead-in type and corresponding lead-in path with it for obtaining task type;
The 3rd acquisition module is analysis type and the data warehouse action statement with this lead-in path for obtain task type according to this lead-in path;
The data warehouse action statement that binding module is lead-in type for binding task type and task type are analysis type and the data warehouse action statement with lead-in path.
Further, the first acquisition module 303, comprising:
The first acquiring unit, for every record for mapping table, obtains the statement mark of the data warehouse action statement of storing in record and the table name of data warehouse object table;
Second acquisition unit, for obtaining data warehouse action statement according to the statement mark of obtaining;
The 3rd resolution unit, for resolving the data warehouse action statement of obtaining, obtains each data warehouse object table.
Further, build module 304, comprising:
Construction unit, at data warehouse table genetic connection figure, builds the node corresponding to table name of data warehouse object table, and builds the node corresponding to table name of the data warehouse source table that data warehouse object table is corresponding;
As unit, for the child node using the node corresponding table name of data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to data warehouse object table.
Further, this device also comprises:
The second memory module, for being stored in the data warehouse action statement of visit data warehouse object table the node corresponding to table name of data warehouse object table;
Sending module, for data warehouse table genetic connection figure is sent to terminal, is shown to user by terminal.
In embodiments of the present invention, each data warehouse action statement in server parses visit data warehouse, obtain the table name of storehouse, the source data warehouse table that the table name of data warehouse object table of each data warehouse action statement access and each data warehouse object table are corresponding, and according to the table name of the table name of each data warehouse object table and data warehouse corresponding to each data warehouse object table source table, automatically build data warehouse table genetic connection figure, reduce labor workload, and, improved the speed and the accuracy that build data warehouse table genetic connection.
It should be noted that: the device of the structure data warehouse table genetic connection figure that above-described embodiment provides is in the time building data warehouse table genetic connection figure, only be illustrated with the division of above-mentioned each functional module, in practical application, can above-mentioned functions be distributed and completed by different functional modules as required, be divided into different functional modules by the inner structure of device, to complete all or part of function described above.In addition, the device of the structure data warehouse table genetic connection figure that above-described embodiment provides belongs to same design with the embodiment of the method that builds data warehouse table genetic connection figure, and its specific implementation process refers to embodiment of the method, repeats no more here.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can carry out the hardware that instruction is relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. a method that builds data warehouse table genetic connection figure, is characterized in that, described method comprises:
Resolve each data warehouse action statement in visit data warehouse, obtain the table name of the data warehouse object table of described each data warehouse action statement access;
The statement mark of described each data warehouse action statement is stored in mapping table with the corresponding relation of the table name of the data warehouse object table of access;
According to described mapping table, obtain the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown;
According to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, build data warehouse table genetic connection figure.
2. the method for claim 1, is characterized in that, each data warehouse action statement in described parsing visit data warehouse obtains the table name of the data warehouse object table of described each data warehouse action statement access, comprising:
Resolve each data warehouse action statement in described visit data warehouse, obtain the access mode corresponding to each data warehouse action statement in described visit data warehouse;
Obtaining access mode is the data warehouse action statement of WriteMode;
Resolving described access mode is the data warehouse action statement of WriteMode, and obtaining described access mode is the table name of all data warehouse object tables of the access of WriteMode.
3. the method for claim 1, is characterized in that, each data warehouse action statement in described parsing visit data warehouse, and after obtaining the table name of data warehouse object table of described each data warehouse action statement access, described method also comprises:
Obtain task type and be the data warehouse action statement of lead-in type and corresponding lead-in path with it;
Obtaining task type according to described lead-in path is analysis type and the data warehouse action statement with described lead-in path;
Data warehouse action statement and the described task type of binding described task type and be lead-in type are analysis type and the data warehouse action statement with described lead-in path.
4. the method for claim 1, is characterized in that, described according to described mapping table, obtains the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown, and comprising:
For every record in described mapping table, obtain the statement mark of the data warehouse action statement of storing in described record and the table name of data warehouse object table;
Obtain data warehouse action statement according to the described statement mark of obtaining;
The data warehouse action statement of obtaining described in parsing, obtains the table name of data warehouse source corresponding to described each data warehouse object table table.
5. the method for claim 1, is characterized in that, describedly builds data warehouse table genetic connection figure according to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, comprising:
In data warehouse table genetic connection figure, build the node corresponding to table name of described data warehouse object table, and build the node corresponding to table name of the data warehouse source table that described data warehouse object table is corresponding;
Child node using the node corresponding table name of described data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to described data warehouse object table.
6. method as claimed in claim 5, is characterized in that, after node corresponding to the table name of the described data warehouse object of described structure table, described method also comprises:
The data warehouse action statement of the described data warehouse object table of access is stored in the node that the table name of described data warehouse object table is corresponding;
Described data warehouse table genetic connection figure is sent to terminal, be shown to user by described terminal.
7. a device that builds data warehouse table genetic connection figure, is characterized in that, described device comprises:
Parsing module, for resolving each data warehouse action statement in visit data warehouse, obtains the table name of the data warehouse object table of described each data warehouse action statement access;
The first memory module, for being stored in mapping table by the statement mark of described each data warehouse action statement with the corresponding relation of the table name of the data warehouse object table of access;
The first acquisition module, for according to described mapping table, obtains the table name that data warehouse source corresponding to each data warehouse object table in described mapping table shown;
Build module, for according to the table name of the table name of each data warehouse object table and data warehouse corresponding to described each data warehouse object table source table, build data warehouse table genetic connection figure.
8. device as claimed in claim 7, is characterized in that, described parsing module, comprising:
The first resolution unit, for resolving each data warehouse action statement in described visit data warehouse, obtains the access mode corresponding to each data warehouse action statement in described visit data warehouse;
Acquiring unit, for obtaining the data warehouse action statement that access mode is WriteMode;
The second resolution unit, for resolving the data warehouse action statement that described access mode is WriteMode, obtain described access mode the table name of all data warehouse object tables of the access that is WriteMode.
9. device as claimed in claim 7, is characterized in that, described device also comprises:
The second acquisition module is the data warehouse action statement of lead-in type and corresponding lead-in path with it for obtaining task type;
The 3rd acquisition module is analysis type and the data warehouse action statement with described lead-in path for obtain task type according to described lead-in path;
Binding module is analysis type and the data warehouse action statement with described lead-in path for binding data warehouse action statement and the described task type that described task type is lead-in type.
10. device as claimed in claim 7, is characterized in that, described the first acquisition module, comprising:
The first acquiring unit, for every record for described mapping table, obtains the statement mark of the data warehouse action statement of storing in described record and the table name of data warehouse object table;
Second acquisition unit, obtains data warehouse action statement for the statement mark of obtaining described in basis;
The 3rd resolution unit, for the data warehouse action statement of obtaining described in resolving, obtains the table name of the data warehouse source table that described each data warehouse object table is corresponding.
11. devices as claimed in claim 7, is characterized in that, described structure module, comprising:
Construction unit, at data warehouse table genetic connection figure, builds the node corresponding to table name of described data warehouse object table, and builds the node corresponding to table name of the data warehouse source table that described data warehouse object table is corresponding;
As unit, for the child node using the node corresponding table name of described data warehouse object table as node corresponding to the table name of data warehouse source table corresponding to described data warehouse object table.
12. devices as claimed in claim 11, is characterized in that, described device also comprises:
The second memory module, for being stored in the data warehouse action statement of the described data warehouse object table of access the node corresponding to table name of described data warehouse object table;
Sending module, for described data warehouse table genetic connection figure is sent to terminal, is shown to user by described terminal.
CN201410072773.0A 2014-02-28 2014-02-28 A kind of method and apparatus for building data warehouse table genetic connection figure Active CN103902653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410072773.0A CN103902653B (en) 2014-02-28 2014-02-28 A kind of method and apparatus for building data warehouse table genetic connection figure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410072773.0A CN103902653B (en) 2014-02-28 2014-02-28 A kind of method and apparatus for building data warehouse table genetic connection figure

Publications (2)

Publication Number Publication Date
CN103902653A true CN103902653A (en) 2014-07-02
CN103902653B CN103902653B (en) 2017-08-01

Family

ID=50993976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410072773.0A Active CN103902653B (en) 2014-02-28 2014-02-28 A kind of method and apparatus for building data warehouse table genetic connection figure

Country Status (1)

Country Link
CN (1) CN103902653B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915390A (en) * 2015-05-25 2015-09-16 广州精点计算机科技有限公司 ETL data lineage query system and query method
CN105868521A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Data information processing method and apparatus
CN106997369A (en) * 2016-01-26 2017-08-01 阿里巴巴集团控股有限公司 Data clearing method and device
CN107239458A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and device of development object relation is calculated based on big data
CN108038248A (en) * 2017-12-28 2018-05-15 携程计算机技术(上海)有限公司 ETL relies on automatic identifying method and system
CN108132957A (en) * 2016-12-01 2018-06-08 中国移动通信有限公司研究院 A kind of data base processing method and device
CN109614432A (en) * 2018-12-05 2019-04-12 北京百分点信息科技有限公司 A kind of system and method for the acquisition data genetic connection based on syntactic analysis
CN109669981A (en) * 2018-12-21 2019-04-23 成都四方伟业软件股份有限公司 Data relationship management method, device, data relationship acquisition methods and storage medium
CN109857818A (en) * 2019-02-03 2019-06-07 北京字节跳动网络技术有限公司 Determine method, apparatus, storage medium and the electronic equipment of the relations of production
CN109947998A (en) * 2017-12-20 2019-06-28 Sap欧洲公司 The calculating data lineage of network across heterogeneous system
CN110008291A (en) * 2019-04-10 2019-07-12 北京字节跳动网络技术有限公司 Data early warning method, device, storage medium and electronic equipment
CN110019315A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and apparatus for the parsing of data blood relationship
CN110019384A (en) * 2017-08-15 2019-07-16 阿里巴巴集团控股有限公司 A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110795509A (en) * 2019-09-29 2020-02-14 北京淇瑀信息科技有限公司 Method and device for constructing index blood relationship graph of data warehouse and electronic equipment
CN111125229A (en) * 2019-12-24 2020-05-08 杭州数梦工场科技有限公司 Data blood margin generation method and device and electronic equipment
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment
CN111782738A (en) * 2020-08-14 2020-10-16 北京斗米优聘科技发展有限公司 Method and device for constructing database table level blood relationship
CN112231203A (en) * 2020-09-28 2021-01-15 四川新网银行股份有限公司 Data warehouse test analysis method based on blood relationship
CN112434042A (en) * 2020-12-03 2021-03-02 深圳市欢太科技有限公司 Data relationship construction method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609473A (en) * 2009-07-30 2009-12-23 金蝶软件(中国)有限公司 A kind of method of Structured Query Language (SQL) of reconstruct report query and device
CN101859303A (en) * 2009-04-07 2010-10-13 中国移动通信集团湖北有限公司 Metadata management method and management system
CN102239458A (en) * 2008-12-02 2011-11-09 起元技术有限责任公司 Visualizing relationships between data elements
US8468120B2 (en) * 2010-08-24 2013-06-18 International Business Machines Corporation Systems and methods for tracking and reporting provenance of data used in a massively distributed analytics cloud
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102239458A (en) * 2008-12-02 2011-11-09 起元技术有限责任公司 Visualizing relationships between data elements
CN101859303A (en) * 2009-04-07 2010-10-13 中国移动通信集团湖北有限公司 Metadata management method and management system
CN101609473A (en) * 2009-07-30 2009-12-23 金蝶软件(中国)有限公司 A kind of method of Structured Query Language (SQL) of reconstruct report query and device
US8468120B2 (en) * 2010-08-24 2013-06-18 International Business Machines Corporation Systems and methods for tracking and reporting provenance of data used in a massively distributed analytics cloud
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨玢玢: ""数据仓库元数据的管理与运用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
衡铁刚: ""面向疑点核实的数据路径追踪技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915390A (en) * 2015-05-25 2015-09-16 广州精点计算机科技有限公司 ETL data lineage query system and query method
CN105868521A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Data information processing method and apparatus
CN106997369A (en) * 2016-01-26 2017-08-01 阿里巴巴集团控股有限公司 Data clearing method and device
CN106997369B (en) * 2016-01-26 2020-11-24 阿里巴巴集团控股有限公司 Data cleaning method and device
CN107239458A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and device of development object relation is calculated based on big data
TWI736587B (en) * 2016-03-28 2021-08-21 香港商阿里巴巴集團服務有限公司 Method and device for estimating the relationship of development objects based on big data
CN108132957A (en) * 2016-12-01 2018-06-08 中国移动通信有限公司研究院 A kind of data base processing method and device
CN108132957B (en) * 2016-12-01 2021-09-10 中国移动通信有限公司研究院 Database processing method and device
CN110019384B (en) * 2017-08-15 2023-06-27 阿里巴巴集团控股有限公司 Method for acquiring blood edge data, method and device for providing blood edge data
CN110019384A (en) * 2017-08-15 2019-07-16 阿里巴巴集团控股有限公司 A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data
CN109947998B (en) * 2017-12-20 2023-05-26 Sap欧洲公司 Computing data lineage across networks of heterogeneous systems
CN109947998A (en) * 2017-12-20 2019-06-28 Sap欧洲公司 The calculating data lineage of network across heterogeneous system
CN108038248A (en) * 2017-12-28 2018-05-15 携程计算机技术(上海)有限公司 ETL relies on automatic identifying method and system
CN108038248B (en) * 2017-12-28 2021-11-26 携程计算机技术(上海)有限公司 ETL dependency automatic identification method and system
CN110019315A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and apparatus for the parsing of data blood relationship
CN109614432A (en) * 2018-12-05 2019-04-12 北京百分点信息科技有限公司 A kind of system and method for the acquisition data genetic connection based on syntactic analysis
CN109614432B (en) * 2018-12-05 2021-01-05 北京百分点信息科技有限公司 System and method for acquiring data blood relationship based on syntactic analysis
CN109669981A (en) * 2018-12-21 2019-04-23 成都四方伟业软件股份有限公司 Data relationship management method, device, data relationship acquisition methods and storage medium
CN109857818A (en) * 2019-02-03 2019-06-07 北京字节跳动网络技术有限公司 Determine method, apparatus, storage medium and the electronic equipment of the relations of production
CN110008291B (en) * 2019-04-10 2022-03-11 北京字节跳动网络技术有限公司 Data early warning method and device, storage medium and electronic equipment
CN110008291A (en) * 2019-04-10 2019-07-12 北京字节跳动网络技术有限公司 Data early warning method, device, storage medium and electronic equipment
CN110232056B (en) * 2019-05-21 2022-02-25 苏宁云计算有限公司 Blood margin analysis method and tool of structured query language
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110795509A (en) * 2019-09-29 2020-02-14 北京淇瑀信息科技有限公司 Method and device for constructing index blood relationship graph of data warehouse and electronic equipment
CN110795509B (en) * 2019-09-29 2024-02-09 北京淇瑀信息科技有限公司 Method and device for constructing index blood-margin relation graph of data warehouse and electronic equipment
CN111125229A (en) * 2019-12-24 2020-05-08 杭州数梦工场科技有限公司 Data blood margin generation method and device and electronic equipment
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment
CN111782738A (en) * 2020-08-14 2020-10-16 北京斗米优聘科技发展有限公司 Method and device for constructing database table level blood relationship
CN112231203A (en) * 2020-09-28 2021-01-15 四川新网银行股份有限公司 Data warehouse test analysis method based on blood relationship
CN112434042A (en) * 2020-12-03 2021-03-02 深圳市欢太科技有限公司 Data relationship construction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103902653B (en) 2017-08-01

Similar Documents

Publication Publication Date Title
CN103902653A (en) Method and device for creating data warehouse table blood relationship graph
US8364683B2 (en) Importing and reconciling resources from disjoint name spaces to a common namespace
US20180144061A1 (en) Edge store designs for graph databases
JP6262874B2 (en) Database implementation method
US9619492B2 (en) Data migration
CN102306168B (en) Log operation method and device and file system
US10353874B2 (en) Method and apparatus for associating information
US10002142B2 (en) Method and apparatus for generating schema of non-relational database
US9406018B2 (en) Systems and methods for semantic data integration
CN110019111B (en) Data processing method, data processing device, storage medium and processor
CN105900093A (en) Keyvalue database data table updating method and data table updating device
US20180357330A1 (en) Compound indexes for graph databases
CN107016115B (en) Data export method and device, computer readable storage medium and electronic equipment
CN110781183A (en) Method and device for processing incremental data in Hive database and computer equipment
US20200133787A1 (en) Method, electronic device and computer readable medium of file management
CN108241676A (en) Realize the method and apparatus that data synchronize
JP6700554B2 (en) Distributed processing management method, distributed processing management program, and distributed processing management device
CN103049494A (en) Method and device for storing table of extensible markup language (XML) file
CN107766519B (en) Method for visually configuring data structure
CN109446167A (en) A kind of storage of daily record data, extracting method and device
CN108897742A (en) A kind of log method for internationalizing, system, equipment and computer readable storage medium
US20180144060A1 (en) Processing deleted edges in graph databases
US10664501B2 (en) Deriving and interpreting users collective data asset use across analytic software systems
CN111078905A (en) Data processing method, device, medium and equipment
CN107609038A (en) Data clearing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 519000 High-tech Zone, Zhuhai City, Guangdong Province, Unit 1, Fourth Floor C, Building A, Headquarters Base No. 1, Qianwan Third Road, Tangjiawan Town

Patentee after: ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd.

Address before: 519080 Zone B, 1st Floor, Convention Center, No. 1, Software Park Road, Tangjiawan Town, Zhuhai, Guangdong

Patentee before: ZHUHAI DUOWAN INFORMATION TECHNOLOGY Ltd.