CN107644060A

CN107644060A - A kind of intelligent grid cross-node join methods based on E R stripping strategies

Info

Publication number: CN107644060A
Application number: CN201710742995.2A
Authority: CN
Inventors: 陈硕; 毛洪涛; 李钊; 雷振江; 唐胜; 谢玉波; 曹健; 耿洪碧; 李强; 秦鹏飞
Original assignee: CHINA REALTIME DATABASE Co Ltd; State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Current assignee: CHINA REALTIME DATABASE Co Ltd; State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Priority date: 2017-08-25
Filing date: 2017-08-25
Publication date: 2018-01-30

Abstract

The present invention relates to a kind of intelligent grid cross-node join methods based on E R stripping strategies.Traditional distributed relation storehouse carries out burst by the way of horizontal cutting to tables of data, and some rows in table exactly are sliced into a database, and other some rows are sliced into other databases, and this slit mode brings some shortcomings：(1) rule is split to be difficult to be abstracted；(2) burst transaction consistency is difficult to solve；(3) it is very big with maintenance repeatedly to extend difficulty for data；(4) inter-library join poor-performings.The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve distributed data base table data fragmentation (carrying out burst according to E R relations)；Distributed data base tables of data join improved efficiencies (can avoid inter-library operation) substantially according to the data of E R relation bursts when carrying out join operations；Finally, a kind of method for being satisfied with the efficient burst of distributed relational database tables of data and join operations is designed.

Description

A kind of intelligent grid cross-node join methods based on E-R stripping strategies

Technical field

The present invention relates to a kind of data cross-node join methods, particularly in intelligent grid based on E-R stripping strategies across Node join methods.

Technical background

The mode of the horizontal cutting of traditional distributed relation storehouse generally use to carry out burst to tables of data.Horizontal cutting be by According to the cutting of data row, some rows in table are exactly sliced into a database, and other some rows are sliced into other Database in, this slit mode brings some shortcomings：(1) rule is split to be difficult to be abstracted；(2) burst transaction consistency is difficult To solve；(3) it is very big with maintenance repeatedly to extend difficulty for data；(4) inter-library join poor-performings.

The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve：(1) distributed data Storehouse table data fragmentation (carries out burst) according to E-R relations；(2) distributed data base tables of data join improved efficiencies (are closed according to E-R It is that the data of burst can avoid inter-library operation substantially when carrying out join operations)；(3) finally, design one kind is satisfied with distribution The efficient burst of formula relational database tables of data and the method for join operations.

Join refers to concatenation operation, and in relational algebra, concatenation operation is selected by a cartesian product computing and one Take what computing was formed.The multiplication closed to two datasets is completed with cartesian product first, then the results set of generation is entered Row chooses computing, it is ensured that only the row for merging respectively from two datasets and having lap is merged.Connection Whole meanings are that merging two datasets in the horizontal direction closes (being typically table), and produce a new results set, its Method is that the row in a data source and another data source are neutralized into the row that it matches to be combined into a new tuple.In relationship type In database, two or more tables that JOIN is fundamentally based on being related to are combined the process point of reconstruct.Its knot created Fruit can be saved as a table (table) or be used as a table to use.The basis of this process combined, in other words Communication center, it is the common row being present between two tables.Because the table in cluster is stored in different server node, such as Fruit performs the table that the involved table of join operations is distributed across different server node, it is necessary to carries out cross-node join operations.

Techniqueflow

In data cutting processing, in particularly horizontal cutting, two processing procedures that database is finally wanted are exactly data Cutting, the polymerization of data.Suitable segmentation rules are selected, it is most important, because it determines the difficulty or ease of follow-up data polymerization Degree, it might even be possible to avoid inter-library data aggregate from handling.Relevant database is to be based on entity relationship model (Entity- Relationship Model) on, to be derived from this by the way which depict things in real world and relation, ER tables.Root According to this thinking, this paper presents the data fragmentation strategy based on E-R relations, the record of sublist is deposited with associated parent table record It is placed on same data fragmentation, i.e., sublist depends on parent table, and being grouped (Table Group) by table ensures that data Join will not Inter-library operation.

Techniqueflow and brief description of the drawings

Fig. 1 is based on E-R stripping strategy cross-node join flows

Process description：

1st, tables of data is extracted from multiple sources operation systems, and the correctness and integrality of tables of data are verified, The data that mistake be present are corrected；

2nd, tables of data is arranged and is abstracted, the table for set membership can not be abstracted as, still deposited using original Storage mode is stored, the table for that can be abstracted as set membership, using proposed by the invention based on E-R stripping strategies Point join methods transboundary；

3rd, cutting is carried out to table according to E-R allocation methods.There are a kind of business, such as order (order) with order detail (order_detail), detail list can depend on order, that is to say, that can have the master slave relation of table, this cutting similar to business Suitable segmentation rules can be taken out, such as according to ID cutting, other related tables all rely on ID, then or According to order ID cuttings, partial service can always take out the table of set membership in a word.This kind of table is applied to ER burst tables, son The record of table is stored on same data fragmentation with associated parent table record, avoids the inter-library operations of data Join.With order Exemplified by order_detail examples, following burst configuration defined in schema.xml, order, order_detail according to Order_id carries out data cutting, ensures that identical order_id data are assigned on same burst, is carrying out data insertion behaviour When making, database can obtain the burst where order, the burst being then also inserted into order_detail where order.

Xml file configurations are as follows：

</table>

4th, burst is stored using E-R join technologies.We have used for reference Foundation DB mentality of designing, The concept for proposing Table Group innovative Foundation DB, the storage location of sublist is depended on main table by it, and And physically close to storage, therefore thoroughly solve JION efficiency and performance issue, according to this thinking, it is proposed that based on E- The data fragmentation strategy of R relations, the record of sublist are stored on same data fragmentation with associated parent table record. Customer is using this stripping strategy of sharding-by-intfile, and on dn1, dn2, orders relies on parent table and entered burst Row burst, the incidence relation of two tables is orders.customer_id=customer.id.So, on burst Dn1 Customer can be carried out local JOIN with the orders on Dn1 and combine, also in this way, remerging two nodes on Dn2 Data can complete overall JOIN, if orders tables have 1,000,000 on each burst, 100 bursts just have 1 Hundred million, based on the data fragmentation pattern of E-R mappings, the enterprise for having substantially solved more than 80% applies problem encountered.

Implementation result

1. having built the cluster environment for including 10 Node distribution formulas, data include two tables of data R and S, R table include 3 Individual attribute：Employee number, age and wage.S tables include 3 attributes：Employee number, line manager's numbering and company's numbering, institute It is shaping to have attribute, and meets uniform data distribution, and wherein table R and table S are attached operation, Mei Gebiao using employee number Number of tuples be 15000000.

2., can be using table R as main table by identification, table S is complied with table R, according to employee number cutting, other related tables Employee number is all relied on, takes out the table of set membership.This kind of table is applied to E-R burst tables, the record of sublist with it is associated Parent table record deposit on same data fragmentation, avoid the inter-library operations of data Join.

3. the query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 38 seconds, and uses conventional method, not Query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 162 seconds.The present invention substantially reduces query time, Search efficiency is obviously improved, reduces O＆M cost.

Claims

1. a kind of intelligent grid cross-node join methods based on E-R stripping strategies, it is characterized in that：In intelligent grid database There is the master slave relation of table in many tables, can take out suitable segmentation rules according to the cutting of business, partial service can take out Table as going out set membership；This kind of table is applied to ER burst tables, and the record of sublist is stored in same with associated parent table record On individual data fragmentation, the inter-library operations of data Join are avoided；The storage location of sublist depends on main table, and physically close to depositing Put, therefore thoroughly solve JION efficiency and performance issue.

2. a kind of intelligent grid cross-node join methods based on E-R stripping strategies as claimed in claim 1, it is characterized in that： Than more typical two tables it is electricity consumption user Customer tables and electrical equipment Device tables in intelligent grid, two tables can be combed Manage out the master slave relation of table, electrical equipment Device tables depend on electricity consumption user's Customer tables, the major key in Device tables Parent_id and Customer tables major key id have one-to-one dependence, therefore, two tables are stored in into same point In piece, when the join for carrying out two tables is inquired about, it is possible to avoid inter-library operation, make distributed data base tables of data join efficiency It is obviously improved.