CN107644060A - A kind of intelligent grid cross-node join methods based on E R stripping strategies - Google Patents

A kind of intelligent grid cross-node join methods based on E R stripping strategies Download PDF

Info

Publication number
CN107644060A
CN107644060A CN201710742995.2A CN201710742995A CN107644060A CN 107644060 A CN107644060 A CN 107644060A CN 201710742995 A CN201710742995 A CN 201710742995A CN 107644060 A CN107644060 A CN 107644060A
Authority
CN
China
Prior art keywords
tables
data
join
burst
intelligent grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710742995.2A
Other languages
Chinese (zh)
Inventor
陈硕
毛洪涛
李钊
雷振江
唐胜
谢玉波
曹健
耿洪碧
李强
秦鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHINA REALTIME DATABASE Co Ltd
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Original Assignee
CHINA REALTIME DATABASE Co Ltd
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHINA REALTIME DATABASE Co Ltd, State Grid Corp of China SGCC, State Grid Liaoning Electric Power Co Ltd, Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd filed Critical CHINA REALTIME DATABASE Co Ltd
Priority to CN201710742995.2A priority Critical patent/CN107644060A/en
Publication of CN107644060A publication Critical patent/CN107644060A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to a kind of intelligent grid cross-node join methods based on E R stripping strategies.Traditional distributed relation storehouse carries out burst by the way of horizontal cutting to tables of data, and some rows in table exactly are sliced into a database, and other some rows are sliced into other databases, and this slit mode brings some shortcomings:(1) rule is split to be difficult to be abstracted;(2) burst transaction consistency is difficult to solve;(3) it is very big with maintenance repeatedly to extend difficulty for data;(4) inter-library join poor-performings.The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve distributed data base table data fragmentation (carrying out burst according to E R relations);Distributed data base tables of data join improved efficiencies (can avoid inter-library operation) substantially according to the data of E R relation bursts when carrying out join operations;Finally, a kind of method for being satisfied with the efficient burst of distributed relational database tables of data and join operations is designed.

Description

A kind of intelligent grid cross-node join methods based on E-R stripping strategies
Technical field
The present invention relates to a kind of data cross-node join methods, particularly in intelligent grid based on E-R stripping strategies across Node join methods.
Technical background
The mode of the horizontal cutting of traditional distributed relation storehouse generally use to carry out burst to tables of data.Horizontal cutting be by According to the cutting of data row, some rows in table are exactly sliced into a database, and other some rows are sliced into other Database in, this slit mode brings some shortcomings:(1) rule is split to be difficult to be abstracted;(2) burst transaction consistency is difficult To solve;(3) it is very big with maintenance repeatedly to extend difficulty for data;(4) inter-library join poor-performings.
The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve:(1) distributed data Storehouse table data fragmentation (carries out burst) according to E-R relations;(2) distributed data base tables of data join improved efficiencies (are closed according to E-R It is that the data of burst can avoid inter-library operation substantially when carrying out join operations);(3) finally, design one kind is satisfied with distribution The efficient burst of formula relational database tables of data and the method for join operations.
Join refers to concatenation operation, and in relational algebra, concatenation operation is selected by a cartesian product computing and one Take what computing was formed.The multiplication closed to two datasets is completed with cartesian product first, then the results set of generation is entered Row chooses computing, it is ensured that only the row for merging respectively from two datasets and having lap is merged.Connection Whole meanings are that merging two datasets in the horizontal direction closes (being typically table), and produce a new results set, its Method is that the row in a data source and another data source are neutralized into the row that it matches to be combined into a new tuple.In relationship type In database, two or more tables that JOIN is fundamentally based on being related to are combined the process point of reconstruct.Its knot created Fruit can be saved as a table (table) or be used as a table to use.The basis of this process combined, in other words Communication center, it is the common row being present between two tables.Because the table in cluster is stored in different server node, such as Fruit performs the table that the involved table of join operations is distributed across different server node, it is necessary to carries out cross-node join operations.
Techniqueflow
In data cutting processing, in particularly horizontal cutting, two processing procedures that database is finally wanted are exactly data Cutting, the polymerization of data.Suitable segmentation rules are selected, it is most important, because it determines the difficulty or ease of follow-up data polymerization Degree, it might even be possible to avoid inter-library data aggregate from handling.Relevant database is to be based on entity relationship model (Entity- Relationship Model) on, to be derived from this by the way which depict things in real world and relation, ER tables.Root According to this thinking, this paper presents the data fragmentation strategy based on E-R relations, the record of sublist is deposited with associated parent table record It is placed on same data fragmentation, i.e., sublist depends on parent table, and being grouped (Table Group) by table ensures that data Join will not Inter-library operation.
Techniqueflow and brief description of the drawings
Fig. 1 is based on E-R stripping strategy cross-node join flows
Process description:
1st, tables of data is extracted from multiple sources operation systems, and the correctness and integrality of tables of data are verified, The data that mistake be present are corrected;
2nd, tables of data is arranged and is abstracted, the table for set membership can not be abstracted as, still deposited using original Storage mode is stored, the table for that can be abstracted as set membership, using proposed by the invention based on E-R stripping strategies Point join methods transboundary;
3rd, cutting is carried out to table according to E-R allocation methods.There are a kind of business, such as order (order) with order detail (order_detail), detail list can depend on order, that is to say, that can have the master slave relation of table, this cutting similar to business Suitable segmentation rules can be taken out, such as according to ID cutting, other related tables all rely on ID, then or According to order ID cuttings, partial service can always take out the table of set membership in a word.This kind of table is applied to ER burst tables, son The record of table is stored on same data fragmentation with associated parent table record, avoids the inter-library operations of data Join.With order Exemplified by order_detail examples, following burst configuration defined in schema.xml, order, order_detail according to Order_id carries out data cutting, ensures that identical order_id data are assigned on same burst, is carrying out data insertion behaviour When making, database can obtain the burst where order, the burst being then also inserted into order_detail where order.
Xml file configurations are as follows:
<Table name=" order " dataNode=" ds $ 1-32 " rule=" mod-long ">
<ChildTable name=" order_detail " primaryKey=" id " joinKey=" order_id " p ArentKey=" order_id "/>
</table>
4th, burst is stored using E-R join technologies.We have used for reference Foundation DB mentality of designing, The concept for proposing Table Group innovative Foundation DB, the storage location of sublist is depended on main table by it, and And physically close to storage, therefore thoroughly solve JION efficiency and performance issue, according to this thinking, it is proposed that based on E- The data fragmentation strategy of R relations, the record of sublist are stored on same data fragmentation with associated parent table record. Customer is using this stripping strategy of sharding-by-intfile, and on dn1, dn2, orders relies on parent table and entered burst Row burst, the incidence relation of two tables is orders.customer_id=customer.id.So, on burst Dn1 Customer can be carried out local JOIN with the orders on Dn1 and combine, also in this way, remerging two nodes on Dn2 Data can complete overall JOIN, if orders tables have 1,000,000 on each burst, 100 bursts just have 1 Hundred million, based on the data fragmentation pattern of E-R mappings, the enterprise for having substantially solved more than 80% applies problem encountered.
Implementation result
1. having built the cluster environment for including 10 Node distribution formulas, data include two tables of data R and S, R table include 3 Individual attribute:Employee number, age and wage.S tables include 3 attributes:Employee number, line manager's numbering and company's numbering, institute It is shaping to have attribute, and meets uniform data distribution, and wherein table R and table S are attached operation, Mei Gebiao using employee number Number of tuples be 15000000.
2., can be using table R as main table by identification, table S is complied with table R, according to employee number cutting, other related tables Employee number is all relied on, takes out the table of set membership.This kind of table is applied to E-R burst tables, the record of sublist with it is associated Parent table record deposit on same data fragmentation, avoid the inter-library operations of data Join.
3. the query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 38 seconds, and uses conventional method, not Query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 162 seconds.The present invention substantially reduces query time, Search efficiency is obviously improved, reduces O&M cost.

Claims (2)

1. a kind of intelligent grid cross-node join methods based on E-R stripping strategies, it is characterized in that:In intelligent grid database There is the master slave relation of table in many tables, can take out suitable segmentation rules according to the cutting of business, partial service can take out Table as going out set membership;This kind of table is applied to ER burst tables, and the record of sublist is stored in same with associated parent table record On individual data fragmentation, the inter-library operations of data Join are avoided;The storage location of sublist depends on main table, and physically close to depositing Put, therefore thoroughly solve JION efficiency and performance issue.
2. a kind of intelligent grid cross-node join methods based on E-R stripping strategies as claimed in claim 1, it is characterized in that: Than more typical two tables it is electricity consumption user Customer tables and electrical equipment Device tables in intelligent grid, two tables can be combed Manage out the master slave relation of table, electrical equipment Device tables depend on electricity consumption user's Customer tables, the major key in Device tables Parent_id and Customer tables major key id have one-to-one dependence, therefore, two tables are stored in into same point In piece, when the join for carrying out two tables is inquired about, it is possible to avoid inter-library operation, make distributed data base tables of data join efficiency It is obviously improved.
CN201710742995.2A 2017-08-25 2017-08-25 A kind of intelligent grid cross-node join methods based on E R stripping strategies Pending CN107644060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710742995.2A CN107644060A (en) 2017-08-25 2017-08-25 A kind of intelligent grid cross-node join methods based on E R stripping strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710742995.2A CN107644060A (en) 2017-08-25 2017-08-25 A kind of intelligent grid cross-node join methods based on E R stripping strategies

Publications (1)

Publication Number Publication Date
CN107644060A true CN107644060A (en) 2018-01-30

Family

ID=61110181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710742995.2A Pending CN107644060A (en) 2017-08-25 2017-08-25 A kind of intelligent grid cross-node join methods based on E R stripping strategies

Country Status (1)

Country Link
CN (1) CN107644060A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227521A1 (en) * 2014-02-07 2015-08-13 Scalebase Inc. System and method for analysis and management of data distribution in a distributed database environment
CN105393251A (en) * 2013-06-12 2016-03-09 甲骨文国际公司 An in-database sharded queue
CN105404638A (en) * 2015-09-28 2016-03-16 高新兴科技集团股份有限公司 Method for solving correlated query of distributed cross-database fragment table
CN105930407A (en) * 2016-04-18 2016-09-07 北京思特奇信息技术股份有限公司 Cross-database associated query method and system for distributed database
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105393251A (en) * 2013-06-12 2016-03-09 甲骨文国际公司 An in-database sharded queue
US20150227521A1 (en) * 2014-02-07 2015-08-13 Scalebase Inc. System and method for analysis and management of data distribution in a distributed database environment
CN105404638A (en) * 2015-09-28 2016-03-16 高新兴科技集团股份有限公司 Method for solving correlated query of distributed cross-database fragment table
CN105930407A (en) * 2016-04-18 2016-09-07 北京思特奇信息技术股份有限公司 Cross-database associated query method and system for distributed database
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHIYAN1248..: ""Mycat跨分片Join指南"", 《道客巴巴,链接:HTTPS://WWW.DOC88.COM/P-2092373574194.HTML》 *
ZHANGLEI_16转载: ""mycat ER分片的场景详细分析"", 《CSDN:HTTPS://BLOG.CSDN.NET/ZHANGLEI_16/ARTICLE/DETAILS/50779929》 *
王亚玲等: "数据库系统应用分片中间件", 《计算机系统应用》 *

Similar Documents

Publication Publication Date Title
CN106227800B (en) Storage method and management system for highly-associated big data
Zhao et al. Modeling MongoDB with relational model
CN110618983A (en) JSON document structure-based industrial big data multidimensional analysis and visualization method
EP2527996B1 (en) Equi-joins between split tables
CN109669934A (en) A kind of data warehouse and its construction method suiting electric power customer service
CN104391948A (en) Data standardization construction method and system of data warehouse
US20180004781A1 (en) Data lineage analysis
CN105117442B (en) A kind of big data querying method based on probability
CN102867066B (en) Data Transform Device and data summarization method
CN108280159B (en) Method for converting graph database into relational database
Ngu et al. B+-tree construction on massive data with Hadoop
CN109299154A (en) A kind of data-storage system and method for big data
CN103631922A (en) Hadoop cluster-based large-scale Web information extraction method and system
CN103646100A (en) Report data organization model
CN109871470B (en) Power grid equipment data labeling management system and implementation method
CN104834754A (en) SPARQL semantic data query optimization method based on connection cost
US20150169656A1 (en) Distributed database system
CN102902811A (en) Database design method for quickly generating tree structure
CN101710336A (en) Method for accelerating data processing by using relational middleware
CN101916281B (en) Concurrent computational system and non-repetition counting method
CN104504030B (en) A kind of indexing means towards power dispatching automation magnanimity message
CN103377236B (en) A kind of Connection inquiring method and system for distributed data base
CN102708188A (en) Method and system for data separation
CN107644060A (en) A kind of intelligent grid cross-node join methods based on E R stripping strategies
GB2609831A (en) Multi-value primary keys for plurality of unique identifiers of entities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180130

RJ01 Rejection of invention patent application after publication