CN107644060A - A kind of intelligent grid cross-node join methods based on E R stripping strategies - Google Patents
A kind of intelligent grid cross-node join methods based on E R stripping strategies Download PDFInfo
- Publication number
- CN107644060A CN107644060A CN201710742995.2A CN201710742995A CN107644060A CN 107644060 A CN107644060 A CN 107644060A CN 201710742995 A CN201710742995 A CN 201710742995A CN 107644060 A CN107644060 A CN 107644060A
- Authority
- CN
- China
- Prior art keywords
- tables
- data
- join
- burst
- intelligent grid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000005520 cutting process Methods 0.000 claims abstract description 14
- 238000013467 fragmentation Methods 0.000 claims abstract description 10
- 238000006062 fragmentation reaction Methods 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000005611 electricity Effects 0.000 claims 2
- 238000000151 deposition Methods 0.000 claims 1
- 239000012634 fragment Substances 0.000 abstract description 2
- 238000012423 maintenance Methods 0.000 abstract description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000006116 polymerization reaction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
Abstract
The present invention relates to a kind of intelligent grid cross-node join methods based on E R stripping strategies.Traditional distributed relation storehouse carries out burst by the way of horizontal cutting to tables of data, and some rows in table exactly are sliced into a database, and other some rows are sliced into other databases, and this slit mode brings some shortcomings:(1) rule is split to be difficult to be abstracted;(2) burst transaction consistency is difficult to solve;(3) it is very big with maintenance repeatedly to extend difficulty for data;(4) inter-library join poor-performings.The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve distributed data base table data fragmentation (carrying out burst according to E R relations);Distributed data base tables of data join improved efficiencies (can avoid inter-library operation) substantially according to the data of E R relation bursts when carrying out join operations;Finally, a kind of method for being satisfied with the efficient burst of distributed relational database tables of data and join operations is designed.
Description
Technical field
The present invention relates to a kind of data cross-node join methods, particularly in intelligent grid based on E-R stripping strategies across
Node join methods.
Technical background
The mode of the horizontal cutting of traditional distributed relation storehouse generally use to carry out burst to tables of data.Horizontal cutting be by
According to the cutting of data row, some rows in table are exactly sliced into a database, and other some rows are sliced into other
Database in, this slit mode brings some shortcomings:(1) rule is split to be difficult to be abstracted;(2) burst transaction consistency is difficult
To solve;(3) it is very big with maintenance repeatedly to extend difficulty for data;(4) inter-library join poor-performings.
The present invention is directed to the feature of Distributed Relational storehouse table fragment data, and its object is to solve:(1) distributed data
Storehouse table data fragmentation (carries out burst) according to E-R relations;(2) distributed data base tables of data join improved efficiencies (are closed according to E-R
It is that the data of burst can avoid inter-library operation substantially when carrying out join operations);(3) finally, design one kind is satisfied with distribution
The efficient burst of formula relational database tables of data and the method for join operations.
Join refers to concatenation operation, and in relational algebra, concatenation operation is selected by a cartesian product computing and one
Take what computing was formed.The multiplication closed to two datasets is completed with cartesian product first, then the results set of generation is entered
Row chooses computing, it is ensured that only the row for merging respectively from two datasets and having lap is merged.Connection
Whole meanings are that merging two datasets in the horizontal direction closes (being typically table), and produce a new results set, its
Method is that the row in a data source and another data source are neutralized into the row that it matches to be combined into a new tuple.In relationship type
In database, two or more tables that JOIN is fundamentally based on being related to are combined the process point of reconstruct.Its knot created
Fruit can be saved as a table (table) or be used as a table to use.The basis of this process combined, in other words
Communication center, it is the common row being present between two tables.Because the table in cluster is stored in different server node, such as
Fruit performs the table that the involved table of join operations is distributed across different server node, it is necessary to carries out cross-node join operations.
Techniqueflow
In data cutting processing, in particularly horizontal cutting, two processing procedures that database is finally wanted are exactly data
Cutting, the polymerization of data.Suitable segmentation rules are selected, it is most important, because it determines the difficulty or ease of follow-up data polymerization
Degree, it might even be possible to avoid inter-library data aggregate from handling.Relevant database is to be based on entity relationship model (Entity-
Relationship Model) on, to be derived from this by the way which depict things in real world and relation, ER tables.Root
According to this thinking, this paper presents the data fragmentation strategy based on E-R relations, the record of sublist is deposited with associated parent table record
It is placed on same data fragmentation, i.e., sublist depends on parent table, and being grouped (Table Group) by table ensures that data Join will not
Inter-library operation.
Techniqueflow and brief description of the drawings
Fig. 1 is based on E-R stripping strategy cross-node join flows
Process description:
1st, tables of data is extracted from multiple sources operation systems, and the correctness and integrality of tables of data are verified,
The data that mistake be present are corrected;
2nd, tables of data is arranged and is abstracted, the table for set membership can not be abstracted as, still deposited using original
Storage mode is stored, the table for that can be abstracted as set membership, using proposed by the invention based on E-R stripping strategies
Point join methods transboundary;
3rd, cutting is carried out to table according to E-R allocation methods.There are a kind of business, such as order (order) with order detail
(order_detail), detail list can depend on order, that is to say, that can have the master slave relation of table, this cutting similar to business
Suitable segmentation rules can be taken out, such as according to ID cutting, other related tables all rely on ID, then or
According to order ID cuttings, partial service can always take out the table of set membership in a word.This kind of table is applied to ER burst tables, son
The record of table is stored on same data fragmentation with associated parent table record, avoids the inter-library operations of data Join.With order
Exemplified by order_detail examples, following burst configuration defined in schema.xml, order, order_detail according to
Order_id carries out data cutting, ensures that identical order_id data are assigned on same burst, is carrying out data insertion behaviour
When making, database can obtain the burst where order, the burst being then also inserted into order_detail where order.
Xml file configurations are as follows:
<Table name=" order " dataNode=" ds $ 1-32 " rule=" mod-long ">
<ChildTable name=" order_detail " primaryKey=" id " joinKey=" order_id " p
ArentKey=" order_id "/>
</table>
4th, burst is stored using E-R join technologies.We have used for reference Foundation DB mentality of designing,
The concept for proposing Table Group innovative Foundation DB, the storage location of sublist is depended on main table by it, and
And physically close to storage, therefore thoroughly solve JION efficiency and performance issue, according to this thinking, it is proposed that based on E-
The data fragmentation strategy of R relations, the record of sublist are stored on same data fragmentation with associated parent table record.
Customer is using this stripping strategy of sharding-by-intfile, and on dn1, dn2, orders relies on parent table and entered burst
Row burst, the incidence relation of two tables is orders.customer_id=customer.id.So, on burst Dn1
Customer can be carried out local JOIN with the orders on Dn1 and combine, also in this way, remerging two nodes on Dn2
Data can complete overall JOIN, if orders tables have 1,000,000 on each burst, 100 bursts just have 1
Hundred million, based on the data fragmentation pattern of E-R mappings, the enterprise for having substantially solved more than 80% applies problem encountered.
Implementation result
1. having built the cluster environment for including 10 Node distribution formulas, data include two tables of data R and S, R table include 3
Individual attribute:Employee number, age and wage.S tables include 3 attributes:Employee number, line manager's numbering and company's numbering, institute
It is shaping to have attribute, and meets uniform data distribution, and wherein table R and table S are attached operation, Mei Gebiao using employee number
Number of tuples be 15000000.
2., can be using table R as main table by identification, table S is complied with table R, according to employee number cutting, other related tables
Employee number is all relied on, takes out the table of set membership.This kind of table is applied to E-R burst tables, the record of sublist with it is associated
Parent table record deposit on same data fragmentation, avoid the inter-library operations of data Join.
3. the query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 38 seconds, and uses conventional method, not
Query time using the join methods of point transboundary of E-R stripping strategies is time-consuming 162 seconds.The present invention substantially reduces query time,
Search efficiency is obviously improved, reduces O&M cost.
Claims (2)
1. a kind of intelligent grid cross-node join methods based on E-R stripping strategies, it is characterized in that:In intelligent grid database
There is the master slave relation of table in many tables, can take out suitable segmentation rules according to the cutting of business, partial service can take out
Table as going out set membership;This kind of table is applied to ER burst tables, and the record of sublist is stored in same with associated parent table record
On individual data fragmentation, the inter-library operations of data Join are avoided;The storage location of sublist depends on main table, and physically close to depositing
Put, therefore thoroughly solve JION efficiency and performance issue.
2. a kind of intelligent grid cross-node join methods based on E-R stripping strategies as claimed in claim 1, it is characterized in that:
Than more typical two tables it is electricity consumption user Customer tables and electrical equipment Device tables in intelligent grid, two tables can be combed
Manage out the master slave relation of table, electrical equipment Device tables depend on electricity consumption user's Customer tables, the major key in Device tables
Parent_id and Customer tables major key id have one-to-one dependence, therefore, two tables are stored in into same point
In piece, when the join for carrying out two tables is inquired about, it is possible to avoid inter-library operation, make distributed data base tables of data join efficiency
It is obviously improved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710742995.2A CN107644060A (en) | 2017-08-25 | 2017-08-25 | A kind of intelligent grid cross-node join methods based on E R stripping strategies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710742995.2A CN107644060A (en) | 2017-08-25 | 2017-08-25 | A kind of intelligent grid cross-node join methods based on E R stripping strategies |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107644060A true CN107644060A (en) | 2018-01-30 |
Family
ID=61110181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710742995.2A Pending CN107644060A (en) | 2017-08-25 | 2017-08-25 | A kind of intelligent grid cross-node join methods based on E R stripping strategies |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107644060A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227521A1 (en) * | 2014-02-07 | 2015-08-13 | Scalebase Inc. | System and method for analysis and management of data distribution in a distributed database environment |
CN105393251A (en) * | 2013-06-12 | 2016-03-09 | 甲骨文国际公司 | An in-database sharded queue |
CN105404638A (en) * | 2015-09-28 | 2016-03-16 | 高新兴科技集团股份有限公司 | Method for solving correlated query of distributed cross-database fragment table |
CN105930407A (en) * | 2016-04-18 | 2016-09-07 | 北京思特奇信息技术股份有限公司 | Cross-database associated query method and system for distributed database |
CN106341454A (en) * | 2016-08-23 | 2017-01-18 | 世纪龙信息网络有限责任公司 | Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method |
-
2017
- 2017-08-25 CN CN201710742995.2A patent/CN107644060A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105393251A (en) * | 2013-06-12 | 2016-03-09 | 甲骨文国际公司 | An in-database sharded queue |
US20150227521A1 (en) * | 2014-02-07 | 2015-08-13 | Scalebase Inc. | System and method for analysis and management of data distribution in a distributed database environment |
CN105404638A (en) * | 2015-09-28 | 2016-03-16 | 高新兴科技集团股份有限公司 | Method for solving correlated query of distributed cross-database fragment table |
CN105930407A (en) * | 2016-04-18 | 2016-09-07 | 北京思特奇信息技术股份有限公司 | Cross-database associated query method and system for distributed database |
CN106341454A (en) * | 2016-08-23 | 2017-01-18 | 世纪龙信息网络有限责任公司 | Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method |
Non-Patent Citations (3)
Title |
---|
SHIYAN1248..: ""Mycat跨分片Join指南"", 《道客巴巴,链接:HTTPS://WWW.DOC88.COM/P-2092373574194.HTML》 * |
ZHANGLEI_16转载: ""mycat ER分片的场景详细分析"", 《CSDN:HTTPS://BLOG.CSDN.NET/ZHANGLEI_16/ARTICLE/DETAILS/50779929》 * |
王亚玲等: "数据库系统应用分片中间件", 《计算机系统应用》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106227800B (en) | Storage method and management system for highly-associated big data | |
Zhao et al. | Modeling MongoDB with relational model | |
CN110618983A (en) | JSON document structure-based industrial big data multidimensional analysis and visualization method | |
EP2527996B1 (en) | Equi-joins between split tables | |
CN109669934A (en) | A kind of data warehouse and its construction method suiting electric power customer service | |
CN104391948A (en) | Data standardization construction method and system of data warehouse | |
US20180004781A1 (en) | Data lineage analysis | |
CN105117442B (en) | A kind of big data querying method based on probability | |
CN102867066B (en) | Data Transform Device and data summarization method | |
CN108280159B (en) | Method for converting graph database into relational database | |
Ngu et al. | B+-tree construction on massive data with Hadoop | |
CN109299154A (en) | A kind of data-storage system and method for big data | |
CN103631922A (en) | Hadoop cluster-based large-scale Web information extraction method and system | |
CN103646100A (en) | Report data organization model | |
CN109871470B (en) | Power grid equipment data labeling management system and implementation method | |
CN104834754A (en) | SPARQL semantic data query optimization method based on connection cost | |
US20150169656A1 (en) | Distributed database system | |
CN102902811A (en) | Database design method for quickly generating tree structure | |
CN101710336A (en) | Method for accelerating data processing by using relational middleware | |
CN101916281B (en) | Concurrent computational system and non-repetition counting method | |
CN104504030B (en) | A kind of indexing means towards power dispatching automation magnanimity message | |
CN103377236B (en) | A kind of Connection inquiring method and system for distributed data base | |
CN102708188A (en) | Method and system for data separation | |
CN107644060A (en) | A kind of intelligent grid cross-node join methods based on E R stripping strategies | |
GB2609831A (en) | Multi-value primary keys for plurality of unique identifiers of entities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180130 |
|
RJ01 | Rejection of invention patent application after publication |