CN104503966B - The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas - Google Patents
The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas Download PDFInfo
- Publication number
- CN104503966B CN104503966B CN201410550641.4A CN201410550641A CN104503966B CN 104503966 B CN104503966 B CN 104503966B CN 201410550641 A CN201410550641 A CN 201410550641A CN 104503966 B CN104503966 B CN 104503966B
- Authority
- CN
- China
- Prior art keywords
- partition
- data
- subregion
- fixed
- active
- Prior art date
Links
- 238000005192 partition Methods 0.000 title claims abstract description 126
- 241001269238 Data Species 0.000 title claims abstract description 14
- 239000010410 layers Substances 0.000 claims abstract description 5
- 230000000875 corresponding Effects 0.000 claims description 18
- 230000000694 effects Effects 0.000 claims description 13
- 238000000638 solvent extraction Methods 0.000 claims description 10
- 239000007787 solids Substances 0.000 claims description 2
- 238000003780 insertion Methods 0.000 description 5
- 101710062303 moon Proteins 0.000 description 4
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000000034 methods Methods 0.000 description 2
- 230000002159 abnormal effects Effects 0.000 description 1
- 238000010586 diagrams Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
Description
Technical field
The present invention relates to database big data processing technology field, more particularly to a kind of PostgreSQL big datas efficiently to exempt from Safeguard automatic Mesh Partition Method.
Background technology
Existing PostgreSQL databases are because the record quantity to single table is conditional, while the space stored Aspect is also restricted, even if these are not all limited, what MDL (data manipulation language order) operating efficiency also can be with table is big Small growth is affected.Processing of the existing PostgreSQL databases to big data is typically using data partition or data hierarchy Mode is realized.For example, PostgreSQL databases not only support this several data partition mode of range, list and hash, together When also support the data hierarchy mode of any combination, but data hierarchy mode needs to realize by trigger.
At present, problems be present when using the traditional data subregion of PostgreSQL databases:
First, the data partition method of existing PostgreSQL databases is more inflexible, the definition of partitions of database table is once It is fixed up, the later stage with the change of data volume, the division of partition table with regard to cumbersome, it is necessary to human intervention, and can block up DML operation is filled in, influences business.For example, existing PostgreSQL partitions of database table partition definition and subregion constraint with The passage of time, partition table need human intervention to add, if to extend subregion, such as are needed artificially from day subregion is expanded to the moon Intervene, be easy to go wrong.
Second, for range partition, it is necessary to periodically increase subregion, otherwise data can be because value can not fall at defined point Overflowed in area, return to mistake, influence business.Even if there is default partition, DML efficiency is also influenceed.It is for example, existing The partition definition and entrance object binding of PostgreSQL databases, safeguard cumbersome.
Third, subregion is more, zoned logic code is longer, on the one hand influences efficiency, on the other hand adds code maintenance Difficulty and error probability.For example, the zoned logic of existing PostgreSQL databases is over time, logical code needs Human intervention is wanted to change, subregion is more, and code is huger, and efficiency is lower, and error probability is bigger.Extension subregion needs human intervention Change zoned logic.
The content of the invention
The present invention is to solve to cause later stage partition table after the definition of the partition table of PostgreSQL databases is fixed up It is not easy to divide, data are easily overflowed from range partition table, and the code that partition table causes zoned logic is longer so as to causing DML efficiency is low and increases the deficiency of code maintenance difficulty, there is provided and it is a kind of that zoned logic is fixed in a circulation, according to solid Fixed circulation creates fixed zoned logic, and active partition is peeled off from zoned logic, and data shift automatically to active partition, according to Data volume divides the efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas of subregion automatically.
To achieve these goals, the present invention uses following technical scheme:
The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas, comprises the following steps:
(1) create into oral thermometer, and application program only interacts with entering oral thermometer;
(2) according to the subregion value tag of partitions of database table, consolidating over time is converted to by partition value is abstract Determine cycle values;
(3) fixed zoned logic is created according to fixed cycle values;
(4) the subregion succession table according to zoned logic code and the field value distribution for entering oral thermometer is created;
(5) active partition is peeled off from zoned logic, allows active partition to be automatically created over time;
(6).The data timing of fixed partition is transferred to active partition using inherited characteristics.And delete after the transfer is complete Except the data in fixed partition;
(7) line splitting design is entered according to data volume, if single table data volume is too huge, the big current work of time span Dynamic partition table splits into the small goal activities partition table of time span, goal activities partition table is first added inheritance, simultaneously The data shifting logic of timing is changed, data are write toward goal activities partition table, then start an affairs by current active subregion Data splitting in table simultaneously switches off the connection of current active partition table into goal activities partition table;
(8) layering and the data volume rule of correspondence are pre-defined, when active partition table data volume exceedes upper limit threshold then certainly Move toward next split layer;
(9) asynchronous succession table is created according to the data shifting logic of the asynchronous active movement of subregion succession table;
(10) first table of method, subregion table name and the timestamp for recording each active partition is created;
(11) logical function for data transfer is created, the logical function obtains current asynchronous subregion from first table first Pattern, then judge whether to reach critical point, take the size of a upper partition table from statistical form if critical point is reached, then root Select corresponding partitioning strategies to create the division according to data volume, and a record is increased newly to indicate current partitioning strategies, such as in first table Fruit does not reach critical point, then the data shifting logic according to corresponding to calling current bay strategy, and select to close according to current time Suitable fixed partition, taken out in selected fixed partition minimum time and maximum time, and from first table take out minimum time and The subregion table name of maximum time scope and front and rear subregion table name, and the data of fixed partition are write into these active partition tables, And the data of fixed partition table are removed simultaneously;
(12) data transfer function is called in timing.
Preferably, being shifted using bulkload mode data, fixed partition is deleted using TRUNCATE mode Data.
Preferably, the fixed cycles value of partition value include monthly circulation, by Zhou Xunhuan, per diem circulation, by the hour circulation, Circulated by minute, circulate by the second or circulated by half an hour.
If the preferably, granularity of zoned logic by the hour, it is only necessary to create 24 subregions.
The present invention can reach following effect:
1st, zoned logic is fixed in a circulation, realizes that zoned logic code is non-maintaining.Reduce race way simultaneously The data volume in domain, it can like this distinguish and store with active partition, such as the storage of high IO abilities and the storage of low IO abilities. Save the carrying cost of high IO abilities.
2nd, fixed zoned logic, such as granularity by the hour are created according to fixed cycles, it is only necessary to create 24 subregions. After partition table and zoned logic are fixed, subregion and code are non-maintaining.Because logical code part is simpler compared to traditional logic It is clean, also solve code efficiency problem.
3rd, active partition is peeled off from zoned logic, automatically creates active partition over time, solves subregion The problem of safeguarding.
4th, data shift automatically to active partition, using inheritance and DDL affairs solution Data Consistency and certainly The problem of dynamic subregion.
5th, subregion is divided according to data volume automatically, the acquisition of data volume can be used in data from database to be obtained, and is divided Threshold value can be stored in first table or logical code, and automatic division solves the problems, such as main entrance search efficiency, because table is excessive Or Dan Biaoguo big city brings parsing or search efficiency problem.
Brief description of the drawings
Fig. 1 is a kind of architecture principle schematic diagram of the present invention.
Fig. 2 is a kind of flow principle schematic of logical function of the present invention.
Embodiment
Below by embodiment, and with reference to accompanying drawing, technical scheme is described in further detail.
Example one:The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas, referring to shown in Fig. 1, Fig. 2, including such as Lower step:
(1) create into oral thermometer, and business side parallel data DML application program only interacts with entering oral thermometer.
(2) according to the feature of partition value, value is abstracted and is converted to a cycle values over time, such as time word Section, can be evolved into the cycle values temporally elapsed:(1-12) monthly is circulated, by Zhou Xunhuan (1-53), per diem circulates (1- 366) (1-24), is circulated by the hour, (1-60) is circulated by minute, circulates (1-60) by the second, circulated (0100-2400) by half an hour Deng, for solve zoned logic code maintenance provide the foundation.
(3) fixed zoned logic is created according to fixed cycles value, the zoned logic code of zoned logic is according to field value Succession table corresponding to being distributed to.Such as granularity by the hour, it is only necessary to create 24 subregions.Partition table and zoned logic are fixed Afterwards, subregion and code are non-maintaining.Solve efficiency, zoned logic code maintenance problem.
(4) the subregion succession table according to zoned logic code and the field value distribution for entering oral thermometer is created.For example, be distributed to after Hold table 1 and succession table n.
(5) active partition is peeled off from zoned logic, allows active partition to be automatically created over time.Solution The problem of subregion of having determined is safeguarded.
(6) using inherited characteristics by the active partition that is transferred to of the data timing of fixed partition, and delete after the transfer is complete Except the data in fixed partition.The design that data shift automatically to active partition, using inherited characteristics by the data of fixed partition Timing is transferred to active partition, and is deleted from fixed partition, and data transfer can use bulkload mode, accelerate transfer Speed, TRUNCATE modes can be selected by deleting the data of fixed partition, accelerate speed, while will not block DML, be solved certainly The problem of dynamic subregion.
(7) line splitting design is entered according to data volume, if single table data volume is too huge, the big current work of time span Dynamic partition table splits into the small goal activities partition table of time span, goal activities partition table is first added inheritance, simultaneously The data shifting logic of timing is changed, data are write toward goal activities partition table, then start an affairs by current active subregion Data splitting in table simultaneously switches off the connection of current active partition table into goal activities partition table.Such as current active Subregion is that single table data volume is too huge by month subregion, to split into per diem subregion, day table is first added inheritance, together Shi Xiugai timing datas shift logic, data in the past write by table, then start an affairs by menology data splitting to Biao simultaneously Menology relation is disconnected, ensures data consistency, is all transparent for the corresponding use of these operations, and does not block DML, solves to pass The problem of system subregion division may block DML.
(8) layering and the data volume rule of correspondence are pre-defined, when active partition table data volume exceedes upper limit threshold then certainly Move toward next split layer.For example, layering and the data volume rule of correspondence are pre-defined, for example, year, the moon, week, day, hour, minute The rule of corresponding data magnitude, when partition data amount exceedes upper limit threshold then toward next split layer, automatic division is realized in solution Ask.
(9) asynchronous succession table a1 and asynchronous is created according to the data shifting logic of the asynchronous active movement of subregion succession table Inherit table an.
(10) first table of method, subregion table name and the timestamp for recording each active partition is created.
(11) logical function for data transfer is created, the logical function obtains current asynchronous subregion from first table first Pattern, then judge whether to reach critical point, take the size of a upper partition table from statistical form if critical point is reached, then root Select corresponding partitioning strategies to create the division according to data volume, and a record is increased newly to indicate current partitioning strategies, such as in first table Fruit does not reach critical point, then the data shifting logic according to corresponding to calling current bay strategy, and select to close according to current time Suitable fixed partition, taken out in selected fixed partition minimum time and maximum time, and from first table take out minimum time and The subregion table name of maximum time scope and front and rear subregion table name, and the data of fixed partition are write into these active partition tables, And the data of fixed partition table are removed simultaneously.
(12) data transfer function is called in timing.
Example two:The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas, referring to shown in Fig. 1, Fig. 2, including such as Lower step:
First, enter oral thermometer and the zoned logic with service interaction are implemented.
1-1, create into oral thermometer tbl, specify a subregion field such as ctime, it is desirable to which can be distinguished from field contents should Which inherits table for row insertion, such as type is timestamp.Application program only interacts with entering oral thermometer.
1-2, succession table into oral thermometer is created, such as create 24 succession tables by the hour, tbl_01 to tbl_24, be each The ctime for inheriting table adds necessary constraint, such as tbl_01 constraints checkto_char (ctime, ' hh24 ')=01.
1-3, insertion trigger function is created, according to ctime function field values, insert corresponding succession table.Such as 2014- 01-0112:01:01 insertion tbl_12 tables.
1-4, in order to prevent from deleting and update into oral thermometer, create and delete, update trigger function, deletion is performed when entering oral thermometer Or during renewal operation, directly return abnormal.
1-5, create line trigger function, the insertion trigger function that the triggering of line trigger function above creates.
This part avoids the need for maintaining again after having created, when data are inserted into oral thermometer, be automatically inserted into corresponding to after Hold table.
2nd, asynchronous data transfer logic is implemented.
2-1, first table is created, record the data partition dimension (such as day, the moon, year) of asynchronous succession table, subregion table name, time Stamp, original state.
2-2, the asynchronous succession table dimension of initialization, are arranged to minimum dimension (such as day).
2-3, create function for data shifting logic.
2-4, function handling process:Open ssi or rr affairs, from first table obtain current asynchronous compartment model (such as day, the moon, Year), and judge whether to reach critical point, if reaching critical point, the size of a upper partition table, Ran Hougen are taken from statistical form Corresponding partitioning strategies is selected according to data volume, is created the division, and a record, current bay strategy are increased newly in the list of elements.Then root According to partitioning strategies, data shifting logic corresponding to calling, suitable fixed partition is then selected according to current time, what is selected Min and maxtime is taken out in fixed partition.Then subregion and the front and rear increase each one in this period are taken out from the list of elements Individual partition table, based on these partition tables and partitioning strategies, the data of fixed partition are inserted into these partition tables, and remove choosing Determine the record of fixed partition.If being not reaching to critical point, enter directly into according to partitioning strategies, data corresponding to calling turn This step of shifting logic.In this example, function handling process first obtains current asynchronous from first table and inherits table subregion dimension, initial dimension Spend for day.If it is currently non-initial dimension:It is different from upper one of statistical form acquisition according to the subregion dimension of current asynchronous succession table Step inherits the amount of capacity of table subregion, if being currently original dimensions:After original dimensions create generation 3 days, the asynchronous of the previous day is taken Inherit the amount of capacity of table subregion.Whether control threshold decision needs to change dimension, such as day table capacity is less than threshold value, is changed to Menology;Menology capacity is more than threshold value, is changed to a day table, and capacity is changed to chronology less than threshold value;Chronology capacity changes more than threshold value For menology.If the dimension of asynchronous succession table is changed, change is write into first table, the dimension as current asynchronous succession table Degree.Min, max times are taken out from 2 partition datas before succession table, such as is 17 points now, takes 15 partition data.Judge Table data are inherited whether across the asynchronous succession current dimension of table subregion, (such as min, max contain the data of 2 days, then across day dimension Degree).If across dimension, judge asynchronous succession table corresponding to dimension whether all in the presence of (such as tbl_20140101, tbl_ 20140102), in the absence of then asynchronous succession table corresponding to establishment, and corresponding constraint is added.Data insertion corresponding to it is asynchronous after Hold table.The succession table subregion of the truncate transfer data, truncated data.
2-5 opens repeatableread or ssi affairs, calls data transfer function.
Step 2-4 is made timed task by 2-6, and timing performs (but can not surmount 1 day), it is proposed that 1 hour performs one It is secondary.
Example one is that zoned logic is fixed in a circulation, realizes that zoned logic code is non-maintaining.Reduce simultaneously The data volume of race way, it can like this distinguish and store with active partition, such as storage and the low IO abilities of high IO abilities Storage.Save the carrying cost of high IO abilities.Second, fixed zoned logic is created according to fixed cycles, such as by the hour Granularity, it is only necessary to create 24 subregions.After partition table and zoned logic are fixed, subregion and code are non-maintaining.Because logical code Part is more succinct compared to traditional logic, also solves code efficiency problem.Third, active partition is shelled from zoned logic From automatically creating active partition over time, solve the problems, such as that subregion is safeguarded.Fourth, data to active partition from Turn is moved, and solves the problems, such as Data Consistency and auto-partition using inheritance and DDL affairs.Fifth, according to data volume Automatic division subregion, the acquisition of data volume can be used in data from database to be obtained, division threshold value can be stored in first table or In person's logical code, automatic division solves the problems, such as main entrance search efficiency, because table is excessive or Dan Biaoguo big city brings Parsing or search efficiency problem.
Do not limited when describing embodiments of the present invention above in conjunction with accompanying drawing, but realizing by above-described embodiment, this area Those of ordinary skill can make a variety of changes or change within the scope of the appended claims.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410550641.4A CN104503966B (en) | 2014-10-16 | 2014-10-16 | The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410550641.4A CN104503966B (en) | 2014-10-16 | 2014-10-16 | The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104503966A CN104503966A (en) | 2015-04-08 |
CN104503966B true CN104503966B (en) | 2017-12-12 |
Family
ID=52945364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410550641.4A CN104503966B (en) | 2014-10-16 | 2014-10-16 | The efficiently non-maintaining automatic Mesh Partition Method of PostgreSQL big datas |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104503966B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818167A (en) * | 2017-11-08 | 2018-03-20 | 顺丰科技有限公司 | MySQL subregions automatic management method, system, equipment, storage medium |
CN107818167B (en) * | 2017-11-08 | 2020-06-30 | 顺丰科技有限公司 | MySQL partition automatic management method, system, equipment and storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354315B (en) * | 2015-11-11 | 2018-10-30 | 华为技术有限公司 | Method, sublist node and the system of distributed data base neutron table splitting |
CN106649552B (en) * | 2016-11-07 | 2020-05-12 | 湖北省农村信用社联合社网络信息中心 | Normalized data cleaning method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003271494A (en) * | 2002-03-13 | 2003-09-26 | Ntt Comware Corp | Information collection system, information collection method, information collection program and recording medium |
CN1645336A (en) * | 2005-01-20 | 2005-07-27 | 上海复旦光华信息科技股份有限公司 | Automatic extraction and analysis for formwork based on heterogenerous logbook |
CN101957748A (en) * | 2009-07-17 | 2011-01-26 | 中国移动通信集团黑龙江有限公司 | Method, device and system for logic analysis of data |
GB2505210A (en) * | 2012-08-22 | 2014-02-26 | Datashaka Ltd | Processing time series data |
-
2014
- 2014-10-16 CN CN201410550641.4A patent/CN104503966B/en active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003271494A (en) * | 2002-03-13 | 2003-09-26 | Ntt Comware Corp | Information collection system, information collection method, information collection program and recording medium |
CN1645336A (en) * | 2005-01-20 | 2005-07-27 | 上海复旦光华信息科技股份有限公司 | Automatic extraction and analysis for formwork based on heterogenerous logbook |
CN101957748A (en) * | 2009-07-17 | 2011-01-26 | 中国移动通信集团黑龙江有限公司 | Method, device and system for logic analysis of data |
GB2505210A (en) * | 2012-08-22 | 2014-02-26 | Datashaka Ltd | Processing time series data |
Non-Patent Citations (1)
Title |
---|
基于PostgreSQL数据库的GML数据存储;张爱国等;《测绘科学》;20080131;第194-196页 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818167A (en) * | 2017-11-08 | 2018-03-20 | 顺丰科技有限公司 | MySQL subregions automatic management method, system, equipment, storage medium |
CN107818167B (en) * | 2017-11-08 | 2020-06-30 | 顺丰科技有限公司 | MySQL partition automatic management method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104503966A (en) | 2015-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Silvestro et al. | Revisiting the origin and diversification of vascular plants through a comprehensive Bayesian analysis of the fossil record | |
Starr et al. | Democratic dominoes revisited: The hazards of governmental transitions, 1974-1996 | |
US9521052B1 (en) | Methods, systems, and computer readable mediums for utilizing application programming interfaces for accessing key performance indicator information | |
Guimaraes Jr et al. | Asymmetries in specialization in ant–plant mutualistic networks | |
Lusk | Leaf area and growth of juvenile temperate evergreens in low light: species of contrasting shade tolerance change rank during ontogeny | |
Chong-Seng et al. | Macroalgal herbivory on recovering versus degrading coral reefs | |
CN105830064B (en) | Mood generating means and computer-readable recording medium | |
CN102521405B (en) | Massive structured data storage and query methods and systems supporting high-speed loading | |
Wiegand et al. | A patch-dynamics approach to savanna dynamics and woody plant encroachment–insights from an arid savanna | |
CN103617232B (en) | A kind of paging query method for HBase table | |
Silva Matos et al. | The role of density dependence in the population dynamics of a tropical palm | |
Derroire et al. | Resilience of tropical dry forests–a meta‐analysis of changes in species diversity and composition during secondary succession | |
Holt et al. | A theoretical framework for intraguild predation | |
Jacquemyn et al. | Size‐dependent flowering and costs of reproduction affect population dynamics in a tuberous perennial woodland orchid | |
CN102129458B (en) | Method and device for storing relational database | |
US20160147820A1 (en) | Variable Sized Database Dictionary Block Encoding | |
CN104951340B (en) | A kind of information processing method and device | |
Augusto et al. | The enigma of the rise of angiosperms: can we untie the knot? | |
Ogburn et al. | Anatomical variation in Cactaceae and relatives: trait lability and evolutionary innovation | |
US20120221523A1 (en) | Database Backup and Restore with Integrated Index Reorganization | |
Zhu et al. | Density‐dependent survival varies with species life‐history strategy in a tropical forest | |
US8346722B2 (en) | Replica placement strategy for distributed data persistence | |
Coomes et al. | Mortality and tree‐size distributions in natural mixed‐age forests | |
Mayle et al. | Long-term forest–savannah dynamics in the Bolivian Amazon: implications for conservation | |
Popic et al. | Flower‐visitor networks only partially predict the function of pollen transport by bees |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 310000 room 821, building 1, Xinjun building, 330 Internet of things street, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Sky-mobi Technology Co., Ltd. Address before: 10, building 2, block B, The Union Buildings, No. 310013, Bauhinia Road, Zhejiang, Hangzhou Patentee before: Hangzhou Sky-mobi Technology Co., Ltd. |