CN103944964A - Distributed system and method carrying out expansion step by step through same - Google Patents
Distributed system and method carrying out expansion step by step through same Download PDFInfo
- Publication number
- CN103944964A CN103944964A CN201410116840.4A CN201410116840A CN103944964A CN 103944964 A CN103944964 A CN 103944964A CN 201410116840 A CN201410116840 A CN 201410116840A CN 103944964 A CN103944964 A CN 103944964A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- etl
- task
- new
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Multi Processors (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a distributed system and a method carrying out expansion step by step through the same. The system comprises an ETL written-in data module, a buffer module, a data redistribution module, a data index distribution table forming module, a dispatching module and a plurality of databases. The databases are borne on a plurality of servers. The buffer module is an ETL mobile hard disk and is in communication with the databases and the ETL written-in data module. The dispatching module comprises a pausing unit, a starting unit and a task adding unit. The problem that old nodes can not be used after expansion is solved, so that the cost of the servers is lowered, and the profit is increased. The system can provide system service to the exterior during the working time in the day, and both new nodes and the old nodes can run at the same time.
Description
Technical field
The present invention relates to a kind of System Expansion method, a kind of method that is specifically related to distributed system and carries out progressively dilatation according to this system.
Background technology
In large data handling system, the data volume of each node storage is very large, such as the actual amount of data 2.8T of a hard disk.When system processing power is not enough, when need to increase back end and promote disposal ability, how the data of original system are heavily distributed, be deployed to all back end (old node+new node), be a more difficult problem.Current most of distributed data system all adopts hash mode distributed data, when increasing node, by system, uses hash mode heavily to distribute.But because data volume is huge, often within 24 hours, also cannot complete, and most of System production time also needs externally to provide service, this contradiction needs to solve.Some product, greenplum system such as EMC, in the mobile signaling protocol acquisition analysis system of Guangdong, when node will extend to 50 nodes from 10 nodes, have no idea to accomplish this point, have to use the parallel data that import of new and old node, last new system only has 40 nodes, rather than 50 nodes.This problem can be more serious in from 50 to 100.
Summary of the invention
The present invention is for solving the above-mentioned shortcoming that redistributes dilatation existence in data handling system of mentioning, a kind of distributed system is provided and carries out progressively dilatation method according to this distributed system, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guarantee that on the other hand new and old node can move in system.
The invention provides a kind of distributed system of progressively carrying out dilatation, it comprises that ETL data writing module, buffer module, the heavy distribution module of data, data directory distribution table form module, scheduler module and a plurality of database, described a plurality of database is carried on a plurality of servers, described buffer module is ETL portable hard drive, and described buffer module and a plurality of database are connected with described ETL data writing module communication; Described scheduler module comprises to be suspended unit, start unit and adds TU task unit.
Concrete, a kind of method of carrying out progressively dilatation according to above-mentioned distributed system, it comprises the following steps:
S1: after a plurality of new database of distributed system produces, according to the quantity of a plurality of new new database, data directory distribution table forms module, generates new data directory distribution table;
S2: described time-out unit suspends the operation of ETL data writing module data writing, and file is saved to buffer module, and described interpolation TU task unit is added into task queue by the task of ETL data writing;
S3: the heavy distribution module of described start unit log-on data, distributes data by new data directory distribution table;
S4: restart ETL data writing module, accelerate the task of ETL data writing in processing queue; Until after ETL data writing completes, restart online query task.
Preferably, data directory distribution table forms module according to the quantity of new database, by business rule, generates new data directory distribution table.
Preferably, described new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.
Preferably, described new data directory distribution table is distributed in new node 50% of the data in old node.
Preferably, described interpolation TU task unit comprises selected cell and command unit, and described selected cell can be set the priority of the task of ETL data writing, and described command unit is selected the first post command of tasks carrying according to the priority of task.
Preferably, the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
Advantage of the present invention is as described below: the present invention adopts a minute day progressively dilatation method to carry out dilatation to system, after dilatation, old node and new node can both be used, solve the out of use problem of old node after dilatation, thereby reduced the cost of server, increased income.The cost of large data server, generally in ten thousand yuan of left and right of 5-15, by 100,000 1 calculating, when having 20 old nodes to be utilized, just can be saved 2,000,000 yuan.And, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guaranteed that on the other hand new and old node can move in system.
Accompanying drawing explanation
Fig. 1 is the structural representation that progressively carries out the distributed system of dilatation provided by the invention;
Fig. 2 is the structural representation of scheduler module of the present invention;
Fig. 3 is schematic diagram during dilatation in progressively dilatation method of distributed system of the present invention.
Embodiment
First, some terms that relate in the present invention are explained:
Database is a subject-oriented, data acquisition system integrated, nonupdatable, that constantly change in time, and it is for supporting the Analysis of Policy Making of enterprise or tissue to process.Database is generally used for storing the historical data of enterprise, and by ETL process, produces enterprise's form etc.
ETL cleans after referring to data (such as relation data, flat data file) that distribute, in heterogeneous data source etc. being drawn into interim intermediate layer, conversion, integrated, finally be loaded in database, become the basis of enterprise's form, on-line analytical processing, data mining.ETL task generally, in operation at night, is processed the data in enormous quantities of enterprise, forms crucial operation indicator (KPI, Key Performance Indication) and is loaded in form.
Data source refers to the source data that certain required by task of ETL computing is wanted, and is the data of Production database sometimes, is the data that another one ETL program produces sometimes.
Production database is the database that the operating activity in the daytime of enterprise is used, and is the data source of database maximum.
Below in conjunction with accompanying drawing and specific embodiment, the present invention is further explained.
As shown in Figure 1, a kind of distributed system of progressively carrying out dilatation, it comprises that ETL data writing module 1, buffer module 2, the heavy distribution module 3 of data, data directory distribution table form module 4, scheduler module 5 and a plurality of database 6, a plurality of databases 6 are carried on a plurality of servers, buffer module 2 is ETL portable hard drive, and buffer module 2 and a plurality of database 6 are connected with 1 communication of ETL data writing module; As shown in Figure 2, scheduler module 5 comprises time-out unit 50, start unit 51 and adds TU task unit 52.
As shown in Figure 3, a kind of method of carrying out progressively dilatation according to above-mentioned distributed system, it comprises the following steps:
S1: after the new database (below also referred to as node) of distributed system produces, data directory distribution table forms module 4 according to the quantity of new database, generates new data directory distribution table;
S2: suspend the operation that unit 50 suspends ETL data writing module 1 data writing, file is saved to buffer module, add TU task unit 52 task of ETL data writing is added into task queue;
S3: the heavy distribution module 3 of start unit 51 log-on data, distributes data by new data directory distribution table; The new node location that ETL data writing module 1 is looked for, deposits data in;
S4: start unit 51 restarts ETL data writing module 1, accelerates the task of ETL data writing in processing queue;
After S5:ETL data writing completes, start unit 51 restarts online query task.
Preferably, data directory distribution table forms module 4 according to the quantity of new database, by business rule, generates new data directory distribution table.
Preferably, new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.In the present embodiment, new data directory distribution table is distributed in new node 50% of the data in old node.As shown in the table:
In distributed system, the distribution of data distributes by hash value or business rule often.By hash value, distributing is mainly to realize by mathematical algorithm, cannot manual control in the time of dilatation, cause dilatation after old node cannot continue use, increased the Cost Problems of database.In the present invention, according to business rule, generate new data directory distribution table and distribute, when dilatation, easily control.
Be exemplified below: new data directory distribution table is distributed in new node 50% of the data in old node.
Tentation data according to number latter two carry out data distribution, distribution relation is as follows before dilatation:
Districts and cities | Record number | Node |
00 | 50 | DB1 |
01 | 50 | DB1 |
02 | 50 | DB1 |
03 | 40 | DB1 |
04 | 20 | DB2 |
05 | 80 | DB2 |
06 | 40 | DB2 |
07 | 60 | DB2 |
.... | ? | ? |
In dilatation, can adjust data directory distribution table table, such as
Districts and cities | Record number | Node |
00 | 50 | DB101 |
01 | 50 | DB101 |
02 | 50 | DB1 |
03 | 40 | DB1 |
04 | 20 | DB201 |
05 | 80 | DB201 |
06 | 40 | DB2 |
07 | 60 | DB2 |
.... | ? | ? |
Preferably, add TU task unit 52 and comprise selected cell 520 and command unit 521, selected cell 520 can be set the priority of the task of ETL data writing, and command unit 521 is selected the first post command of tasks carrying according to the priority of task.
As preferred embodiment, the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
Advantage of the present invention is as described below: the present invention adopts a minute day progressively dilatation method to carry out dilatation to system, after dilatation, old node and new node can both be used, solve the out of use problem of old node after dilatation, thereby reduced the cost of server, increased income.The cost of large data server, generally in ten thousand yuan of left and right of 5-15, by 100,000 1 calculating, when having 20 old nodes to be utilized, just can be saved 2,000,000 yuan.And, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guaranteed that on the other hand new and old node can move in system.
Person of ordinary skill in the field is to be understood that: in the situation that not departing from basic principle of the present invention; can carry out various modifications, retouching, combination to the present invention, supplement or the replacement of technical characterictic, these are equal to substitute mode or within obviously mode of texturing all falls into protection scope of the present invention.
Claims (7)
1. a distributed system of progressively carrying out dilatation, it is characterized in that: it comprises that ETL data writing module, buffer module, the heavy distribution module of data, data directory distribution table form module, scheduler module and a plurality of database, described a plurality of database is carried on a plurality of servers, described buffer module is ETL portable hard drive, and described buffer module and a plurality of database are connected with described ETL data writing module communication; Described scheduler module comprises to be suspended unit, start unit and adds TU task unit.
2. distributed system is carried out a method for progressively dilatation, it is characterized in that: it comprises the following steps:
S1: after a plurality of new database of distributed system produces, according to the quantity of a plurality of new new database, data directory distribution table forms module, generates new data directory distribution table;
S2: described time-out unit suspends the operation of ETL data writing module data writing, and file is saved to buffer module, and described interpolation TU task unit is added into task queue by the task of ETL data writing;
S3: the heavy distribution module of described start unit log-on data, distributes data by new data directory distribution table;
S4: restart ETL data writing module, accelerate the task of ETL data writing in processing queue; Until after ETL data writing completes, start online query task.
3. the method for progressively dilatation according to claim 2, is characterized in that: data directory distribution table forms module according to the quantity of new database, by business rule, generates new data directory distribution table.
4. the method for progressively dilatation according to claim 3, is characterized in that: described new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.
5. the method for progressively dilatation according to claim 4, is characterized in that: described new data directory distribution table is distributed in new node 50% of the data in old node.
6. the method for progressively dilatation according to claim 2, it is characterized in that: described interpolation TU task unit comprises selected cell and command unit, described selected cell can be set the priority of the task of ETL data writing, and described command unit is selected the first post command of tasks carrying according to the priority of task.
7. the method for progressively dilatation according to claim 6, is characterized in that: the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410116840.4A CN103944964A (en) | 2014-03-27 | 2014-03-27 | Distributed system and method carrying out expansion step by step through same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410116840.4A CN103944964A (en) | 2014-03-27 | 2014-03-27 | Distributed system and method carrying out expansion step by step through same |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103944964A true CN103944964A (en) | 2014-07-23 |
Family
ID=51192445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410116840.4A Pending CN103944964A (en) | 2014-03-27 | 2014-03-27 | Distributed system and method carrying out expansion step by step through same |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103944964A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391989A (en) * | 2014-12-16 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Distributed ETL all-in-one machine system |
CN106301864A (en) * | 2015-06-11 | 2017-01-04 | 腾讯科技(深圳)有限公司 | A kind of server system expansion method, device and dilatation processing equipment |
CN106407308A (en) * | 2016-08-31 | 2017-02-15 | 天津南大通用数据技术股份有限公司 | Method and device for expanding capacity of distributed database |
WO2017036242A1 (en) * | 2015-08-31 | 2017-03-09 | 华为技术有限公司 | Data processing method, apparatus, and system |
CN108008913A (en) * | 2016-10-27 | 2018-05-08 | 杭州海康威视数字技术股份有限公司 | A kind of expansion method based on management node, device and storage system |
CN111061737A (en) * | 2019-12-12 | 2020-04-24 | 税友软件集团股份有限公司 | Distributed database rapid capacity expansion device |
CN111538718A (en) * | 2020-04-22 | 2020-08-14 | 杭州宇为科技有限公司 | Entity id generation and positioning method, capacity expansion method and equipment of distributed system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060026199A1 (en) * | 2004-07-15 | 2006-02-02 | Mariano Crea | Method and system to load information in a general purpose data warehouse database |
US20110213775A1 (en) * | 2010-03-01 | 2011-09-01 | International Business Machines Corporation | Database Table Look-up |
CN102332004A (en) * | 2011-07-29 | 2012-01-25 | 中国科学院计算技术研究所 | Data processing method and system for managing mass data |
CN102521297A (en) * | 2011-11-30 | 2012-06-27 | 北京人大金仓信息技术股份有限公司 | Method for achieving system dynamic expansion in shared-nothing database cluster |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
-
2014
- 2014-03-27 CN CN201410116840.4A patent/CN103944964A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060026199A1 (en) * | 2004-07-15 | 2006-02-02 | Mariano Crea | Method and system to load information in a general purpose data warehouse database |
US20110213775A1 (en) * | 2010-03-01 | 2011-09-01 | International Business Machines Corporation | Database Table Look-up |
CN102332004A (en) * | 2011-07-29 | 2012-01-25 | 中国科学院计算技术研究所 | Data processing method and system for managing mass data |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
CN102521297A (en) * | 2011-11-30 | 2012-06-27 | 北京人大金仓信息技术股份有限公司 | Method for achieving system dynamic expansion in shared-nothing database cluster |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391989A (en) * | 2014-12-16 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Distributed ETL all-in-one machine system |
CN106301864A (en) * | 2015-06-11 | 2017-01-04 | 腾讯科技(深圳)有限公司 | A kind of server system expansion method, device and dilatation processing equipment |
CN106301864B (en) * | 2015-06-11 | 2019-12-27 | 腾讯科技(深圳)有限公司 | Server system capacity expansion method and device and capacity expansion processing equipment |
WO2017036242A1 (en) * | 2015-08-31 | 2017-03-09 | 华为技术有限公司 | Data processing method, apparatus, and system |
CN106407308A (en) * | 2016-08-31 | 2017-02-15 | 天津南大通用数据技术股份有限公司 | Method and device for expanding capacity of distributed database |
CN108008913A (en) * | 2016-10-27 | 2018-05-08 | 杭州海康威视数字技术股份有限公司 | A kind of expansion method based on management node, device and storage system |
CN108008913B (en) * | 2016-10-27 | 2020-12-18 | 杭州海康威视数字技术股份有限公司 | Management node-based capacity expansion method and device and storage system |
CN111061737A (en) * | 2019-12-12 | 2020-04-24 | 税友软件集团股份有限公司 | Distributed database rapid capacity expansion device |
CN111061737B (en) * | 2019-12-12 | 2023-05-09 | 税友软件集团股份有限公司 | Quick capacity expanding device of distributed database |
CN111538718A (en) * | 2020-04-22 | 2020-08-14 | 杭州宇为科技有限公司 | Entity id generation and positioning method, capacity expansion method and equipment of distributed system |
CN111538718B (en) * | 2020-04-22 | 2023-10-27 | 杭州宇为科技有限公司 | Entity id generation and positioning method, capacity expansion method and equipment of distributed system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103944964A (en) | Distributed system and method carrying out expansion step by step through same | |
CN107544984B (en) | Data processing method and device | |
US10452617B2 (en) | Multi-level deduplication | |
CN104820670A (en) | Method for acquiring and storing big data of power information | |
CN109194711B (en) | Synchronization method, client, server and medium for organization architecture | |
CN103412916A (en) | Methods and device for multi-dimensionally storing and retrieving data of monitoring system | |
CN101950297A (en) | Method and device for storing and inquiring mass semantic data | |
CN105069109B (en) | A kind of method and system of distributed data base dilatation | |
CN103235811A (en) | Data storage method and device | |
CN111651519B (en) | Data synchronization method, data synchronization device, electronic equipment and storage medium | |
CN111966677B (en) | Data report processing method and device, electronic equipment and storage medium | |
CN103970852A (en) | Data de-duplication method of backup server | |
CN102722582A (en) | System and method for integrating data on basis of reverse clearing | |
CN103268270B (en) | The management method of snapshot and device | |
CN103067525A (en) | Cloud storage data backup method based on characteristic codes | |
CN103699660A (en) | Large-scale network streaming data cache-write method | |
CN110389967A (en) | Date storage method, device, server and storage medium | |
CN104112011A (en) | Method and device for extracting mass data | |
CN102404411A (en) | Data synchronization method of cloud storage system | |
CN104765651A (en) | Data processing method and device | |
CN101303657B (en) | Method of optimization of multiprocessor real-time task execution power consumption | |
CN102347869B (en) | Method, device and system for monitoring equipment performance | |
CN106250501B (en) | Report processing method and reporting system | |
CN102724290B (en) | Method, device and system for getting target customer group | |
CN102436501A (en) | Parallel file managing system based on web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
DD01 | Delivery of document by public notice |
Addressee: SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY CO., LTD. Document name: the First Notification of an Office Action |
|
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140723 |
|
RJ01 | Rejection of invention patent application after publication |