CN103944964A - Distributed system and method carrying out expansion step by step through same - Google Patents

Distributed system and method carrying out expansion step by step through same Download PDF

Info

Publication number
CN103944964A
CN103944964A CN201410116840.4A CN201410116840A CN103944964A CN 103944964 A CN103944964 A CN 103944964A CN 201410116840 A CN201410116840 A CN 201410116840A CN 103944964 A CN103944964 A CN 103944964A
Authority
CN
China
Prior art keywords
module
data
etl
task
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410116840.4A
Other languages
Chinese (zh)
Inventor
李晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY Co Ltd filed Critical SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410116840.4A priority Critical patent/CN103944964A/en
Publication of CN103944964A publication Critical patent/CN103944964A/en
Pending legal-status Critical Current

Links

Landscapes

  • Multi Processors (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed system and a method carrying out expansion step by step through the same. The system comprises an ETL written-in data module, a buffer module, a data redistribution module, a data index distribution table forming module, a dispatching module and a plurality of databases. The databases are borne on a plurality of servers. The buffer module is an ETL mobile hard disk and is in communication with the databases and the ETL written-in data module. The dispatching module comprises a pausing unit, a starting unit and a task adding unit. The problem that old nodes can not be used after expansion is solved, so that the cost of the servers is lowered, and the profit is increased. The system can provide system service to the exterior during the working time in the day, and both new nodes and the old nodes can run at the same time.

Description

A kind of distributed system and the method for carrying out progressively dilatation according to this system
Technical field
The present invention relates to a kind of System Expansion method, a kind of method that is specifically related to distributed system and carries out progressively dilatation according to this system.
Background technology
In large data handling system, the data volume of each node storage is very large, such as the actual amount of data 2.8T of a hard disk.When system processing power is not enough, when need to increase back end and promote disposal ability, how the data of original system are heavily distributed, be deployed to all back end (old node+new node), be a more difficult problem.Current most of distributed data system all adopts hash mode distributed data, when increasing node, by system, uses hash mode heavily to distribute.But because data volume is huge, often within 24 hours, also cannot complete, and most of System production time also needs externally to provide service, this contradiction needs to solve.Some product, greenplum system such as EMC, in the mobile signaling protocol acquisition analysis system of Guangdong, when node will extend to 50 nodes from 10 nodes, have no idea to accomplish this point, have to use the parallel data that import of new and old node, last new system only has 40 nodes, rather than 50 nodes.This problem can be more serious in from 50 to 100.
Summary of the invention
The present invention is for solving the above-mentioned shortcoming that redistributes dilatation existence in data handling system of mentioning, a kind of distributed system is provided and carries out progressively dilatation method according to this distributed system, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guarantee that on the other hand new and old node can move in system.
The invention provides a kind of distributed system of progressively carrying out dilatation, it comprises that ETL data writing module, buffer module, the heavy distribution module of data, data directory distribution table form module, scheduler module and a plurality of database, described a plurality of database is carried on a plurality of servers, described buffer module is ETL portable hard drive, and described buffer module and a plurality of database are connected with described ETL data writing module communication; Described scheduler module comprises to be suspended unit, start unit and adds TU task unit.
Concrete, a kind of method of carrying out progressively dilatation according to above-mentioned distributed system, it comprises the following steps:
S1: after a plurality of new database of distributed system produces, according to the quantity of a plurality of new new database, data directory distribution table forms module, generates new data directory distribution table;
S2: described time-out unit suspends the operation of ETL data writing module data writing, and file is saved to buffer module, and described interpolation TU task unit is added into task queue by the task of ETL data writing;
S3: the heavy distribution module of described start unit log-on data, distributes data by new data directory distribution table;
S4: restart ETL data writing module, accelerate the task of ETL data writing in processing queue; Until after ETL data writing completes, restart online query task.
Preferably, data directory distribution table forms module according to the quantity of new database, by business rule, generates new data directory distribution table.
Preferably, described new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.
Preferably, described new data directory distribution table is distributed in new node 50% of the data in old node.
Preferably, described interpolation TU task unit comprises selected cell and command unit, and described selected cell can be set the priority of the task of ETL data writing, and described command unit is selected the first post command of tasks carrying according to the priority of task.
Preferably, the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
Advantage of the present invention is as described below: the present invention adopts a minute day progressively dilatation method to carry out dilatation to system, after dilatation, old node and new node can both be used, solve the out of use problem of old node after dilatation, thereby reduced the cost of server, increased income.The cost of large data server, generally in ten thousand yuan of left and right of 5-15, by 100,000 1 calculating, when having 20 old nodes to be utilized, just can be saved 2,000,000 yuan.And, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guaranteed that on the other hand new and old node can move in system.
Accompanying drawing explanation
Fig. 1 is the structural representation that progressively carries out the distributed system of dilatation provided by the invention;
Fig. 2 is the structural representation of scheduler module of the present invention;
Fig. 3 is schematic diagram during dilatation in progressively dilatation method of distributed system of the present invention.
Embodiment
First, some terms that relate in the present invention are explained:
Database is a subject-oriented, data acquisition system integrated, nonupdatable, that constantly change in time, and it is for supporting the Analysis of Policy Making of enterprise or tissue to process.Database is generally used for storing the historical data of enterprise, and by ETL process, produces enterprise's form etc.
ETL cleans after referring to data (such as relation data, flat data file) that distribute, in heterogeneous data source etc. being drawn into interim intermediate layer, conversion, integrated, finally be loaded in database, become the basis of enterprise's form, on-line analytical processing, data mining.ETL task generally, in operation at night, is processed the data in enormous quantities of enterprise, forms crucial operation indicator (KPI, Key Performance Indication) and is loaded in form.
Data source refers to the source data that certain required by task of ETL computing is wanted, and is the data of Production database sometimes, is the data that another one ETL program produces sometimes.
Production database is the database that the operating activity in the daytime of enterprise is used, and is the data source of database maximum.
Below in conjunction with accompanying drawing and specific embodiment, the present invention is further explained.
As shown in Figure 1, a kind of distributed system of progressively carrying out dilatation, it comprises that ETL data writing module 1, buffer module 2, the heavy distribution module 3 of data, data directory distribution table form module 4, scheduler module 5 and a plurality of database 6, a plurality of databases 6 are carried on a plurality of servers, buffer module 2 is ETL portable hard drive, and buffer module 2 and a plurality of database 6 are connected with 1 communication of ETL data writing module; As shown in Figure 2, scheduler module 5 comprises time-out unit 50, start unit 51 and adds TU task unit 52.
As shown in Figure 3, a kind of method of carrying out progressively dilatation according to above-mentioned distributed system, it comprises the following steps:
S1: after the new database (below also referred to as node) of distributed system produces, data directory distribution table forms module 4 according to the quantity of new database, generates new data directory distribution table;
S2: suspend the operation that unit 50 suspends ETL data writing module 1 data writing, file is saved to buffer module, add TU task unit 52 task of ETL data writing is added into task queue;
S3: the heavy distribution module 3 of start unit 51 log-on data, distributes data by new data directory distribution table; The new node location that ETL data writing module 1 is looked for, deposits data in;
S4: start unit 51 restarts ETL data writing module 1, accelerates the task of ETL data writing in processing queue;
After S5:ETL data writing completes, start unit 51 restarts online query task.
Preferably, data directory distribution table forms module 4 according to the quantity of new database, by business rule, generates new data directory distribution table.
Preferably, new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.In the present embodiment, new data directory distribution table is distributed in new node 50% of the data in old node.As shown in the table:
In distributed system, the distribution of data distributes by hash value or business rule often.By hash value, distributing is mainly to realize by mathematical algorithm, cannot manual control in the time of dilatation, cause dilatation after old node cannot continue use, increased the Cost Problems of database.In the present invention, according to business rule, generate new data directory distribution table and distribute, when dilatation, easily control.
Be exemplified below: new data directory distribution table is distributed in new node 50% of the data in old node.
Tentation data according to number latter two carry out data distribution, distribution relation is as follows before dilatation:
Districts and cities Record number Node
00 50 DB1
01 50 DB1
02 50 DB1
03 40 DB1
04 20 DB2
05 80 DB2
06 40 DB2
07 60 DB2
.... ? ?
In dilatation, can adjust data directory distribution table table, such as
Districts and cities Record number Node
00 50 DB101
01 50 DB101
02 50 DB1
03 40 DB1
04 20 DB201
05 80 DB201
06 40 DB2
07 60 DB2
.... ? ?
Preferably, add TU task unit 52 and comprise selected cell 520 and command unit 521, selected cell 520 can be set the priority of the task of ETL data writing, and command unit 521 is selected the first post command of tasks carrying according to the priority of task.
As preferred embodiment, the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
Advantage of the present invention is as described below: the present invention adopts a minute day progressively dilatation method to carry out dilatation to system, after dilatation, old node and new node can both be used, solve the out of use problem of old node after dilatation, thereby reduced the cost of server, increased income.The cost of large data server, generally in ten thousand yuan of left and right of 5-15, by 100,000 1 calculating, when having 20 old nodes to be utilized, just can be saved 2,000,000 yuan.And, can progressively to system, carry out dilatation, can guarantee that on the one hand the system operating time by day still can externally provide system service, guaranteed that on the other hand new and old node can move in system.
Person of ordinary skill in the field is to be understood that: in the situation that not departing from basic principle of the present invention; can carry out various modifications, retouching, combination to the present invention, supplement or the replacement of technical characterictic, these are equal to substitute mode or within obviously mode of texturing all falls into protection scope of the present invention.

Claims (7)

1. a distributed system of progressively carrying out dilatation, it is characterized in that: it comprises that ETL data writing module, buffer module, the heavy distribution module of data, data directory distribution table form module, scheduler module and a plurality of database, described a plurality of database is carried on a plurality of servers, described buffer module is ETL portable hard drive, and described buffer module and a plurality of database are connected with described ETL data writing module communication; Described scheduler module comprises to be suspended unit, start unit and adds TU task unit.
2. distributed system is carried out a method for progressively dilatation, it is characterized in that: it comprises the following steps:
S1: after a plurality of new database of distributed system produces, according to the quantity of a plurality of new new database, data directory distribution table forms module, generates new data directory distribution table;
S2: described time-out unit suspends the operation of ETL data writing module data writing, and file is saved to buffer module, and described interpolation TU task unit is added into task queue by the task of ETL data writing;
S3: the heavy distribution module of described start unit log-on data, distributes data by new data directory distribution table;
S4: restart ETL data writing module, accelerate the task of ETL data writing in processing queue; Until after ETL data writing completes, start online query task.
3. the method for progressively dilatation according to claim 2, is characterized in that: data directory distribution table forms module according to the quantity of new database, by business rule, generates new data directory distribution table.
4. the method for progressively dilatation according to claim 3, is characterized in that: described new data directory distribution table is distributed to the 30%-60% of the data in old node in new node.
5. the method for progressively dilatation according to claim 4, is characterized in that: described new data directory distribution table is distributed in new node 50% of the data in old node.
6. the method for progressively dilatation according to claim 2, it is characterized in that: described interpolation TU task unit comprises selected cell and command unit, described selected cell can be set the priority of the task of ETL data writing, and described command unit is selected the first post command of tasks carrying according to the priority of task.
7. the method for progressively dilatation according to claim 6, is characterized in that: the priority of the task of ETL data writing is divided into limit priority, inferior priority and normal priority.
CN201410116840.4A 2014-03-27 2014-03-27 Distributed system and method carrying out expansion step by step through same Pending CN103944964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410116840.4A CN103944964A (en) 2014-03-27 2014-03-27 Distributed system and method carrying out expansion step by step through same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410116840.4A CN103944964A (en) 2014-03-27 2014-03-27 Distributed system and method carrying out expansion step by step through same

Publications (1)

Publication Number Publication Date
CN103944964A true CN103944964A (en) 2014-07-23

Family

ID=51192445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410116840.4A Pending CN103944964A (en) 2014-03-27 2014-03-27 Distributed system and method carrying out expansion step by step through same

Country Status (1)

Country Link
CN (1) CN103944964A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391989A (en) * 2014-12-16 2015-03-04 浪潮电子信息产业股份有限公司 Distributed ETL all-in-one machine system
CN106301864A (en) * 2015-06-11 2017-01-04 腾讯科技(深圳)有限公司 A kind of server system expansion method, device and dilatation processing equipment
CN106407308A (en) * 2016-08-31 2017-02-15 天津南大通用数据技术股份有限公司 Method and device for expanding capacity of distributed database
WO2017036242A1 (en) * 2015-08-31 2017-03-09 华为技术有限公司 Data processing method, apparatus, and system
CN108008913A (en) * 2016-10-27 2018-05-08 杭州海康威视数字技术股份有限公司 A kind of expansion method based on management node, device and storage system
CN111061737A (en) * 2019-12-12 2020-04-24 税友软件集团股份有限公司 Distributed database rapid capacity expansion device
CN111538718A (en) * 2020-04-22 2020-08-14 杭州宇为科技有限公司 Entity id generation and positioning method, capacity expansion method and equipment of distributed system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026199A1 (en) * 2004-07-15 2006-02-02 Mariano Crea Method and system to load information in a general purpose data warehouse database
US20110213775A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Database Table Look-up
CN102332004A (en) * 2011-07-29 2012-01-25 中国科学院计算技术研究所 Data processing method and system for managing mass data
CN102521297A (en) * 2011-11-30 2012-06-27 北京人大金仓信息技术股份有限公司 Method for achieving system dynamic expansion in shared-nothing database cluster
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026199A1 (en) * 2004-07-15 2006-02-02 Mariano Crea Method and system to load information in a general purpose data warehouse database
US20110213775A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Database Table Look-up
CN102332004A (en) * 2011-07-29 2012-01-25 中国科学院计算技术研究所 Data processing method and system for managing mass data
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN102521297A (en) * 2011-11-30 2012-06-27 北京人大金仓信息技术股份有限公司 Method for achieving system dynamic expansion in shared-nothing database cluster

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391989A (en) * 2014-12-16 2015-03-04 浪潮电子信息产业股份有限公司 Distributed ETL all-in-one machine system
CN106301864A (en) * 2015-06-11 2017-01-04 腾讯科技(深圳)有限公司 A kind of server system expansion method, device and dilatation processing equipment
CN106301864B (en) * 2015-06-11 2019-12-27 腾讯科技(深圳)有限公司 Server system capacity expansion method and device and capacity expansion processing equipment
WO2017036242A1 (en) * 2015-08-31 2017-03-09 华为技术有限公司 Data processing method, apparatus, and system
CN106407308A (en) * 2016-08-31 2017-02-15 天津南大通用数据技术股份有限公司 Method and device for expanding capacity of distributed database
CN108008913A (en) * 2016-10-27 2018-05-08 杭州海康威视数字技术股份有限公司 A kind of expansion method based on management node, device and storage system
CN108008913B (en) * 2016-10-27 2020-12-18 杭州海康威视数字技术股份有限公司 Management node-based capacity expansion method and device and storage system
CN111061737A (en) * 2019-12-12 2020-04-24 税友软件集团股份有限公司 Distributed database rapid capacity expansion device
CN111061737B (en) * 2019-12-12 2023-05-09 税友软件集团股份有限公司 Quick capacity expanding device of distributed database
CN111538718A (en) * 2020-04-22 2020-08-14 杭州宇为科技有限公司 Entity id generation and positioning method, capacity expansion method and equipment of distributed system
CN111538718B (en) * 2020-04-22 2023-10-27 杭州宇为科技有限公司 Entity id generation and positioning method, capacity expansion method and equipment of distributed system

Similar Documents

Publication Publication Date Title
CN103944964A (en) Distributed system and method carrying out expansion step by step through same
CN107544984B (en) Data processing method and device
US10452617B2 (en) Multi-level deduplication
CN104820670A (en) Method for acquiring and storing big data of power information
CN109194711B (en) Synchronization method, client, server and medium for organization architecture
CN103412916A (en) Methods and device for multi-dimensionally storing and retrieving data of monitoring system
CN101950297A (en) Method and device for storing and inquiring mass semantic data
CN105069109B (en) A kind of method and system of distributed data base dilatation
CN103235811A (en) Data storage method and device
CN111651519B (en) Data synchronization method, data synchronization device, electronic equipment and storage medium
CN111966677B (en) Data report processing method and device, electronic equipment and storage medium
CN103970852A (en) Data de-duplication method of backup server
CN102722582A (en) System and method for integrating data on basis of reverse clearing
CN103268270B (en) The management method of snapshot and device
CN103067525A (en) Cloud storage data backup method based on characteristic codes
CN103699660A (en) Large-scale network streaming data cache-write method
CN110389967A (en) Date storage method, device, server and storage medium
CN104112011A (en) Method and device for extracting mass data
CN102404411A (en) Data synchronization method of cloud storage system
CN104765651A (en) Data processing method and device
CN101303657B (en) Method of optimization of multiprocessor real-time task execution power consumption
CN102347869B (en) Method, device and system for monitoring equipment performance
CN106250501B (en) Report processing method and reporting system
CN102724290B (en) Method, device and system for getting target customer group
CN102436501A (en) Parallel file managing system based on web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
DD01 Delivery of document by public notice

Addressee: SHANGHAI CLOUDYBI INFORMATION TECHNOLOGY CO., LTD.

Document name: the First Notification of an Office Action

RJ01 Rejection of invention patent application after publication

Application publication date: 20140723

RJ01 Rejection of invention patent application after publication