CN104239493A

CN104239493A - Cross-cluster data migration method and system

Info

Publication number: CN104239493A
Application number: CN201410455695.2A
Authority: CN
Inventors: 黄刚; 何洋
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2014-09-09
Filing date: 2014-09-09
Publication date: 2014-12-24
Anticipated expiration: 2034-09-09
Also published as: CN104239493B

Abstract

An embodiment of the invention provides a cross-cluster migration method and system. According to the cross-cluster migration method and system, persistence of data inside a distributed database of a source cluster before migration can be achieved due to data operation interruption through all child nodes of the source cluster and persistence of memory data of the distributed database of the source cluster; the data transmission amount can be reduced due to compression of data tables in the distributed database of the source cluster, the compressed data tables in the distributed database of the source cluster are migrated to a target cluster, and the migration efficiency is improved; then occupied storage space and total file blocks of the data tables in the distributed database of the source cluster before migration are matched with occupied space and total file blocks of the data tables of the target cluster, after migration and accordingly the migration integrity can be verified according to a matching result.

Description

Across company-data moving method and system

Technical field

The embodiment of the present invention relates to database technical field, particularly relates to a kind of across company-data moving method and system.

Background technology

Along with the development of internet, applications, the surge of customer volume, datum number storage amount exponentially increases progressively, traditional single library storage technology cannot the access requirement of satisfying magnanimity data, HDFS (Hadoop Distributed File System, distributed file system) and Distributed Database and give birth to.

HBase (Hadoop Database, distributed data base) be a kind of distributed data base that is extendible, that store towards row, utilize HDFS system stored as a file, data are stored with the form of tables of data, the Large data table of 1,000,000,000 magnitude row, 1,000,000 magnitude row can be supported on common hardware environmental basis, and support to store at random and read operation the data of this scale.Owing to there is high reliability, enhanced scalability, support random access and supporting MapReduce (mapping abbreviation) parallel computation, be therefore widely applied.Wherein, Hadoop is a distributed system architecture developed by " Apache " foundation, user can when not understanding distributed low-level details, and exploitation distributed program, the power making full use of cluster realizes the access of high-speed computation and mass data.

In actual application, inevitably relate to Data Migration, especially when certain HBase cluster on line needs to roll off the production line, or when room management resettlement, capital faces the urgent task of mass data migration, namely the tables of data of old cluster is moved in new cluster and continues as access service side and provide mass data access service.

Existing Data Transference Technology, usually adopts the data copy assembly of Hadoop to carry out distributed copy, thus reaches the object tables of data in a cluster being moved to new cluster.After data copy completes, start new cluster related service process.

The defect that above-mentioned Data Transference Technology exists is: the integrality that cannot ensure to move rear data; Move the scale strictly depending on migration data consuming time, cause moving time used very difficult control, if inter-cluster network limited bandwidth, migration data is many again simultaneously, is difficult to ensure that the migration window of short duration completes migration work, and also namely transport efficiency is low.

Summary of the invention

The embodiment of the present invention provides a kind of across company-data moving method and system, to guarantee integrality across company-data migration and high efficiency.

First aspect, embodiments provides a kind of across company-data moving method, comprising:

Each child node that the main controlled node of source cluster calls control source cluster of ceasing and desisting order stops data manipulation;

The main controlled node of source cluster utilizes the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;

The main controlled node of source cluster controls the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;

First storage size of the HDFS shared by tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number;

The IP address of node that the main controlled node of source cluster comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrate in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;

If the Data Migration that the management of webpage interface getting the mapping abbreviation process of source cluster returns completes message, then the main controlled node of target cluster adds up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number is mated with described first storage size and described first general act block number;

If the match is successful, then the main controlled node of target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;

The main controlled node of target cluster, based on startup strategy, starts described target cluster.

Second aspect, the embodiment of the present invention additionally provides one across company-data migratory system, comprises source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node;

The main controlled node of described source cluster comprises:

Stopping modular, stops data manipulation for each child node calling control source cluster of ceasing and desisting order;

Persistence module, for utilizing the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;

Compression module, for the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;

Statistical module, for the first storage size and the first general act block number of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster;

Transferring module, for the IP address of node that comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrates in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;

The main controlled node of described target cluster comprises:

Statistical module, if the Data Migration returned for the management of webpage interface of the mapping abbreviation process getting source cluster completes message, then add up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number are mated with described first storage size and described first general act block number;

Decompression module, if for the match is successful, then adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;

Start module, for based on startup strategy, start described target cluster.

The embodiment of the present invention provide across company-data moving method and system, data manipulation is stopped by making each child node of source cluster, and by the data persistence in the internal memory of the distributed data base of source cluster, the data persistence moved in the distributed data base of front source cluster can be realized; Compressed by the tables of data in the distributed data base to source cluster, can data transfer be reduced, the tables of data after the compression in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; Then the storage size shared by tables of data in the distributed data base of the source cluster before by migration and general act block number is passed through, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.

Accompanying drawing explanation

In order to be illustrated more clearly in the present invention, introduce doing one to the accompanying drawing used required in the present invention simply below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

A kind of process flow diagram across company-data moving method that Fig. 1 provides for the embodiment of the present invention one;

A kind of process flow diagram across company-data moving method that Fig. 2 provides for the embodiment of the present invention three;

A kind of structural representation across the main controlled node of source cluster in company-data migratory system that Fig. 3 a provides for the embodiment of the present invention four;

A kind of structural representation across the main controlled node of target cluster in company-data migratory system that Fig. 3 b provides for the embodiment of the present invention four.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, be described in further detail the technical scheme in the embodiment of the present invention below in conjunction with accompanying drawing, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Be understandable that; specific embodiment described herein is only for explaining the present invention; but not limitation of the invention; based on the embodiment in the present invention; those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.

Embodiment one

Referring to Fig. 1, is a kind of process flow diagram across company-data moving method that the embodiment of the present invention one provides.The method of the embodiment of the present invention is applicable to across company-data migratory system, and this system comprises source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node.Wherein, the main controlled node of source cluster and at least one child node form HDFS, store tables of data to be migrated in the cluster of source; Main controlled node and at least one child node of target cluster also can form HDFS, for moving the tables of data in the cluster of storage source.

The method comprises:

Each child node that the main controlled node of step 110, source cluster calls control source cluster of ceasing and desisting order stops data manipulation;

This step stops data manipulation particular by by each child node of source cluster, makes the data persistence in each node before moving.Particularly, can notify business side's stopping data write that each child node is corresponding or read operation, then calling ceases and desist order makes each child node of source cluster stop data manipulation.Certainly, also directly can call to cease and desist order and make each child node of source cluster stop data manipulation.

The main controlled node of step 120, source cluster utilizes the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in HDFS;

This step is specifically by the data persistence in the distributed data base of source cluster.

Wherein said clearing buffers area assembly is used for being temporarily stored in the data persistence in the internal memory of described distributed data base in the disk of HDFS.

The main controlled node of step 130, source cluster controls the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;

This step is specifically compressed the tables of data to be migrated in the cluster of source.Particularly, by checking the compressive state of each tables of data, the tables of data compressed can be compressed, specifically can adopt LZO (Lempel-Ziv-Oberhumer) compression algorithm, SNAPPY compression algorithm or other compression algorithms.

Wherein, LZO compression algorithm is the compression algorithm that a kind of high compression ratio and decompress(ion) speed are exceedingly fast, and is Lossless Compression, the data energy accurate reproduction after also namely compressing.SNAPPY compression algorithm is a kit for compression and decompression, aims to provide high speed compression speed and rational compressibility.

Distributed data base utilizes HDFS system stored as a file, data are stored with the form of tables of data, the Large data table of 1,000,000,000 magnitude row, 1,000,000 magnitude row can be supported on common hardware environmental basis, therefore by compressing tables of data to be migrated, effectively can reduce data transmission rate, be conducive to improving data migration efficiency.

First storage size of the HDFS shared by tables of data in the distributed data base of the main controlled node Statistic Source cluster of step 140, source cluster and the first general act block number;

In this step, the tables of data that the distributed data base of source cluster comprises is as tables of data to be migrated, can be stored in the root directory of this distributed data base, distributed data base due to source cluster utilizes HDFS system stored as a file, therefore disk storage space is defined based on HDFS, this disk storage space is the summation of the disk storage space of each child node of source cluster, and the storage space of described tables of data shared by the disk storage space of described HDFS is described first storage space.

The data volume stored due to tables of data to be migrated is very large, in actual storage process, to described tables of data to be migrated employing is distributed storage, also piecemeal is carried out by described tables of data to be migrated, form multiple blocks of files, different blocks of files is stored in the disk of different child nodes of source cluster.Described first general act block number refers to the summation of the block number of the blocks of files that tables of data to be migrated is corresponding.

The IP address of node that the main controlled node of step 150, source cluster comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrate in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;

Tables of data in distributed data base in the cluster of source specifically migrates in the distributed data base of target cluster by this step.

It should be noted that, be the data access service that the distributed data base of stopping source cluster provides, and the service of the HDFS of source cluster still normally runs in step 110.

Also it should be noted that, the IP address of node comprised according to target cluster and the mapping relations of host name, the main controlled node of source cluster can find the HDFS of target cluster, thus based on described mapping relations, the tables of data that can realize in the cluster of source in distributed data base migrates in the HDFS of target cluster, distributed data base due to target cluster utilizes HDFS system stored as a file, thus the tables of data that can realize in the cluster of source in distributed data base migrates in the distributed data base of target cluster.

If the Data Migration that the management of webpage interface that step 160 gets MapReduce (mapping abbreviation) process of source cluster returns completes message, then the main controlled node of target cluster adds up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number is mated with described first storage size and described first general act block number;

This step is specifically after monitoring described tables of data and having moved, second storage size of the HDFS of the correspondence that the tables of data of first adding up the distributed data base of target cluster is occupied and the second general act block number, then described second storage size and described first storage size are compared, and the second general act block number and described first general act block number are compared.

Second storage size described in this step and described second general act block number and described first storage size and described first general act block number similar, repeat no more herein.

Wherein, MapReduce is a kind of universal programming model realizing Distributed Parallel Computing task, for the treatment of the concurrent operation of large-scale data.Concrete migration situation can be monitored by the management of webpage interface of MapReduce process, such as, real-time migration speed, move number percent, estimate excess time and the descriptor etc. of data of having moved.

If the match is successful for step 170, then the main controlled node of target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;

In this step, the match is successful refers to that first storage size of the HDFS shared by tables of data in the distributed data base of the source cluster before migration is consistent with second storage size of the HDFS of target cluster shared after migration, and general act block number before migration is consistent with the general act block number after migration, described in being also, the match is successful that the tables of data be in the distributed data base of source cluster has intactly been moved in the distributed data base of target cluster.

This step is specifically after the complete migration of tables of data, and decompress(ion) migrates to tables of data in target cluster.

The main controlled node of step 180, target cluster, based on startup strategy, starts described target cluster.

This step specifically starts described target cluster, normally works to make each node of described target cluster.

The technical scheme of the present embodiment, by making each child node of source cluster stop data manipulation, and by the data persistence in the internal memory of the distributed data base of source cluster, can realize the data persistence moved in the distributed data base of front source cluster; Compressed by the tables of data in the distributed data base to source cluster, can data transfer be reduced, the tables of data after the compression in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; Then the storage size shared by tables of data in the distributed data base of the source cluster before by migration and general act block number is passed through, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.

Embodiment two

The present embodiment, on the basis of above-described embodiment, before first storage size of the HDFS shared by the tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number, also comprises:

The main controlled node of source cluster utilizes the complete file merge module of the distributed data base of source cluster, removes in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.

This step, specifically after compressing tables of data to be migrated, removes the tables of data of the inefficacy in the disk storage space of the distributed data base of described source cluster, to reduce the data volume of migration further, improves transport efficiency.

Wherein, described default removing strategy can have multiple implementation, such as, comprise following at least one:

Using the tables of data with deletion mark in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;

Using the tables of data reaching life span in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;

It should be noted that, the life span of tables of data can be pre-set as required, eliminate expired tables of data according to the life span of tables of data.For e-commerce platform, usually, the life span of the tables of data of correspondence can be set according to the duration of advertising campaign, such as, certain period in 30 days, 7 days or specific some days, if this sky of shop-establishment celebration is from point in morning 10 to evening 10, at the end of shop-establishment celebration, life span is expired tables of data for the tables of data of this shop-establishment celebration, by removing expired tables of data, be conducive to saving storage space and improving transport efficiency.

Maximum version number in the disk storage space of the distributed data base of described source cluster is greater than the tables of data of threshold value as tables of data to be cleaned.

It should be noted that, the maximum version number of tables of data can be pre-set as required, be usually set to 3.1 can be set to for renewal than tables of data more frequently, thus can eliminate the tables of data lost efficacy rapidly, be conducive to saving storage space and raising transport efficiency.

The technical scheme of the present embodiment, after data in the distributed data base to migration front source cluster carry out persistence, tables of data to be migrated is compressed, the data volume transmitted can be reduced, and by the tables of data of the inefficacy in the disk storage space of the distributed data base of removing described source cluster, the data volume of migration can be reduced further, the tables of data through overcompression and clear operation in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; By by migration before source cluster distributed data base tables of data shared by storage size and general act block number, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.

In such scheme, clearing buffers area assembly and complete file merge module can be triggered by the distributed data base command line interface calling source cluster.

Wherein, command line interface is the interactive interface of operating system and user.In (SuSE) Linux OS, title command line interface is shell, and its effect, mainly for user provides service, as received the input data from keyboard, or shows execution result etc. on screen.

In such scheme, adopt before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster at the main controlled node of target cluster, also preferably include:

Consistency detection assembly in the main controlled node invocation target cluster of target cluster, the consistance of the tables of data that the distributed data base detecting target cluster comprises;

If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster.

It should be noted that, the consistance detecting tables of data refers to that whether the attribute information of the tables of data detecting necessary being in the descriptor of tables of data and the HDFS of target cluster is consistent.If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster; If inconsistent, then described consistency detection assembly is utilized to repair.This step be carry out tables of data migration integrity verification after supplementary checking, the consistance of the tables of data migrated in target cluster can be improved.

Embodiment three

Referring to Fig. 2, is a kind of process flow diagram across company-data moving method that the embodiment of the present invention three provides.The present embodiment is on the basis of the various embodiments described above, and the main controlled node providing target cluster, based on startup strategy, starts the preferred version of described target cluster.This method for optimizing comprises:

The main controlled node of step 210, target cluster calls startup command and starts target cluster;

If there is not error-logging information or warning log information in the journal file of the distributed data base association of step 220 target cluster, holistic health degree inspection assembly in the distributed data base of the then main controlled node invocation target cluster of target cluster, checks the holistic health degree of target cluster;

This step specifically checks the journal file associated with distributed data base in the node that target cluster comprises, if there is error-logging information or warning log information, then deal with problems according to the associated component of the distributed data base of prompting invocation target cluster; If there is no error-logging information or warning log information, then utilize the health degree inspection assembly of the distributed data base of target cluster, check the holistic health degree of target cluster.

Wherein, check that the holistic health degree of target cluster comprises the tables of data of checking target cluster and whether is in normal condition.

The state of described tables of data is set to enabled state by the command line interface that the main controlled node of step 230, target cluster calls distributed data base.

This step, specifically according to the testing result of the holistic health degree of the target cluster in step 220, makes the state of the tables of data in target cluster maintain the normal condition of " enable ".

The technical scheme of the present embodiment, by after startup target cluster, check the journal file associated with distributed data base in the node that target cluster comprises, if there is error-logging information or warning log information, then deal with problems according to the associated component of the distributed data base of prompting invocation target cluster; If there is no error-logging information or warning log information, then utilize the health degree inspection assembly of the distributed data base of target cluster, check the holistic health degree of target cluster; And the testing result of the holistic health degree of based target cluster, make the state of the tables of data in target cluster maintain the normal condition of " enable ", thus enable the tables of data in target cluster provide normal access service.

In such scheme, at the distributed database management page by target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, the command line interface that then main controlled node of target cluster calls distributed data base also preferably includes after the state of described tables of data is set to enabled state:

The IP address of the node that target cluster is comprised and the mapping relations of Hostname, and the link information in the distributed data base of target cluster is sent to business side, and notify that the data, services of described business side to target cluster is verified.

Embodiment four

Refer to Fig. 3 a and Fig. 3 b.The embodiment of the present invention four provides one across company-data migratory system, and this system comprises: source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node.

The main controlled node of described source cluster comprises: stopping modular 310, persistence module 320, compression module 330, statistical module 340 and transferring module 350.

The main controlled node of described target cluster comprises: statistical module 360, decompression module 370 and startup module 380.

Wherein, stopping modular 310 stops data manipulation for each child node calling control source cluster of ceasing and desisting order; Persistence module 320 for utilizing the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in HDFS; The tables of data of compression module 330 for comprising the distributed data base of source cluster, adopts the compression algorithm of setting to compress; Statistical module 340 is for first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number; Tables of data in distributed data base in the cluster of source, for the IP address of node that comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrates in the distributed data base of described target cluster by transferring module 350;

Wherein, if the Data Migration that statistical module 360 returns for the management of webpage interface of the mapping abbreviation process getting source cluster completes message, then add up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number are mated with described first storage size and described first general act block number; If decompression module 370 is for the match is successful, then the decompression algorithm corresponding with the compression algorithm of described setting is adopted to carry out decompress(ion) to the tables of data migrated in target cluster; Start module 380 for based on startup strategy, start described target cluster.

In such scheme, the main controlled node of described source cluster also preferably includes: remove module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, utilize the complete file merge module of the distributed data base of source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.

In such scheme, described default removing strategy comprises following at least one item:

In such scheme, described persistence module 310 triggers clearing buffers area assembly, by the data persistence in described distributed data base internal memory in distributed file system HDFS specifically for the distributed data base command line interface by calling source cluster; Described removing module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, trigger complete file merge module by the distributed data base command line interface calling source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.

In such scheme, the main controlled node of described target cluster also preferably includes: consistency detection module, for adopting before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster, consistency detection assembly in invocation target cluster, the consistance of the tables of data that the distributed data base detecting target cluster comprises; If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster.

In such scheme, described startup module 390 preferably includes: start unit, holistic health degree detecting unit and tables of data state set unit.

Wherein, start unit is used for calling startup command startup target cluster; If there is not error-logging information or warning log information in the journal file that holistic health degree detecting unit associates for the distributed data base of target cluster, holistic health degree inspection assembly in the distributed data base of then invocation target cluster, checks the holistic health degree of target cluster; The state of described tables of data is set to enabled state for the command line interface calling distributed data base by tables of data state set unit.

In such scheme, described startup module 390 can also comprise: data service authentication unit, for by the distributed database management page of target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, after the state of described tables of data is set to enabled state by the command line interface then calling distributed data base, the IP address of the node that target cluster is comprised and the mapping relations of Hostname, and the link information in the distributed data base of target cluster is sent to business side, and notify that the data, services of described business side to target cluster is verified.

The embodiment of the present invention provide across the main controlled node of source cluster in company-data migratory system and the main controlled node of target cluster can perform that any embodiment of the present invention provides across company-data moving method, possess the corresponding functional module of manner of execution and beneficial effect.

Last it is noted that above each embodiment is only for illustration of technical scheme of the present invention, but not be limited; In embodiment preferred embodiment, be not limited, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. across a company-data moving method, it is characterized in that, comprising:

2. method according to claim 1, is characterized in that, before first storage size of the HDFS shared by the tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number, also comprises:

3. method according to claim 2, is characterized in that, described default removing strategy comprises following at least one item:

4. method according to claim 2, is characterized in that, triggers clearing buffers area assembly and complete file merge module by the distributed data base command line interface calling source cluster.

5. method according to claim 1, is characterized in that, adopts before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster, also comprise at the main controlled node of target cluster:

6. according to the arbitrary described method of claim 1-5, it is characterized in that, the main controlled node of target cluster, based on startup strategy, starts described target cluster, comprising:

The main controlled node of target cluster calls startup command and starts target cluster;

If there is not error-logging information or warning log information in the journal file of the distributed data base association of target cluster, holistic health degree inspection assembly in the distributed data base of the then main controlled node invocation target cluster of target cluster, checks the holistic health degree of target cluster;

The state of described tables of data is set to enabled state by the command line interface that the main controlled node of target cluster calls distributed data base.

7. method according to claim 6, it is characterized in that, at the distributed database management page by target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, the command line interface that then main controlled node of target cluster calls distributed data base also comprises after the state of described tables of data is set to enabled state:

8. across a company-data migratory system, comprise source cluster and target cluster, described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node, it is characterized in that:

The main controlled node of described source cluster comprises:

The main controlled node of described target cluster comprises:

Start module, for based on startup strategy, start described target cluster.

9. system according to claim 8, is characterized in that, the main controlled node of described source cluster also comprises:

Remove module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, utilize the complete file merge module of the distributed data base of source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.

10. system according to claim 9, is characterized in that, described default removing strategy comprises following at least one item: