CN104239493A - Cross-cluster data migration method and system - Google Patents

Cross-cluster data migration method and system Download PDF

Info

Publication number
CN104239493A
CN104239493A CN201410455695.2A CN201410455695A CN104239493A CN 104239493 A CN104239493 A CN 104239493A CN 201410455695 A CN201410455695 A CN 201410455695A CN 104239493 A CN104239493 A CN 104239493A
Authority
CN
China
Prior art keywords
cluster
data
tables
data base
target cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410455695.2A
Other languages
Chinese (zh)
Other versions
CN104239493B (en
Inventor
黄刚
何洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201410455695.2A priority Critical patent/CN104239493B/en
Publication of CN104239493A publication Critical patent/CN104239493A/en
Application granted granted Critical
Publication of CN104239493B publication Critical patent/CN104239493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention provides a cross-cluster migration method and system. According to the cross-cluster migration method and system, persistence of data inside a distributed database of a source cluster before migration can be achieved due to data operation interruption through all child nodes of the source cluster and persistence of memory data of the distributed database of the source cluster; the data transmission amount can be reduced due to compression of data tables in the distributed database of the source cluster, the compressed data tables in the distributed database of the source cluster are migrated to a target cluster, and the migration efficiency is improved; then occupied storage space and total file blocks of the data tables in the distributed database of the source cluster before migration are matched with occupied space and total file blocks of the data tables of the target cluster, after migration and accordingly the migration integrity can be verified according to a matching result.

Description

Across company-data moving method and system
Technical field
The embodiment of the present invention relates to database technical field, particularly relates to a kind of across company-data moving method and system.
Background technology
Along with the development of internet, applications, the surge of customer volume, datum number storage amount exponentially increases progressively, traditional single library storage technology cannot the access requirement of satisfying magnanimity data, HDFS (Hadoop Distributed File System, distributed file system) and Distributed Database and give birth to.
HBase (Hadoop Database, distributed data base) be a kind of distributed data base that is extendible, that store towards row, utilize HDFS system stored as a file, data are stored with the form of tables of data, the Large data table of 1,000,000,000 magnitude row, 1,000,000 magnitude row can be supported on common hardware environmental basis, and support to store at random and read operation the data of this scale.Owing to there is high reliability, enhanced scalability, support random access and supporting MapReduce (mapping abbreviation) parallel computation, be therefore widely applied.Wherein, Hadoop is a distributed system architecture developed by " Apache " foundation, user can when not understanding distributed low-level details, and exploitation distributed program, the power making full use of cluster realizes the access of high-speed computation and mass data.
In actual application, inevitably relate to Data Migration, especially when certain HBase cluster on line needs to roll off the production line, or when room management resettlement, capital faces the urgent task of mass data migration, namely the tables of data of old cluster is moved in new cluster and continues as access service side and provide mass data access service.
Existing Data Transference Technology, usually adopts the data copy assembly of Hadoop to carry out distributed copy, thus reaches the object tables of data in a cluster being moved to new cluster.After data copy completes, start new cluster related service process.
The defect that above-mentioned Data Transference Technology exists is: the integrality that cannot ensure to move rear data; Move the scale strictly depending on migration data consuming time, cause moving time used very difficult control, if inter-cluster network limited bandwidth, migration data is many again simultaneously, is difficult to ensure that the migration window of short duration completes migration work, and also namely transport efficiency is low.
Summary of the invention
The embodiment of the present invention provides a kind of across company-data moving method and system, to guarantee integrality across company-data migration and high efficiency.
First aspect, embodiments provides a kind of across company-data moving method, comprising:
Each child node that the main controlled node of source cluster calls control source cluster of ceasing and desisting order stops data manipulation;
The main controlled node of source cluster utilizes the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;
The main controlled node of source cluster controls the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;
First storage size of the HDFS shared by tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number;
The IP address of node that the main controlled node of source cluster comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrate in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;
If the Data Migration that the management of webpage interface getting the mapping abbreviation process of source cluster returns completes message, then the main controlled node of target cluster adds up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number is mated with described first storage size and described first general act block number;
If the match is successful, then the main controlled node of target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;
The main controlled node of target cluster, based on startup strategy, starts described target cluster.
Second aspect, the embodiment of the present invention additionally provides one across company-data migratory system, comprises source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node;
The main controlled node of described source cluster comprises:
Stopping modular, stops data manipulation for each child node calling control source cluster of ceasing and desisting order;
Persistence module, for utilizing the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;
Compression module, for the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;
Statistical module, for the first storage size and the first general act block number of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster;
Transferring module, for the IP address of node that comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrates in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;
The main controlled node of described target cluster comprises:
Statistical module, if the Data Migration returned for the management of webpage interface of the mapping abbreviation process getting source cluster completes message, then add up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number are mated with described first storage size and described first general act block number;
Decompression module, if for the match is successful, then adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;
Start module, for based on startup strategy, start described target cluster.
The embodiment of the present invention provide across company-data moving method and system, data manipulation is stopped by making each child node of source cluster, and by the data persistence in the internal memory of the distributed data base of source cluster, the data persistence moved in the distributed data base of front source cluster can be realized; Compressed by the tables of data in the distributed data base to source cluster, can data transfer be reduced, the tables of data after the compression in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; Then the storage size shared by tables of data in the distributed data base of the source cluster before by migration and general act block number is passed through, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention, introduce doing one to the accompanying drawing used required in the present invention simply below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
A kind of process flow diagram across company-data moving method that Fig. 1 provides for the embodiment of the present invention one;
A kind of process flow diagram across company-data moving method that Fig. 2 provides for the embodiment of the present invention three;
A kind of structural representation across the main controlled node of source cluster in company-data migratory system that Fig. 3 a provides for the embodiment of the present invention four;
A kind of structural representation across the main controlled node of target cluster in company-data migratory system that Fig. 3 b provides for the embodiment of the present invention four.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, be described in further detail the technical scheme in the embodiment of the present invention below in conjunction with accompanying drawing, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Be understandable that; specific embodiment described herein is only for explaining the present invention; but not limitation of the invention; based on the embodiment in the present invention; those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Embodiment one
Referring to Fig. 1, is a kind of process flow diagram across company-data moving method that the embodiment of the present invention one provides.The method of the embodiment of the present invention is applicable to across company-data migratory system, and this system comprises source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node.Wherein, the main controlled node of source cluster and at least one child node form HDFS, store tables of data to be migrated in the cluster of source; Main controlled node and at least one child node of target cluster also can form HDFS, for moving the tables of data in the cluster of storage source.
The method comprises:
Each child node that the main controlled node of step 110, source cluster calls control source cluster of ceasing and desisting order stops data manipulation;
This step stops data manipulation particular by by each child node of source cluster, makes the data persistence in each node before moving.Particularly, can notify business side's stopping data write that each child node is corresponding or read operation, then calling ceases and desist order makes each child node of source cluster stop data manipulation.Certainly, also directly can call to cease and desist order and make each child node of source cluster stop data manipulation.
The main controlled node of step 120, source cluster utilizes the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in HDFS;
This step is specifically by the data persistence in the distributed data base of source cluster.
Wherein said clearing buffers area assembly is used for being temporarily stored in the data persistence in the internal memory of described distributed data base in the disk of HDFS.
The main controlled node of step 130, source cluster controls the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;
This step is specifically compressed the tables of data to be migrated in the cluster of source.Particularly, by checking the compressive state of each tables of data, the tables of data compressed can be compressed, specifically can adopt LZO (Lempel-Ziv-Oberhumer) compression algorithm, SNAPPY compression algorithm or other compression algorithms.
Wherein, LZO compression algorithm is the compression algorithm that a kind of high compression ratio and decompress(ion) speed are exceedingly fast, and is Lossless Compression, the data energy accurate reproduction after also namely compressing.SNAPPY compression algorithm is a kit for compression and decompression, aims to provide high speed compression speed and rational compressibility.
Distributed data base utilizes HDFS system stored as a file, data are stored with the form of tables of data, the Large data table of 1,000,000,000 magnitude row, 1,000,000 magnitude row can be supported on common hardware environmental basis, therefore by compressing tables of data to be migrated, effectively can reduce data transmission rate, be conducive to improving data migration efficiency.
First storage size of the HDFS shared by tables of data in the distributed data base of the main controlled node Statistic Source cluster of step 140, source cluster and the first general act block number;
In this step, the tables of data that the distributed data base of source cluster comprises is as tables of data to be migrated, can be stored in the root directory of this distributed data base, distributed data base due to source cluster utilizes HDFS system stored as a file, therefore disk storage space is defined based on HDFS, this disk storage space is the summation of the disk storage space of each child node of source cluster, and the storage space of described tables of data shared by the disk storage space of described HDFS is described first storage space.
The data volume stored due to tables of data to be migrated is very large, in actual storage process, to described tables of data to be migrated employing is distributed storage, also piecemeal is carried out by described tables of data to be migrated, form multiple blocks of files, different blocks of files is stored in the disk of different child nodes of source cluster.Described first general act block number refers to the summation of the block number of the blocks of files that tables of data to be migrated is corresponding.
The IP address of node that the main controlled node of step 150, source cluster comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrate in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;
Tables of data in distributed data base in the cluster of source specifically migrates in the distributed data base of target cluster by this step.
It should be noted that, be the data access service that the distributed data base of stopping source cluster provides, and the service of the HDFS of source cluster still normally runs in step 110.
Also it should be noted that, the IP address of node comprised according to target cluster and the mapping relations of host name, the main controlled node of source cluster can find the HDFS of target cluster, thus based on described mapping relations, the tables of data that can realize in the cluster of source in distributed data base migrates in the HDFS of target cluster, distributed data base due to target cluster utilizes HDFS system stored as a file, thus the tables of data that can realize in the cluster of source in distributed data base migrates in the distributed data base of target cluster.
If the Data Migration that the management of webpage interface that step 160 gets MapReduce (mapping abbreviation) process of source cluster returns completes message, then the main controlled node of target cluster adds up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number is mated with described first storage size and described first general act block number;
This step is specifically after monitoring described tables of data and having moved, second storage size of the HDFS of the correspondence that the tables of data of first adding up the distributed data base of target cluster is occupied and the second general act block number, then described second storage size and described first storage size are compared, and the second general act block number and described first general act block number are compared.
Second storage size described in this step and described second general act block number and described first storage size and described first general act block number similar, repeat no more herein.
Wherein, MapReduce is a kind of universal programming model realizing Distributed Parallel Computing task, for the treatment of the concurrent operation of large-scale data.Concrete migration situation can be monitored by the management of webpage interface of MapReduce process, such as, real-time migration speed, move number percent, estimate excess time and the descriptor etc. of data of having moved.
If the match is successful for step 170, then the main controlled node of target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;
In this step, the match is successful refers to that first storage size of the HDFS shared by tables of data in the distributed data base of the source cluster before migration is consistent with second storage size of the HDFS of target cluster shared after migration, and general act block number before migration is consistent with the general act block number after migration, described in being also, the match is successful that the tables of data be in the distributed data base of source cluster has intactly been moved in the distributed data base of target cluster.
This step is specifically after the complete migration of tables of data, and decompress(ion) migrates to tables of data in target cluster.
The main controlled node of step 180, target cluster, based on startup strategy, starts described target cluster.
This step specifically starts described target cluster, normally works to make each node of described target cluster.
The technical scheme of the present embodiment, by making each child node of source cluster stop data manipulation, and by the data persistence in the internal memory of the distributed data base of source cluster, can realize the data persistence moved in the distributed data base of front source cluster; Compressed by the tables of data in the distributed data base to source cluster, can data transfer be reduced, the tables of data after the compression in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; Then the storage size shared by tables of data in the distributed data base of the source cluster before by migration and general act block number is passed through, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.
Embodiment two
The present embodiment, on the basis of above-described embodiment, before first storage size of the HDFS shared by the tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number, also comprises:
The main controlled node of source cluster utilizes the complete file merge module of the distributed data base of source cluster, removes in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.
This step, specifically after compressing tables of data to be migrated, removes the tables of data of the inefficacy in the disk storage space of the distributed data base of described source cluster, to reduce the data volume of migration further, improves transport efficiency.
Wherein, described default removing strategy can have multiple implementation, such as, comprise following at least one:
Using the tables of data with deletion mark in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Using the tables of data reaching life span in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
It should be noted that, the life span of tables of data can be pre-set as required, eliminate expired tables of data according to the life span of tables of data.For e-commerce platform, usually, the life span of the tables of data of correspondence can be set according to the duration of advertising campaign, such as, certain period in 30 days, 7 days or specific some days, if this sky of shop-establishment celebration is from point in morning 10 to evening 10, at the end of shop-establishment celebration, life span is expired tables of data for the tables of data of this shop-establishment celebration, by removing expired tables of data, be conducive to saving storage space and improving transport efficiency.
Maximum version number in the disk storage space of the distributed data base of described source cluster is greater than the tables of data of threshold value as tables of data to be cleaned.
It should be noted that, the maximum version number of tables of data can be pre-set as required, be usually set to 3.1 can be set to for renewal than tables of data more frequently, thus can eliminate the tables of data lost efficacy rapidly, be conducive to saving storage space and raising transport efficiency.
The technical scheme of the present embodiment, after data in the distributed data base to migration front source cluster carry out persistence, tables of data to be migrated is compressed, the data volume transmitted can be reduced, and by the tables of data of the inefficacy in the disk storage space of the distributed data base of removing described source cluster, the data volume of migration can be reduced further, the tables of data through overcompression and clear operation in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; By by migration before source cluster distributed data base tables of data shared by storage size and general act block number, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.
In such scheme, clearing buffers area assembly and complete file merge module can be triggered by the distributed data base command line interface calling source cluster.
Wherein, command line interface is the interactive interface of operating system and user.In (SuSE) Linux OS, title command line interface is shell, and its effect, mainly for user provides service, as received the input data from keyboard, or shows execution result etc. on screen.
In such scheme, adopt before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster at the main controlled node of target cluster, also preferably include:
Consistency detection assembly in the main controlled node invocation target cluster of target cluster, the consistance of the tables of data that the distributed data base detecting target cluster comprises;
If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster.
It should be noted that, the consistance detecting tables of data refers to that whether the attribute information of the tables of data detecting necessary being in the descriptor of tables of data and the HDFS of target cluster is consistent.If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster; If inconsistent, then described consistency detection assembly is utilized to repair.This step be carry out tables of data migration integrity verification after supplementary checking, the consistance of the tables of data migrated in target cluster can be improved.
Embodiment three
Referring to Fig. 2, is a kind of process flow diagram across company-data moving method that the embodiment of the present invention three provides.The present embodiment is on the basis of the various embodiments described above, and the main controlled node providing target cluster, based on startup strategy, starts the preferred version of described target cluster.This method for optimizing comprises:
The main controlled node of step 210, target cluster calls startup command and starts target cluster;
If there is not error-logging information or warning log information in the journal file of the distributed data base association of step 220 target cluster, holistic health degree inspection assembly in the distributed data base of the then main controlled node invocation target cluster of target cluster, checks the holistic health degree of target cluster;
This step specifically checks the journal file associated with distributed data base in the node that target cluster comprises, if there is error-logging information or warning log information, then deal with problems according to the associated component of the distributed data base of prompting invocation target cluster; If there is no error-logging information or warning log information, then utilize the health degree inspection assembly of the distributed data base of target cluster, check the holistic health degree of target cluster.
Wherein, check that the holistic health degree of target cluster comprises the tables of data of checking target cluster and whether is in normal condition.
The state of described tables of data is set to enabled state by the command line interface that the main controlled node of step 230, target cluster calls distributed data base.
This step, specifically according to the testing result of the holistic health degree of the target cluster in step 220, makes the state of the tables of data in target cluster maintain the normal condition of " enable ".
The technical scheme of the present embodiment, by after startup target cluster, check the journal file associated with distributed data base in the node that target cluster comprises, if there is error-logging information or warning log information, then deal with problems according to the associated component of the distributed data base of prompting invocation target cluster; If there is no error-logging information or warning log information, then utilize the health degree inspection assembly of the distributed data base of target cluster, check the holistic health degree of target cluster; And the testing result of the holistic health degree of based target cluster, make the state of the tables of data in target cluster maintain the normal condition of " enable ", thus enable the tables of data in target cluster provide normal access service.
In such scheme, at the distributed database management page by target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, the command line interface that then main controlled node of target cluster calls distributed data base also preferably includes after the state of described tables of data is set to enabled state:
The IP address of the node that target cluster is comprised and the mapping relations of Hostname, and the link information in the distributed data base of target cluster is sent to business side, and notify that the data, services of described business side to target cluster is verified.
Embodiment four
Refer to Fig. 3 a and Fig. 3 b.The embodiment of the present invention four provides one across company-data migratory system, and this system comprises: source cluster and target cluster, and described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node.
The main controlled node of described source cluster comprises: stopping modular 310, persistence module 320, compression module 330, statistical module 340 and transferring module 350.
The main controlled node of described target cluster comprises: statistical module 360, decompression module 370 and startup module 380.
Wherein, stopping modular 310 stops data manipulation for each child node calling control source cluster of ceasing and desisting order; Persistence module 320 for utilizing the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in HDFS; The tables of data of compression module 330 for comprising the distributed data base of source cluster, adopts the compression algorithm of setting to compress; Statistical module 340 is for first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number; Tables of data in distributed data base in the cluster of source, for the IP address of node that comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrates in the distributed data base of described target cluster by transferring module 350;
Wherein, if the Data Migration that statistical module 360 returns for the management of webpage interface of the mapping abbreviation process getting source cluster completes message, then add up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number are mated with described first storage size and described first general act block number; If decompression module 370 is for the match is successful, then the decompression algorithm corresponding with the compression algorithm of described setting is adopted to carry out decompress(ion) to the tables of data migrated in target cluster; Start module 380 for based on startup strategy, start described target cluster.
The technical scheme of the present embodiment, by making each child node of source cluster stop data manipulation, and by the data persistence in the internal memory of the distributed data base of source cluster, can realize the data persistence moved in the distributed data base of front source cluster; Compressed by the tables of data in the distributed data base to source cluster, can data transfer be reduced, the tables of data after the compression in the distributed data base of source cluster is migrated in target cluster, improves transport efficiency; Then the storage size shared by tables of data in the distributed data base of the source cluster before by migration and general act block number is passed through, the storage size of occupying with the tables of data of the target cluster after migration and general act block number mate, can according to the integrality of matching result checking migration.
In such scheme, the main controlled node of described source cluster also preferably includes: remove module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, utilize the complete file merge module of the distributed data base of source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.
In such scheme, described default removing strategy comprises following at least one item:
Using the tables of data with deletion mark in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Using the tables of data reaching life span in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Maximum version number in the disk storage space of the distributed data base of described source cluster is greater than the tables of data of threshold value as tables of data to be cleaned.
In such scheme, described persistence module 310 triggers clearing buffers area assembly, by the data persistence in described distributed data base internal memory in distributed file system HDFS specifically for the distributed data base command line interface by calling source cluster; Described removing module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, trigger complete file merge module by the distributed data base command line interface calling source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.
In such scheme, the main controlled node of described target cluster also preferably includes: consistency detection module, for adopting before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster, consistency detection assembly in invocation target cluster, the consistance of the tables of data that the distributed data base detecting target cluster comprises; If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster.
In such scheme, described startup module 390 preferably includes: start unit, holistic health degree detecting unit and tables of data state set unit.
Wherein, start unit is used for calling startup command startup target cluster; If there is not error-logging information or warning log information in the journal file that holistic health degree detecting unit associates for the distributed data base of target cluster, holistic health degree inspection assembly in the distributed data base of then invocation target cluster, checks the holistic health degree of target cluster; The state of described tables of data is set to enabled state for the command line interface calling distributed data base by tables of data state set unit.
In such scheme, described startup module 390 can also comprise: data service authentication unit, for by the distributed database management page of target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, after the state of described tables of data is set to enabled state by the command line interface then calling distributed data base, the IP address of the node that target cluster is comprised and the mapping relations of Hostname, and the link information in the distributed data base of target cluster is sent to business side, and notify that the data, services of described business side to target cluster is verified.
The embodiment of the present invention provide across the main controlled node of source cluster in company-data migratory system and the main controlled node of target cluster can perform that any embodiment of the present invention provides across company-data moving method, possess the corresponding functional module of manner of execution and beneficial effect.
Last it is noted that above each embodiment is only for illustration of technical scheme of the present invention, but not be limited; In embodiment preferred embodiment, be not limited, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. across a company-data moving method, it is characterized in that, comprising:
Each child node that the main controlled node of source cluster calls control source cluster of ceasing and desisting order stops data manipulation;
The main controlled node of source cluster utilizes the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;
The main controlled node of source cluster controls the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;
First storage size of the HDFS shared by tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number;
The IP address of node that the main controlled node of source cluster comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrate in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;
If the Data Migration that the management of webpage interface getting the mapping abbreviation process of source cluster returns completes message, then the main controlled node of target cluster adds up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number is mated with described first storage size and described first general act block number;
If the match is successful, then the main controlled node of target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;
The main controlled node of target cluster, based on startup strategy, starts described target cluster.
2. method according to claim 1, is characterized in that, before first storage size of the HDFS shared by the tables of data in the distributed data base of the main controlled node Statistic Source cluster of source cluster and the first general act block number, also comprises:
The main controlled node of source cluster utilizes the complete file merge module of the distributed data base of source cluster, removes in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.
3. method according to claim 2, is characterized in that, described default removing strategy comprises following at least one item:
Using the tables of data with deletion mark in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Using the tables of data reaching life span in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Maximum version number in the disk storage space of the distributed data base of described source cluster is greater than the tables of data of threshold value as tables of data to be cleaned.
4. method according to claim 2, is characterized in that, triggers clearing buffers area assembly and complete file merge module by the distributed data base command line interface calling source cluster.
5. method according to claim 1, is characterized in that, adopts before the decompression algorithm corresponding with the compression algorithm of described setting carry out decompress(ion) to the tables of data migrated in target cluster, also comprise at the main controlled node of target cluster:
Consistency detection assembly in the main controlled node invocation target cluster of target cluster, the consistance of the tables of data that the distributed data base detecting target cluster comprises;
If consistent, then the main controlled node of trigger target cluster adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster.
6. according to the arbitrary described method of claim 1-5, it is characterized in that, the main controlled node of target cluster, based on startup strategy, starts described target cluster, comprising:
The main controlled node of target cluster calls startup command and starts target cluster;
If there is not error-logging information or warning log information in the journal file of the distributed data base association of target cluster, holistic health degree inspection assembly in the distributed data base of the then main controlled node invocation target cluster of target cluster, checks the holistic health degree of target cluster;
The state of described tables of data is set to enabled state by the command line interface that the main controlled node of target cluster calls distributed data base.
7. method according to claim 6, it is characterized in that, at the distributed database management page by target cluster, if the tables of data that the distributed data base of target cluster comprises is not in enabled state, the command line interface that then main controlled node of target cluster calls distributed data base also comprises after the state of described tables of data is set to enabled state:
The IP address of the node that target cluster is comprised and the mapping relations of Hostname, and the link information in the distributed data base of target cluster is sent to business side, and notify that the data, services of described business side to target cluster is verified.
8. across a company-data migratory system, comprise source cluster and target cluster, described source cluster comprises main controlled node and at least one child node, and described target cluster comprises main controlled node and at least one child node, it is characterized in that:
The main controlled node of described source cluster comprises:
Stopping modular, stops data manipulation for each child node calling control source cluster of ceasing and desisting order;
Persistence module, for utilizing the clearing buffers area assembly of the distributed data base of source cluster, by the data persistence in described distributed data base internal memory in distributed file system HDFS;
Compression module, for the tables of data comprised the distributed data base of source cluster, adopts the compression algorithm of setting to compress;
Statistical module, for the first storage size and the first general act block number of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster;
Transferring module, for the IP address of node that comprises based on the target cluster obtained in advance and the mapping relations of Hostname, migrates in the distributed data base of described target cluster by the tables of data in distributed data base in the cluster of source;
The main controlled node of described target cluster comprises:
Statistical module, if the Data Migration returned for the management of webpage interface of the mapping abbreviation process getting source cluster completes message, then add up the second storage size and the second general act block number of the HDFS of the correspondence that the tables of data in the distributed data base of target cluster is occupied, and described second storage size and the second general act block number are mated with described first storage size and described first general act block number;
Decompression module, if for the match is successful, then adopts the decompression algorithm corresponding with the compression algorithm of described setting to carry out decompress(ion) to the tables of data migrated in target cluster;
Start module, for based on startup strategy, start described target cluster.
9. system according to claim 8, is characterized in that, the main controlled node of described source cluster also comprises:
Remove module, before first storage size of the HDFS shared by the tables of data in the distributed data base of Statistic Source cluster and the first general act block number, utilize the complete file merge module of the distributed data base of source cluster, remove in the disk storage space of the distributed data base of described source cluster the tables of data meeting and preset and remove strategy.
10. system according to claim 9, is characterized in that, described default removing strategy comprises following at least one item:
Using the tables of data with deletion mark in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Using the tables of data reaching life span in the disk storage space of the distributed data base of described source cluster as tables of data to be cleaned;
Maximum version number in the disk storage space of the distributed data base of described source cluster is greater than the tables of data of threshold value as tables of data to be cleaned.
CN201410455695.2A 2014-09-09 2014-09-09 cross-cluster data migration method and system Active CN104239493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410455695.2A CN104239493B (en) 2014-09-09 2014-09-09 cross-cluster data migration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410455695.2A CN104239493B (en) 2014-09-09 2014-09-09 cross-cluster data migration method and system

Publications (2)

Publication Number Publication Date
CN104239493A true CN104239493A (en) 2014-12-24
CN104239493B CN104239493B (en) 2017-05-10

Family

ID=52227552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410455695.2A Active CN104239493B (en) 2014-09-09 2014-09-09 cross-cluster data migration method and system

Country Status (1)

Country Link
CN (1) CN104239493B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069128A (en) * 2015-08-14 2015-11-18 北京京东尚科信息技术有限公司 Data synchronization method and apparatus
CN105159970A (en) * 2015-08-25 2015-12-16 浪潮(北京)电子信息产业有限公司 Database data migrating system and method
CN105808612A (en) * 2014-12-31 2016-07-27 北京嘀嘀无限科技发展有限公司 Method and equipment used for migrating data of database
CN106484379A (en) * 2015-08-28 2017-03-08 华为技术有限公司 A kind of processing method and processing device of application
CN106777164A (en) * 2016-12-20 2017-05-31 东软集团股份有限公司 A kind of Data Migration cluster and data migration method
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data
CN107016075A (en) * 2017-03-27 2017-08-04 聚好看科技股份有限公司 Company-data synchronous method and device
CN107515782A (en) * 2017-07-26 2017-12-26 北京天云融创软件技术有限公司 Implementation method of the container across host migration under a kind of Docker environment
CN107704633A (en) * 2017-11-01 2018-02-16 郑州云海信息技术有限公司 A kind of method and system of file migration
CN108021585A (en) * 2016-10-28 2018-05-11 腾讯科技(深圳)有限公司 Distributed data storage method and device
CN108234566A (en) * 2016-12-21 2018-06-29 阿里巴巴集团控股有限公司 The data processing method and device of a kind of cluster
CN109376010A (en) * 2018-09-28 2019-02-22 上海思询信息科技有限公司 A method of across cluster resource migration is realized based on Openstack
CN109544072A (en) * 2018-11-21 2019-03-29 北京京东尚科信息技术有限公司 Method, system, equipment and medium are reduced in hot spot inventory localization
CN109542882A (en) * 2018-12-05 2019-03-29 南京中孚信息技术有限公司 A kind of database migration method and device
CN109818794A (en) * 2019-01-31 2019-05-28 北京搜狐互联网信息服务有限公司 Cluster moving method and tool
CN110209731A (en) * 2019-04-25 2019-09-06 深圳壹账通智能科技有限公司 Method of data synchronization, device and storage medium, electronic device
CN110263044A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Date storage method, device, equipment and computer readable storage medium
CN110704540A (en) * 2019-10-10 2020-01-17 云南中烟工业有限责任公司 Method for evaluating data quality of source end and target end in data acquisition process
CN110955720A (en) * 2018-09-27 2020-04-03 阿里巴巴集团控股有限公司 Data loading method, device and system
CN111064789A (en) * 2019-12-18 2020-04-24 北京三快在线科技有限公司 Data migration method and system
CN111274213A (en) * 2020-02-13 2020-06-12 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN111367889A (en) * 2020-03-09 2020-07-03 中国工商银行股份有限公司 Cross-cluster data migration method and device based on webpage interface
CN111756562A (en) * 2019-03-29 2020-10-09 深信服科技股份有限公司 Cluster takeover method, system and related components
CN112799912A (en) * 2021-01-27 2021-05-14 苏州浪潮智能科技有限公司 Data monitoring method, device and system of AMS (automatic monitoring system)
CN113297166A (en) * 2020-07-27 2021-08-24 阿里巴巴集团控股有限公司 Data processing system, method and device
CN110209653B (en) * 2019-06-04 2021-11-23 中国农业银行股份有限公司 HBase data migration method and device
CN113760856A (en) * 2020-06-05 2021-12-07 京东数字科技控股有限公司 Database management method and device, computer readable storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958808A (en) * 2010-10-18 2011-01-26 华东交通大学 Cluster task dispatching manager used for multi-grid access
US20120198269A1 (en) * 2011-01-27 2012-08-02 International Business Machines Corporation Method and apparatus for application recovery in a file system
CN103207814A (en) * 2012-12-27 2013-07-17 北京仿真中心 Decentralized cross cluster resource management and task scheduling system and scheduling method
US20140040575A1 (en) * 2012-08-01 2014-02-06 Netapp, Inc. Mobile hadoop clusters

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958808A (en) * 2010-10-18 2011-01-26 华东交通大学 Cluster task dispatching manager used for multi-grid access
US20120198269A1 (en) * 2011-01-27 2012-08-02 International Business Machines Corporation Method and apparatus for application recovery in a file system
US20120284558A1 (en) * 2011-01-27 2012-11-08 International Business Machines Corporation Application recovery in a file system
CN103329105A (en) * 2011-01-27 2013-09-25 国际商业机器公司 Application recovery in file system
US20140040575A1 (en) * 2012-08-01 2014-02-06 Netapp, Inc. Mobile hadoop clusters
CN103207814A (en) * 2012-12-27 2013-07-17 北京仿真中心 Decentralized cross cluster resource management and task scheduling system and scheduling method

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808612A (en) * 2014-12-31 2016-07-27 北京嘀嘀无限科技发展有限公司 Method and equipment used for migrating data of database
CN105808612B (en) * 2014-12-31 2019-08-27 北京嘀嘀无限科技发展有限公司 The method and apparatus of data for migrating data library
CN105069128B (en) * 2015-08-14 2018-11-09 北京京东尚科信息技术有限公司 Method of data synchronization and device
CN105069128A (en) * 2015-08-14 2015-11-18 北京京东尚科信息技术有限公司 Data synchronization method and apparatus
CN105159970A (en) * 2015-08-25 2015-12-16 浪潮(北京)电子信息产业有限公司 Database data migrating system and method
CN105159970B (en) * 2015-08-25 2019-03-15 浪潮(北京)电子信息产业有限公司 A kind of database data migration system and method
CN106484379B (en) * 2015-08-28 2019-11-29 华为技术有限公司 A kind of processing method and processing device of application
CN106484379A (en) * 2015-08-28 2017-03-08 华为技术有限公司 A kind of processing method and processing device of application
CN106933859B (en) * 2015-12-30 2020-10-20 中国移动通信集团公司 Medical data migration method and device
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data
CN108021585A (en) * 2016-10-28 2018-05-11 腾讯科技(深圳)有限公司 Distributed data storage method and device
CN106777164B (en) * 2016-12-20 2020-07-10 东软集团股份有限公司 Data migration cluster and data migration method
CN106777164A (en) * 2016-12-20 2017-05-31 东软集团股份有限公司 A kind of Data Migration cluster and data migration method
CN108234566A (en) * 2016-12-21 2018-06-29 阿里巴巴集团控股有限公司 The data processing method and device of a kind of cluster
CN107016075A (en) * 2017-03-27 2017-08-04 聚好看科技股份有限公司 Company-data synchronous method and device
CN107515782A (en) * 2017-07-26 2017-12-26 北京天云融创软件技术有限公司 Implementation method of the container across host migration under a kind of Docker environment
CN107704633A (en) * 2017-11-01 2018-02-16 郑州云海信息技术有限公司 A kind of method and system of file migration
CN110955720A (en) * 2018-09-27 2020-04-03 阿里巴巴集团控股有限公司 Data loading method, device and system
CN110955720B (en) * 2018-09-27 2023-04-07 阿里巴巴集团控股有限公司 Data loading method, device and system
CN109376010A (en) * 2018-09-28 2019-02-22 上海思询信息科技有限公司 A method of across cluster resource migration is realized based on Openstack
CN109544072A (en) * 2018-11-21 2019-03-29 北京京东尚科信息技术有限公司 Method, system, equipment and medium are reduced in hot spot inventory localization
CN109542882B (en) * 2018-12-05 2020-11-06 南京中孚信息技术有限公司 Database migration method and device
CN109542882A (en) * 2018-12-05 2019-03-29 南京中孚信息技术有限公司 A kind of database migration method and device
CN109818794A (en) * 2019-01-31 2019-05-28 北京搜狐互联网信息服务有限公司 Cluster moving method and tool
CN111756562B (en) * 2019-03-29 2023-07-14 深信服科技股份有限公司 Cluster takeover method, system and related components
CN111756562A (en) * 2019-03-29 2020-10-09 深信服科技股份有限公司 Cluster takeover method, system and related components
CN110209731A (en) * 2019-04-25 2019-09-06 深圳壹账通智能科技有限公司 Method of data synchronization, device and storage medium, electronic device
CN110209653B (en) * 2019-06-04 2021-11-23 中国农业银行股份有限公司 HBase data migration method and device
CN110263044B (en) * 2019-06-21 2023-03-31 深圳前海微众银行股份有限公司 Data storage method, device, equipment and computer readable storage medium
CN110263044A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Date storage method, device, equipment and computer readable storage medium
CN110704540B (en) * 2019-10-10 2023-05-02 云南中烟工业有限责任公司 Method for evaluating data quality of source end and target end in data acquisition process
CN110704540A (en) * 2019-10-10 2020-01-17 云南中烟工业有限责任公司 Method for evaluating data quality of source end and target end in data acquisition process
CN111064789A (en) * 2019-12-18 2020-04-24 北京三快在线科技有限公司 Data migration method and system
CN111274213B (en) * 2020-02-13 2022-07-15 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN111274213A (en) * 2020-02-13 2020-06-12 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN111367889B (en) * 2020-03-09 2023-08-04 中国工商银行股份有限公司 Cross-cluster data migration method and device based on webpage interface
CN111367889A (en) * 2020-03-09 2020-07-03 中国工商银行股份有限公司 Cross-cluster data migration method and device based on webpage interface
CN113760856A (en) * 2020-06-05 2021-12-07 京东数字科技控股有限公司 Database management method and device, computer readable storage medium and electronic device
CN113297166A (en) * 2020-07-27 2021-08-24 阿里巴巴集团控股有限公司 Data processing system, method and device
CN112799912A (en) * 2021-01-27 2021-05-14 苏州浪潮智能科技有限公司 Data monitoring method, device and system of AMS (automatic monitoring system)

Also Published As

Publication number Publication date
CN104239493B (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN104239493A (en) Cross-cluster data migration method and system
US10514971B2 (en) Dispersed b-tree directory trees
US11327689B1 (en) Storage unit including memories of different operational speeds for optimizing data storage functions
US10656866B2 (en) Unidirectional vault synchronization to support tiering
US11113161B2 (en) Local storage clustering for redundancy coded data storage system
US20170286224A1 (en) Optimal slice encoding strategies within a dispersed storage unit
US20170153946A1 (en) Process to migrate named objects to a dispersed or distributed storage network (dsn)
US20200341670A1 (en) Method, device, and computer readable medium for data deduplication
WO2019001521A1 (en) Data storage method, storage device, client and system
US10969962B2 (en) Compacting data in a dispersed storage network
US20220391098A1 (en) Optimizing Access Performance in a Distributed Storage Network
US10558592B2 (en) Priority level adaptation in a dispersed storage network
US11455100B2 (en) Handling data slice revisions in a dispersed storage network
US10552341B2 (en) Zone storage—quickly returning to a state of consistency following an unexpected event
US20190171525A1 (en) Method for partial updating data content in a distributed storage network
US20230004505A1 (en) Generating Messages with Priorities in a Storage Network
US10282135B2 (en) Strong consistency write threshold
CN104965835A (en) Method and apparatus for reading and writing files of a distributed file system
US10318445B2 (en) Priority level adaptation in a dispersed storage network
US10678664B1 (en) Hybridized storage operation for redundancy coded data storage systems
US10031805B2 (en) Assigning slices to storage locations based on a predicted lifespan
US10366062B1 (en) Cycled clustering for redundancy coded data storage systems
CN116760661A (en) Data storage method, apparatus, computer device, storage medium, and program product
US20190026147A1 (en) Avoiding index contention with distributed task queues in a distributed storage system
CN115981559A (en) Distributed data storage method and device, electronic equipment and readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant