CN105718507A - Data migration method and device - Google Patents

Data migration method and device Download PDF

Info

Publication number
CN105718507A
CN105718507A CN201610007991.5A CN201610007991A CN105718507A CN 105718507 A CN105718507 A CN 105718507A CN 201610007991 A CN201610007991 A CN 201610007991A CN 105718507 A CN105718507 A CN 105718507A
Authority
CN
China
Prior art keywords
data
thread
hdfs
cluster
gan
Prior art date
Application number
CN201610007991.5A
Other languages
Chinese (zh)
Inventor
郑振峰
Original Assignee
杭州数梦工场科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州数梦工场科技有限公司 filed Critical 杭州数梦工场科技有限公司
Priority to CN201610007991.5A priority Critical patent/CN105718507A/en
Publication of CN105718507A publication Critical patent/CN105718507A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Abstract

The invention provides a data migration method and device. The method comprises the steps as follows: a data source of a first cluster is loaded through a first class loader; the data source of a second cluster is loaded through a second class loader; the first class loader and the second class loader inherit a loader of a data migration tool; in the data migration tool, a first thread reads data of the data source of the first cluster through the first class loader and the data is put into a data queue; and in the data migration tool, a second thread writes the data in the data queue into the data source of the second cluster through the second class loader. The data migration method and device improve the data migration efficiency across the Hadoop cluster.

Description

一种数据迁移方法和装置 A data migration method and apparatus

技术领域 FIELD

[0001 ]本公开涉及计算机技术,特别涉及一种数据迀移方法和装置。 [0001] The present disclosure relates to computer technology, particularly, to a method and apparatus for shifting data Gan.

背景技术 Background technique

[0002] Hadoop是一个能够对大量数据进行分布式处理的软件框架,是一个能够让用户轻松架构和使用的分布式计算平台,用户可以轻松地在Hadoop上开发和运行处理海量数据的应用程序,在大数据处理中得到广泛应用。 [0002] Hadoop is a software framework that enables distributed processing of large amounts of data, is a framework that allows users to easily use and distributed computing platform, users can easily develop and run applications handling massive amounts of data in Hadoop, widely used in the large data processing. 随着大数据应用需求的不断扩张,Hadoop也进行了一系列的版本变革以解决庞大的需求变更造成的技术瓶颈。 With the continuous expansion of big data applications, Hadoop version also conducted a series of changes to address the huge demand changes caused by technical bottlenecks. 不过,Hadoop各个版本之间通常是不互相兼容的,因此数据迀移就成为版本升级过程中必不可少的操作。 However, among the various versions of Hadoop it is often incompatible with each other, so the data Gan shifted version upgrade process has become essential to the operation. 例如,HIVE(hive是基于Hadoop的一个数据仓库工具)和HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)之间的数据迀移是经常遇到的一个场景。 For example, HIVE (hive is a Hadoop-based data warehousing tools) and a scene data between HDFS (Hadoop Distributed File System, Hadoop Distributed File System) Gan shift is often encountered.

[0003]在跨集群的HIVE与HDFS之间的数据迀移场景中,当前的数据迀移方式是,利用Hadoop自身的工具将一个Hadoop集群的数据导出到本地,然后再导入到另一个Hadoop集群的HDFS或者HIVE中。 [0003] between the data across the cluster and the HDFS Gan shift HIVE scene, current data Gan shift mode is to use its own Hadoop tool to export the data to a local Hadoop cluster, and then introduced into another cluster Hadoop HDFS or in the HIVE. 这种迀移方式中,用户需要操作两次才能实现最终的数据迀移,整个过程相当于进行了两次完整的数据迀移,使得数据迀移的效率降低;并且,该方式需要将数据导出到本地,占用本地磁盘空间,磁盘I/O的操作相当耗时,也降低了迀移效率。 Gan this shift, the user needs to operate two times to achieve a final data shift Gan, the whole process is equivalent to two full data shift Gan, Gan shifted so as to reduce the efficiency of the data; and, in this way need to export data locally, occupy local disk space, disk I / O operation is time-consuming, but also reduces the efficiency of Gan shift.

发明内容 SUMMARY

[0004]有鉴于此,本公开提供一种数据迀移方法和装置,以提高Hadoop跨集群的数据迀移的效率。 [0004] Accordingly, the present disclosure provides a method and apparatus for shifting data Gan, to improve the data across the cluster Hadoop Gan shift efficiency.

[0005]具体地,本公开是通过如下技术方案实现的: [0005] In particular, the present disclosure is implemented through the following technical solutions:

[0006]第一方面,提供一种数据迀移方法,所述数据迀移方法通过数据迀移工具执行,所述方法包括: [0006] In a first aspect, there is provided a method of shifting Gan data, said data shift method Gan Gan data by performing shifting tool, said method comprising:

[0007]通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;所述第一类加载器和第二类加载器均继承所述数据迀移工具的加载器; [0007] The class loader is loaded by a first data source of the first cluster, a second cluster of the data source is loaded by a second class loader; the first class and the second class loader loader inherit the data Gan shifter tool loader;

[0008]在所述数据迀移工具中,第一线程通过所述第一类加载器读取第一集群的数据源的数据,将所述数据放入数据队列; [0008] Gan shift in the tool data, data source of a first thread of the first cluster of the first reading by the class loader, the data into the data queue;

[0009]在所述数据迀移工具中,第二线程通过所述第二类加载器将所述数据队列中的数据写入第二集群的数据源。 [0009] Gan shift in the tool data, the second thread by the data in the data queue write data source of the second cluster of the second class loader.

[0010]第二方面,提供一种数据迀移装置,包括: [0010] In a second aspect, there is provided a data Gan shifting apparatus comprising:

[0011 ]加载器配置模块,用于通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;所述第一类加载器和第二类加载器均继承所述数据迀移工具的加载器; [0011] The loader module is configured for loading the first cluster by a first data source class loader, loading the second data source through a second cluster class loader; the first class and the second class loader loading It inherits the data is shifter Gan loader tool;

[0012]数据读取模块,用于在所述数据迀移工具中,第一线程通过所述第一类加载器读取第一集群的数据源的数据,将所述数据放入数据队列; [0012] Data reading module, for the data in the shift Gan tool, a first thread through the first class loader reads the first data source cluster, the data into the data queue;

[0013]数据写入模块,用于在所述数据迀移工具中,第二线程通过所述第二类加载器将所述数据队列中的数据写入第二集群的数据源。 [0013] The data writing means for shifting the data Gan tool, a second thread data in the data queue through the second class loader write data source of the second cluster.

[0014]本公开实施例的数据迀移方法和装置,通过第一类加载器和第二类加载器分别加载两个集群的数据目录,第一类加载器从第一集群数据目录读取的数据,可以被第二类加载器获取并将数据写入第二集群的数据目录,从而通过该JVM就可以实现两个集群之间的数据迀移,相对于两次操作迀移数据的方式,提高了Hadoop跨集群的数据迀移的效率。 [0014] Data embodiment of the present disclosure Gan shift method and apparatus, two cluster data directory loaded by the class loader first and the second class loader respectively, a first class loader reads the data from the first cluster directory data may be acquired and the second class loader directory data is written to the second cluster of data, so that by the JVM can be achieved between the two clusters of data shifting Gan, Gan manner with respect to the two operations shift data, improved data across the cluster Hadoop Gan shift efficiency.

附图说明 BRIEF DESCRIPTION

[0015]图1是本公开实施例提供的一种数据迀移方法的原理图; [0015] FIG 1 are provided to a data shift method disclosed embodiments Gan schematic diagram;

[0016]图2是本公开实施例提供的一种数据迀移方法的流程图; [0016] FIG. 2 is provided in the present embodiment to a data flowchart of a method of the disclosed embodiments Gan shift;

[0017]图3是本公开实施例提供的一种数据迀移的过程示意图; [0017] FIG. 3 is a shift of the data provided by process Gan schematic embodiment of the present disclosure;

[0018]图4是本公开实施例提供的另一种数据迀移的过程示意图; [0018] FIG. 4 is an embodiment of the present process provides another data shift Gan schematic disclosed embodiments;

[0019]图5是本公开实施例提供的又一种数据迀移的过程示意图; [0019] FIG 5 is yet another embodiment provides a data shift Gan process schematic embodiment of the present disclosure;

[0020]图6是本公开实施例提供的一种数据迀移装置的结构示意图。 [0020] FIG. 6 is a data structure diagram of a shifting means provided in Gan embodiment of the present disclosure.

具体实施方式 Detailed ways

[0021]为了克服当前的数据迀移中,由于采用两个迀移工具操作两次才能实现数据迀移导致的迀移效率低的问题,本申请实施例提供的数据迀移方法,将在一个数据迀移工具中实现数据迀移,以提高跨集群的数据迀移的效率。 [0021] In order to overcome the current data in the shift Gan, Gan Gan result of two operation twice to achieve shifting tool Gan shift data shifted due to low efficiency, the application data provided in the present embodiment Gan shift method, will be a data tool for data shift Gan Gan move to improve cross-cluster data Gan shift efficiency.

[0022]图1示例了本申请的数据迀移方法的原理,如图1所示,假设第一集群11和第二集群12是两个不同的Hadoop集群,例如,第一集群11可以是CDH5.2.0,第二集群12可以是Hadoop 2.6.0;或者,第一集群11是Hadoop I.χ,第二集群12是Hadoop 2.χ;或者,还可以第一集群11是HDP,第二集群12是Hadoop,等,不再详举,这些例子中,第一集群11和第二集群12的数据迀移即为跨集群的数据迀移。 Principle Data Gan [0022] Figure 1 illustrates a shift method of the present disclosure, shown in Figure 1, assuming that the first cluster and the second cluster 11 are two different 12 Hadoop cluster, e.g., cluster 11 may be a first CDH5 .2.0, a second cluster 12 may Hadoop 2.6.0; Alternatively, the first cluster 11 is Hadoop I.χ, the second cluster 12 is Hadoop 2.χ; Alternatively, the cluster 11 is the HDP may be the first, second cluster 12 is the Hadoop, etc., give no details of these examples, the first cluster and the second cluster 11 Gan data 12 is the data shift Gan shift across the cluster.

[0023]跨集群的数据迀移的场景,可以是HIVE与HDFS之间的数据迀移,例如,将CDH5.2.0的HIVE中的数据迀移至Hadoop 2.6.0中的HDFS,或者,也可以是将Hadoop 2.6.0中的HDFS中的数据迀移至CDH 5.2.0的HIVE。 [0023] Gan shift data across the cluster scenario, data may be HIVE Gan and HDFS shift between, for example, the data in the Gan HIVE CDH5.2.0 moved in HDFS Hadoop 2.6.0, or may be Gan is the data in HDFS Hadoop 2.6.0 the move to the HIVE CDH 5.2.0. 即数据迀移可以是HIVE与HDFS之间任一方向的迀移。 That data can be Gan Gan shift in either direction between the HDFS shift HIVE.

[0024]请继续参见图1,本申请提供了一个数据迀移工具13,将在该数据迀移工具13中实现跨集群的数据迀移的过程。 [0024] Please continue to refer to FIG. 1, the present application provides a data shift Gan tool 13, the tool 13 will move Gan implementation Gan across the cluster data in the data shifted. 例如,该数据迀移工具13可以是JVM(Java Virtual Machine,Java虚拟机)。 For example, the data shift Gan tool 13 can be a JVM (Java Virtual Machine, Java Virtual Machine). 在该JVM中,可以根据类加载器的双亲委派机制,创建两个类加载器,实现对两个集群的数据操作。 In the JVM, the mechanism may delegate parent class loader, class loader creates two, data on the operation of the two clusters. 图2示例了该数据迀移方法的流程: Figure 2 illustrates the data flow Gan shift method:

[0025]在步骤201中,通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;第一类加载器和第二类加载器均继承数据迀移工具的加载器。 [0025] In step 201, the first cluster of the data source is loaded by a first class loader, loading the second data source through a second cluster class loader; class loader first and the second data are inherited class loader Gan shift loader tool.

[0026]例如,可以先进行JVM的初始化,加载该JVM工具本身所需的Jar包。 [0026] For example, to initialize the JVM, the JVM loads the desired tool Jar package itself.

[0027]本步骤中,可以创建两个类加载器,结合图1的示例,一个称为第一类加载器14,另一个称为第二类加载器15。 [0027] In this step, the two can create a class loader, in conjunction with the example of FIG. 1, referred to as a first class loader 14, the other as a second class loader 15.

[0028]其中,第一类加载器14继承当前JVM的加载器,例如,当第一集群11是⑶H 5.2.0时,该第一类加载器14可以命名为⑶H 5.2.0Class Loader,用该第一类加载器14加载第一集群11的数据源,例如,加载CDH Lib所在的目录,包括HIVE JDBC驱动Jar。 [0028] wherein the first class loader 14 inherits the current JVM loader, e.g., when the time of the first cluster 11 is 5.2.0 ⑶H, the first class loader 14 can be named ⑶H 5.2.0Class Loader, with the a first class loader 14 to load the data source 11 of the first cluster, e.g., where the load directory CDH Lib, including HIVE JDBC driver Jar.

[0029] 第二类加载器15同样继承JVM的加载器,例如,当第二集群12是Hadoop 2.6.0时,该第二类加载器15可以命名为Hadoop 2.6.0Class Loader,用该第二类加载器15加载第二集群12的数据源,例如,加载Hadoop Lib所在的目录。 [0029] The second class also inherits JVM loader 15 loads, for example, when the second cluster 12 is Hadoop 2.6.0 when the second class loader 15 can be named Hadoop 2.6.0Class Loader, with the second class loader 15 to load the data source 12 of the second cluster, e.g., where the load directory Hadoop Lib. 上述的第一类加载器14和第二类加载器15继承了相同的加载器,即为双亲委派机制。 It said first class loader 14 and the second class loader 15 inherits the same loader, i.e. parents delegation mechanism.

[0030]在步骤202中,在数据迀移工具中,第一线程通过第一类加载器读取第一集群的数据源的数据,将数据放入数据队列。 [0030] In step 202, the data Gan shifting tool, a first thread is a first class loader reads the first data source cluster, the data into the data queue through.

[0031] 例如,在JVM中可以创建一个新的线程,将该线程可以称为第一线程。 [0031] For example, to create a new thread in the JVM, the thread may be referred to the first thread. 该第一线程用于从第一集群中读取数据,可以将该第一线程的Context Class Loader设置为CDH5.2.0Class Loader,使得第一线程可以通过第一类加载器CDH 5.2.0Class Loader读取第一集群中的数据。 The first thread for reading data from the first cluster, the first thread may Context Class Loader set CDH5.2.0Class Loader, such that a first thread can be read by a first class loader CDH 5.2.0Class Loader taking data in the first cluster.

[0032]本步骤中,第一线程读取的数据可以放入数据队列中,参见图1所示,示例了其中的一个数据队列16,第一线程读取的数据放入该队列中。 [0032] In this step, a first thread can read the data into the data queue, referring to FIG. 1, which exemplifies a data queue 16, the read data into the first thread in the queue.

[0033]在步骤203中,在数据迀移工具中,第二线程通过第二类加载器将数据队列中的数据写入第二集群的数据源。 [0033] In step 203, the data in the shift Gan tool, the second thread data queue data is written by the second source of the second cluster class loader.

[0034]例如,在JVM中可以再创建一个新的线程,将该线程可以称为第二线程。 [0034] For example, you can then create a new thread in the JVM, the thread may be referred to a second thread. 该第二线程用于向第二集群中写入数据,可以将该第二线程的Context Class Loader设置为Hadoop2.6.0Class Loader。 The second thread for writing data to the second cluster, the second thread can Context Class Loader set Hadoop2.6.0Class Loader. 本步骤中,Hadoop 2.6.0Class Loader可以从数据队列16中读取出第一线程放入的数据,并将数据写入第二集群。 In this step, Hadoop 2.6.0Class Loader 16 can be read out from the data queue data into the first thread, writes the data to the second cluster.

[0035]在本例子的数据迀移方法中,通过第一类加载器和第二类加载器分别加载两个集群的数据目录,例如,Hadoop Lib所在的目录,数据目录相当于集群数据的存放位置,在本例子中也可以称为数据源,第一类加载器从第一集群数据目录读取的数据,可以被第二类加载器获取并将数据写入第二集群的数据目录,从而通过该JVM就可以实现两个集群之间的数据迀移,相对于两次操作迀移数据的方式,提高了数据迀移的效率。 [0035] In the present example of a data Gan shift method, the first and the second class loader class loader directory data are two clusters, e.g., where the directory Hadoop Lib, the cluster data directory corresponding to the data store position, in this example may also be referred to as a data source, a first class loader reads the data from the first cluster data directory, and writes the data may be acquired second cluster data directory second class loader, thereby Gan to shift the JVM data between two clusters may be implemented with respect to two way data shift operation Gan, Gan improved data shifting efficiency. 用户只要提供两个集群对应的版本号相应的LIB,即可通过上述的方案实现在一个进程中完成两个集群的HIVE和HDFS之间的数据迀移,屏蔽了不同版本集群的实现差异, As long as the user provides two clusters corresponding to respective LIB version number, the above-described embodiment can be realized by the completion of data between two clusters and HIVE HDFS Gan shift in a process, shielding differentiate different versions of the cluster,

[0036]此外,在步骤203中第二线程向第二集群写数据时,还可以将数据进行清洗,例如,可以按照用户指定的规则清洗过滤数据后,再将数据写入。 [0036] Further, in step 203 a second thread when the write data, the data can also be cleaned to the second cluster, for example, data can be filtered and washed in accordance with user-specified rules, then the data is written. 另外,在迀移过程中的数据都保存在同一个JVM,可以是临时缓存在内存里,这样就没有磁盘1/0操作,大大提升了数据迀移效率;当数据缓存在内存中时,在数据的内存生命周期内对数据清洗过滤后再写入第二集群。 In addition, data in Gan shift process are stored in a same JVM, it may be temporarily cached in memory, so there is no disk 1/0 operation, greatly enhance the efficiency of data Gan shift; when the data is cached in memory, in filtering of data cleansing within the memory of the life cycle of data and then written to the second cluster.

[0037]本申请提供的数据迀移方法,既可以应用于HIVE至HDFS方向的迀移,也可以适用于HDFS至HIVE方向的迀移。 [0037] Data provided herein Gan shift method, may be applied to HDFS HIVE Gan shift direction, it can be applied to a shift in the Gan HDFS HIVE direction. 如下通过几个例子,详细描述使用本申请的数据迀移方法进行HI VE和HDFS之间数据迀移的过程。 By following a few examples, a detailed description of the present application is the data used for HI VE Gan shift method and data between HDFS Gan shift process.

[0038] 将HIVE的数据迀移至HDFS中 [0038] The data HIVE Gan moved in HDFS

[0039] 在本例子中,第一集群11例如是CDH 5.2.0,第一集群的数据源是HIVE,g卩CDH5.2.0Hive,参见图3所示。 [0039] In the present example, the first cluster 11, for example, CDH 5.2.0, the data source is a first cluster HIVE, g Jie CDH5.2.0Hive, see FIG. 3. 第二集群12例如是Hadoop 2.6.0,第二集群的数据源是HDFS,即Hadoop 2.6.0HDFS Cluster。 12, for example, the second cluster Hadoop 2.6.0, the data source of the second cluster is HDFS, i.e. Hadoop 2.6.0HDFS Cluster. 本例子要将CDH 5.2.0 Hive 中的数据迁移至Hadoop2.6.0HDFS Cluster。 The data in the present example 5.2.0 Hive CDH To migrate to Hadoop2.6.0HDFS Cluster.

[0040]请继续参见图3,CDH 5.2.0Hive至Hadoop 2.6.0HDFS Cluster的数据迀移,可以采用并发迀移的方式以进一步提高数据迀移的效率。 [0040] Please continue to refer to FIG. 3, CDH 5.2.0Hive to Hadoop 2.6.0HDFS Cluster data shift Gan, Gan concurrent shifting methods can be used to further enhance the efficiency of the data shift Gan. 为了实现并发迀移,需要将迀移的数据进行拆分,例如,将CDH 5.2.0Hive中的数据分成多个数据分片,图3中的slicel、slice2........slicen这η个数据分片并行迀移。 To achieve concurrent shift Gan, Gan shift data needs to be split, for example, in the CDH 5.2.0Hive data into a plurality of data pieces, in FIG. 3 slicel, slice2 ........ slicen this η data slices parallel Gan shift.

[0041]本例子中,JVM中可以创建多个第一线程,每个线程用于读取其中一个数据分片,例如,一个第一线程读取s I ice I,另一个第一线程读取si ice2,又一个第一线程读取s Iicen等。 [0041] In the present example, the JVM may create a first plurality of threads, each of which is used to read a data piece, e.g., a first thread reads s I ice I, the other of the first thread reads si ice2, and a first thread reads s Iicen like. 这些第一线程组成第一线程池。 The first of threads of the first thread pool. 在具体实施中,第一线程可以通过HIVE接口连接HIVE,该HIVE接口可以是JDBC(Java Data Base Connectivity,java数据库连接)接口,JDBC接口是一种用于执行SQL语句的Java API,可以为多种关系数据库提供统一访问。 In a specific embodiment, the first thread can HIVE HIVE interface, the interface may be HIVE JDBC (Java Data Base Connectivity, java database connectivity) interface, JDBC interface is a Java API for executing SQL statements, may be a multi- kinds of relational databases to provide a unified access. 第一线程可以创建JDBC Driver实例连接HIVE,并用Statement执行select语句读取HIVE中的数据,例如可以将每5000行数据提交到数据队列中。 The first thread connection can be created instance HIVE JDBC Driver, the read data and the select statement Statement HIVE is performed with, for example, every 5000 to submit data to the data line queue.

[0042]请继续结合图3,JVM中还创建有第二线程池,该第二线程池中包括多个第二线程,每个第二线程对应一个第一线程,用于通过HDFS接口连接HDFS,将第一线程读取的数据分片写入一个HDFS文件中。 [0042] Please continue conjunction with FIG. 3, JVM also create a second thread pool, the thread pool comprising a second plurality of second thread, the second thread corresponds to each of a first thread, for connection to HDFS HDFS via the interface the thread reading the first data piece is written in a file HDFS. 例如,HDFS接口是FileSystem接口,使用FileSystem JAVA API可以操作HDFS,在HDFS进行数据的读、写、删除等操作。 For example, HDFS FileSystem interface is the interface that can be operated using the FileSystem JAVA API HDFS, read data in HDFS, write, delete and other operations. 第二线程可以创建HDFS FileSystem,并创建OutputStream实例,并将一个slice的数据写入一个对应的HDFS SubFile中。 The second thread may create HDFS FileSystem, and OutputStream created instance, and a slice of data is written in a corresponding HDFS SubFile. 例如,将slicel的数据写入对应的HDFS SubFilel中,将slice2的数据写入对应的HDFS SubFile2中。 For example, the data is written into the corresponding slicel HDFS SubFilel, the data are written into corresponding slice2 in HDFS SubFile2. 此外,本例子中通过JDBC和流的抽象接口实现与两个集群的HIVE或HDFS的连接,屏蔽了不同Hadoop发行版本的差异。 Furthermore, the present example achieved HIVE two clusters connected by HDFS or abstract interface JDBC and flow, shields the differences Hadoop release.

[0043]本例子中,将⑶H 5.2.0Hive中的数据划分数据分片的方式有多种,可以按照基于HIVE表中的ID索引列或者整型列划分。 [0043] In the present example, a variety of data into the data in the slice ⑶H 5.2.0Hive manner, can ID HIVE table index column or divided according to a column based integer. 例如,假如HIVE表中存在ID索引列1-30000000共三千万行数据,如果拆分成三个数据分片,则1-10000000行由一个第一线程读取,10000001-20000000由另一个第一线程读取,200000001-30000000行由再一个第一线程读取。 For example, if the index column 1-30000000 ID total of three million lines of HIVE data exists in the table, if the data is split into three fragments, the row 1-10000000 read by a first thread, of 10000001-20000000 by another a read thread, and then read by a row 200000001-30000000 first thread. 又例如,假如HIVE表中存在整型列Price,拆分成50片,则每个第一线程读取的数据为select^fromtable where Price%50 = I limit 10000,其中i为线程ID号。 As another example, if the integer column Price HIVE table exists, split into 50, each data of the first thread is read select ^ fromtable where Price% 50 = I limit 10000, where i is the thread ID number. 根据HDFS的文件处理特点,每个数据分片的数据量不宜过小。 The characteristics of HDFS file processing, the amount of data in each tile should not be too small.

[0044]此外,当HIVE的所有数据都导入完成后,用户可以选择合并多个HDFS SubFiIe,这是可选的操作。 [0044] In addition, when all data HIVE are imported, the user can choose to merge multiple HDFS SubFiIe, this operation is optional. 例如,如果HIVE表支持时间戳,还可以将各个时间段的数据导入到HDFS,再将不同时间段的HDFS文件合并。 For example, if the timestamp table supports HIVE, further data may be introduced to the HDFS respective periods, then HDFS file merge different time periods.

[0045]并发迀移的优点在于,当某个数据分片传输失败后,只要删除对应的HDFS文件并重传即可,重新迀移的只是某个数据分片的数据而不是全部数据迀移重新执行,可以提高故障恢复的效率。 [0045] The advantage is that concurrent shift Gan, when a data slice transmission fails, simply delete the corresponding file and retransmit to HDFS, re Gan shifted only a data slice data rather than all the data re-shift Gan execution, can improve the efficiency of recovery.

[0046] 将HDFS的数据迀移至HIVE中 [0046] The data HDFS Gan moved in HIVE

[0047] 在本例子中,第一集群的数据源是HDFSji^nCDH 5.2.0HDFS Cluster,第二集群的数据源是HIVE,例如Hadoop 2.6.0HIVE。 [0047] In the present example, the first cluster of the data source is HDFSji ^ nCDH 5.2.0HDFS Cluster, the data source is the HIVE second cluster, e.g. Hadoop 2.6.0HIVE. 将HDFS的数据向HIVE迀移时,也可以采用并发迀移的方案,以进一步提高迀移效率。 When data is shifted to the HDFS HIVE Gan, Gan may be used concurrently shifted programs, Gan shift to further improve efficiency. 并且,HIVE的底层存储是HDFS,因此本例子依据HDFS实现并发写。 And, HIVE underlying storage is HDFS, thus achieving the present example based on concurrent write HDFS. Hive表分为外部表和内部表,如下分别对这两种表的迀移方式进行说明。 Hive table is divided into inner table and the outer table, respectively, as these two ways Gan shift table will be described.

[0048] 外部表 [0048] External Table

[0049] 请参见图4所示,本例子中,将数据由CDH 5.2.0HDFS Cluster迀移至Hadoop2.6.0HIVE外部表。 [0049] Referring to FIG illustrated, in the present example, the data is moved from the CDH 5.2.0HDFS Cluster Gan 4 Hadoop2.6.0HIVE external table. HIVE外部表往往通过指定的HDFS目录作为它的存储空间,插入数据等同于在该HDFS目录下增加或者修改文件,因此,向HIVE外部表写入数据就相当于向HDFS目录写数据,将数据写到外部表指定的HDFS目录即可。 HIVE external tables are often specified by the HDFS directory as its storage space, insert data equivalent to an increase in the HDFS or modify the file directory, therefore, write to an external table HIVE equivalent to write data to the HDFS directory, write the data HDFS to an outside table specified directory.

[0050] 如图4所示,将CDH 5.2.0HDFS Cluster的数据进行拆分,分成多个segmentTread,即分成多个数据分片,segment Tread I是一个数据分片,segment Tread 2是另一个数据分片等,本例子分成了η个分片。 [0050] As shown in FIG 4, the CDH 5.2.0HDFS Cluster data split, into a plurality of segmentTread, i.e. divided into a plurality of data pieces, segment Tread I data is a slice, segment Tread 2 is another data fragments etc., the present example is divided into sub-η sheet. JVM同样通过第一线程池中的多个第一线程读取数据,每个第一线程读取一个数据分片。 Similarly JVM first thread through the plurality of first read data thread pool, each first thread reads a data slice. 第二线程池中的多个第二线程与第一线程池中的多个第一线程一一对应,每个第二线程对应一个第一线程,用于将该第一线程的数据分片写入HIVE外部表指定的HDFS目录中的一个HDFS文件中。 A second plurality of second thread of the thread pool with a plurality of first thread of the first thread pool one correspondence, each of the second thread corresponds to a first thread, the first thread for the write data pieces HIVE external table into the specified directory in a HDFS HDFS file. 例如,某个第一线程读取了segmentTread I,再由对应该第一线程的第二线程将segment Tread I写入指定HDFS目录中的HDFSSubFile I。 For example, a first thread reads the segmentTread I, and then by the second thread of the first thread should be written to the specified segment Tread I HDFS directory HDFSSubFile I.

[0051 ] 内部表 [0051] The inner table

[0052] 请结合图5所示,将数据由CDH 5.2.0HDFS Cluster迀移至Hadoop 2.6.0 HIVE内部表的方案与迀往外部表的方案类似,区别在于,内部表的数据存储位置交由Hive管理。 [0052] Please conjunction with FIG. 5, the data moves from the CDH 5.2.0HDFS Cluster Gan Gan Hadoop 2.6.0 HIVE programs and tables into the interior of the outer table similar embodiment, except that the position of the internal data storage table referred Hive management. 因此在本例子中,可以创建HDFS目录,按照上述外部表的方案,由第一线程池将各个数据分片的数据写到临时HDFS的对应HDFS文件中。 Therefore, in the present example, HDFS directories can be created, according to the protocol of the external table by the first thread pool respective data pieces of data written to a temporary file HDFS corresponding HDFS. 例如,某个第一线程读取了segment Tread 1,再由对应该第一线程的第二线程将segment Tread I写入临时HDFS目录的HDFS SubFile I。 For example, a first thread reads the segment Tread 1, then by corresponding to the second thread of the first thread segment Tread I will write to the temporary directory HDFS HDFS SubFile I.

[0053]每个第二线程在将自身负责的所有数据写入到临时HDFS目录的HDFS文件后,可以通过HIVE接口连接HIVE,将HDFS文件中的数据导入至HIVE。 [0053] After each second thread itself is responsible for all the data written to a temporary directory HDFS HDFS files, may be connected via HIVE HIVE interfaces, the data file into HDFS to HIVE. 例如,可以创建JDBC Driver实例连接Hive数据库,使用load data inpath语句将HDFS文件中的数据导入至HIVE,该过程即数据拷贝。 For example, you can create JDBC Driver connection Hive database instance, using load data inpath statement data to HDFS file into the HIVE, i.e. the data copying process. 拷贝结束后,可以删除对应的临时的HDFS文件,所有数据拷贝结束后删除创建的临时HDFS目录即可。 After the copy, you can delete the corresponding HDFS temporary files, temporary HDFS directory can be deleted after the copy of all the data created.

[0054]上述的向HIVE外部表和内部表写数据的两种情况,都涉及到对源HDFS文件的拆分,本例子中,可以是直接按文件大小平均分配,例如,如果⑶H5.2.0HDFS Cluster中要迀移的源HDFS文件数据的大小是1G,如果将该文件拆分成10个数据分片,那么每个数据分片的大小可以是按文件大小平均分配得到。 [0054] In both cases described above the write data to the inner table and the outer table HIVE, involves splitting HDFS source file, in the present example, may be evenly distributed directly by file size, e.g., if ⑶H5.2.0HDFS Cluster size Gan to be shifted source data file is HDFS 1G, if the file data is split into 10 slices, then each data slice size may be obtained by file size evenly distributed. 例如,segment Thread I包括的是O〜102.4M的数据,segment 111代3(12包括的是102.4〜204.8M的数据。 For example, segment Thread I O~102.4M data is included, segment 111 substituting 3 (12 102.4~204.8M data is included.

[0055]在文件拆分时,每个数据分片都需要确定自己的真实起始位置和结束位置。 [0055] When a file splitting, each of the data slice need to identify their true start and end positions. 例如,在确定起始位置时,如果当前划分的位置的上一个字符为换行符则当前位置即起始位置,否则往后读,一直读到换行符为止,则下一行即为本分片应该开始读的数据。 For example, in determining the starting position, if a character on the current position is divided newline that is the current position of the starting position, otherwise the next read, read until a newline character so far, the next line is, this fragmentation should I began to read the data. 在确定结束位置时,如果当前划分的最后位置不是换行符则往后读,一直读到换行符为止,则读的这一行为本分片应用读的最后一行数据。 In determining the end position, if the last position of the current dividing line breaks are not read back, the last line of data has been read so far this behavior newline, then read this fragmentation of the application read.

[0056]图6示例了一种数据迀移装置的结构,该装置可以执行上述实施例的数据迀移方法,如图6所示,该装置可以包括:加载器配置模块61、数据读取模块62和数据写入模块63。 [0056] FIG. 6 illustrates a structure of a data Gan shifting means, the apparatus may perform the above-described embodiment Gan data shift method, shown in Figure 6, the apparatus may comprise: loading configuration module 61, the data read module 62 and 63 the data writing module.

[0057]加载器配置模块61,用于通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;所述第一类加载器和第二类加载器均继承所述数据迀移工具的加载器; [0057] The configuration module loader 61 for loading a first cluster by a first data source class loader, loading the second data source through a second cluster class loader; the first type and second type loader said data loader are inherited Gan shifter tool loader;

[0058]数据读取模块62,用于在所述数据迀移工具中,第一线程通过所述第一类加载器读取第一集群的数据源的数据,将所述数据放入数据队列; [0058] The data read module 62, the data for Gan shifting tool, a first thread through the first class loader reads the first data source cluster, the data into the data queue ;

[0059]数据写入模块63,用于在所述数据迀移工具中,第二线程通过所述第二类加载器将所述数据队列中的数据写入第二集群的数据源。 [0059] The data writing module 63, the data for Gan shifting tool, a second thread of the second class loader through the data in the data queue write data source of the second cluster.

[0060]在一个例子中,第一集群的数据源是HIVE,第二集群的数据源是HDFS; [0060] In one example, the data source is the HIVE first cluster, the second cluster of data source is the HDFS;

[0061]数据读取模块62,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于通过HIVE接口连接HIVE,读取所述HIVE的数据源的一个数据分片; [0061] The data read module 62, comprising a first thread pool, the first thread pool comprising a plurality of the first thread, the first thread for each connection HIVE HIVE via the interface, the reading of HIVE a source of data slice;

[0062]所述数据写入模块63,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于通过HDFS接口连接HDFS,将所述第一线程的数据分片写入一个HDFS文件中。 [0062] The data writing module 63, including a second thread pool, the thread pool comprising a second plurality of said second threads, each of the second thread corresponds to a first thread, for connection interface to HDFS HDFS, the first thread a write data piece HDFS file.

[0063]例如,HIVE接口是JDBC接口,所述HDFS接口是FiIeSystem接口。 [0063] For example, the HIVE JDBC interface is an interface, the interface is a FiIeSystem HDFS interface.

[0064]在一个例子中,第一集群的数据源是HDFS,第二集群的数据源是HIVE。 [0064] In one example, the data source is the HDFS first cluster, the second cluster is a data source HIVE.

[0065]数据读取模块62,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; [0065] The data read module 62, comprising a first thread pool, the first thread pool comprising a plurality of the first thread, the first thread of each HDFS for reading the data source a data points sheet;

[0066]所述数据写入模块63,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入HIVE外部表指定的HDFS目录中的一个HDFS文件中。 [0066] The data writing module 63, including a second thread pool, the thread pool comprising a second plurality of said second threads, each thread of the second thread corresponds to a first, for the first a thread writes data piece HIVE external table specified HDFS HDFS a directory file.

[0067]在一个例子中,第一集群的数据源是HDFS,第二集群的数据源是HIVE。 [0067] In one example, the data source is the HDFS first cluster, the second cluster is a data source HIVE.

[0068]数据读取模块62,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; [0068] The data read module 62, comprising a first thread pool, the first thread pool comprising a plurality of the first thread, the first thread of each HDFS for reading the data source a data points sheet;

[0069]所述数据写入模块63,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入临时HDFS目录中的一个HDFS文件,并通过HI VE接口连接HIVE,将所述HDFS文件中的数据写入HIVE。 [0069] The data writing module 63, including a second thread pool, the thread pool comprising a second plurality of said second threads, each thread of the second thread corresponds to a first, for the first a thread writes a data piece HDFS HDFS temporary file directory, and is connected via HIVE HI VE interfaces, the data file is written to the HDFS HIVE.

[0070]所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。 [0070] If the function is implemented as a separate product sold or used in the form of a software functional unit may be stored in a computer-readable storage medium. 基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。 Based on such understanding, the technical solutions of the present disclosure or the part contributing to the prior art or part of the technical solutions may be embodied in a software product, which computer software product is stored in a storage medium, comprising several instructions that enable a computer device (may be a personal computer, a server, or network device) to perform all or a part of the present disclosure various steps of the method according to an embodiment. 而前述的存储介质包括:U盘、移动硬盘、只读存储器(R0M,Read-0nly Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。 The storage medium includes: U disk, mobile hard disk, a read-only memory (R0M, Read-0nly Memory), a random access various memories (RAM, Random Access Memory), a magnetic disk, or an optical medium can store program codes .

[0071]以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。 [0071] The above-described preferred embodiments of the present disclosure only but are not intended to limit the present disclosure, within the spirit and principles of the present disclosure, any changes made, equivalent substitutions and improvements should be included within the scope of protection of the present disclosure.

Claims (10)

1.一种数据迀移方法,其特征在于,所述数据迀移方法通过数据迀移工具执行,所述方法包括: 通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;所述第一类加载器和第二类加载器均继承所述数据迀移工具的加载器; 在所述数据迀移工具中,第一线程通过所述第一类加载器读取第一集群的数据源的数据,将所述数据放入数据队列; 在所述数据迀移工具中,第二线程通过所述第二类加载器将所述数据队列中的数据写入第二集群的数据源。 A data Gan shift method, wherein the data through the data shift method Gan Gan perform shifting tool, said method comprising: a first cluster of the data source is loaded by a first class loader, loading the second type loads the second cluster of data source; the first and the second class loader class loader inherit the data loader Gan shifter tool; Gan shift in the tool data, a first thread through the first a class loader reads the first data source cluster, the data into the data queue; Gan shift in the data tool, the second thread of the second class loader through the data queue the source of the second write data cluster.
2.根据权利要求1所述的方法,其特征在于, 所述第一集群的数据源是HIVE,第二集群的数据源是HDFS; 所述数据迀移工具中包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于通过HIVE接口连接HIVE,读取所述HIVE的数据源的一个数据分片; 所述数据迀移工具包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于通过HDFS接口连接HDFS,将所述第一线程的数据分片写入一个HDFS文件中。 2. The method according to claim 1, wherein the first data source is the HIVE cluster, the second cluster of data source is the HDFS; Gan said data shift means comprises a first thread pool, the the first thread pool comprising a plurality of the first thread, a thread data piece for each of the first interface via HIVE HIVE, HIVE reading the data source; Gan said data shift means comprises a second thread pool, the thread pool comprising a second plurality of said second threads, each of the second thread corresponds to a first thread, for connection to HDFS HDFS via the interface, the first data pieces written thread an HDFS file.
3.根据权利要求2所述的方法,其特征在于,所述HIVE接口是JDBC接口,所述HDFS接口是FileSystem接口。 3. The method according to claim 2, wherein said interface is a HIVE JDBC interface, the interface is a HDFS FileSystem interface.
4.根据权利要求1所述的方法,其特征在于, 所述第一集群的数据源是HDFS,第二集群的数据源是HIVE; 所述数据迀移工具中包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; 所述数据迀移工具包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入HIVE外部表指定的HDFS目录中的一个HDFS文件中。 4. The method according to claim 1, wherein the first data source is the HDFS cluster, the second cluster of data source is the HIVE; Gan said data shifting means includes a first thread pool, the the first thread pool comprising a plurality of the first thread, each thread configured to read the first data pieces HDFS a data source; Gan said data shift means comprises a second thread pool, the first two thread pool comprising a plurality of said second threads, each thread of the second thread corresponds to a first, for the first thread a write data piece HDFS HIVE external table file in the directory specified by HDFS in.
5.根据权利要求1所述的方法,其特征在于, 所述第一集群的数据源是HDFS,第二集群的数据源是HIVE; 所述数据迀移工具中包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; 所述数据迀移工具包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入临时HDFS目录中的一个HDFS文件,并通过HIVE接口连接HIVE,将所述HDFS文件中的数据写入HIVE。 5. The method according to claim 1, wherein the first data source is the HDFS cluster, the second cluster of data source is the HIVE; Gan said data shifting means includes a first thread pool, the the first thread pool comprising a plurality of the first thread, each thread configured to read the first data pieces HDFS a data source; Gan said data shift means comprises a second thread pool, the first two thread pool comprising a plurality of said second threads, each thread of the second thread corresponds to a first, for the first thread a write data piece HDFS HDFS temporary file directory, and by HIVE interface HIVE, the data file is written to HDFS HIVE.
6.一种数据迀移装置,其特征在于,包括: 加载器配置模块,用于通过第一类加载器加载第一集群的数据源,通过第二类加载器加载第二集群的数据源;所述第一类加载器和第二类加载器均继承所述数据迀移工具的加载器; 数据读取模块,用于在所述数据迀移工具中,第一线程通过所述第一类加载器读取第一集群的数据源的数据,将所述数据放入数据队列; 数据写入模块,用于在所述数据迀移工具中,第二线程通过所述第二类加载器将所述数据队列中的数据写入第二集群的数据源。 A data shifting means Gan, characterized by comprising: a loader module is configured for loading the first cluster by a first data source class loader, loading the second data source through a second cluster class loader; the first and the second class loader inherit the class loader to load data Gan shifter tool; and a data reading module, for the data in the shift Gan tool, a first thread through the first type loader reads the first data source cluster, the data into the data queue; data writing means for shifting the data Gan tool, the second thread by the second class loader data in the data queue write data source of the second cluster.
7.根据权利要求6所述的装置,其特征在于,所述第一集群的数据源是HIVE,第二集群的数据源是HDFS; 所述数据读取模块,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于通过HIVE接口连接HIVE,读取所述HIVE的数据源的一个数据分片; 所述数据写入模块,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于通过HDFS接口连接HDFS,将所述第一线程的数据分片写入一个HDFS文件中。 7. The device according to claim 6, wherein the first data source is the HIVE cluster, the second cluster of data source is the HDFS; the data read module, including a first thread pool, the the first thread pool comprising a plurality of the first thread, the first thread for each connected via an interface HIVE HIVE, reading a slice of the data HIVE data source; the data writing module, comprising a first two thread pool, the thread pool comprising a second plurality of said second threads, each of the second thread corresponds to a first thread, for connection to HDFS HDFS via the interface, the first thread a write data piece HDFS into a file.
8.根据权利要求7所述的装置,其特征在于,所述HI VE接口是JDBC接口,所述HDFS接口是FileSystem接口。 8. The apparatus according to claim 7, wherein said HI VE interface is a JDBC interface, the interface is a HDFS FileSystem interface.
9.根据权利要求6所述的装置,其特征在于,所述第一集群的数据源是HDFS,第二集群的数据源是HIVE; 所述数据读取模块,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; 所述数据写入模块,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入HIVE外部表指定的HDFS目录中的一个HDFS文件中。 9. The apparatus according to claim 6, wherein the first data source is the HDFS cluster, the second cluster of data source is the HIVE; the data read module, including a first thread pool, the the first thread pool comprising a plurality of the first thread, each thread configured to read the first data pieces HDFS a data source; the data writing module, comprising a second thread pool, the the second thread pool comprising a plurality of said second threads, each thread of the second thread corresponds to a first, for the first thread a write data piece HDFS the HIVE external table specified directory HDFS file.
10.根据权利要求6所述的装置,其特征在于,所述第一集群的数据源是HDFS,第二集群的数据源是HIVE; 所述数据读取模块,包括第一线程池,所述第一线程池中包括多个所述第一线程,每个第一线程用于读取所述HDFS的数据源的一个数据分片; 所述数据写入模块,包括第二线程池,所述第二线程池中包括多个所述第二线程,每个第二线程对应一个第一线程,用于将所述第一线程的数据分片写入临时HDFS目录中的一个HDFS文件,并通过HIVE接口连接HIVE,将所述HDFS文件中的数据写入HIVE。 10. The apparatus according to claim 6, wherein the first data source is the HDFS cluster, the second cluster of data source is the HIVE; the data read module, including a first thread pool, the the first thread pool comprising a plurality of the first thread, each thread configured to read the first data pieces HDFS a data source; the data writing module, comprising a second thread pool, the the second thread pool comprising a plurality of said second threads, each thread of the second thread corresponds to a first, for the first thread a write data piece HDFS HDFS temporary file directory, and by HIVE HIVE interface, the data file is written to the HDFS HIVE.
CN201610007991.5A 2016-01-06 2016-01-06 Data migration method and device CN105718507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610007991.5A CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610007991.5A CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Publications (1)

Publication Number Publication Date
CN105718507A true CN105718507A (en) 2016-06-29

Family

ID=56147117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610007991.5A CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Country Status (1)

Country Link
CN (1) CN105718507A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778960B1 (en) * 2005-10-20 2010-08-17 American Megatrends, Inc. Background movement of data between nodes in a storage cluster
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103324592A (en) * 2013-06-24 2013-09-25 华为技术有限公司 Data migration control method, data migration method and data migration device
CN104346240A (en) * 2013-08-05 2015-02-11 国际商业机器公司 Method and Apparatus Utilizing Multiple Memory Pools During Mobility Operations
CN104598453A (en) * 2013-10-31 2015-05-06 中国银联股份有限公司 Data migration method based on data buffering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778960B1 (en) * 2005-10-20 2010-08-17 American Megatrends, Inc. Background movement of data between nodes in a storage cluster
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103324592A (en) * 2013-06-24 2013-09-25 华为技术有限公司 Data migration control method, data migration method and data migration device
CN104346240A (en) * 2013-08-05 2015-02-11 国际商业机器公司 Method and Apparatus Utilizing Multiple Memory Pools During Mobility Operations
CN104598453A (en) * 2013-10-31 2015-05-06 中国银联股份有限公司 Data migration method based on data buffering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周志明: "《深入理解Java虚拟机 JVM高级特性与最佳实践》", 30 June 2013, 机械工业出版社 *
鲍亮 等: "《深入浅出云计算》", 31 October 2012, 清华大学出版社 *

Similar Documents

Publication Publication Date Title
Abramova et al. NoSQL databases: MongoDB vs cassandra
Jiang et al. The performance of mapreduce: An in-depth study
CN102317938B (en) Asynchronous distributed de-duplication for replicated content addressable storage clusters
Floratou et al. SQL-on-Hadoop: full circle back to shared-nothing database architectures
US8856469B2 (en) Apparatus and method for logging optimization using non-volatile memory
US8346820B2 (en) Asynchronous distributed garbage collection for replicated storage clusters
US9336263B2 (en) Data loading systems and methods
Vora Hadoop-HBase for large-scale data
CN101031907B (en) Index processing
US9262458B2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
CN100437568C (en) Transaction consistent copy-on-write database
AU2006200226B2 (en) File system represented inside a database
US7930508B2 (en) File systems for data processing systems
KR20100107470A (en) Selecting storage location for file storage based on storage longevity and speed
US8868926B2 (en) Cryptographic hash database
US9483257B2 (en) Universal and adaptive software development platform for data-driven applications
US9542409B2 (en) Deduplicated file system
US9141630B2 (en) Fat directory structure for use in transaction safe file system
US10311048B2 (en) Full and partial materialization of data from an in-memory array to an on-disk page structure
JP6046260B2 (en) Table format for the MapReduce system
US20130097599A1 (en) Resuming execution of an execution plan in a virtual machine
JP2006018632A (en) Index addition program for relational data base, index addition device and index addition method
AU2010239529B2 (en) Asynchronous distributed object uploading for replicated content addressable storage clusters
US8255373B2 (en) Atomic multiple modification of data in a distributed storage system
US9665304B2 (en) Storage system with fast snapshot tree search

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination