CN105718507A - Data migration method and device - Google Patents

Data migration method and device Download PDF

Info

Publication number
CN105718507A
CN105718507A CN201610007991.5A CN201610007991A CN105718507A CN 105718507 A CN105718507 A CN 105718507A CN 201610007991 A CN201610007991 A CN 201610007991A CN 105718507 A CN105718507 A CN 105718507A
Authority
CN
China
Prior art keywords
data
thread
hdfs
cluster
hive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610007991.5A
Other languages
Chinese (zh)
Inventor
郑振峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201610007991.5A priority Critical patent/CN105718507A/en
Publication of CN105718507A publication Critical patent/CN105718507A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data migration method and device. The method comprises the steps as follows: a data source of a first cluster is loaded through a first class loader; the data source of a second cluster is loaded through a second class loader; the first class loader and the second class loader inherit a loader of a data migration tool; in the data migration tool, a first thread reads data of the data source of the first cluster through the first class loader and the data is put into a data queue; and in the data migration tool, a second thread writes the data in the data queue into the data source of the second cluster through the second class loader. The data migration method and device improve the data migration efficiency across the Hadoop cluster.

Description

A kind of data migration method and device
Technical field
It relates to computer technology, particularly to a kind of data migration method and device.
Background technology
Hadoop is a software frame that mass data can carry out distributed treatment, it it is a Distributed Computing Platform that can allow the light framework of user and use, user can easily on Hadoop exploitation and operation process mass data application program, be used widely at big Data processing.Along with the continuous expansion of big market demand demand, Hadoop has been also carried out a series of version and has changed the technical bottleneck caused to solve huge demand to change.But, usually not compatible between each version of Hadoop, therefore Data Migration just becomes requisite operation in edition upgrading process.Such as, the Data Migration between HIVE one Tool for Data Warehouse of Hadoop (hive be based on) and HDFS (HadoopDistributedFileSystem, Hadoop distributed file system) is the scene being frequently encountered by.
In Data Migration scene between HIVE and the HDFS across cluster, current Data Migration mode is, utilizes the instrument of Hadoop self that the data of a Hadoop cluster are exported to this locality, then imports in HDFS or HIVE of another Hadoop cluster again.In this migration pattern, user needs operation could realize final Data Migration for twice, and whole process is equivalent to carry out complete twice Data Migration so that the efficiency of Data Migration reduces;Further, which needs data are exported to this locality, takies local disk space, and the operation of magnetic disc i/o is relatively time consuming, also reduces transport efficiency.
Summary of the invention
In view of this, the disclosure provides a kind of data migration method and device, to improve the Hadoop efficiency across the Data Migration of cluster.
Specifically, the disclosure is achieved by the following technical solution:
First aspect, it is provided that a kind of data migration method, described data migration method is performed by Data Migration Tools, and described method includes:
Loaded the data source of the first cluster by first kind loader, loaded the data source of the second cluster by Equations of The Second Kind loader;The loader of described Data Migration Tools all inherited by described first kind loader and Equations of The Second Kind loader;
In described Data Migration Tools, first thread reads the data of the data source of the first cluster by described first kind loader, and described data are put into data queue;
In described Data Migration Tools, the data in described data queue are write the data source of the second cluster by the second thread by described Equations of The Second Kind loader.
Second aspect, it is provided that a kind of data migration device, including:
Loader configuration module, for being loaded the data source of the first cluster by first kind loader, loads the data source of the second cluster by Equations of The Second Kind loader;The loader of described Data Migration Tools all inherited by described first kind loader and Equations of The Second Kind loader;
Data read module, in described Data Migration Tools, first thread reads the data of the data source of the first cluster by described first kind loader, and described data are put into data queue;
Data write. module, in described Data Migration Tools, the data in described data queue are write the data source of the second cluster by the second thread by described Equations of The Second Kind loader.
The data migration method of disclosure embodiment and device, the data directory of two clusters is loaded respectively by first kind loader and Equations of The Second Kind loader, the data that first kind loader reads from the first company-data catalogue, can be obtained and write data into the data directory of the second cluster by Equations of The Second Kind loader, the Data Migration between two clusters just can be realized thereby through this JVM, migrate the mode of data relative to twice operation, improve the Hadoop efficiency across the Data Migration of cluster.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of a kind of data migration method that disclosure embodiment provides;
Fig. 2 is the flow chart of a kind of data migration method that disclosure embodiment provides;
Fig. 3 is the process schematic of a kind of Data Migration that disclosure embodiment provides;
Fig. 4 is the process schematic of the another kind of Data Migration that disclosure embodiment provides;
Fig. 5 is the process schematic of another Data Migration that disclosure embodiment provides;
Fig. 6 is the structural representation of a kind of data migration device that disclosure embodiment provides.
Detailed description of the invention
In order to overcome in current Data Migration, the problem that Data Migration causes owing to adopting the operation of two Migration tools could realize for twice transport efficiency is low, the data migration method that the embodiment of the present application provides, Data Migration will be realized, to improve the efficiency of the Data Migration across cluster in a Data Migration Tools.
Fig. 1 illustrates the principle of the data migration method of the application, as shown in Figure 1, it is assumed that the first cluster 11 is two different Hadoop clusters with the second cluster 12, for instance, the first cluster 11 can be CDH5.2.0, and the second cluster 12 can be Hadoop2.6.0;Or, the first cluster 11 is Hadoop1.x, and the second cluster 12 is Hadoop2.x;Or, it is also possible to the first cluster 11 is HDP, and the second cluster 12 is Hadoop, etc., to lift no longer in detail, in these examples, the Data Migration of the first cluster 11 and the second cluster 12 is the Data Migration across cluster.
Scene across the Data Migration of cluster, can be the Data Migration between HIVE and HDFS, for instance, by the HDFS in the Data Migration in the HIVE of CDH5.2.0 to Hadoop2.6.0, or, it is also possible to it is by the HIVE of the Data Migration in the HDFS in Hadoop2.6.0 to CDH5.2.0.Namely Data Migration can be the migration of either direction between HIVE and HDFS.
Continuing with referring to Fig. 1, this application provides a Data Migration Tools 13, the process of Data Migration that will realize in this Data Migration Tools 13 across cluster.Such as, this Data Migration Tools 13 can be JVM (JavaVirtualMachine, Java Virtual Machine).In this JVM, it is possible to appoint mechanism according to the parents of Classloader, create two Classloaders, it is achieved the data manipulation to two clusters.Fig. 2 illustrates the flow process of this data migration method:
In step 201, loaded the data source of the first cluster by first kind loader, loaded the data source of the second cluster by Equations of The Second Kind loader;The loader of Data Migration Tools all inherited by first kind loader and Equations of The Second Kind loader.
For example, it is possible to first carry out the initialization of JVM, load the Jar bag that this JVM instrument itself is required.
In this step, it is possible to create two Classloaders, in conjunction with the example of Fig. 1, one is called first kind loader 14, and another is called Equations of The Second Kind loader 15.
Wherein, the loader of current JVM inherited by first kind loader 14, such as, when the first cluster 11 is CDH5.2.0, this first kind loader 14 can called after CDH5.2.0ClassLoader, load the data source of the first cluster 11 with this first kind loader 14, for instance, load the catalogue at CDHLib place, drive Jar including HIVEJDBC.
The loader of JVM inherited equally by Equations of The Second Kind loader 15, such as, when the second cluster 12 is Hadoop2.6.0, this Equations of The Second Kind loader 15 can called after Hadoop2.6.0ClassLoader, the data source of the second cluster 12 is loaded with this Equations of The Second Kind loader 15, such as, the catalogue at HadoopLib place is loaded.Above-mentioned first kind loader 14 and Equations of The Second Kind loader 15 inherit identical loader, are parents and appoint mechanism.
In step 202., in Data Migration Tools, first thread reads the data of the data source of the first cluster by first kind loader, places data into data queue.
Such as, JVM can create a new thread, this thread is properly termed as first thread.This first thread for reading data from the first cluster, the ContextClassLoader of this first thread can be set to CDH5.2.0ClassLoader so that first thread can read the data in the first cluster by first kind loader CDH5.2.0ClassLoader.
In this step, the data that first thread reads can be put in data queue, shown in Figure 1, illustrates one of them data queue 16, and the data that first thread reads are put in this queue.
In step 203, in Data Migration Tools, the data in data queue are write the data source of the second cluster by the second thread by Equations of The Second Kind loader.
Such as, JVM can create a new thread again, this thread is properly termed as the second thread.This second thread is used for write data in the second cluster, it is possible to the ContextClassLoader of this second thread is set to Hadoop2.6.0ClassLoader.In this step, Hadoop2.6.0ClassLoader can read out the data that first thread is put into from data queue 16, and writes data into the second cluster.
In the data migration method of this example, the data directory of two clusters is loaded respectively by first kind loader and Equations of The Second Kind loader, such as, the catalogue at HadoopLib place, data directory is equivalent to the deposit position of company-data, data source can also be called in the present example, the data that first kind loader reads from the first company-data catalogue, can be obtained and write data into the data directory of the second cluster by Equations of The Second Kind loader, the Data Migration between two clusters just can be realized thereby through this JVM, the mode of data is migrated relative to twice operation, improve the efficiency of Data Migration.As long as user provides the corresponding LIB of version number that two clusters are corresponding, can being completed the Data Migration between HIVE and HDFS of two clusters in a process by above-mentioned scheme realization, what shield different editions cluster realizes difference,
Additionally, in step 203 the second thread to the second cluster write data time, it is also possible to data are carried out, for instance, it is possible to according to after the cleaned filter data of rule that user specifies, then write data into.It addition, the data in transition process are all saved in same JVM, it is possible to be temporal cache in internal memory, thus do not have magnetic disc i/o to operate, be greatly improved data migration efficiency;When data buffer storage is in internal memory, in the internal memory life cycle of data, write the second cluster again to after the cleaned filter of data.
The data migration method that the application provides, both can apply to the migration in HIVE to HDFS direction, it is also possible to suitable in the migration in HDFS to HIVE direction.Following by several examples, detailed description uses the data migration method of the application to carry out the process of Data Migration between HIVE and HDFS.
By in the Data Migration of HIVE to HDFS
In the present example, the first cluster 11 is such as CDH5.2.0, and the data source of the first cluster is HIVE, i.e. CDH5.2.0Hive, shown in Figure 3.Second cluster 12 is such as Hadoop2.6.0, and the data source of the second cluster is HDFS, i.e. Hadoop2.6.0HDFSCluster.This example will by the Data Migration in CDH5.2.0Hive to Hadoop2.6.0HDFSCluster.
Continuing with the Data Migration referring to Fig. 3, CDH5.2.0Hive to Hadoop2.6.0HDFSCluster, it is possible to adopt the mode concurrently migrated to improve the efficiency of Data Migration further.In order to realize concurrently migrating, it is necessary to the data of migration are split, for instance, the data in CDH5.2.0Hive are divided into multiple data fragmentation, slice1, the slice2 in Fig. 3 ... this n data fragmentation parallel migration of ..slicen.
In this example, can creating multiple first thread in JVM, each thread is used for reading one of them data fragmentation, for instance, a first thread reads slice1, and another first thread reads slice2, and another first thread reads slicen etc..These first threads composition first thread pond.In being embodied as, first thread can be passed through HIVE interface and connect HIVE, this HIVE interface can be JDBC (JavaDataBaseConnectivity, java data base connects) interface, JDBC interface is a kind of JavaAPI for performing SQL statement, it is possible to provides unified for multiple relational database and accesses.First thread can create JDBCDriver example and connect HIVE, and performs the data in select statement reading HIVE with Statement, for instance can every 5000 row data be submitted in data queue.
Continuing with in conjunction with Fig. 3, also creating in JVM and have the second thread pool, this second thread pool includes multiple second thread, the corresponding first thread of each second thread, for connecting HDFS by HDFS interface, in data fragmentation one HDFS file of write that first thread is read.Such as, HDFS interface is FileSystem interface, uses FileSystemJAVAAPI can operate HDFS, carries out the operations such as the reading and writing of data, deletion at HDFS.Second thread can create HDFSFileSystem, and creates OutputStream example, and the data of a slice is write in a corresponding HDFSSubFile.Such as, the data of slice1 are write in corresponding HDFSSubFile1, the data of slice2 are write in corresponding HDFSSubFile2.Additionally, by the connection of the abstraction interface realization of JDBC and stream with HIVE or HDFS of two clusters in this example, shield the difference of different Hadoop release version.
In this example, the mode of the data partitioning data burst in CDH5.2.0Hive is had multiple, it is possible to divide according to arranging based on the ID index column in HIVE table or integer.Such as, if HIVE table exists ID index column 1-30000000 totally three thousand ten thousand line number evidence, if splitting into three data fragmentations, then 1-10000000 row is read by a first thread, 10000001-20000000 is read by another first thread, and 200000001-30000000 row is read by another first thread.Again such as, if there is integer row Price in HIVE table, split into 50, then the data that each first thread reads are select*fromtablewherePrice%50=Ilimit10000, and wherein i is Thread Id number.File process feature according to HDFS, the data volume of each data fragmentation is unsuitable too small.
Additionally, after all data of HIVE have all imported, user can the multiple HDFSSubFile of selection combining, this is optional operation.Such as, if timestamp supported by HIVE table, it is also possible to the data of each time period to be imported to HDFS, then by the HDFS Piece file mergence of different time sections.
The advantage concurrently migrated is in that, when after certain data fragmentation bust this, as long as deleting corresponding HDFS file and retransmitting, data rather than the total data of simply certain data fragmentation again migrated migrate and re-execute, it is possible to improve the efficiency of fault recovery.
By in the Data Migration of HDFS to HIVE
In the present example, the data source of the first cluster is HDFS, for instance CDH5.2.0HDFSCluster, and the data source of the second cluster is HIVE, for instance Hadoop2.6.0HIVE.By the data of HDFS to HIVE migrate time, it would however also be possible to employ the scheme concurrently migrated, to improve transport efficiency further.Further, the bottom storage of HIVE is HDFS, and therefore this example realizes concurrently writing according to HDFS.Hive table is divided into external table and internal table, respectively the migration pattern of both tables is illustrated as follows.
External table
Shown in Figure 4, in this example, data are migrated to Hadoop2.6.0HIVE external table by CDH5.2.0HDFSCluster.HIVE external table is often through the HDFS catalogue the specified memory space as it, insert data to be equal under this HDFS catalogue and increase or amendment file, therefore, it is equivalent to write data to HDFS catalogue to HIVE external table write data, writes data into the HDFS catalogue that external table is specified.
As shown in Figure 4, the data of CDH5.2.0HDFSCluster are split, be divided into multiple segmentTread, namely multiple data fragmentation it is divided into, segmentTread1 is a data fragmentation, and segmentTread2 is another data fragmentation etc., and this example divide into n burst.JVM reads data again by the multiple first threads in first thread pond, and each first thread reads a data fragmentation.Multiple second threads in second thread pool and the multiple first thread one_to_one corresponding in first thread pond, the corresponding first thread of each second thread, for writing the data fragmentation of this first thread in a HDFS file in the HDFS catalogue that HIVE external table is specified.Such as, certain first thread have read segmentTread1, then by the second thread of first thread segmentTread1 write being specified the HDFSSubFile1 in HDFS catalogue.
Internal table
Shown in Fig. 5, the scheme that by CDH5.2.0HDFSCluster, data are migrated to Hadoop2.6.0HIVE internal table is similar with the scheme moving to external table, is distinctive in that, the data storage location of internal table transfers to Hive to manage.Therefore in the present example, it is possible to create HDFS catalogue, according to the scheme of said external table, the data of each data fragmentation are write in the corresponding HDFS file of interim HDFS in first thread pond.Such as, certain first thread have read segmentTread1, then by the second thread of first thread segmentTread1 being write the HDFSSubFile1 of interim HDFS catalogue.
Each second thread is after the HDFS file that all data that self is responsible for are written to interim HDFS catalogue, it is possible to connect HIVE by HIVE interface, the data in HDFS file are directed into HIVE.For example, it is possible to create JDBCDriver example to connect Hive data base, using loaddatainpath statement that the data in HDFS file are directed into HIVE, this process and data copy.After copy terminates, it is possible to delete corresponding interim HDFS file, the interim HDFS catalogue of establishment deleted by all data copy after terminating.
The above-mentioned two kinds of situations writing data to HIVE external table and internal table, it is directed to the fractionation to source HDFS file, in this example, can be directly by file size mean allocation, such as, if the size of the source HDFS file data migrated in CDH5.2.0HDFSCluster is 1G, if this document being split into 10 data fragmentations, then the size of each data fragmentation can be obtain by file size mean allocation.Such as, what segmentThread1 included is the data of 0~102.4M, and what segmentThread2 included is the data of 102.4~204.8M.
When file declustering, each data fragmentation is required for determining the true original position of oneself and end position.Such as, when determining original position, if a upper character of the current position divided is newline, current location and original position, otherwise reads backward, read till newline always, then next line is the data that this burst should start to read.When determining end position, if the current rearmost position divided is not newline, reads backward, read till newline always, then last column data that this sheet application that behaves oneself decently read is read.
Fig. 6 illustrates the structure of a kind of data migration device, and this device can perform the data migration method of above-described embodiment, and as shown in Figure 6, this device may include that loader configuration module 61, data read module 62 and Data write. module 63.
Loader configuration module 61, for being loaded the data source of the first cluster by first kind loader, loads the data source of the second cluster by Equations of The Second Kind loader;The loader of described Data Migration Tools all inherited by described first kind loader and Equations of The Second Kind loader;
Data read module 62, in described Data Migration Tools, first thread reads the data of the data source of the first cluster by described first kind loader, and described data are put into data queue;
Data write. module 63, in described Data Migration Tools, the data in described data queue are write the data source of the second cluster by the second thread by described Equations of The Second Kind loader.
In one example, the data source of the first cluster is HIVE, and the data source of the second cluster is HDFS;
Data read module 62, including first thread pond, described first thread pond includes multiple described first thread, and each first thread, for connecting HIVE by HIVE interface, reads a data fragmentation of the data source of described HIVE;
Described Data write. module 63, including the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for connecting HDFS by HDFS interface, the data fragmentation of described first thread is write in a HDFS file.
Such as, HIVE interface is JDBC interface, and described HDFS interface is FileSystem interface.
In one example, the data source of the first cluster is HDFS, and the data source of the second cluster is HIVE.
Data read module 62, including first thread pond, described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data write. module 63, including the second thread pool, described second thread pool includes multiple described second thread, and the corresponding first thread of each second thread, for writing the data fragmentation of described first thread in a HDFS file in the HDFS catalogue that HIVE external table is specified.
In one example, the data source of the first cluster is HDFS, and the data source of the second cluster is HIVE.
Data read module 62, including first thread pond, described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data write. module 63, including the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for the data fragmentation of described first thread being write a HDFS file in interim HDFS catalogue, and connect HIVE by HIVE interface, the data in described HDFS file are write HIVE.
If described function is using the form realization of SFU software functional unit and as independent production marketing or use, it is possible to be stored in a computer read/write memory medium.Based on such understanding, part or the part of this technical scheme that prior art is contributed by the technical scheme of the disclosure substantially in other words can embody with the form of software product, this computer software product is stored in a storage medium, including some instructions with so that a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the disclosure.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-OnlyMemory), the various media that can store program code such as random access memory (RAM, RandomAccessMemory), magnetic disc or CD.
The foregoing is only the preferred embodiment of the disclosure, not in order to limit the disclosure, within all spirit in the disclosure and principle, any amendment of making, equivalent replacements, improvement etc., should be included within the scope that the disclosure is protected.

Claims (10)

1. a data migration method, it is characterised in that described data migration method is performed by Data Migration Tools, described method includes:
Loaded the data source of the first cluster by first kind loader, loaded the data source of the second cluster by Equations of The Second Kind loader;The loader of described Data Migration Tools all inherited by described first kind loader and Equations of The Second Kind loader;
In described Data Migration Tools, first thread reads the data of the data source of the first cluster by described first kind loader, and described data are put into data queue;
In described Data Migration Tools, the data in described data queue are write the data source of the second cluster by the second thread by described Equations of The Second Kind loader.
2. method according to claim 1, it is characterised in that
The data source of described first cluster is HIVE, and the data source of the second cluster is HDFS;
Described Data Migration Tools includes first thread pond, and described first thread pond includes multiple described first thread, and each first thread, for connecting HIVE by HIVE interface, reads a data fragmentation of the data source of described HIVE;
Described Data Migration Tools includes the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for connecting HDFS by HDFS interface, writes the data fragmentation of described first thread in a HDFS file.
3. method according to claim 2, it is characterised in that described HIVE interface is JDBC interface, described HDFS interface is FileSystem interface.
4. method according to claim 1, it is characterised in that
The data source of described first cluster is HDFS, and the data source of the second cluster is HIVE;
Described Data Migration Tools includes first thread pond, and described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data Migration Tools includes the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for writing the data fragmentation of described first thread in a HDFS file in the HDFS catalogue that HIVE external table is specified.
5. method according to claim 1, it is characterised in that
The data source of described first cluster is HDFS, and the data source of the second cluster is HIVE;
Described Data Migration Tools includes first thread pond, and described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data Migration Tools includes the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for the data fragmentation of described first thread being write a HDFS file in interim HDFS catalogue, and connect HIVE by HIVE interface, the data in described HDFS file are write HIVE.
6. a data migration device, it is characterised in that including:
Loader configuration module, for being loaded the data source of the first cluster by first kind loader, loads the data source of the second cluster by Equations of The Second Kind loader;The loader of described Data Migration Tools all inherited by described first kind loader and Equations of The Second Kind loader;
Data read module, in described Data Migration Tools, first thread reads the data of the data source of the first cluster by described first kind loader, and described data are put into data queue;
Data write. module, in described Data Migration Tools, the data in described data queue are write the data source of the second cluster by the second thread by described Equations of The Second Kind loader.
7. device according to claim 6, it is characterised in that the data source of described first cluster is HIVE, and the data source of the second cluster is HDFS;
Described data read module, including first thread pond, described first thread pond includes multiple described first thread, and each first thread, for connecting HIVE by HIVE interface, reads a data fragmentation of the data source of described HIVE;
Described Data write. module, including the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for connecting HDFS by HDFS interface, the data fragmentation of described first thread is write in a HDFS file.
8. device according to claim 7, it is characterised in that described HIVE interface is JDBC interface, described HDFS interface is FileSystem interface.
9. device according to claim 6, it is characterised in that the data source of described first cluster is HDFS, and the data source of the second cluster is HIVE;
Described data read module, including first thread pond, described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data write. module, including the second thread pool, described second thread pool includes multiple described second thread, and the corresponding first thread of each second thread, for writing the data fragmentation of described first thread in a HDFS file in the HDFS catalogue that HIVE external table is specified.
10. device according to claim 6, it is characterised in that the data source of described first cluster is HDFS, and the data source of the second cluster is HIVE;
Described data read module, including first thread pond, described first thread pond includes multiple described first thread, and each first thread is for reading a data fragmentation of the data source of described HDFS;
Described Data write. module, including the second thread pool, described second thread pool includes multiple described second thread, the corresponding first thread of each second thread, for the data fragmentation of described first thread being write a HDFS file in interim HDFS catalogue, and connect HIVE by HIVE interface, the data in described HDFS file are write HIVE.
CN201610007991.5A 2016-01-06 2016-01-06 Data migration method and device Pending CN105718507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610007991.5A CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610007991.5A CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Publications (1)

Publication Number Publication Date
CN105718507A true CN105718507A (en) 2016-06-29

Family

ID=56147117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610007991.5A Pending CN105718507A (en) 2016-01-06 2016-01-06 Data migration method and device

Country Status (1)

Country Link
CN (1) CN105718507A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777345A (en) * 2017-01-16 2017-05-31 山东浪潮商用系统有限公司 A kind of data pick-up loading method based on mass data migration
CN109753493A (en) * 2019-01-04 2019-05-14 中国银行股份有限公司 The method, apparatus and equipment of Data Migration are carried out between database
CN110493302A (en) * 2019-07-01 2019-11-22 联想(北京)有限公司 A kind of document transmission method, equipment and computer readable storage medium
CN110535978A (en) * 2019-10-08 2019-12-03 湖南新云网科技有限公司 Data transmission method, device, system and intelligent wearable equipment and storage medium
CN111262915A (en) * 2020-01-10 2020-06-09 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111274213A (en) * 2020-02-13 2020-06-12 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN111291023A (en) * 2020-02-09 2020-06-16 苏州浪潮智能科技有限公司 Data migration method, system, device and medium
CN111913663A (en) * 2020-07-29 2020-11-10 星辰天合(北京)数据科技有限公司 Storage volume online migration method and device and storage volume online migration system
CN112363678A (en) * 2021-01-13 2021-02-12 北京东方通软件有限公司 Data migration method and system based on message middleware
CN113032818A (en) * 2021-05-27 2021-06-25 北京国电通网络技术有限公司 Task encryption method and device, electronic equipment and computer readable medium
CN113656474A (en) * 2021-08-05 2021-11-16 京东科技控股股份有限公司 Service data access method and device, electronic equipment and storage medium
CN114363321A (en) * 2021-12-30 2022-04-15 支付宝(杭州)信息技术有限公司 File transmission method, equipment and system
CN115698974A (en) * 2020-06-10 2023-02-03 万迪斯科股份有限公司 Method, device and system for migrating active file systems
CN116628068A (en) * 2023-07-25 2023-08-22 杭州衡泰技术股份有限公司 Data handling method, system and readable storage medium based on dynamic window

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778960B1 (en) * 2005-10-20 2010-08-17 American Megatrends, Inc. Background movement of data between nodes in a storage cluster
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103324592A (en) * 2013-06-24 2013-09-25 华为技术有限公司 Data migration control method, data migration method and data migration device
CN104346240A (en) * 2013-08-05 2015-02-11 国际商业机器公司 Method and Apparatus Utilizing Multiple Memory Pools During Mobility Operations
CN104598453A (en) * 2013-10-31 2015-05-06 中国银联股份有限公司 Data migration method based on data buffering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778960B1 (en) * 2005-10-20 2010-08-17 American Megatrends, Inc. Background movement of data between nodes in a storage cluster
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103324592A (en) * 2013-06-24 2013-09-25 华为技术有限公司 Data migration control method, data migration method and data migration device
CN104346240A (en) * 2013-08-05 2015-02-11 国际商业机器公司 Method and Apparatus Utilizing Multiple Memory Pools During Mobility Operations
CN104598453A (en) * 2013-10-31 2015-05-06 中国银联股份有限公司 Data migration method based on data buffering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周志明: "《深入理解Java虚拟机 JVM高级特性与最佳实践》", 30 June 2013, 机械工业出版社 *
鲍亮 等: "《深入浅出云计算》", 31 October 2012, 清华大学出版社 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777345B (en) * 2017-01-16 2020-07-28 浪潮软件科技有限公司 Data extraction loading method based on mass data migration
CN106777345A (en) * 2017-01-16 2017-05-31 山东浪潮商用系统有限公司 A kind of data pick-up loading method based on mass data migration
CN109753493A (en) * 2019-01-04 2019-05-14 中国银行股份有限公司 The method, apparatus and equipment of Data Migration are carried out between database
CN110493302A (en) * 2019-07-01 2019-11-22 联想(北京)有限公司 A kind of document transmission method, equipment and computer readable storage medium
CN110535978A (en) * 2019-10-08 2019-12-03 湖南新云网科技有限公司 Data transmission method, device, system and intelligent wearable equipment and storage medium
CN111262915A (en) * 2020-01-10 2020-06-09 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111262915B (en) * 2020-01-10 2020-09-22 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111291023A (en) * 2020-02-09 2020-06-16 苏州浪潮智能科技有限公司 Data migration method, system, device and medium
CN111274213B (en) * 2020-02-13 2022-07-15 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN111274213A (en) * 2020-02-13 2020-06-12 苏州浪潮智能科技有限公司 Distributed file system HDFS (Hadoop distributed file system) cross-Insight cluster real-time data transmission method and system
CN115698974B (en) * 2020-06-10 2023-12-15 西拉塔股份有限公司 Method, device and system for migrating active file systems
CN115698974A (en) * 2020-06-10 2023-02-03 万迪斯科股份有限公司 Method, device and system for migrating active file systems
CN111913663A (en) * 2020-07-29 2020-11-10 星辰天合(北京)数据科技有限公司 Storage volume online migration method and device and storage volume online migration system
CN112363678A (en) * 2021-01-13 2021-02-12 北京东方通软件有限公司 Data migration method and system based on message middleware
CN112363678B (en) * 2021-01-13 2021-04-30 北京东方通软件有限公司 Data migration method and system based on message middleware
CN113032818B (en) * 2021-05-27 2021-08-31 北京国电通网络技术有限公司 Task encryption method and device, electronic equipment and computer readable medium
CN113032818A (en) * 2021-05-27 2021-06-25 北京国电通网络技术有限公司 Task encryption method and device, electronic equipment and computer readable medium
CN113656474A (en) * 2021-08-05 2021-11-16 京东科技控股股份有限公司 Service data access method and device, electronic equipment and storage medium
CN114363321A (en) * 2021-12-30 2022-04-15 支付宝(杭州)信息技术有限公司 File transmission method, equipment and system
CN116628068A (en) * 2023-07-25 2023-08-22 杭州衡泰技术股份有限公司 Data handling method, system and readable storage medium based on dynamic window

Similar Documents

Publication Publication Date Title
CN105718507A (en) Data migration method and device
US11288267B2 (en) Pluggable storage system for distributed file systems
US9361187B2 (en) File system metadata capture and restore
US9020996B2 (en) Synthetic view
US10248676B2 (en) Efficient B-Tree data serialization
US7613738B2 (en) FAT directory structure for use in transaction safe file system
CN106687911B (en) Online data movement without compromising data integrity
US9619502B2 (en) Combining row based and column based tables to form mixed-mode tables
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
CN105630409A (en) Dual data storage using an in-memory array and an on-disk page structure
CN105630864A (en) Forced ordering of a dictionary storing row identifier values
JP6598996B2 (en) Signature-based cache optimization for data preparation
US20160063018A1 (en) File System with Data Block Sharing
PH12014501762B1 (en) Method and apparatus for file storage
US9430503B1 (en) Coalescing transactional same-block writes for virtual block maps
US9740722B2 (en) Representing dynamic trees in a database
KR101584760B1 (en) Method and apparatus of journaling by block group unit for ordered mode journaling file system
JP6598997B2 (en) Cache optimization for data preparation
US10929347B2 (en) Defragmenting files having file blocks in multiple point-in-time copies
US20230222165A1 (en) Object storage-based indexing systems and method
CN105808451A (en) Data caching method and related apparatus
US9367573B1 (en) Methods and apparatus for archiving system having enhanced processing efficiency
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
US8977814B1 (en) Information lifecycle management for binding content
JP6707754B2 (en) Database management system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160629