CN103176843A

CN103176843A - File migration method and file migration equipment of Map Reduce distributed system

Info

Publication number: CN103176843A
Application number: CN2013100906609A
Authority: CN
Inventors: 潘瑾瑜
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2013-03-20
Filing date: 2013-03-20
Publication date: 2013-06-26
Anticipated expiration: 2033-03-20
Also published as: CN103176843B

Abstract

The invention provides a file migration method of a Map Reduce distributed system and file migration equipment of the Map Reduce distributed system. Migration operation which is used for migrating a target file is started. The migration operation at least comprises a first Map task and a second Map task which are executed in a parallel mode, and a Reduce task which corresponds to the first Map task and the second Map task so that in the Reduce task, metadata, in the target Map Reduce distributed system, of the target file can be generated. Due to the fact that the migration task which enables the target file to be migrated at least comprises the first Map task and the second Map task, and the first Map task and the second Map task are executed in the parallel mode, migration time of the target file can be shortened, and accordingly migration efficiency of the target is improved.

Description

The file migration method and apparatus of MapReduce distributed system

[technical field]

The present invention relates to the file migration technology, relate in particular to a kind of file migration method and apparatus of MapReduce distributed system.

[background technology]

In recent years, along with the fast development of broadband network technology and parallel computation theory, a kind of distributed system of more simplifying is namely shone upon and gathers (MapReduce) distributed system and arise at the historic moment, and thinks that multiple application provides service, for example, provide service for search engine.In the MapReduce distributed system, also can become the MapReduce distributed type assemblies, for example, the Hadoop system, in, a data processing procedure is called an operation (Job), Job is divided into N part with pending data after submitting to, and every part of pending data are processed by a mapping (Map) task, on the node device of Map task run in this MapReduce distributed system, can move one or more Map tasks on a node device; The Output rusults of all Map tasks gathers by gathering (Reduce) task, the result that output is corresponding.Wherein, Hadoop is the project of increasing income under Apache's software fund.

Yet, in the file migration process of MapReduce distributed system, be take file as unit, to move in same task, transport efficiency is not high.

[summary of the invention]

Many aspects of the present invention provide a kind of file migration method and apparatus of MapReduce distributed system, in order to improve the transport efficiency of file.

An aspect of of the present present invention provides a kind of file migration method of MapReduce distributed system, comprising:

Start the migration operation that is used for the migration file destination, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, described file destination comprises the first data and the second data at least, described the first data are stored at least one first data block, and described the second data are stored at least one second data block;

In a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system;

In described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system;

In described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system.

Aspect as above and arbitrary possible implementation further provide a kind of implementation, and the identification information of described file destination comprises the routing information that described file destination is stored in the file system of described target MapReduce distributed system.

Aspect as above and arbitrary possible implementation further provide a kind of implementation,

The identification information of described at least one the first data block is the side-play amount of starting position in described file destination of described at least one the first data block;

The side-play amount of the identification information of described at least one the second data block in described file destination is the side-play amount of starting position in described file destination of described at least one the second data block.

Aspect as above and arbitrary possible implementation, a kind of implementation further is provided, described in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, comprising:

In described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, revise the metadata in the mapping table that described target MapReduce distributed system safeguards, to generate the metadata of described file destination in described target MapReduce distributed system; Perhaps

In described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, metadata in the mapping table of safeguarding according to described target MapReduce distributed system, again with in described the first data and described the second data copy to new file, with as described file destination, and generate the described new metadata of file in described target MapReduce distributed system.

Aspect as above and arbitrary possible implementation further provide a kind of implementation, and described migration operation specifically is used for

Described file destination is moved in described target MapReduce distributed system from the MapReduce distributed system of source.

Another aspect of the present invention provides a kind of file migration equipment of MapReduce distributed system, comprising:

Start unit, be used for starting the migration operation that is used for the migration file destination, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, described file destination comprises the first data and the second data at least, described the first data are stored at least one first data block, and described the second data are stored at least one second data block;

The one Map task executing units is used in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system;

The 2nd Map task executing units is used in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system;

The Reduce task executing units, be used in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system.

Aspect as above and arbitrary possible implementation further provide a kind of implementation, and described Reduce task executing units specifically is used for

as shown from the above technical solution, the embodiment of the present invention is used for moving the migration operation of file destination by startup, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, so that in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system, and in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system, make in described Reduce task, can be according to the identification information of described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, because the migration task of moving a file destination comprises a Map task and the 2nd Map task at least, and a described Map task and described the 2nd Map task are executed in parallel, therefore, can shorten the transit time of this file destination, thereby improved the transport efficiency of file destination.

[description of drawings]

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, the below will do one to the accompanying drawing of required use in embodiment or description of the Prior Art and introduce simply, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The schematic flow sheet of the file migration method of the MapReduce distributed system that Fig. 1 provides for one embodiment of the invention;

Fig. 2 is that the migration task that starts in embodiment corresponding to Fig. 1 moves to file destination the schematic diagram of the Hadoop B of system from the Hadoop A of system;

The structural representation of the file migration equipment of the MapReduce distributed system that Fig. 3 provides for another embodiment of the present invention.

[embodiment]

For the purpose, technical scheme and the advantage that make the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

In addition, herein term " and/or ", be only a kind of incidence relation of describing affiliated partner, can there be three kinds of relations in expression, for example, A and/or B can represent: individualism A exists A and B, these three kinds of situations of individualism B simultaneously.In addition, character "/", represent that generally forward-backward correlation is to liking a kind of relation of "or" herein.

The schematic flow sheet of the file migration method of the MapReduce distributed system that Fig. 1 provides for one embodiment of the invention.

101, start the migration operation that is used for the migration file destination, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, described file destination comprises the first data and the second data at least, described the first data are stored at least one first data block, and described the second data are stored at least one second data block.

102, in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system.

103, in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system.

104, in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system.

Need to prove, 101～104 executive agent can be a MapReduce distributed system, for example, and target MapReduce distributed system or independent MapReduce distributed system etc.

like this, the migration operation that is used for moving file destination by startup, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, so that in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system, and in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system, make in described Reduce task, can be according to the identification information of described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, because the migration task of moving a file destination comprises a Map task and the 2nd Map task at least, and a described Map task and described the 2nd Map task are executed in parallel, therefore, can shorten the transit time of this file destination, thereby improved the transport efficiency of file destination.

The file migration method of existing MapReduce distributed system, the migration task of migration one file destination only comprises a Map task, that is to say, be take file as unit, to move in same Map task, transport efficiency is not high.

Alternatively, in one of the present embodiment possible implementation, the identification information of described file destination can comprise the routing information that described file destination is stored in the file system of described target MapReduce distributed system.

Alternatively, in one of the present embodiment possible implementation, the identification information of described at least one the first data block can be the side-play amount of starting position in described file destination of described at least one the first data block; Correspondingly, the side-play amount of the identification information of described at least one the second data block in described file destination is the side-play amount of starting position in described file destination of described at least one the second data block.

Alternatively, in one of the present embodiment possible implementation, in 104, specifically can be in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, revise the metadata in the mapping table that described target MapReduce distributed system safeguards, to generate the metadata of described file destination in described target MapReduce distributed system.

In this implementation, in described Reduce task, only need to revise, for example, merge etc., metadata in the mapping table that described target MapReduce distributed system is safeguarded need not the first data that described file destination is included and the second data and rewrites, and can further improve the transport efficiency of file destination.Wherein, the data block information of the first data block that can store for fileinfo and the first included data of described file destination of described file destination of described metadata and the second data second data block of storing.

alternatively, in one of the present embodiment possible implementation, in 104, specifically can also be in described Reduce task, identification information according to described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, metadata in the mapping table of safeguarding according to described target MapReduce distributed system, again with in described the first data and described the second data copy to new file, with as described file destination, and generate the described new metadata of file in described target MapReduce distributed system.

In this implementation, in described Reduce task, the first data and the second data that need to described file destination is included rewrite, and further generate the metadata of new file in described target MapReduce distributed system after rewriteeing.

Further, in described Reduce task, further other metadata of deletion and described metadata contradiction, thus can further improve the migration reliability of file destination.

Alternatively, in one of the present embodiment possible implementation, described migration operation specifically can be used for described file destination is moved to described target MapReduce distributed system from source MapReduce distributed system.That is to say, before 101, described file destination is stored in the file system of described source MapReduce distributed system, and after 104, described file destination has been written in the file system of described target MapReduce distributed system.

For the method that makes the embodiment of the present invention provide clearer, the below will be with the Hadoop system as an example, the file system of described MapReduce distributed system is Hadoop distributed file system (Hadoop Distributed File System, HDFS).As shown in Figure 2, hypothetical target file 1(file name is file 1, and store path is file1) be stored in data block 1, data block 2, data block 3, data block 4 and the data block 5 in the file system HDFS of the Hadoop A of system.Wherein, file destination 1 comprises data 1, data 2, data 3, data 4 and data 5, and data 1 are stored in data block 1, data 2 are stored in data block 2, data 3 are stored in data block 3, and data 4 are stored in data block 4, and data 5 are stored in data block 5.

The A of Hadoop system safeguards a mapping table A, has comprised the metadata relevant to file destination in this mapping table A, and is as follows:

File 1, [data block 1, data block 2, data block 3, data block 4 and data block 5];

The migration device start moves operation, comprises Map task 1, Map task 2, Map task 3, Map task 4 and the Map task 5 of executed in parallel in this migration operation, and corresponding Reduce task.Wherein, Map task N(N=1,2,3,4,5) key (Key) and value (Value) the identification information file1 and the data block N that are respectively described file destination.Particularly,

In Map task 1, according to the identification information file1 of described file destination and the identification information offset1 of described data block 1, data 1 are copied in the Hadoop B of system;

In Map task 2, according to the identification information file1 of described file destination and the identification information offset2 of described data block 2, data 2 are copied in the Hadoop B of system;

In Map task 3, according to the identification information file1 of described file destination and the identification information offset3 of described data block 3, data 3 are copied in the Hadoop B of system;

In Map task 4, according to the identification information file1 of described file destination and the identification information offset4 of described data block 4, data 4 are copied in the Hadoop B of system; And

In Map task 5, according to the identification information file1 of described file destination and the identification information offset5 of described data block 5, data 5 are copied in the Hadoop B of system.

The B of Hadoop system safeguards a mapping table B, has comprised the metadata relevant to file destination in this mapping table B, and is as follows:

File 1, [data block 1];

File 2, [data block 2];

File 3, [data block 3];

File 4, [data block 4];

File 5, [data block 5];

Wherein, the key of Reduce task (Key) and value (Value) are respectively identification information file1 and the data block N of described file destination.Particularly,

In described Reduce task, identification information file1, the identification information offset1 of described data block 1, identification information offset2, the identification information offset3 of described data block 3, the identification information offset4 of described data block 4 and the identification information offset5 of data block 2 of described data block 2 according to described file destination, metadata in the mapping table B that the modification Hadoop B of system safeguards is to generate the metadata of described file destination in the Hadoop B of system.Wherein, the metadata relevant to file destination after revising, as follows:

And in deletion mapping table B with other metadata of described metadata contradiction, i.e. the metadata of deletion, as follows:

File 1, [data block 1];

File 2, [data block 2];

File 3, [data block 3];

File 4, [data block 4];

File 5, [data block 5];

So far, file destination is moved to the Hadoop B of system from the Hadoop A of system, namely file destination 1 is stored in data block 1, data block 2, data block 3, data block 4 and data block 5 in the file system HDFS of the Hadoop B of system.

in the present embodiment, the migration operation that is used for moving file destination by startup, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, so that in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system, and in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system, make in described Reduce task, can be according to the identification information of described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, because the migration task of moving a file destination comprises a Map task and the 2nd Map task at least, and a described Map task and described the 2nd Map task are executed in parallel, therefore, can shorten the transit time of this file destination, thereby improved the transport efficiency of file destination.

In addition, if the migration of file destination failure only needs again to move the corresponding data of storing in failed data block, and need not again to move whole file destination, thereby can further improve the transport efficiency of file destination.

Need to prove, for aforesaid each embodiment of the method, for simple description, therefore it all is expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not subjected to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.

In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, there is no the part that describes in detail in certain embodiment, can be referring to the associated description of other embodiment.

The structural representation of the file migration equipment of the MapReduce distributed system that Fig. 3 provides for another embodiment of the present invention.As shown in Figure 3, the file migration equipment of the MapReduce distributed system that provides of the present embodiment can comprise start unit 31, a Map task executing units 32, the 2nd Map task executing units 33 and Reduce task executing units 34.Wherein, start unit 31, be used for starting the migration operation that is used for the migration file destination, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, described file destination comprises the first data and the second data at least, and described the first data are stored at least one first data block, and described the second data are stored at least one second data block; The one Map task executing units 32 is used in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system; The 2nd Map task executing units 33 is used in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system; Reduce task executing units 34, be used in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system.

Need to prove, the file migration equipment of the MapReduce distributed system that the present embodiment provides can be a MapReduce distributed system, for example, and target MapReduce distributed system or independent MapReduce distributed system etc.

like this, start the migration operation that is used for the migration file destination by start unit, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, so that a Map task executing units is in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system, and the 2nd the Map task executing units in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system, make the Reduce task executing units in described Reduce task, can be according to the identification information of described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, because the migration task of moving a file destination comprises a Map task and the 2nd Map task at least, and a described Map task and described the 2nd Map task are executed in parallel, therefore, can shorten the transit time of this file destination, thereby improved the transport efficiency of file destination.

The file migration equipment of existing MapReduce distributed system, the migration task of migration one file destination only comprises a Map task, that is to say, be take file as unit, to move in same Map task, transport efficiency is not high.

Alternatively, in one of the present embodiment possible implementation, described Reduce task executing units 34, specifically can be used in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, revise the metadata in the mapping table that described target MapReduce distributed system safeguards, to generate the metadata of described file destination in described target MapReduce distributed system.

In this implementation, described Reduce task executing units 34 is in described Reduce task, only need to revise, for example, merge etc., metadata in the mapping table that described target MapReduce distributed system is safeguarded need not the first data that described file destination is included and the second data and rewrites, and can further improve the transport efficiency of file destination.Wherein, the data block information of the first data block that can store for fileinfo and the first included data of described file destination of described file destination of described metadata and the second data second data block of storing.

alternatively, in one of the present embodiment possible implementation, described Reduce task executing units 34, specifically can also be used in described Reduce task, identification information according to described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, metadata in the mapping table of safeguarding according to described target MapReduce distributed system, again with in described the first data and described the second data copy to new file, with as described file destination, and generate the described new metadata of file in described target MapReduce distributed system.

In this implementation, described Reduce task executing units 34 is in described Reduce task, the first data and the second data that need to described file destination is included rewrite, and further generate the metadata of new file in described target MapReduce distributed system after rewriteeing.

Further, described Reduce task executing units 34 in described Reduce task, further other metadata of deletion and described metadata contradiction, thus can further improve the migration reliability of file destination.

Alternatively, in one of the present embodiment possible implementation, described migration operation specifically can be used for described file destination is moved to described target MapReduce distributed system from source MapReduce distributed system.That is to say, before the file migration equipment executable operations of the MapReduce distributed system that the present embodiment provides, described file destination is stored in the file system of described source MapReduce distributed system, after the file migration equipment executable operations of the MapReduce distributed system that the present embodiment provides, described file destination has been written in the file system of described target MapReduce distributed system.

For the method that makes the embodiment of the present invention provide clearer, the below will be with the Hadoop system as an example, the file system of described MapReduce distributed system is Hadoop distributed file system (Hadoop Distributed File System, HDFS).As shown in Figure 2, hypothetical target file 1(file name is file 1, and store path is file1) be stored in data block 1, data block 2, data block 3, data block 4 and the data block 5 in the file system HDFS of the Hadoop A of system.Wherein, file destination 1 comprises data 1, data 2, data 3, data 4 and data 5, and data 1 are stored in data block 1, data 2 are stored in data block 2, data 3 are stored in data block 3, and data 4 are stored in data block 4, and data 5 are stored in data block 5.Detailed description can referring to the related content in embodiment corresponding to Fig. 1, repeat no more herein.

in the present embodiment, start the migration operation that is used for the migration file destination by start unit, at least the Map task and the 2nd Map task that comprise executed in parallel in described migration operation, and a described Map task and Reduce task corresponding to described the 2nd Map task, so that a Map task executing units is in a described Map task, according to the identification information of described file destination and the identification information of described at least one the first data block, with described the first data copy in target MapReduce distributed system, and the 2nd the Map task executing units in described the 2nd Map task, according to the identification information of described file destination and the identification information of described at least one the second data block, with described the second data copy in target MapReduce distributed system, make the Reduce task executing units in described Reduce task, can be according to the identification information of described file destination, the identification information of the identification information of described at least one the first data block and described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, because the migration task of moving a file destination comprises a Map task and the 2nd Map task at least, and a described Map task and described the 2nd Map task are executed in parallel, therefore, can shorten the transit time of this file destination, thereby improved the transport efficiency of file destination.

The those skilled in the art can be well understood to, and is the convenience described and succinct, the system of foregoing description, and the specific works process of device and unit can with reference to the corresponding process in preceding method embodiment, not repeat them here.

In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method can realize by another way.For example, device embodiment described above is only schematic, for example, the division of described unit, be only that a kind of logic function is divided, during actual the realization, other dividing mode can be arranged, for example a plurality of unit or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, indirect coupling or the communication connection of device or unit can be electrically, machinery or other form.

Described unit as separating component explanation can or can not be also physically to separate, and the parts that show as the unit can be or can not be also physical locations, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select according to the actual needs wherein some or all of unit to realize the purpose of the present embodiment scheme.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, the form that also can adopt hardware to add SFU software functional unit realizes.

The above-mentioned integrated unit of realizing with the form of SFU software functional unit can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) or processor (processor) carry out the part steps of the described method of each embodiment of the present invention.And aforesaid storage medium comprises: the various media that can be program code stored such as USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD.

It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. the file migration method of a MapReduce distributed system, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the identification information of described file destination comprises the routing information that described file destination is stored in the file system of described target MapReduce distributed system.

3. method according to claim 1 and 2, is characterized in that,

4. according to claim 1～3 described methods of arbitrary claim, it is characterized in that, described in described Reduce task, according to the identification information of the identification information of described file destination, described at least one the first data block and the identification information of described at least one the second data block, generate the metadata of described file destination in described target MapReduce distributed system, comprising:

5. according to claim 1～4 described methods of arbitrary claim, is characterized in that, described migration operation specifically is used for

6. the file migration equipment of a MapReduce distributed system, is characterized in that, comprising:

7. equipment according to claim 6, is characterized in that, the identification information of described file destination comprises the routing information that described file destination is stored in the file system of described target MapReduce distributed system.

8. according to claim 6 or 7 described equipment, is characterized in that,

9. according to claim 6～8 described equipment of arbitrary claim, is characterized in that, described Reduce task executing units specifically is used for

10. according to claim 6～9 described equipment of arbitrary claim, is characterized in that, described migration operation specifically is used for