CN103176843B - The file migration method and apparatus of MapReduce distributed system - Google Patents
The file migration method and apparatus of MapReduce distributed system Download PDFInfo
- Publication number
- CN103176843B CN103176843B CN201310090660.9A CN201310090660A CN103176843B CN 103176843 B CN103176843 B CN 103176843B CN 201310090660 A CN201310090660 A CN 201310090660A CN 103176843 B CN103176843 B CN 103176843B
- Authority
- CN
- China
- Prior art keywords
- identification information
- file destination
- data block
- data
- distributed system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of file migration method and apparatus of MapReduce distributed system.The embodiment of the present invention passes through the migration operation started for migrating file destination, the first Map task and the 2nd Map task executed parallel is included at least in the migration operation, and the first Map task and the corresponding Reduce task of the 2nd Map task, so that in Reduce task, metadata of the file destination in the target MapReduce distributed system can be generated, since the migration task of one file destination of migration includes at least the first Map task and the 2nd Map task, and the first Map task and the 2nd Map task execute parallel, therefore, the transit time of the file destination can be shortened, to improve the transport efficiency of file destination.
Description
[technical field]
The present invention relates to file migration technology more particularly to a kind of file migration methods of MapReduce distributed system
And equipment.
[background technique]
In recent years, with the fast development of broadband network technology and parallel computation theory, a kind of more simplified distributed system
System maps and summarizes (MapReduce) distributed system and comes into being, to provide service for a variety of applications, for example, for search
Engine provides service.In MapReduce distributed system, MapReduce distributed type assemblies can also be become, for example, Hadoop
System, in, a data handling procedure is known as an operation (Job) and pending data is divided into N parts, often after Job is submitted
Part pending data is handled by mapping (Map) task, and Map task run is in the MapReduce distributed system
A node device on, one or more Map tasks can be run on a node device;The output knot of all Map tasks
Fruit is summarized by summarizing (Reduce) task, exports corresponding result.Wherein, Hadoop is one under Apache's software fund
A open source projects.
However, being as unit of file, at same during the file migration of MapReduce distributed system
It is migrated in business, transport efficiency is not high.
[summary of the invention]
Many aspects of the invention provide a kind of file migration method and apparatus of MapReduce distributed system, to
Improve the transport efficiency of file.
An aspect of of the present present invention provides a kind of file migration method of MapReduce distributed system, comprising:
Start the migration operation for migrating file destination, includes at least first executed parallel in the migration operation
Map task and the 2nd Map task and the first Map task and the corresponding Reduce task of the 2nd Map task, institute
File destination is stated including at least the first data and the second data, first data are stored at least one first data block,
Second data are stored at least one second data block;
In the first Map task, according to the identification information of the file destination and at least one described first data
The identification information of block, by first data copy into target MapReduce distributed system;
In the 2nd Map task, according to the identification information of the file destination and at least one described second data
The identification information of block, by second data copy into target MapReduce distributed system;
In the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, generate the file destination in the target
Metadata in MapReduce distributed system.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the target text
The identification information of part includes that the file destination is stored in the file system of the target MapReduce distributed system
Routing information.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation,
The identification information of at least one first data block is that the starting position of at least one first data block exists
Offset in the file destination;
Offset of the identification information of at least one second data block in the file destination is described at least one
Offset of the starting position of a second data block in the file destination.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, it is described described
In Reduce task, according to the identification information of the file destination, the identification information of at least one first data block and institute
The identification information for stating at least one the second data block generates the file destination in the target MapReduce distributed system
In metadata, comprising:
In the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, modify the target MapReduce distributed system
The metadata in mapping table safeguarded, to generate the file destination in the target MapReduce distributed system
Metadata;Or
In the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, according to the target MapReduce distributed system
The metadata in mapping table safeguarded, again by first data and second data copy to a new file
In, using as the file destination, and generate member of the new file in the target MapReduce distributed system
Data.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the migration are made
Industry is specifically used for
It is distributed that the file destination is moved to the target MapReduce from the MapReduce distributed system of source
In system.
Another aspect of the present invention provides a kind of file migration equipment of MapReduce distributed system, comprising:
Start unit includes at least simultaneously in the migration operation for starting the migration operation for migrating file destination
The first Map task and the 2nd Map task and the first Map task and the 2nd Map task that row executes are corresponding
Reduce task, the file destination include at least the first data and the second data, and first data are stored at least one
In first data block, second data are stored at least one second data block;
First Map task executing units, for being believed in the first Map task according to the mark of the file destination
The identification information of breath and at least one first data block, first data copy is distributed to target MapReduce
In system;
2nd Map task executing units, for being believed in the 2nd Map task according to the mark of the file destination
The identification information of breath and at least one second data block, second data copy is distributed to target MapReduce
In system;
Reduce task executing units, in the Reduce task, according to the identification information of the file destination,
The identification information of the identification information of at least one first data block and at least one second data block, generates the mesh
Mark metadata of the file in the target MapReduce distributed system.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the target text
The identification information of part includes that the file destination is stored in the file system of the target MapReduce distributed system
Routing information.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation,
The identification information of at least one first data block is that the starting position of at least one first data block exists
Offset in the file destination;
Offset of the identification information of at least one second data block in the file destination is described at least one
Offset of the starting position of a second data block in the file destination.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the Reduce
Task executing units are specifically used for
In the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, modify the target MapReduce distributed system
The metadata in mapping table safeguarded, to generate the file destination in the target MapReduce distributed system
Metadata;Or
In the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, according to the target MapReduce distributed system
The metadata in mapping table safeguarded, again by first data and second data copy to a new file
In, using as the file destination, and generate member of the new file in the target MapReduce distributed system
Data.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the migration are made
Industry is specifically used for
It is distributed that the file destination is moved to the target MapReduce from the MapReduce distributed system of source
In system.
As shown from the above technical solution, the embodiment of the present invention passes through the migration operation started for migrating file destination, institute
State included at least in migration operation the first Map task that executes parallel and the 2nd Map task and the first Map task and
The corresponding Reduce task of the 2nd Map task, so that in the first Map task, according to the file destination
The identification information of identification information and at least one first data block, by first data copy to target MapReduce
In distributed system, and in the 2nd Map task, according to the identification information of the file destination and it is described at least one
The identification information of second data block, by second data copy into target MapReduce distributed system, so that described
It, can be according to the identification information of the file destination, the identification information of at least one first data block in Reduce task
With the identification information of at least one second data block, it is distributed in the target MapReduce to generate the file destination
Metadata in system, since the migration task of one file destination of migration includes at least the first Map task and the 2nd Map task,
And the first Map task and the 2nd Map task execute parallel, therefore, can shorten the migration of the file destination
Time, to improve the transport efficiency of file destination.
[Detailed description of the invention]
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is some realities of the invention
Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be attached according to these
Figure obtains other attached drawings.
Fig. 1 is the process signal of the file migration method for the MapReduce distributed system that one embodiment of the invention provides
Figure;
Fig. 2 is moved to file destination from Hadoop system A by the migration task started in the corresponding embodiment of Fig. 1
The schematic diagram of Hadoop system B;
Fig. 3 be another embodiment of the present invention provides the structure of file migration equipment of MapReduce distributed system show
It is intended to.
[specific embodiment]
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist
Three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately
Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or".
Fig. 1 is the process signal of the file migration method for the MapReduce distributed system that one embodiment of the invention provides
Figure.
101, start the migration operation for migrating file destination, the executed parallel is included at least in the migration operation
One Map task and the 2nd Map task and the first Map task and the corresponding Reduce task of the 2nd Map task,
The file destination includes at least the first data and the second data, and first data are stored at least one first data block
In, second data are stored at least one second data block.
102, in the first Map task, according to the identification information of the file destination and it is described at least one first
The identification information of data block, by first data copy into target MapReduce distributed system.
103, in the 2nd Map task, according to the identification information of the file destination and it is described at least one second
The identification information of data block, by second data copy into target MapReduce distributed system.
104, in the Reduce task, according to the identification information of the file destination, at least one described first number
According to the identification information of block and the identification information of at least one second data block, the file destination is generated in the target
Metadata in MapReduce distributed system.
It should be noted that 101~104 executing subject can be a MapReduce distributed system, for example, mesh
Mark MapReduce distributed system or an individual MapReduce distributed system etc..
In this way, including at least in the migration operation and holding parallel by starting the migration operation for migrating file destination
The first capable Map task and the 2nd Map task and the first Map task and the corresponding Reduce of the 2nd Map task
Task, so that in the first Map task, according to the identification information of the file destination and at least one described first number
According to the identification information of block, by first data copy into target MapReduce distributed system, and described second
It, will be described according to the identification information of the identification information of the file destination and at least one second data block in Map task
Second data copy is into target MapReduce distributed system, can be according to the mesh so that in the Reduce task
Mark identification information, the identification information of at least one first data block and the mark of at least one second data block of file
Know information, metadata of the file destination in the target MapReduce distributed system is generated, due to migrating a target
The migration task of file includes at least the first Map task and the 2nd Map task, and the first Map task and described second
Map task executes parallel, therefore, can shorten the transit time of the file destination, to improve the migration of file destination
Efficiency.
The migration task of the file migration method of existing MapReduce distributed system, one file destination of migration is only wrapped
Containing a Map task, that is to say, that migrated in the same Map task, transport efficiency is not as unit of file
It is high.
Optionally, in a possible implementation of the present embodiment, the identification information of the file destination be can wrap
Include the routing information that the file destination is stored in the file system of the target MapReduce distributed system.
Optionally, in a possible implementation of the present embodiment, the mark of at least one first data block
Information can be offset of the starting position of at least one first data block in the file destination;Correspondingly, institute
Stating offset of the identification information of at least one the second data block in the file destination is then at least one described second number
According to offset of the starting position of block in the file destination.
It optionally, in 104, specifically can be in the Reduce in a possible implementation of the present embodiment
In task, according to the identification information of the file destination, at least one first data block identification information and it is described at least
The identification information of one the second data block modifies the member in the mapping table that the target MapReduce distributed system is safeguarded
Data, to generate metadata of the file destination in the target MapReduce distributed system.
In this implementation, in the Reduce task, it is only necessary to it modifies, for example, merge etc., the target
The metadata in mapping table that MapReduce distributed system is safeguarded, without being counted included by the file destination first
It is written over according to the second data, the transport efficiency of file destination can be further increased.Wherein, the metadata can be institute
State file destination the file information and the file destination included by the first data block for being stored of the first data and the second number
According to the data block information of the second data block stored.
It optionally, in 104, specifically can also be described in a possible implementation of the present embodiment
In Reduce task, according to the identification information of the file destination, the identification information of at least one first data block and institute
The identification information for stating at least one the second data block, the mapping table safeguarded according to the target MapReduce distributed system
In metadata, again by first data and second data copy into a new file, using as the mesh
File is marked, and generates metadata of the new file in the target MapReduce distributed system.
In this implementation, it in the Reduce task, needs the first data included by the file destination
It is written over the second data, and further generates the new file after rewriteeing in the target MapReduce distributed system
In metadata.
Further, in the Reduce task, other yuan contradictory with the metadata can also further be deleted
Data, so as to further increase the migration reliability of file destination.
Optionally, in a possible implementation of the present embodiment, the migration operation specifically can be used for institute
File destination is stated to move in the target MapReduce distributed system from the MapReduce distributed system of source.Namely
Say, before 101, the file destination is stored in the file system of the source MapReduce distributed system, 104 it
Afterwards, the file destination then has been written in the file system of the target MapReduce distributed system.
It, below will be described with Hadoop system as an example to make method provided in an embodiment of the present invention clearer
The file system of MapReduce distributed system is Hadoop distributed file system (Hadoop Distributed File
System, HDFS).As illustrated in fig. 2, it is assumed that file destination 1(file name is file 1, store path file1) it is stored in
In data block 1, data block 2, data block 3, data block 4 and data block 5 in the file system HDFS of Hadoop system A.Wherein,
File destination 1 includes data 1, data 2, data 3, data 4 and data 5, and data 1 are stored in data block 1, and data 2 are stored in
In data block 2, data 3 are stored in data block 3, and data 4 are stored in data block 4 and data 5 are stored in data block 5.
Hadoop system A safeguards a mapping table A, contains metadata relevant to file destination in mapping table A, such as
Shown in lower:
File 1, [data block 1, data block 2, data block 3, data block 4 and data block 5];
Equipment starting migration operation is migrated, includes Map task 1, the Map task 2, Map executed parallel in the migration operation
Task 3, Map task 4 and Map task 5 and corresponding Reduce task.Wherein, key Map task N(N=1,2,3,4,5)
(Key) and value (Value) be respectively the file destination identification information file1 and data block N.Specifically,
In Map task 1, according to the identification information of the identification information file1 of the file destination and the data block 1
Offset1 copies data 1 in Hadoop system B to;
In Map task 2, according to the identification information of the identification information file1 of the file destination and the data block 2
Offset2 copies data 2 in Hadoop system B to;
In Map task 3, according to the identification information of the identification information file1 of the file destination and the data block 3
Offset3 copies data 3 in Hadoop system B to;
In Map task 4, according to the identification information of the identification information file1 of the file destination and the data block 4
Offset4 copies data 4 in Hadoop system B to;And
In Map task 5, according to the identification information of the identification information file1 of the file destination and the data block 5
Offset5 copies data 5 in Hadoop system B to.
Hadoop system B safeguards a mapping table B, contains metadata relevant to file destination in mapping table B, such as
Shown in lower:
File 1, [data block 1];
File 2, [data block 2];
File 3, [data block 3];
File 4, [data block 4];
File 5, [data block 5];
Wherein, the key (Key) of Reduce task and value (Value) are respectively the identification information file1 of the file destination
With data block N.Specifically,
In the Reduce task, according to the identification information file1 of the file destination, the mark of the data block 1
Information offset1, the identification information offset2 of the data block 2, the identification information offset3 of the data block 3, the number
According to the identification information offset4 of the block 4 and identification information offset5 of data block 2, the mapping that Hadoop system B is safeguarded is modified
Metadata in table B, to generate metadata of the file destination in Hadoop system B.Wherein, after modifying and target
The relevant metadata of file, as follows:
File 1, [data block 1, data block 2, data block 3, data block 4 and data block 5];
And delete other metadata contradictory with the metadata in mapping table B, that is, the metadata deleted is as follows:
File 1, [data block 1];
File 2, [data block 2];
File 3, [data block 3];
File 4, [data block 4];
File 5, [data block 5];
So far, file destination is moved into Hadoop system B from Hadoop system A, i.e. file destination 1 is stored in Hadoop
In data block 1, data block 2, data block 3, data block 4 and data block 5 in the file system HDFS of system B.
In the present embodiment, by starting the migration operation for migrating file destination, included at least in the migration operation
The first Map task executed parallel and the 2nd Map task and the first Map task and the 2nd Map task are corresponding
Reduce task so that in the first Map task, according to the identification information of the file destination and it is described at least one
The identification information of first data block, by first data copy into target MapReduce distributed system, and described
It, will according to the identification information of the identification information of the file destination and at least one second data block in 2nd Map task
Second data copy is into target MapReduce distributed system, can be according to institute so that in the Reduce task
State the identification information of file destination, the identification information of at least one first data block and at least one described second data block
Identification information, generate metadata of the file destination in the target MapReduce distributed system, due to migration one
The migration task of file destination include at least the first Map task and the 2nd Map task, and the first Map task with it is described
2nd Map task executes parallel, therefore, can shorten the transit time of the file destination, to improve file destination
Transport efficiency.
In addition, only needing pair stored in the data block for migrating failure again if the migration of file destination fails
Data are answered, without migrating entire file destination again, so as to further increase the transport efficiency of file destination.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
Fig. 3 be another embodiment of the present invention provides the structure of file migration equipment of MapReduce distributed system show
It is intended to.As shown in figure 3, the file migration equipment of MapReduce distributed system provided in this embodiment may include that starting is single
First 31, the first Map task executing units 32, the 2nd Map task executing units 33 and Reduce task executing units 34.Wherein,
Start unit 31 includes at least parallel execute for starting the migration operation for migrating file destination in the migration operation
The first Map task and the 2nd Map task and the first Map task and the corresponding Reduce of the 2nd Map task appoint
Business, the file destination include at least the first data and the second data, and first data are stored at least one first data
In block, second data are stored at least one second data block;First Map task executing units 32, for described
It, will according to the identification information of the identification information of the file destination and at least one first data block in first Map task
First data copy is into target MapReduce distributed system;2nd Map task executing units 33, for described
It, will according to the identification information of the identification information of the file destination and at least one second data block in 2nd Map task
Second data copy is into target MapReduce distributed system;Reduce task executing units 34, for described
In Reduce task, according to the identification information of the file destination, the identification information of at least one first data block and institute
The identification information for stating at least one the second data block generates the file destination in the target MapReduce distributed system
In metadata.
It should be noted that the file migration equipment of MapReduce distributed system provided in this embodiment can be one
A MapReduce distributed system, for example, target MapReduce distributed system or an individual MapReduce distribution
System etc..
In this way, starting the migration operation for migrating file destination by start unit, at least wrapped in the migration operation
It is corresponding containing the first Map task executed parallel and the 2nd Map task and the first Map task and the 2nd Map task
Reduce task so that the first Map task executing units are in the first Map task, according to the file destination
The identification information of identification information and at least one first data block, by first data copy to target MapReduce
In distributed system and the 2nd Map task executing units are in the 2nd Map task, according to the mark of the file destination
The identification information for knowing information and at least one second data block, by second data copy to target MapReduce point
In cloth system, so that Reduce task executing units are in the Reduce task, it can be according to the mark of the file destination
Know information, the identification information of at least one first data block and the identification information of at least one second data block, it is raw
At metadata of the file destination in the target MapReduce distributed system, due to moving for one file destination of migration
Shifting task includes at least the first Map task and the 2nd Map task, and the first Map task is with the 2nd Map task
It executes parallel, therefore, the transit time of the file destination can be shortened, to improve the transport efficiency of file destination.
The migration task of the file migration equipment of existing MapReduce distributed system, one file destination of migration is only wrapped
Containing a Map task, that is to say, that migrated in the same Map task, transport efficiency is not as unit of file
It is high.
Optionally, in a possible implementation of the present embodiment, the identification information of the file destination be can wrap
Include the routing information that the file destination is stored in the file system of the target MapReduce distributed system.
Optionally, in a possible implementation of the present embodiment, the mark of at least one first data block
Information can be offset of the starting position of at least one first data block in the file destination;Correspondingly, institute
Stating offset of the identification information of at least one the second data block in the file destination is then at least one described second number
According to offset of the starting position of block in the file destination.
Optionally, in a possible implementation of the present embodiment, the Reduce task executing units 34, specifically
It can be used in the Reduce task, according to the identification information of the file destination, at least one described first data block
Identification information and at least one second data block identification information, modify the target MapReduce distributed system
The metadata in mapping table safeguarded, to generate the file destination in the target MapReduce distributed system
Metadata.
In this implementation, the Reduce task executing units 34 are in the Reduce task, it is only necessary to it modifies,
For example, merge etc., the metadata in mapping table that the target MapReduce distributed system is safeguarded is not necessarily to the mesh
First data included by mark file and the second data are written over, and can further increase the transport efficiency of file destination.Its
In, the metadata can the first data included by the file information of the file destination and the file destination store
The first data block and the data block information of the second data block that is stored of the second data.
Optionally, in a possible implementation of the present embodiment, the Reduce task executing units 34, specifically
It can be also used in the Reduce task, according to the identification information of the file destination, at least one described first data
The identification information of the identification information of block and at least one second data block, according to the target MapReduce distribution system
The metadata united in safeguarded mapping table, again by first data and second data copy to a new file
In, using as the file destination, and generate member of the new file in the target MapReduce distributed system
Data.
In this implementation, for the Reduce task executing units 34 in the Reduce task, needing will be described
First data and the second data included by file destination are written over, and further generate the new file after rewriteeing described
Metadata in target MapReduce distributed system.
Further, the Reduce task executing units 34 can also be deleted further in the Reduce task
Other metadata contradictory with the metadata, so as to further increase the migration reliability of file destination.
Optionally, in a possible implementation of the present embodiment, the migration operation specifically can be used for institute
File destination is stated to move in the target MapReduce distributed system from the MapReduce distributed system of source.Namely
It says, before the file migration equipment of MapReduce distributed system provided in this embodiment executes operation, the file destination
It is stored in the file system of the source MapReduce distributed system, in MapReduce distribution provided in this embodiment system
After the file migration equipment of system executes operation, the file destination then has been written to the target MapReduce distribution system
In the file system of system.
It, below will be described with Hadoop system as an example to make method provided in an embodiment of the present invention clearer
The file system of MapReduce distributed system is Hadoop distributed file system (Hadoop Distributed File
System, HDFS).As illustrated in fig. 2, it is assumed that file destination 1(file name is file 1, store path file1) it is stored in
In data block 1, data block 2, data block 3, data block 4 and data block 5 in the file system HDFS of Hadoop system A.Wherein,
File destination 1 includes data 1, data 2, data 3, data 4 and data 5, and data 1 are stored in data block 1, and data 2 are stored in
In data block 2, data 3 are stored in data block 3, and data 4 are stored in data block 4 and data 5 are stored in data block 5.
Detailed description may refer to the related content in the corresponding embodiment of Fig. 1, and details are not described herein again.
In the present embodiment, the migration operation for migrating file destination is started by start unit, in the migration operation
Appoint including at least the first Map task executed parallel and the 2nd Map task and the first Map task and the 2nd Map
It is engaged in corresponding Reduce task, so that the first Map task executing units are in the first Map task, according to the target
The identification information of the identification information of file and at least one first data block, by first data copy to target
In MapReduce distributed system and the 2nd Map task executing units are in the 2nd Map task, according to the target
The identification information of the identification information of file and at least one second data block, by second data copy to target
In MapReduce distributed system, so that Reduce task executing units are in the Reduce task, it can be according to the mesh
Mark identification information, the identification information of at least one first data block and the mark of at least one second data block of file
Know information, metadata of the file destination in the target MapReduce distributed system is generated, due to migrating a target
The migration task of file includes at least the first Map task and the 2nd Map task, and the first Map task and described second
Map task executes parallel, therefore, can shorten the transit time of the file destination, to improve the migration of file destination
Efficiency.
In addition, only needing pair stored in the data block for migrating failure again if the migration of file destination fails
Data are answered, without migrating entire file destination again, so as to further increase the transport efficiency of file destination.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various
It can store the medium of program code.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of file migration method of MapReduce distributed system characterized by comprising
Start the migration operation for migrating file destination, includes at least the first Map executed parallel in the migration operation and appoint
Business and the 2nd Map task and the first Map task and the corresponding Reduce task of the 2nd Map task, the target
File includes at least the first data and the second data, and first data are stored at least one first data block, and described the
Two data are stored at least one second data block;
In the first Map task, according to the identification information of the file destination and at least one first data block
Identification information, by first data copy into target MapReduce distributed system;
In the 2nd Map task, according to the identification information of the file destination and at least one second data block
Identification information, by second data copy into target MapReduce distributed system;
In the Reduce task, according to the identification information of the file destination, the mark of at least one first data block
The identification information for knowing information and at least one second data block, generates the file destination in the target MapReduce
Metadata in distributed system.
2. the method according to claim 1, wherein the identification information of the file destination includes the target text
The routing information that part is stored in the file system of the target MapReduce distributed system.
3. method according to claim 1 or 2, which is characterized in that
The identification information of at least one first data block is the starting position of at least one first data block described
Offset in file destination;
Offset of the identification information of at least one second data block in the file destination be it is described at least one the
Offset of the starting position of two data blocks in the file destination.
4. method according to claim 1 or 2, which is characterized in that it is described in the Reduce task, according to the mesh
Mark identification information, the identification information of at least one first data block and the mark of at least one second data block of file
Know information, generate metadata of the file destination in the target MapReduce distributed system, comprising:
In the Reduce task, according to the identification information of the file destination, the mark of at least one first data block
The identification information for knowing information and at least one second data block, modifies the target MapReduce distributed system and is tieed up
Metadata in the mapping table of shield, to generate first number of the file destination in the target MapReduce distributed system
According to;Or
In the Reduce task, according to the identification information of the file destination, the mark of at least one first data block
The identification information for knowing information and at least one second data block, is tieed up according to the target MapReduce distributed system
Metadata in the mapping table of shield, again by first data and second data copy into a new file, with
As the file destination, and generate metadata of the new file in the target MapReduce distributed system.
5. method according to claim 1 or 2, which is characterized in that the migration operation is specifically used for
The file destination is moved into the target MapReduce distributed system from the MapReduce distributed system of source
In.
6. a kind of file migration equipment of MapReduce distributed system characterized by comprising
Start unit is included at least in the migration operation and is held parallel for starting the migration operation for migrating file destination
The first capable Map task and the 2nd Map task and the first Map task and the corresponding Reduce of the 2nd Map task
Task, the file destination include at least the first data and the second data, and first data are stored at least one first number
According in block, second data are stored at least one second data block;
First Map task executing units, in the first Map task, according to the identification information of the file destination and
The identification information of at least one first data block, by first data copy to target MapReduce distributed system
In;
2nd Map task executing units, in the 2nd Map task, according to the identification information of the file destination and
The identification information of at least one second data block, by second data copy to target MapReduce distributed system
In;
Reduce task executing units, in the Reduce task, according to the identification information of the file destination, described
The identification information of the identification information of at least one the first data block and at least one second data block generates the target text
Metadata of the part in the target MapReduce distributed system.
7. equipment according to claim 6, which is characterized in that the identification information of the file destination includes the target text
The routing information that part is stored in the file system of the target MapReduce distributed system.
8. equipment according to claim 6 or 7, which is characterized in that
The identification information of at least one first data block is the starting position of at least one first data block described
Offset in file destination;
Offset of the identification information of at least one second data block in the file destination be it is described at least one the
Offset of the starting position of two data blocks in the file destination.
9. equipment according to claim 6 or 7, which is characterized in that the Reduce task executing units are specifically used for
In the Reduce task, according to the identification information of the file destination, the mark of at least one first data block
The identification information for knowing information and at least one second data block, modifies the target MapReduce distributed system and is tieed up
Metadata in the mapping table of shield, to generate first number of the file destination in the target MapReduce distributed system
According to;Or
In the Reduce task, according to the identification information of the file destination, the mark of at least one first data block
The identification information for knowing information and at least one second data block, is tieed up according to the target MapReduce distributed system
Metadata in the mapping table of shield, again by first data and second data copy into a new file, with
As the file destination, and generate metadata of the new file in the target MapReduce distributed system.
10. according to claim 6 or 7 seeks the equipment, which is characterized in that the migration operation is specifically used for
The file destination is moved into the target MapReduce distributed system from the MapReduce distributed system of source
In.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310090660.9A CN103176843B (en) | 2013-03-20 | 2013-03-20 | The file migration method and apparatus of MapReduce distributed system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310090660.9A CN103176843B (en) | 2013-03-20 | 2013-03-20 | The file migration method and apparatus of MapReduce distributed system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103176843A CN103176843A (en) | 2013-06-26 |
CN103176843B true CN103176843B (en) | 2018-12-14 |
Family
ID=48636744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310090660.9A Active CN103176843B (en) | 2013-03-20 | 2013-03-20 | The file migration method and apparatus of MapReduce distributed system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103176843B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105808612B (en) * | 2014-12-31 | 2019-08-27 | 北京嘀嘀无限科技发展有限公司 | The method and apparatus of data for migrating data library |
CN106528711B (en) * | 2016-11-02 | 2019-04-30 | 北京集奥聚合科技有限公司 | Intersection solving method and system for data of out-of-table files |
CN111444148B (en) * | 2020-04-09 | 2023-09-05 | 南京大学 | Data transmission method and device based on MapReduce |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000047996A (en) * | 1998-07-31 | 2000-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Load leveling method for distributed system |
CN101764835A (en) * | 2008-12-25 | 2010-06-30 | 华为技术有限公司 | Task allocation method and device based on MapReduce programming framework |
CN102196049A (en) * | 2011-05-31 | 2011-09-21 | 北京大学 | Method suitable for secure migration of data in storage cloud |
RU2469388C1 (en) * | 2011-09-19 | 2012-12-10 | Российская Федерация, от имени которой выступает Государственная корпорация по атомной энергии "Росатом" - Госкорпорация "Росатом" | Method of handling data stored in parallel file system with hierarchical memory organisation |
CN102855297A (en) * | 2012-08-14 | 2013-01-02 | 北京高森明晨信息科技有限公司 | Method for controlling data transmission, and connector |
-
2013
- 2013-03-20 CN CN201310090660.9A patent/CN103176843B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000047996A (en) * | 1998-07-31 | 2000-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Load leveling method for distributed system |
CN101764835A (en) * | 2008-12-25 | 2010-06-30 | 华为技术有限公司 | Task allocation method and device based on MapReduce programming framework |
CN102196049A (en) * | 2011-05-31 | 2011-09-21 | 北京大学 | Method suitable for secure migration of data in storage cloud |
RU2469388C1 (en) * | 2011-09-19 | 2012-12-10 | Российская Федерация, от имени которой выступает Государственная корпорация по атомной энергии "Росатом" - Госкорпорация "Росатом" | Method of handling data stored in parallel file system with hierarchical memory organisation |
CN102855297A (en) * | 2012-08-14 | 2013-01-02 | 北京高森明晨信息科技有限公司 | Method for controlling data transmission, and connector |
Also Published As
Publication number | Publication date |
---|---|
CN103176843A (en) | 2013-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240146771A1 (en) | Inclusion of time-series geospatial markers in analyses employing a cyber-decision platform | |
CN104081348B (en) | System and method to reduce memory usage by optimally placing VMs in virtualized data center | |
US8972465B1 (en) | Burst buffer appliance with small file aggregation | |
US9778926B2 (en) | Minimizing image copying during partition updates | |
US9218197B2 (en) | Virtual machine image migration | |
CN105760184B (en) | A kind of method and apparatus of charging assembly | |
US9400767B2 (en) | Subgraph-based distributed graph processing | |
US10218723B2 (en) | System and method for fast and scalable functional file correlation | |
US9542461B2 (en) | Enhancing performance of extract, transform, and load (ETL) jobs | |
CN106445951A (en) | File transmission method and apparatus | |
US8751762B2 (en) | Prevention of overlay of production data by point in time copy operations in a host based asynchronous mirroring environment | |
CN108289034A (en) | A kind of fault discovery method and apparatus | |
US11249854B2 (en) | Method and device for failover in HBase system, and non-transitory computer-readable storage medium | |
CN110427364A (en) | A kind of data processing method, device, electronic equipment and storage medium | |
US9380001B2 (en) | Deploying and modifying a service-oriented architecture deployment environment model | |
CN103176843B (en) | The file migration method and apparatus of MapReduce distributed system | |
CN110795143A (en) | Method, apparatus, computing device, and medium for processing functional module | |
CN112860412B (en) | Service data processing method and device, electronic equipment and storage medium | |
US20160098442A1 (en) | Verifying analytics results | |
US20220036206A1 (en) | Containerized distributed rules engine | |
CN110795331A (en) | Software testing method and device | |
US10372770B1 (en) | Cloud-based platform for semantic indexing of web objects | |
US10970133B2 (en) | System and method for hardware acceleration for operator parallelization with streams | |
CN104468230B (en) | Management method, read method, corresponding equipment and the system of configuration file | |
US9606784B2 (en) | Data object with common sequential statements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |