CN103023805A - MapReduce system - Google Patents

MapReduce system Download PDF

Info

Publication number
CN103023805A
CN103023805A CN2012104791192A CN201210479119A CN103023805A CN 103023805 A CN103023805 A CN 103023805A CN 2012104791192 A CN2012104791192 A CN 2012104791192A CN 201210479119 A CN201210479119 A CN 201210479119A CN 103023805 A CN103023805 A CN 103023805A
Authority
CN
China
Prior art keywords
node
read
shuffle
reduce
write requests
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104791192A
Other languages
Chinese (zh)
Inventor
林学练
李金贵
赵保敬
随培培
胡春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN2012104791192A priority Critical patent/CN103023805A/en
Publication of CN103023805A publication Critical patent/CN103023805A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a MapReduce system, comprising a Map node, a Shuffle node and a Reduce node, wherein the Reduce node is used for sending a read-write request to the Shuffle node, wherein the read-write request carries Map node identification information and Reduce node identification information; the Shuffle node is used for sending the read-write request to the Map node corresponding to the Map node identification information according to the Map node identification information in the read-write request received from the Reduce node; the Map node is used for obtaining target data corresponding to the read-write request according to the Reduce node identification information in the read-write request received from the Shuffle node and the correspondence between the preset Reduce node and the target data, and returning the target data to the Shuffle node so that the Shuffle node returns the target data to the Reduce node used for sending the read-write request; as a result, the utilization rate of resources such as the CPU resource, network bandwidth resource and the like in the MapReduce system is improved and the performance of the system is enhanced.

Description

A kind of MapReduce system
Technical field
The present invention relates to computer technology, relate in particular to a kind of MapReduce system.
Background technology
Company of Google (Google) has delivered the paper based on the programming model of mapping abbreviation in 2004, the paper name is called " MapReduce:simplified data processing on large clusters ", after this, distributed parallel mass data processing instrument take distributed system architecture (Hadoop) as representative becomes the first-selection of enterprise and the focus of academic research.With relational database system, to compare such as relationship type business databases such as Oracle, the computation capability of the linear expansion of Hadoop and employed MapReduce model is processed performance excellence in the scene in large data.Social network services (the Social Networking Services of the Internet (Internet) search service of company of Yahoo (Yahoo), the social networking service website types of facial makeup in Beijing operas (Facebook), in the log analysis of SNS) data analysis, domestic Baidu search engine, the service of the data magic square of Taobao and the system of China Mobile " Da Yun " (Big Cloud), all used Hadoop as the core data handling implement.Hadoop becomes the de facto standards instrument that current large and medium-sized enterprise processes petabyte (PB) DBMS.
On Hadoop distributed file system (Hadoop Distributed File System, HDFS), realized further the MapReduce framework.The MapReduce system of present Hadoop platform, execution to operation generally comprises execution mapping task (Map Task) and two stages of abbreviation task (Reduce Task), wherein, when carrying out Reduce Task, also comprise the process of shuffling (Shuffle).When carrying out Reduce Task, mainly consume central processing unit (Central Processing Unit, CPU) resource and memory source; Because the Hadoop platform is generally made up by cluster computer, carry out the computer of Reduce Task when carrying out the Shuffle process, also need the computer by access to netwoks Map Task place, when therefore carrying out the Shuffle process, need consume network bandwidth resource and memory source.
But, because the Shuffle process is the part among the Reduce Task, when the Hadoop platform is Reduce Task Resources allocation, will distribute simultaneously cpu resource, network bandwidth resources and memory source.When the Shuffle process among the execution Reduce Task of MapReduce system, the cpu resource that distributes for ReduceTask will be in idle condition; When other parts among the execution Reduce Task, the network bandwidth resources of distributing for the Shuffle process will be in idle condition.Therefore, MapReduce system of the prior art Shortcomings aspect resource utilization.
Summary of the invention
The invention provides a kind of MapReduce system, be used for solving MapReduce system Shortcomings aspect resource utilization.
MapReduce provided by the invention system comprises Map node, Shuffle node and Reduce node;
Described Reduce node is used for sending read-write requests to described Shuffle node, carries Map node identification information and Reduce node identification information in the described read-write requests;
Described Shuffle node is used for the described Map node identification information according to the described read-write requests that receives from described Reduce node, and described read-write requests is sent to Map node corresponding to described Map node identification information;
Described Map node, be used for described Reduce node identification information and the Reduce node of presetting and the corresponding relation of target data according to the described read-write requests that receives from described Shuffle node, obtain target data corresponding to described read-write requests, and described target data returned to described Shuffle node, described target data is returned to the Reduce node that sends described read-write requests for described Shuffle node.
The MapReduce system that the embodiment of the invention provides, Shuffle is separated from Reduce Task, with Shuffle Task as node independently, when the Reduce node need to be from Map node reading out data, can be by sending read-write requests to the Shuffle node, read corresponding data by the Shuffle node from the Map node, because the new Shuffle node that creates has kept the function identical with Shuffle process in the former Reduce node, so that the function of MapReduce system disappearance not; With the Shuffle node as after independently node is separated from the Reduce node, in that being carried out resource, each node divides timing, can distribute for Shuffle node and the resource requirement separately of Reduce node, compare with the mode that is the Reduce node Resources allocation of the original Shuffle of comprising process, effectively improve the utilance of the resource such as cpu resource and network bandwidth resources in the MapReduce system, and improved the performance of system.
Description of drawings
Fig. 1 is the structural representation of MapReduce one embodiment of system provided by the invention;
Fig. 2 is the structural representation of MapReduce another embodiment of system provided by the invention.
Embodiment
The MapReduce framework definition for the interface of upper strata user and client, framework itself has solved components of system as directed Problem of Failure under data flow con-trol, the distributed environment; By backup tasks (backup-task) and the mode that repeats, realize fault-tolerant in the distributed system, and guarantee the performance of entire system.Sort, merge and the process such as division is improved and the optimization of data flow by the data to Map among the MapReduce and Reduce, can improve to a certain extent CPU, internal memory and the network bandwidth utilization factor of Hadoop platform computer cluster, thereby improve to a certain extent the overall performance of Hadoop cluster.
In the MapReduce system of present Hadoop platform, Shuffle exists as the part among the Reduce Task.Carry out Reduce Task process and be divided into Shuffle process and the execution Reduce process carried out, wherein, carry out the Shuffle process and can use in a large number network bandwidth resources, execution Reduce process can be used cpu resource and memory sources in a large number.Because these two processes are diverse to the operating position of physical resource, and two processes also need to order be carried out in Reduce Task, therefore will cause the waste of the system resources such as cpu resource, network bandwidth resources and memory source to computer cluster.
And the MapReduce system in the various embodiments of the present invention separates the Shuffle process from Reduce Task, exists as the independently service in the MapReduce system, can effectively improve the utilance of cpu resource or network bandwidth resources.
Fig. 1 is the structural representation of MapReduce one embodiment of system provided by the invention, and as shown in Figure 1, this system comprises Map node 11, Shuffle node 12 and Reduce node 13.
Described Reduce node 13 is used for sending read-write requests to described Shuffle node 12, carries Map node identification information and Reduce node identification information in the described read-write requests;
Described Shuffle node 12 is used for the described Map node identification information according to the described read-write requests that receives from described Reduce node 13, and described read-write requests is sent to Map node 11 corresponding to described Map node identification information;
Described Map node 11 is used for described Reduce node identification information and the Reduce node of presetting and the corresponding relation of target data according to the described read-write requests that receives from described Shuffle node, obtain target data corresponding to described read-write requests, and described target data returned to described Shuffle node, described target data is returned to the Reduce node that sends described read-write requests for described Shuffle node.
Be different from MapReduce of the prior art system, the MapReduce system in the embodiment of the invention not only comprises Map node 11 and Reduce node 13, also comprises Shuffle node 12.
The concrete grammar that Shuffle separates from Reduce is as follows.
Create the relevant class of Shuffle, other independent Shuffle class of the same level of formation and Map and Reduce also can be called Shuffle Task.The something in common of Shuffle Task and Map Task and Reduce Task is, all inherits from Task, that is to say, Shuffle Task, Map Task are to be generated by identical task module with Reduce Task; And Map Task and Reduce Task in the prior art can be by the Master node schedulings, and Shuffle Task equally can be by the Master node scheduling in various embodiments of the present invention.
With originally belonging to data pull (fetch), copy (copy), the merging function subclasses such as (merge) of Reduce Task, extract among the Shuffle Task of new establishment, Shuffle Task is reconstructed into public service (common service).The new Shuffle Task that creates can be connected with one or more Reduce Task, realizes the transfer of data between Shuffle Task and the Reduce Task.Wherein, merge comprises the modes such as disk merge and internal memory merge.In the MapReduce system, Map Task is realized by MapTask.java, Reduce Task is realized that by ReduceTask.java Map Task and Reduce Task all operate on the Task Tracker, and Task Tracker is realized by TaskTracker.java.The function of the new Shuffle Task that creates is identical with the function of Shuffle process among the former Reduce Task.The implementation of the initialization of Shuffle Task and the function such as fault-tolerant can be identical with the mode of Shuffle process implementation initialization among the former Reduce Task and the function such as fault-tolerant.
After Shuffle Task was finished in establishment, the data transmission procedure between Reduce node 13, Shuffle node 12 and the Map node 11 was as follows.Wherein, the node that will carry out Reduce Task in various embodiments of the present invention is called Reduce node 13, and the node of carrying out Shuffle Task is called Shuffle node 12, and the node of carrying out Map Task is called Map node 11.
After Shuffle Task separated from Reduce Task, the MapReduce system can be respectively Map node 11, Shuffle node 12 and Reduce node 13 and distribute corresponding resource when Resources allocation.Wherein, because the distribution of resource is carried out for each node, Shuffle node 12 mainly takies network bandwidth resources, Reduce node 13 mainly takies cpu resource, in computer cluster in the constant situation of the configuration of each computer node, after the Shuffle process separated from Reduce Task, the corresponding Shuffle process of different Reduce Task and Reduce process can executed in parallel.Because no longer need to be after each Shuffle process of wait all finishes, can carry out the Reduce process, so that carry out the Shuffle node of different Reduce Task and Internet resources, cpu resource and the memory source that the Reduce node can utilize computer cluster simultaneously.Therefore, for each node and MapReduce entire system, all be conducive to improve the utilance of resource, be conducive to reduce the waste of the resources such as cpu resource and network bandwidth resources.And Shuffle node 12 also has the few little advantage of expense of the resource of taking.
The major function of Shuffle node 12 is the demand according to 13 pairs of reading out datas of Reduce node, from Map node 11 with corresponding transfer of data to Reduce node 13.
When Reduce node 13 needs reading out data, send read-write requests to Shuffle node 12.
The data that need to read owing to Reduce node 13 are stored on the Map node 11, so carry the identification information of Map node 11 in the read-write requests of Reduce node 13 transmissions.If when Shuffle node 12 was connected with a plurality of Map nodes 11, the identification information of the Map node 11 that carries in this read-write requests was the identification informations as target Map node, namely need the identification information of the Map node 11 at the data place of reading.
In addition, also carry the Reduce node identification information of this Reduce node 13 in the read-write requests that Reduce node 13 sends.The effect of Reduce node identification information is to search corresponding target data for Map node 11.
Shuffle node 12 is after Reduce node 13 receives this read-write requests, and according to the identification information of the Map node 11 in this read-write requests, can determine needs which Map node 11 is this read-write requests sent to, and namely determines target Map node.And then, this read-write requests is sent to Map node 11 corresponding to identification information.
The pre-stored corresponding relation that one or more Reduce nodes 13 and target data are arranged in the Map node 11, thereby Map node 11 is after Shuffle node 12 receives this read-write requests, according to the Reduce node identification information of carrying in the read-write requests that receives, can find the target data that Reduce node 13 corresponding to this Reduce node identification information need to read.Map node 11 reads this target data in the data that it is stored, and the target data that reads is returned to Shuffle node 12, for Shuffle node 12 this target data is returned to Reduce node 13.
Concrete, Map node 11 is according to the Reduce node identification information in the read-write requests that receives, and the partition information of inquiry storage data parses original position and side-play amount that target data corresponding to this Reduce node identification information stored in Map node 11.Can orient corresponding data at a Map node by original position and side-play amount.
Through after the above-mentioned flow process, Reduce node 13 can read corresponding data from Map node 11.
Shuffle node 12 if wait for after the default time threshold, does not receive the target data that Shuffle node 12 returns yet after read-write requests is sent to Map node 11, then Shuffle node 12 is judged this read-write requests and do not sent successfully.Correspondingly, Shuffle node 12 can continue to carry out other operations, the read-write requests of other that continuation will receive sends to corresponding Map node 11, and after waiting for that a default time span finishes, again this is not sent successful read-write requests and send to corresponding Map node 11.
The MapReduce system that the embodiment of the invention provides, Shuffle is separated from Reduce Task, with Shuffle Task as node independently, when the Reduce node need to be from Map node reading out data, can be by sending read-write requests to the Shuffle node, read corresponding data by the Shuffle node from the Map node, because the new Shuffle node that creates has kept the function identical with Shuffle process in the former Reduce node, so that the function of MapReduce system disappearance not; With the Shuffle node as after independently node is separated from the Reduce node, in that being carried out resource, each node divides timing, can distribute for Shuffle node and the resource requirement separately of Reduce node, compare with the mode that is the Reduce node Resources allocation of the original Shuffle of comprising process, effectively improve the utilance of the resource such as cpu resource and network bandwidth resources in the MapReduce system, and improved the performance of system.
Further, on the basis of above-described embodiment, described Map node 11 can also be used for: the read-write requests that will receive respectively from one or more Shuffle nodes 12, put into buffer queue; According to the memory location of target data corresponding in each read-write requests, each read-write requests in the described buffer queue is sorted, obtain to read order; According to the described order that reads, each target data is returned to corresponding Reduce node 13 by sending Shuffle node 12 corresponding to corresponding read-write requests respectively.
Can comprise one or more Map nodes 11, one or more Shuffle node 12 and one or more Reduce node 13 in the MapReduce system in various embodiments of the present invention.A Shuffle node 12 can be connected with one or more Reduce nodes 13 towards Reduce node 13 sides, towards Map node 11 sides, can be connected with one or more Map nodes 11; A Map node 11 can be connected with one or more Shuffle nodes 12.
By above-described embodiment as can be known, when a Shuffle node 12 is connected with a plurality of Map nodes 11, according to the identification information of Map node 11 entrained in the read-write requests, determine which Map node 11 is this read-write requests sent to.Correspondingly, Map node 11 returns to this Shuffle node 12 with corresponding data.
When a Map node 11 was connected with a Shuffle node 12, the data message that carries in the read-write requests according to 12 transmissions of Shuffle node read corresponding data, and returns to Shuffle node 12.
When a Map node 11 was connected with a plurality of Shuffle nodes 12, this Map node 11 may receive the read-write requests from a plurality of Shuffle nodes 12.
In the present MapReduce system, can carry out the Shuffle process after Reduce Task starts, send read-write requests to all Map nodes, to read the needed data of Reduce Task.Because after different Reduce Task started, sending respectively read-write requests was separate process, for same Map node, receives the process that the read-write requests that comes from different Reduce Task is near random.The Map node can be Reduce Task reading out data according to the order of the read-write requests that receives and send to the Reduce node.Because original position and the side-play amount of the subregion of the data that the Map node parses from the read-write requests of Reduce Task are different, processes of near random when therefore the Map node is from the disk read-write data, cause a large amount of disk tracking addressing, Shortcomings aspect reading out data efficient.
And after the Map node 11 in the embodiment of the invention receives read-write requests from a plurality of Shuffle nodes 12, will not ask the data of read-write to return to corresponding Shuffle node 12 in the read-write requests immediately, but with the read-write requests that receives, put into buffer queue.
Wherein, can preset the read-write requests that receives in the Preset Time length is put into buffer queue, and after this time span finished, the data that each read-write requests in this time span is asked to read returned to corresponding Shuffle node 12; Perhaps, can also adopt other policy control to deposit the quantity of the read-write requests in the buffer queue in.
If the strategy according to default only has a read-write requests in the buffer queue, then do not need to sort, carry out subsequent operation according to this strategy and get final product.
If the read-write requests in the buffer queue is a plurality of, then need the read-write requests in the buffer queue is sorted.
Concrete, owing to carry respectively Reduce node identification information in a plurality of read-write requests that Map node 11 receives, therefore can inquire the memory location of corresponding target data according to each Reduce node identification information, i.e. each target data original position and side-play amount of in Map node 11, storing.Sequencing according to the memory location of each target data sorts to each read-write requests, that is to say, sorting in the memory location of the target data that each read-write requests is read again as required, obtains to read order.
Thereby, when Map node 11 reads the order reading out data according to this, be equivalent to read according to the sequencing that target data is stored at Map node 11.Because MapReduce of the prior art system, Map node 11 is according to the sequencing that receives read-write requests from Reduce node 13, from disk, read target data, because the position of target data may be at random, the sequencing that is target data may not be the original storage order according to data, can cause the random read-write number of times more, the disk tracking, the addressing time increases, the readwrite performance of disk descends, so that the time of read-write efficiency and transfer of data is extended, the whole time of implementation of Reduce Task and MapReduce Job is extended, and wherein, MapReduce Job refers to the operation of an integral body being performed in the MapReduce system.And the reading manner in the various embodiments of the present invention has reduced owing to random mode reads the too frequently situation generation of switching that causes head position, has improved read-write efficiency.
Map node 11 reads after each target data according to the order that reads that obtains, and reads order according to this, and target data corresponding to data message in each read-write requests that will read successively sends to the corresponding Shuffle node 12 of each read-write requests.When Map node 11 sends to Shuffle node 12 with target data, can adopt the mode that namely sends that reads.Shuffle node 12 sends to respectively corresponding Reduce node 13 after Map node 11 receives target data with each target data.
In the MapReduce system that the embodiment of the invention provides, the Map node is after one or more Shuffle nodes receive read-write requests, memory location according to target data corresponding to the Reduce node identification information in each read-write requests, order is read in acquisition, and then read order according to this, read each target data, and the target data that reads sent to corresponding Reduce node by the corresponding Shuffle node of read-write requests successively, effectively reduce the number of times that switches magnetic head when from disk, reading target data, improved readwrite performance and the efficient of whole system.
On the basis of the various embodiments described above, a kind of optional execution mode is, described Map node is according to the described order that reads, the target data corresponding to read-write requests that will be less than or equal to time threshold the stand-by period returns to corresponding Reduce node by the Shuffle node that sends corresponding read-write requests respectively.
Concrete, described Map node 11 is put into described buffer queue with each read-write requests after the read-write requests that receives respectively from one or more Shuffle nodes 12.
Map node 11 according to the order that reads that obtains, sends to respectively corresponding Shuffle node 12 with target data after each read-write requests is sorted.Before each target data is sent, need also to judge whether read-write requests corresponding to this target data be overtime, namely judge current point in time and receive stand-by period between the time point of this read-write requests whether be less than or equal to default time threshold.
If the stand-by period of read-write requests, judges then that this read-write requests is overtime greater than this time threshold; If the stand-by period of read-write requests is less than or equal to this time threshold, judge that then this read-write requests is not overtime.
When target data is sent, if it is overtime to judge read-write requests corresponding to target data, then this target data is not sent; If it is not overtime to judge read-write requests corresponding to target data, then this target data is sent to corresponding Shuffle node 12.
Wherein, for overtime read-write requests, can adopt with prior art in similar implementation, for example, by returning the mode of notification message, inform corresponding Shuffle node 12 and/or Reduce node 13, so that Shuffle node 12 and/or Reduce node 13 resend read-write requests etc.
The MapReduce system that the embodiment of the invention provides, read-write requests not overtime in the read-write requests that receives is added in the buffer queue, it is not overtime so that the read-write requests in the buffer queue is, can guarantee validity and the correctness of read-write requests, readwrite performance and the efficient of whole system is provided.
Fig. 2 is the structural representation of MapReduce another embodiment of system provided by the invention, and as shown in Figure 2, on the basis of the various embodiments described above, this system can also comprise Master node 14.Master node 14 is used for described Map node 11, described Shuffle node 12 and described Reduce are controlled.
Shuffle node 12 is separated from Reduce node 13 used in the prior art, correspondingly, Master node 14 in the various embodiments of the present invention not only can be controlled Map node 11 and Reduce node 13, can also control Shuffle node 12.Wherein, the Reduce node 13 in the various embodiments of the present invention is the Reduce nodes 13 that removed after the Shuffle function.Wherein, Master node 14 is realized by JobTracker.java in the MapReduce system.
Because on the computer cluster that the MapReduce system can make up, therefore, each node can have multiple existing way in computer cluster.A kind of optional execution mode is that Master node 14 can be positioned on the computer of a platform independent; One or more Map nodes 11 can be positioned on one or more computer, that is to say, on the computer one or more Map nodes 11 can be arranged; Shuffle node 12 and Reduce node 13 can be positioned on the same computer, wherein, one or more Reduce nodes 13 that a Shuffle node 12 can be arranged and be connected with this Shuffle node 12 on computer, on the computer that comprises one or more Reduce nodes 13, need to correspondingly comprise a Shuffle node 12.。
A kind of optional execution mode is, described Shuffle node 12 also is used for, and after described MapReduce system receives pending MapReduce operation, starts and enters operating state, until this MapReduce Job execution complete after, 12 power cut-offs of Shuffle node.
In present MapReduce system, because the Shuffle process is the part of functions among the Reduce Task, therefore, generally after Reduce Task starts, start the Shuffle function, behind this ReduceTask complete operation, corresponding Shuffle function namely is closed, when treating that Reduce Task starts again, again start the Shuffle function.And the Shuffle node 12 in the various embodiments of the present invention is to exist as node independently, therefore, after the computer cluster of MapReduce system started and receives the MapReduce operation, Shuffle node 12 namely entered starting state, and kept the operating state of operation.That is to say that Shuffle node 12 is as the public service that data-transformation facility is provided, similar background process and existing is namely with program similar operating state when the running background.Situation for a plurality of Reduce node 13 corresponding same Shuffle nodes 12, Shuffle node 12 is set to the operating state of background process, so that Shuffle node 12 can not be finished and closes because of one of them Reduce node 13 thereupon, guaranteed that Shuffle node 12 has less being activated and pent number of times.
To the method for closing of Shuffle node 12, can for by Master node 14 it being closed, generally after the MapReduce Job execution finishes, close this Shuffle node.
Shuffle node 12 is set to namely enter the mode of operating state after startup, can reduces the operation of frequently closing and starting this node, be conducive to improve performance and the reading efficiency of system.
A kind of optional execution mode is that described Shuffle node 12 can be connected with a plurality of Reduce nodes 13.Correspondingly, in Shuffle node 12, increase the functional module (Reduce Manager) of communicating by letter with a plurality of Reduce nodes 13 for management, so that transfer of data and network connection between this Shuffle node 12 and each the Reduce node 13 are managed.
Further, on the basis of above-described embodiment, described Shuffle node 12 can also be used for, periodically will from the read-write requests that each Reduce node 13 receives, carry a plurality of read-write requests of identical Map node identification information, be encapsulated as a read-write requests, and the read-write requests after will encapsulating sends to described identical Map node 11 corresponding to Map node identification information.
In Shuffle node 12 and situation that a plurality of Reduce nodes 13 are connected, if the identification information of entrained Map node 11 is identical in the read-write requests that a plurality of Reduce nodes 13 send, then these a plurality of Reduce nodes 13 of expression need to be from same Map node 11 reading out datas, therefore, Shuffle node 12 can be respectively sends to this Map node 11 with the read-write requests of these a plurality of Reduce nodes 13, also the read-write requests of these a plurality of Reduce nodes 13 can be encapsulated in the read-write requests, send to this Map node 11.
Consume the long period for fear of Shuffle node 12, receive the read-write requests that a plurality of Reduce nodes 13 send to wait for, mail to the read-write requests of same Map node 11 in can be periodically with current period in Shuffle node 12, be packaged into a read-write requests.
Further, a kind of optional execution mode is that described Master node 14 can also be used for:
When the system for computer resources occupation rate is more than or equal to default occupancy threshold value under described Reduce node 13, control described Reduce node 13 and enter wait state, until described system resource occupancy is when finishing less than described occupancy threshold value or the default time span of waiting for, recovering described Reduce node is operating state.Wherein, the system resource occupancy can be cpu resource occupancy or memory source occupancy, and correspondingly the occupancy threshold value can be cpu resource occupancy threshold value or memory source occupancy threshold value.
Because each Reduce node 13 is connected to Map node 11 by Shuffle node 12, when Map node 11 during to Reduce node 13 return data, need to pass through first Shuffle node 12, because Shuffle node 12 and Reduce node 13 are all worked under the control of Master node 14, therefore, Master node 14 can be controlled Reduce node 13 and process immediately after the data that receive Shuffle node 12, processes after perhaps waiting for a period of time again.
Shuffle node 12 mainly takies network bandwidth resources, and Reduce node 13 mainly takies cpu resource.Master node 14 can be judged the size of the system for computer resources occupation rate under each Reduce node 13.
If judge system for computer resources occupation rate under the Reduce node 13 more than or equal to default occupancy threshold value, judge that then the system resource of this Reduce node 13 is not enough.Under these circumstances, Master node 14 can be controlled this Reduce node 13 and enter wait state, does not namely temporarily process for the data that receive from Shuffle node 12.Master node 14 reverts to normal operating state with this Reduce node 13 again after waiting for that a default time span finishes, the data that receive from Shuffle node 12 are processed accordingly.Perhaps, the occupancy of the system for computer resource of Master node 14 under detecting this Reduce node 13 less than this occupancy threshold value after, again this Reduce node 13 is reverted to normal operating state, the data that receive from Shuffle node 12 are processed accordingly.
If judge the occupancy of the system for computer resource under the Reduce node 13 less than this occupancy threshold value, judge that then the system resource of this Reduce node 13 is still sufficient.Under these circumstances, need not Reduce node 13 is set to wait state, can control 13 pairs of data that receive from Shuffle node 12 of this Reduce node and process accordingly.
In the present MapReduce system, the function of Shuffle realizes that in Reduce node 13 the Shuffle process in Master node 14 each Reduce node 13 of meeting control is carried out other operations in the Reduce node 13 after finishing again.That is to say, under these circumstances, Shuffle process and Reduce process are done as a whole by 14 scheduling of Master node, need to carry out first the Shuffle process before carrying out the Reduce process, the difference of the conditions of demand of computer cluster resource is not included Shuffle process, Reduce process in scheduling strategy, not according to the occupation condition control Shuffle process of current computer cluster and the time of implementation interval of Reduce process yet.。
And in the MapReduce system in the various embodiments of the present invention, because Shuffle node 12 and Reduce node 13 are separate, Master node 14 can be controlled Shuffle node 12 or Reduce node 13 separately according to the computer cluster occupation condition.For example, if network bandwidth free time of current computer cluster and the occupancy of cpu resource is higher can dispatch the 12 executing data copy functions of Shuffle node and postpone the execution of Reduce node 13.On the other hand, when in computer cluster, having a plurality of MapReduce operation, in the time of certain operation also may occurring and in Shuffle node 12, process, the current situation about processing in Reduce node 13 of other operations can executed in parallel for different work Shuffle node 12 and Reduce node 13 thereby realized.
The MapReduce system that the embodiment of the invention provides, Shuffle node and Reduce node can be controlled by the Master node, and can control the Reduce node and enter normal operating conditions or wait state, carrying out the Shuffle node of different work and Reduce node can parallel work-flow, has effectively promoted the performance of system and to the utilance of resource.
One of ordinary skill in the art will appreciate that: all or part of step that realizes above-mentioned each embodiment of the method can be finished by the relevant hardware of program command.Aforesaid program can be stored in the computer read/write memory medium.This program is carried out the step that comprises above-mentioned each embodiment of the method when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above each embodiment is not intended to limit only in order to technical scheme of the present invention to be described; Although with reference to aforementioned each embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps some or all of technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the scope of various embodiments of the present invention technical scheme.

Claims (8)

1. a mapping abbreviation MapReduce system is characterized in that, comprises mapping Map node, shuffles Shuffle node and abbreviation Reduce node;
Described Reduce node is used for sending read-write requests to described Shuffle node, carries Map node identification information and Reduce node identification information in the described read-write requests;
Described Shuffle node is used for the described Map node identification information according to the described read-write requests that receives from described Reduce node, and described read-write requests is sent to Map node corresponding to described Map node identification information;
Described Map node, be used for described Reduce node identification information and the Reduce node of presetting and the corresponding relation of target data according to the described read-write requests that receives from described Shuffle node, obtain target data corresponding to described read-write requests, and described target data returned to described Shuffle node, described target data is returned to the Reduce node that sends described read-write requests for described Shuffle node.
2. MapReduce according to claim 1 system is characterized in that described Map node specifically is used for:
The read-write requests that will receive respectively from one or more Shuffle nodes is put into buffer queue;
According to the memory location of target data corresponding to each read-write requests, each read-write requests in the described buffer queue is sorted, obtain to read order;
According to the described order that reads, each target data is returned to corresponding Reduce node by the Shuffle node that sends corresponding read-write requests respectively.
3. MapReduce according to claim 2 system is characterized in that described Map node returns to corresponding Reduce node by the Shuffle node that sends corresponding read-write requests respectively with each target data and is specially according to the described order that reads:
Described Map node will be less than or equal to the target data corresponding to read-write requests of time threshold the stand-by period according to the described order that reads, and return to corresponding Reduce node by the Shuffle node that sends corresponding read-write requests respectively.
4. MapReduce according to claim 1 system is characterized in that, described MapReduce system also comprises:
The Master node is used for described Map node, described Shuffle node and described Reduce are controlled.
5. MapReduce according to claim 4 system is characterized in that described Master node also is used for:
When the system for computer resources occupation rate is more than or equal to default occupancy threshold value under described Reduce node, control described Reduce node and enter wait state, until described system resource occupancy is when finishing less than described occupancy threshold value or the default time span of waiting for, recovering described Reduce node is operating state.
6. MapReduce according to claim 1 system is characterized in that described Shuffle node is connected with a plurality of Reduce nodes.
7. MapReduce according to claim 6 system is characterized in that described Shuffle node also is used for:
Periodically will from the read-write requests that each Reduce node receives, carry a plurality of read-write requests of identical Map node identification information, be encapsulated as a read-write requests, and the read-write requests after will encapsulating sends to described identical Map node corresponding to Map node identification information.
8. arbitrary described MapReduce system according to claim 1-7, it is characterized in that, described Shuffle node also is used for: after described MapReduce system receives pending MapReduce operation, start and enter operating state, until described MapReduce Job execution is complete.
CN2012104791192A 2012-11-22 2012-11-22 MapReduce system Pending CN103023805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104791192A CN103023805A (en) 2012-11-22 2012-11-22 MapReduce system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104791192A CN103023805A (en) 2012-11-22 2012-11-22 MapReduce system

Publications (1)

Publication Number Publication Date
CN103023805A true CN103023805A (en) 2013-04-03

Family

ID=47971948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104791192A Pending CN103023805A (en) 2012-11-22 2012-11-22 MapReduce system

Country Status (1)

Country Link
CN (1) CN103023805A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346380A (en) * 2013-07-31 2015-02-11 华为技术有限公司 Data sequencing method and system on basis of MapReduce model
CN104598304A (en) * 2013-10-31 2015-05-06 国际商业机器公司 Dispatch method and device used in operation execution
CN104793903A (en) * 2015-04-20 2015-07-22 浪潮电子信息产业股份有限公司 Video data writing method, device and system based on IO sequencing
CN105138679A (en) * 2015-09-14 2015-12-09 桂林电子科技大学 Data processing system and method based on distributed caching
CN105138405A (en) * 2015-08-06 2015-12-09 湖南大学 To-be-released resource list based MapReduce task speculation execution method and apparatus
CN105765537A (en) * 2013-10-03 2016-07-13 谷歌公司 Persistent shuffle system
CN105793822A (en) * 2013-10-02 2016-07-20 谷歌公司 Dynamic shuffle reconfiguration
CN105808634A (en) * 2015-01-15 2016-07-27 国际商业机器公司 Distributed map reduce network
CN106557282A (en) * 2016-11-07 2017-04-05 华为技术有限公司 The method and apparatus of response write request
CN106598488A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Distributed data reading method and device
CN109101188A (en) * 2017-11-21 2018-12-28 新华三大数据技术有限公司 A kind of data processing method and device
CN109510862A (en) * 2018-09-19 2019-03-22 中国石油天然气集团有限公司 Hough transformation method, apparatus and system
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium
CN112749042A (en) * 2019-10-31 2021-05-04 北京沃东天骏信息技术有限公司 Application running method and device
CN114550833A (en) * 2022-02-15 2022-05-27 郑州大学 Gene analysis method and system based on big data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127758A (en) * 2006-08-16 2008-02-20 华为技术有限公司 IP address acquisition method and acquisition system for mobile nodes
CN101242421A (en) * 2008-03-19 2008-08-13 中国科学院计算技术研究所 Application-oriented name registration system and its service method under multi-layer NAT environment
CN101860474A (en) * 2009-04-08 2010-10-13 中兴通讯股份有限公司 Peer-to-peer network and resource information processing method based on same
CN102045655A (en) * 2009-10-10 2011-05-04 中兴通讯股份有限公司 Realization method and system for active propelling movement of data messages
CN102053816A (en) * 2010-11-25 2011-05-11 中国人民解放军国防科学技术大学 Data shuffling unit with switch matrix memory and shuffling method thereof
CN102110164A (en) * 2011-02-28 2011-06-29 南京邮电大学 Data acquisition and processing method realized by utilizing distributed technology
US20110313973A1 (en) * 2010-06-19 2011-12-22 Srivas Mandayam C Map-Reduce Ready Distributed File System
CN102541858A (en) * 2010-12-07 2012-07-04 腾讯科技(深圳)有限公司 Data equality processing method, device and system based on mapping and protocol
CN102769615A (en) * 2012-07-02 2012-11-07 北京大学 Task scheduling method and system based on MapReduce mechanism

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127758A (en) * 2006-08-16 2008-02-20 华为技术有限公司 IP address acquisition method and acquisition system for mobile nodes
CN101242421A (en) * 2008-03-19 2008-08-13 中国科学院计算技术研究所 Application-oriented name registration system and its service method under multi-layer NAT environment
CN101860474A (en) * 2009-04-08 2010-10-13 中兴通讯股份有限公司 Peer-to-peer network and resource information processing method based on same
CN102045655A (en) * 2009-10-10 2011-05-04 中兴通讯股份有限公司 Realization method and system for active propelling movement of data messages
US20110313973A1 (en) * 2010-06-19 2011-12-22 Srivas Mandayam C Map-Reduce Ready Distributed File System
CN102053816A (en) * 2010-11-25 2011-05-11 中国人民解放军国防科学技术大学 Data shuffling unit with switch matrix memory and shuffling method thereof
CN102541858A (en) * 2010-12-07 2012-07-04 腾讯科技(深圳)有限公司 Data equality processing method, device and system based on mapping and protocol
CN102110164A (en) * 2011-02-28 2011-06-29 南京邮电大学 Data acquisition and processing method realized by utilizing distributed technology
CN102769615A (en) * 2012-07-02 2012-11-07 北京大学 Task scheduling method and system based on MapReduce mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A.VERMA ET AL.: "《ARIA: Automatic Resource Inference and Allocation for MapReduce Environments》", 《ICAC》 *
HUNG-CHIH YANG ET AL.: "《Mapreduce-reduce-merge:simplified relational data processing on large clusters》", 《PROCEEDINGS OF THE 2007 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》 *
潘旭明: "《MapReduce FairScheduler的高性能优化及超大规模集群模拟器设计及实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346380B (en) * 2013-07-31 2018-03-09 华为技术有限公司 Data reordering method and system based on MapReduce model
CN104346380A (en) * 2013-07-31 2015-02-11 华为技术有限公司 Data sequencing method and system on basis of MapReduce model
CN105793822B (en) * 2013-10-02 2020-03-20 谷歌有限责任公司 Dynamic shuffle reconfiguration
CN105793822A (en) * 2013-10-02 2016-07-20 谷歌公司 Dynamic shuffle reconfiguration
US11966377B2 (en) 2013-10-03 2024-04-23 Google Llc Persistent shuffle system
US11269847B2 (en) 2013-10-03 2022-03-08 Google Llc Persistent shuffle system
CN105765537A (en) * 2013-10-03 2016-07-13 谷歌公司 Persistent shuffle system
US10515065B2 (en) 2013-10-03 2019-12-24 Google Llc Persistent shuffle system
CN104598304A (en) * 2013-10-31 2015-05-06 国际商业机器公司 Dispatch method and device used in operation execution
CN104598304B (en) * 2013-10-31 2018-03-13 国际商业机器公司 Method and apparatus for the scheduling in Job execution
CN105808634A (en) * 2015-01-15 2016-07-27 国际商业机器公司 Distributed map reduce network
CN105808634B (en) * 2015-01-15 2019-12-03 国际商业机器公司 Distributed mapping reduction network
CN104793903A (en) * 2015-04-20 2015-07-22 浪潮电子信息产业股份有限公司 Video data writing method, device and system based on IO sequencing
CN105138405B (en) * 2015-08-06 2019-05-14 湖南大学 MapReduce task based on the Resources list to be released, which speculates, executes method and apparatus
CN105138405A (en) * 2015-08-06 2015-12-09 湖南大学 To-be-released resource list based MapReduce task speculation execution method and apparatus
CN105138679B (en) * 2015-09-14 2018-11-13 桂林电子科技大学 A kind of data processing system and processing method based on distributed caching
CN105138679A (en) * 2015-09-14 2015-12-09 桂林电子科技大学 Data processing system and method based on distributed caching
CN106557282A (en) * 2016-11-07 2017-04-05 华为技术有限公司 The method and apparatus of response write request
WO2018082302A1 (en) * 2016-11-07 2018-05-11 华为技术有限公司 Writing request response method and apparatus
CN106557282B (en) * 2016-11-07 2019-08-23 华为技术有限公司 The method and apparatus for responding write request
CN106598488B (en) * 2016-11-24 2019-08-13 北京小米移动软件有限公司 Distributed data read method and device
CN106598488A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Distributed data reading method and device
CN109101188B (en) * 2017-11-21 2022-03-01 新华三大数据技术有限公司 Data processing method and device
CN109101188A (en) * 2017-11-21 2018-12-28 新华三大数据技术有限公司 A kind of data processing method and device
CN109510862A (en) * 2018-09-19 2019-03-22 中国石油天然气集团有限公司 Hough transformation method, apparatus and system
CN112749042A (en) * 2019-10-31 2021-05-04 北京沃东天骏信息技术有限公司 Application running method and device
CN112749042B (en) * 2019-10-31 2024-03-01 北京沃东天骏信息技术有限公司 Application running method and device
CN111930731A (en) * 2020-07-28 2020-11-13 苏州亿歌网络科技有限公司 Data dump method, device, equipment and storage medium
CN114550833A (en) * 2022-02-15 2022-05-27 郑州大学 Gene analysis method and system based on big data

Similar Documents

Publication Publication Date Title
CN103023805A (en) MapReduce system
US10275851B1 (en) Checkpointing for GPU-as-a-service in cloud computing environment
CN107038069B (en) Dynamic label matching DLMS scheduling method under Hadoop platform
US10990288B2 (en) Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce
US7349970B2 (en) Workload management of stateful program entities
CN102831120B (en) A kind of data processing method and system
CN102495857B (en) Load balancing method for distributed database
US10133797B1 (en) Distributed heterogeneous system for data warehouse management
CN107864211B (en) Cluster resource dispatching method and system
CN101957863A (en) Data parallel processing method, device and system
CN103930875A (en) Software virtual machine for acceleration of transactional data processing
US9836516B2 (en) Parallel scanners for log based replication
CN101375241A (en) Efficient data management in a cluster file system
CN106033373A (en) A method and a system for scheduling virtual machine resources in a cloud computing platform
CN107515781B (en) Deterministic task scheduling and load balancing system based on multiple processors
CN103310460A (en) Image characteristic extraction method and system
CN114756170B (en) Storage isolation system and method for container application
EP3537281A1 (en) Storage controller and io request processing method
CN110162396A (en) Method for recovering internal storage, device, system and storage medium
CN111190691A (en) Automatic migration method, system, device and storage medium suitable for virtual machine
WO2016074130A1 (en) Batch processing method and device for system invocation commands
US10579419B2 (en) Data analysis in storage system
CN102760073B (en) Method, system and device for scheduling task
CN103365740A (en) Data cold standby method and device
JP5692355B2 (en) Computer system, control system, control method and control program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130403