CN105426255B - ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation - Google Patents

ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation Download PDF

Info

Publication number
CN105426255B
CN105426255B CN201510999364.XA CN201510999364A CN105426255B CN 105426255 B CN105426255 B CN 105426255B CN 201510999364 A CN201510999364 A CN 201510999364A CN 105426255 B CN105426255 B CN 105426255B
Authority
CN
China
Prior art keywords
node
host
network
reducetask
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510999364.XA
Other languages
Chinese (zh)
Other versions
CN105426255A (en
Inventor
尚凤军
闫辰云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing China Post Information Technology Group Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201510999364.XA priority Critical patent/CN105426255B/en
Publication of CN105426255A publication Critical patent/CN105426255A/en
Application granted granted Critical
Publication of CN105426255B publication Critical patent/CN105426255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/503Resource availability

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the ReduceTask data locality dispatching methods in a kind of Hadoop big data platform based on network I/O cost evaluation, belong to cloud computing platform optimisation technique field.When this method executes node with each recorded Host node by assessment for ReduceTask, Map output data on other nodes copies the network I/O cost of the Host node to, in this, as the priority of distribution ReduceTask, to distribute Reduce task for the high node of priority, to reach the network I/O cost for reducing Map output data and copying Reduce node to, so that selected Reduce node has optimal data locality.This method is that the distribution of ReduceTask joined data locality, so that the data copy bring network load in Shuffle stage reduces, saves the network bandwidth resources of Hadoop cluster.

Description

ReduceTask number based on network I/O cost evaluation in Hadoop big data platform According to locality dispatching method
Technical field
The invention belongs to cloud computing platform optimisation technique field, it is related in a kind of Hadoop big data platform based on network I/ The ReduceTask data locality dispatching method of O cost evaluation.
Background technique
With the development of information industry, the data volume rapid growth of enterprise and various tissue generations.Traditional data storage Ability and processing technique gradually seem out of strength.And the 4V feature that big data is contained, the i.e. scale of construction (volume), mode greatly are more (variety), speed fast (velocity), value density are low (value), further increase the difficulty of data management and information extraction And complexity[3].The understanding of big data being intimately not meant to for big data is goed deep into, and it is excessive to show that big data exists instead The danger [6] of propagation.The basic conception of big data, key technology and there is many query and dispute using upper to it. " big data technology " has been expedited the emergence of under such big data era background.
IBM Corporation will announce cloud computing plan in the end of the year 2007, and the concept of cloud computing (Cloud Computing) appears in In face of masses.But the thought of cloud computing be not it is stranghtforward, it is grid computing, distributed computing, parallel computation, effectiveness The product of the traditional computers such as calculating, network storage, virtualization, load balancing technology and network technical development fusion.It is intended to lead to It crosses network and multiple lower-cost computational entities is integrated into a perfect system with powerful calculating ability, and by basis It is advanced that facility i.e. service (IaaS), platform i.e. service (PaaS), software service (SaaS) and management service provider (MSP) etc. Business model, powerful computing capability is distributed in terminal user's hand.
Since 2003, Google discloses GFS, the Highly Scalables such as MapReduce, high performance distributed magnanimity number successively According to processing frame, and demonstrate superiority of the frame when handling magnanimity web data.Based on the two platforms, Doug Cutting has been applied in the whole network search engine project (Nutch).2006 beginning of the years, developer just increase income this It realizes out of Nutch, becomes a sub-project of Lucene, referred to as Hadoop.The Apache Hadoop of open source is provided into Ripe " big data handling implement ", is widely applied and supports, is the fact that " big data " calculates standard.We by pair The research of Hadoop promotes its performance, with the processing capacity of improvement " cloud ", reduces the processing load of user terminal, final to use Family terminal is simplified to a simple input-output equipment, and can enjoy the powerful calculation processing ability of " cloud " on demand.
In distributed computing field, the basic goal of scheduling strategy is according to the resource on node each in current cluster Service quality (QoS, the Quality of of (including CPU, memory and Internet resources) remaining situation and each user job Service it) requires, optimal matching is made between resource and operation/task.Due to user's wanting to the service quality of operation Seeking Truth is diversified, and therefore, the task schedule in distributed system is a multi-objective optimization question, and furtherly, it is One typical np problem.
Summary of the invention
In view of this, the purpose of the present invention is to provide be based on network I/O cost in a kind of Hadoop big data platform to comment The ReduceTask data locality dispatching method estimated, this method in current Hadoop version, to operation (Job) into In the scheduling of subtask (MapTask and ReduceTask) after row division, data only are considered in the scheduling phase of MapTask The defect of the factor of locality, it is other when executing node with each recorded Host node by assessing for ReduceTask Map output data on node copies the network I/O cost of the Host node to, in this, as the preferential of distribution ReduceTask Power, distributes Reduce task for the high node of priority, copies Reduce node to reach and reduce Map output data Network I/O cost, so that selected Reduce node has optimal data locality.
In order to achieve the above objectives, the invention provides the following technical scheme:
ReduceTask data locality dispatching party based on network I/O cost evaluation in a kind of Hadoop big data platform Method, comprising the following steps:
Step 1: for JobTracker initialization each user job (Job) safeguard a mapping table, it be about JobTracker distributes to the mapping table of the MapTask and this Host of some Host node, i.e., and<HostId, MapTaskId>, Whenever JobTracker distributes the MapTask of a new operation, just by new map entry update to this table;
Step 2: the user job (Job) for each JobTracker initialization safeguards a mapping table, i.e., < network I/O Cost, HostId >, it is the mapping for physical node and its network I/O cost that the MapTask of the operation is assigned in cluster, this The Host of a part eliminates duplicate Host node as the Host in step 1;In addition to this also comprising another A little Host nodes, they are a representatives with the last point of same subregion of Host node (having identical topological path length) Mapping of the node (deposit node) with its network I/O cost;
Step 3: by mapping table<HostId in step 1, MapTaskId>, this will be distributed on identical HostId The MapTask number of operation merges, and obtains the corresponding MapTask number of each Host;Distinguished by the network topology tree of Hadoop cluster Obtain each record node and the topology distance that node is executed as ReduceTask to be calculated;By above data according to network I/O cost computation model calculates the network I/ for executing node for ReduceTask with any one physical node in step 2 O cost, and by result update in the mapping table into step 2;
Step 4: reach in configuration file mapred-default.xml when MapTask completes number by parameter mapred.r The threshold value of educe.slowstart.completed.maps setting, JobTracker can start to distribute ReduceTask, distribute ReduceTask strategy be exactly according in the current Hadoop cluster in step 3 < network I/O generation Valence, HostId > mapping table, the higher principle of the smaller then priority of network I/O cost of node are that node distributes ReduceTask.
Further, in step 2, this<network I/O cost, in HostId>mapping table Host include it is all obtain by JobTracker distributes the Host of the MapTask of the Job, and a Host conduct of minimum subregion identical as this Host The minimum alternative Host of subregion;The minimum alternative Host of subregion is with the subregion of the node of all MapTask for being assigned with the Job For unit, the network I/O of an alternative Host is only calculated in each minimum subregion.
Further, in step 3, the network I/O Cost Model for calculating each difference Host node (such as j) is as follows:
Wherein, Num_MapTaskiMap number of tasks for this operation distributed in node i;Distance (i, j) is section The Hadoop cluster network topological tree path length of point i to node j, opens up figure by the three-layer network of Hadoop cluster and obtains, calculating side Formula is as follows:
1) if node i and j are the same Host node, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth1)=0
2) if node i and j are the different Host nodes in the same region Rack, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth2)=2
If 3) node i and j are the different Host nodes in the different regions Rack of the same data center, topology away from From are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr2/Hosth2)=4
Further, in step 4, using Reduce network I/O cost the distributing as this node of each Host node Priority when ReduceTask is distributed when JobTracker distributes ReduceTask according to the priority of Host node for it ReduceTask, if optimal node at this time due to failure or it is busy have no ability to receive ReduceTask, selection time Excellent node distributes ReduceTask for it, in this way, just ensure that the output data by MapTask in current cluster copies this to The network I/O cost of Reduce node is optimal.
Mapping table is established to record the distribution condition of MapTask the beneficial effects of the present invention are: the present invention, is devised The intermediate data copies of MapTask to Reduce node (Shuffle process) network I/O cost computation model, by by this A little costs are as ReduceTask is distributed to the reference priority of node, so that the network I/O cost of Shuffle optimizes to save The Internet resources in Hadoop cluster are saved, to improve the operation turnaround time of Hadoop cluster and increase handling capacity.
Detailed description of the invention
In order to keep the purpose of the present invention, technical scheme and beneficial effects clearer, the present invention provides following attached drawing and carries out Illustrate:
Fig. 1 is the macro flow chart of the method for the invention;
Fig. 2 is the Hadoop cluster topology graph of a single data center;
Fig. 3 is the timing diagram that ReduceTask is distributed in Hadoop source code frame.
Specific embodiment
Below in conjunction with attached drawing, a preferred embodiment of the present invention will be described in detail.
Fig. 1 is the macro flow chart of the method for the invention, as shown, Hadoop big data platform of the present invention In the ReduceTask data locality scheduling strategy based on network I/O cost evaluation mainly include following four step: step One: each user job (Job) one mapping table of maintenance initialized for JobTracker, i.e.,<HostId, MapTaskId>, Whenever JobTracker distributes the MapTask of a new operation, just by new map entry update to this table;Step Two: the user job (Job) for each JobTracker initialization safeguards a mapping table, i.e., < network I/O cost, HostId >;Step 3: it is calculated according to network I/O cost computation model and is with any one physical node in step 2 ReduceTask executes the network I/O cost of node, and by result update in the mapping table into step 2;Step 4: according to In current Hadoop cluster in step 3<network I/O cost, the network I/O cost of HostId>mapping table, node is smaller then The higher principle of priority is that node distributes ReduceTask.
It is analyzed in Hadoop platform source code frame it is found that JobTracker is inside it with the side of " three layers of multiway tree " Formula describes and tracks the operating status of each operation, and operation is abstracted into three layers, from top to bottom successively are as follows: monitoring operation layer is appointed Supervisory layers of being engaged in and task execution layer.In monitoring operation layer, each operation is by JobInProgress (JIP) object factory With the operating condition for tracking its overall operation state and each task;In task supervisory layers, each task is by one TaskInProgress (TIP) object factory and its operating status of tracking.
Specifically, this method the following steps are included:
Step 1: for JobTracker initialization each user job (Job) safeguard a mapping table, it be about JobTracker distributes to the mapping table of the MapTask and this Host of some Host node, i.e., and<HostId, MapTaskId>, Whenever JobTracker distributes the MapTask of a new operation, just by new map entry update to this table;
Step 2: the user job (Job) for each JobTracker initialization safeguards a mapping table, i.e., < network I/O Cost, HostId >, it is the mapping for physical node and its network I/O cost that the MapTask of the operation is assigned in cluster, this The Host of a part eliminates duplicate Host node as the Host in step 1;In addition to this also comprising another A little Host nodes, they are a representatives with the last point of same subregion of Host node (having identical topological path length) Mapping of the node (deposit node) with its network I/O cost;
This<network I/O cost, Host includes all obtaining distributing the Job by JobTracker in HostId>mapping table MapTask Host, and minimum subregion identical as this Host a Host as the alternative Host of minimum subregion; The minimum alternative Host of subregion is as unit of the subregion of the node of all MapTask for being assigned with the Job, each minimum sub-district The network I/O of an alternative Host is only calculated in domain.
Step 3: by mapping table<HostId in step 1, MapTaskId>, this will be distributed on identical HostId The MapTask number of operation merges, and obtains the corresponding MapTask number of each Host;Distinguished by the network topology tree of Hadoop cluster Obtain each record node and the topology distance that node is executed as ReduceTask to be calculated;By above data according to network I/O cost computation model calculates the network I/ for executing node for ReduceTask with any one physical node in step 2 O cost, and by result update in the mapping table into step 2;
The network I/O Cost Model for calculating each difference Host node (such as j) is as follows:
Wherein, Num_MapTaskiMap number of tasks for this operation distributed in node i;Distance (i, j) is section The Hadoop cluster network topological tree path length of point i to node j, opens up figure by the three-layer network of Hadoop cluster and obtains, calculating side Formula is as follows:
1) if node i and j are the same Host node, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth1)=0
2) if node i and j are the different Host nodes in the same region Rack, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth2)=2
If 3) node i and j are the different Host nodes in the different regions Rack of the same data center, topology away from From are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr2/Hosth2)=4
Step 4: reach in configuration file mapred-default.xml when MapTask completes number by parameter mapred.r The threshold value of educe.slowstart.completed.maps setting, JobTracker can start to distribute ReduceTask, distribute ReduceTask strategy be exactly according in the current Hadoop cluster in step 3 < network I/O generation Valence, HostId > mapping table, the higher principle of the smaller then priority of network I/O cost of node are that node distributes ReduceTask.
In this step, using Reduce network I/O cost the distributing as this node of each Host node Priority when ReduceTask is distributed when JobTracker distributes ReduceTask according to the priority of Host node for it ReduceTask, if optimal node at this time due to failure or it is busy have no ability to receive ReduceTask, selection time Excellent node distributes ReduceTask for it, in this way, just ensure that the output data by MapTask in current cluster copies this to The network I/O cost of Reduce node is optimal.
Finally, it is stated that preferred embodiment above is only used to illustrate the technical scheme of the present invention and not to limit it, although logical It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be Various changes are made to it in form and in details, without departing from claims of the present invention limited range.

Claims (2)

1. the ReduceTask data locality dispatching party in a kind of Hadoop big data platform based on network I/O cost evaluation Method, it is characterised in that: the following steps are included:
Step 1: for JobTracker initialization each user job (Job) safeguard a mapping table, it be about JobTracker distributes to the mapping table of the MapTask and this Host of some Host node, i.e., and<HostId, MapTaskId>, Whenever JobTracker distributes the MapTask of a new operation, just by new map entry update to this table;
Step 2: the user job (Job) for each JobTracker initialization safeguards a mapping table, i.e., < network I/O generation Valence, HostId >, it is the mapping for physical node and its network I/O cost that the MapTask of the operation is assigned in cluster, this Partial Host eliminates duplicate Host node as the Host in step 1;It in addition to this also include other Host node, they are the mappings for representing node Yu its network I/O cost with one of subregion with last point of Host node;
Step 3: by mapping table<HostId in step 1, MapTaskId>, this operation will be distributed on identical HostId MapTask number merge, obtain the corresponding MapTask number of each Host;It is obtained respectively by the network topology tree of Hadoop cluster Each record node and the topology distance that node is executed as ReduceTask to be calculated;By above data according to network I/O Cost computation model calculates the network I/O generation for executing node for ReduceTask with any one physical node in step 2 Valence, and by result update in the mapping table into step 2;
In step 3, the network I/O Cost Model for calculating each difference Host node is as follows:
Wherein, Num_MapTaskHost_iMap number of tasks for this operation distributed in Host node i;distance(i,j) For the Hadoop cluster network topological tree path length of node i to node j, figure is opened up by the three-layer network of Hadoop cluster and is obtained, counted Calculation mode is as follows:
1) if node i and j are the same Host node, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth1)=0
2) if node i and j are the different Host nodes in the same region Rack, topology distance are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr1/Hosth2)=2
3) if node i and j are the different Host nodes in the different regions Rack of the same data center, topology distance Are as follows:
distance(DataCenterd1/Rackr1/Hosth1,DataCenterd1/Rackr2/Hosth2)=4;
Step 4: reach in configuration file mapred-default.xml when MapTask completes number by parameter mapred.redu The threshold value of ce.slowstart.completed.maps setting, JobTracker can start to distribute ReduceTask, point Strategy with ReduceTask be exactly according in the current Hadoop cluster in step 3<network I/O cost, HostId>mapping Table, the higher principle of the smaller then priority of network I/O cost of node are that node distributes ReduceTask;
In step 4, using the Reduce network I/O cost of each Host node as this node in distribution ReduceTask When priority, distributed when JobTracker distributes ReduceTask for it according to the priority of Host node ReduceTask, if optimal node at this time due to failure or it is busy have no ability to receive ReduceTask, selection time Excellent node distributes ReduceTask for it, in this way, just ensure that the output data by MapTask in current cluster copies section to The Reduce network I/O cost of point is optimal.
2. based on network I/O cost evaluation in a kind of Hadoop big data platform according to claim 1 ReduceTask data locality dispatching method, it is characterised in that: in step 2, this<network I/O cost, HostId>reflect Host includes the Host of all MapTask for obtaining being distributed the operation by JobTracker in firing table, and with this Host phase A Host with minimum subregion is as the alternative Host of minimum subregion;The minimum alternative Host of subregion is assigned with this with all The subregion of the node of the MapTask of operation is unit, and the network I/ of an alternative Host is only calculated in each minimum subregion O。
CN201510999364.XA 2015-12-28 2015-12-28 ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation Active CN105426255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510999364.XA CN105426255B (en) 2015-12-28 2015-12-28 ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510999364.XA CN105426255B (en) 2015-12-28 2015-12-28 ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation

Publications (2)

Publication Number Publication Date
CN105426255A CN105426255A (en) 2016-03-23
CN105426255B true CN105426255B (en) 2019-04-19

Family

ID=55504479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510999364.XA Active CN105426255B (en) 2015-12-28 2015-12-28 ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation

Country Status (1)

Country Link
CN (1) CN105426255B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201681B (en) * 2016-06-30 2019-04-26 湖南大学 Method for scheduling task based on pre-release the Resources list under Hadoop platform
CN106168912B (en) * 2016-07-28 2019-04-16 重庆邮电大学 A kind of dispatching method based on the estimation of backup tasks runing time in Hadoop big data platform
CN106681820B (en) * 2016-12-30 2020-05-01 西北工业大学 Extensible big data computing method based on message combination
CN109871265A (en) * 2017-12-05 2019-06-11 航天信息股份有限公司 The dispatching method and device of Reduce task
CN109637278A (en) * 2019-01-03 2019-04-16 青岛萨纳斯智能科技股份有限公司 Big data teaching experiment training platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986661A (en) * 2010-11-04 2011-03-16 华中科技大学 Improved MapReduce data processing method under virtual machine cluster
CN103139265A (en) * 2011-12-01 2013-06-05 国际商业机器公司 Network transmission self-adaption optimizing method and system in large-scale parallel computing system
CN104008012A (en) * 2014-05-30 2014-08-27 长沙麓云信息科技有限公司 High-performance MapReduce realization mechanism based on dynamic migration of virtual machine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986661A (en) * 2010-11-04 2011-03-16 华中科技大学 Improved MapReduce data processing method under virtual machine cluster
CN103139265A (en) * 2011-12-01 2013-06-05 国际商业机器公司 Network transmission self-adaption optimizing method and system in large-scale parallel computing system
CN104008012A (en) * 2014-05-30 2014-08-27 长沙麓云信息科技有限公司 High-performance MapReduce realization mechanism based on dynamic migration of virtual machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic;Mohammad Hammoud,等;《2012 IEEE Fifth International Conference on Cloud Computing》;20120629;第49-58页

Also Published As

Publication number Publication date
CN105426255A (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN105426255B (en) ReduceTask data locality dispatching method in Hadoop big data platform based on network I/O cost evaluation
Wang et al. Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality
US9846589B2 (en) Virtual machine placement optimization with generalized organizational scenarios
US20190213647A1 (en) Numa-based client placement
Yakhchi et al. Proposing a load balancing method based on Cuckoo Optimization Algorithm for energy management in cloud computing infrastructures
Zhang et al. An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments
Mashayekhy et al. A trust-aware mechanism for cloud federation formation
EP3580912A1 (en) Learning-based resource management in a data center cloud architecture
CN110325968A (en) System upgrade management in distributed computing system
US10719363B2 (en) Resource claim optimization for containers
US11055139B2 (en) Smart accelerator allocation and reclamation for deep learning jobs in a computing cluster
CN110362388A (en) A kind of resource regulating method and device
CN109271257A (en) A kind of method and apparatus of virtual machine (vm) migration deployment
Guerrieri et al. DFEP: Distributed funding-based edge partitioning
CN103997515B (en) Center system of selection and its application are calculated in a kind of distributed cloud
Convolbo et al. DRASH: A data replication-aware scheduler in geo-distributed data centers
CN109960579A (en) A kind of method and device of adjustment business container
CN110704851A (en) Public cloud data processing method and device
Niu et al. An adaptive efficiency-fairness meta-scheduler for data-intensive computing
Vigliotti et al. A green network-aware VMs placement mechanism
Han et al. An adaptive scheduling algorithm for heterogeneous Hadoop systems
Heidari et al. A cost-efficient auto-scaling algorithm for large-scale graph processing in cloud environments with heterogeneous resources
US20200004574A1 (en) Memory access optimization in a processor complex
Malathy et al. Performance improvement in cloud computing using resource clustering
Vu et al. A framework for big data as a service

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220602

Address after: 401121 No. 53, middle section of Huangshan Avenue, Yubei District, Chongqing

Patentee after: Chongqing China Post Information Technology Group Co.,Ltd.

Address before: 400065 Chongqing Nan'an District huangjuezhen pass Chongwen Road No. 2

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

TR01 Transfer of patent right