US20160103695A1 - Optimized assignments and/or generation virtual machine for reducer tasks - Google Patents

Optimized assignments and/or generation virtual machine for reducer tasks Download PDF

Info

Publication number
US20160103695A1
US20160103695A1 US14/509,691 US201414509691A US2016103695A1 US 20160103695 A1 US20160103695 A1 US 20160103695A1 US 201414509691 A US201414509691 A US 201414509691A US 2016103695 A1 US2016103695 A1 US 2016103695A1
Authority
US
United States
Prior art keywords
reducer
virtual machine
assignments
tasks
virtual machines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/509,691
Other versions
US9367344B2 (en
Inventor
Yathiraj B. Udupi
Debojyoti Dutta
Madhav V. Marathe
Raghunath O. Nambiar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cisco Technology Inc
Original Assignee
Cisco Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cisco Technology Inc filed Critical Cisco Technology Inc
Priority to US14/509,691 priority Critical patent/US9367344B2/en
Assigned to CISCO TECHNOLOGY, INC. reassignment CISCO TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NAMBIAR, RAGHUNATH O., MARATHE, MADHAV V., DUTTA, DEBOJYOTI, UDUPI, YATHIRAJ B.
Priority to CN201580054119.XA priority patent/CN107111517B/en
Priority to EP15781537.4A priority patent/EP3204855A1/en
Priority to PCT/US2015/054035 priority patent/WO2016057410A1/en
Publication of US20160103695A1 publication Critical patent/US20160103695A1/en
Application granted granted Critical
Publication of US9367344B2 publication Critical patent/US9367344B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • H04L41/122Discovery or management of network topologies of virtualised topologies, e.g. software-defined networks [SDN] or network function virtualisation [NFV]

Definitions

  • This disclosure relates in general to the field of computing and, more particularly, to systems and methods for providing optimized virtual machine assignments to reducer tasks.
  • MapReduce used with Hadoop (framework for distributed computing) can allow writing of applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault tolerant manner.
  • MapReduce can be implemented using many virtual machines distributed on physical hosts. Processing these large datasets is computationally intensive, and taking up resources in a data center can be costly.
  • FIG. 1 illustrates the process for MapReduce having map tasks and reducer tasks being performed in a virtualized computing environment, according to some embodiments of the disclosure
  • FIG. 2 shows an exemplary flow diagram illustrating a method for determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure
  • FIG. 3 illustrates a distribution of keys over mapper virtual machines after map tasks are complete, according to some embodiments of the disclosure
  • FIG. 4 shows an exemplary variable matrix X, according to some embodiments of the disclosure
  • FIG. 5 shows a key distribution matrix D, according to some embodiments of the disclosure.
  • FIG. 6 shows a network distance matrix C, according to some embodiments of the disclosure.
  • FIG. 7 shows an exemplary system for determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure.
  • the present disclosure relates to assignment or generation of reducer virtual machines (VMs) after the “map” phase is substantially complete in MapReduce. Instead of a priori placement, distribution of keys after the “map” phase over the mapper virtual machines can be used to efficiently place reducer tasks to virtual machines in virtualized cloud infrastructure like OpenStack. By solving a constraint optimization problem, reducer VMs can be optimally assigned to process keys subject to certain constraints.
  • the present disclosure describes a special variable matrix.
  • the present disclosure describes several possible cost matrices for representing the costs determined based on the key distribution over the mapper VMs (and other suitable factors).
  • a method for determining virtual machine assignments for reducer tasks on physical hosts can include determining a distribution of keys over mapper virtual machines after map tasks are complete, determining costs associated with possible assignments of virtual machines to reducer tasks on the keys based on the distribution of keys; and solving for assignments of virtual machines to the reducer tasks based on the costs and subject to one or more constraints.
  • the assignment of virtual machines to reducer tasks can be formulated as a constraints optimization problem, where one or more optimal or desirable solutions may exist. From the solution(s), a solution can be selected which may provide the optimal assignment of virtual machines to reducer tasks, or at least an assignment that is better than other possible assignments.
  • the costs associated with possible assignments of virtual machines to reducer tasks comprises, for each possible virtual machine and for each reducer task, a cost for the particular possible virtual machine to perform the particular reducer task. These costs can in some cases be computed based on the distribution of keys.
  • the resulting optimized assignment of VMs to reducer tasks can utilize resources in the data center more efficiently, and in some cases, allow MapReduce to be completed faster than a priori placements of reducer VMs.
  • the distribution of keys provide some guidance for the optimization, such that certain costs in the data center for a given set of assignments of reducer VMs can be determined and minimized.
  • the distribution of keys over the mapper virtual machines comprises, for each key and for each mapper virtual machine, a number of key-value pairs for the particular key stored with the particular mapper virtual machine.
  • the method can not only determine assignments of mapper virtual machines (VMs used as mappers in the “map” phase) to reducer tasks, the method can also determine assignments of virtual machines to be created on available physical hosts to reducer tasks.
  • the partitioning method determines optimized assignments from possible assignments (i.e., solves for substantially optimized assignments of virtual machines) by using a specialized variable matrix defining the possible assignments.
  • the variable matrix can have dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created.
  • the partitioning method assesses the costs for various possible assignments of reducer VMs to reducer tasks by computing, for each virtual machine and for each reducer task, a cost for performing the particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines. In some embodiments, other factors are used for computing the cost.
  • These factors can include one or more of the following: network distance(s) from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key, processor utilization of the particular virtual machine performing the reducer task for the particular key, memory utilization of the particular virtual machine performing the reducer task for the particular key, bandwidth availability(-ies) of the communication path from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key, and disk input/output speeds of the particular virtual machine performing the reducer task for the particular key.
  • the partitioning method is configured with one or more constraints. These constraints can advantageously implement certain rules and policies on the possible assignments, as well as ensuring the solution to the optimization problem is a correct one.
  • the one or more constraints includes the following: (1) a virtual machine is assigned to at most one reducer task, (2) a reducer task for a particular key is assigned to only one virtual machine, and (3) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • the one or more constraints can include the following: (1) a reducer task for a particular key is assigned to only one virtual machine, and (2) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • a MapReduce job (e.g., as a Hadoop workload) usually splits the input data-set into independent chunks to be processed in parallel manner.
  • the job has two main phases of work—“map” and “reduce”—hence MapReduce.
  • Map the given problem is divided into smaller sub-problems, each mapper then works on the subset of data providing an output with a set of (key, value) pairs (or referred herein as key-value pairs).
  • the output from the mappers is handled by a set of reducers, where each reducer summarizes the data based on the provided keys.
  • MapReduce is implemented in a virtualized environment, e.g., using OpenStack cloud infrastructure, the mappers and reducers are provisioned as virtual machines (“VMs” or sometimes referred to as virtual compute nodes) on physical hosts.
  • VMs virtual machines
  • FIG. 1 illustrates the process for MapReduce having map tasks and reducer tasks being performed in a virtualized computing environment, according to some embodiments of the disclosure.
  • M mapper VMs shown as MAP VM_ 1 , MAP VM_ 2 , . . . MAP VM_M
  • MAP VM_ 1 maps the map tasks to reducer tasks.
  • MapReduce job all the map tasks may be completed before reducer tasks start.
  • output from the mapper VMs can have N keys. For reduce, key-value pairs with the same key ought to end up at (or be placed at/assigned to) the same reducer VM. This is called partitioning.
  • reducer VM performs reducer task for one key.
  • the example would have N reducer VMs (shown as REDUCE VM_ 1 , REDUCE VM_ 2 , . . . REDUCE VM_N).
  • a MapReduce system usually provides a default partitioning function, e.g., hash(key) mod R to select a reducer VM for a particular key.
  • a default partitioning function e.g., hash(key) mod R
  • hash(key) mod R to select a reducer VM for a particular key.
  • such a simple partition function can cause some of the reducer VMs to take excessively long time, thus delaying the overall completion of the job. For at least that reason, the placement VMs in a physical topology of hosts/servers and their assignments to reducer tasks can play an important role in deciding the performance of such workloads.
  • the present disclosure describes an improved partitioning method which can determine virtual machine assignments for reducer tasks on physical hosts to enable faster as well as balanced completion of all the reducer tasks.
  • the improved partitioning method can address how to make optimized placements of the reducer VMs in a virtualized Hadoop environment on cloud infrastructures such as OpenStack.
  • the improved partitioning method can perform technical tasks such as improve load balancing among the reducer VMs (and the hosts on which the reducer VMs are provided), determine whether to create new reducer VMs and how many, which host to place the new reducer VMs, etc.
  • FIG. 2 shows an exemplary flow diagram illustrating an improved partitioning method for determining virtual machine assignments for reducer tasks on physical hosts, according to some embodiments of the disclosure.
  • the partitioning method determines a distribution of keys over mapper virtual machines (box 202 ). Based on the distribution of keys, the partitioning method determines costs associated with possible assignments of virtual machines to reducer tasks on the keys (box 204 ). Based on the costs, the partitioning method solves for substantially optimized assignments of virtual machines to the reducer tasks subject to one or more constraints (box 206 ).
  • the flow diagram illustrates that the improved partitioning method solves a constraints optimization problem to determine optimal assignments of VMs to reducer tasks. This can be done by minimizing cost based on the distribution of keys in view of one or more constraint(s). It is envisioned by the disclosure that an equivalent implementation may solve the problem by maximizing another metric (as opposed to minimizing cost).
  • the method uses the distribution of keys as part of the cost function of the constraints optimization problem when optimizing the assignment of reducer VMs.
  • the distribution of keys is an important factor in partitioning because the transfer and processing of these keys in a virtualized cloud infrastructure can take up a lot of network and computing resources.
  • the network and computing resources needed for performing a reducer task is directly related to the cost for a particular reducer VM to perform the reducer task.
  • the distribution of keys over the mapper virtual machines would generally include, for each key and for each mapper VM, a number of key-value pairs for the particular key stored with the particular mapper VM (on the physical host of the VM).
  • the distribution of keys provide information relating to where the keys are stored, such that costs for transferring and/or processing these keys on certain reducer VMs can be determined.
  • FIG. 3 illustrates a distribution of keys over mapper VMs after map tasks are complete, according to some embodiments of the disclosure.
  • the table or matrix has the mapper VMs (Mapper 1 , Mapper 2 , Mapper 3 ) represented as rows. The columns show the counts/numbers of how many key-value pairs having a particular key are stored with a particular mapper VM.
  • Mapper 1 has 10000 key-value pairs with Mapper 1 , 100000 key-value pairs with Mapper 2 , and 20 key-value pairs with mapper 3 . It is envisioned that other kinds of numbers can be used to represent distribution of keys (e.g., percentages, fractions, scores, sizes, etc.)
  • the above exemplary assignments can ensure that the amount of data that has to be moved from the mappers to reducers is minimized or reduced.
  • these assignments can be determined based on which VM had the most key-value pairs for a particular key, which can directly relate to the cost of moving the data from mappers to reducers.
  • the method solves a constraints optimization problem by minimizing costs to find the optimal solution subject to resource constraints.
  • the costs can be determined based on at least the distribution of keys over the mapper VMs.
  • This optimal solution once found, can be used by the compute VM schedulers such as OpenStack compute scheduler while deciding which physical host to use to either spin up a VM or to reuse an existing mapper VM on that host.
  • an aggregate cost can be a measure of computational and network resources consumed for completing a particular reducer task on a particular VM, which also is an indication of the total time taken by that particular reducer task on the particular VM. By minimizing this aggregate cost, it is possible to solve for one or more optimal assignments of reducer VMs to reducer tasks.
  • the optimization assumes there will be one reducer VM per key, and if there are more keys currently being output by mapper VMs than there are Mapper VMs, additional VMs can be created. Later in the present disclosure, a more complicated embodiment is described where the optimization problem does not have the assumption of one key per one reducer VM, where the partitioning method can allow for more than one key per reducer VM.
  • the constraints optimization problem is reduced to finding the optimal solution of deciding which VM should be used for performing a particular reducer task for a particular key based on the key distribution.
  • determining a solution to the assignment problem is not trivial. Many factors besides key distribution can affect the cost of assigning a particular reducer VM to perform a reducer task for a particular key.
  • the costs associated with possible assignments of virtual machines to reducer tasks can include, for each possible virtual machine and for each reducer task, a cost for the particular possible virtual machine to perform the particular reducer task.
  • a cost can be computed based on one or more of the following:
  • a constraint solver can effectively solve the problem of reducer VM assignments and generation.
  • the constraint solver can use a variable matrix (in combination with one or more cost matrices) and solve for one or more optimal solutions subject to one or more constraints.
  • the mechanics of the constraint solver searches through the different possible instances of the variable matrix (subject to the one or more constraints) to determine which one (or more) of the instances would result in the lowest costs (or lowered costs).
  • FIG. 4 shows an exemplary variable matrix X, according to some embodiments of the disclosure.
  • Each entries in the matrix is denoted by x_ij, where i ranges from 1 to m, and j ranges from 1 to n.
  • the improved partitioning method aims to determine n Reducer VMs that can be assigned to reduce the n Keys.
  • M (existing) mapper VMs which are VMs that are already existing and placed in certain physical hosts. Hence each of these M mapper VMs can be represented by one row each in the variable matrix.
  • p n-M number of additional VMs should be created.
  • variable matrix includes further rows for these additional VMs.
  • the variable matrix can include up to p*q new rows in the variable matrix (or some other prescribed number of possible additional VMs to be created).
  • the p*q rows indicate the option of each of these new VMs to potentially have q host options of where they can be created.
  • variable matrix has dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created.
  • each variable x_ij in the variable matrix is 1 if a reducer for key K_j selects the VM denoted by the V_i.
  • V_i is either one of the existing Mapper VMs, or a new VM to be created on one of the q available hosts.
  • V_i is a non-mapper new VM to be created, it is possible to know which host the new VM can be created in, based on which variable row it is.
  • (i ⁇ M) % q can give the host that V_i corresponds to, where M is number of Mappers, q is the number of hosts, and % is the modulo (reminder) operator.)
  • variable matrix setup involving multiple VM rows for existing VMs and all the additional VMs required allows us to mathematically solve for the best solution, including giving an opportunity for a physical host to create more than one VMs.
  • variable matrix X The possible instances of the variable matrix X is limited by one or more constraints, generally business rules/policies may govern how many VMs can be created on one host, or how many reducer tasks a reducer VM can perform. For that reason, one or more constraints are provided to limit the possible instances of the variable matrix X.
  • a virtual machine is assigned to at most one reducer task;
  • A2 a reducer task for a particular key is assigned to only one virtual machine; and
  • A3 if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • the first constraint (A1) requires that for VM rows (mapper VMs and non-mapper VMs to be created) in the variable matrix X, the sum of the x_ij values is less than or equal to 1. This constraint means there is at most one reducer task for a particular key per reducer VM.
  • the constraint A1 can be summarized below:
  • the second constraint (A1) requires that for all VM rows and all keys, the sum of the column should equal to 1. This constraint means that a key can be reduced by only one VM at a time.
  • the constraint A2 can be summarized below:
  • the third constraint (A3) requires that for the additional VMs to be created, a VM is created in only one host.
  • a VM is created in only one host.
  • V —11 indicates the first VM on the first Host
  • V —12 indicates the first VM on the second Host
  • the first VM can be created in Host 1 or Host 2, and it can be linked to one Key only.
  • x1, x2, x3, or x4 can be 1, indicating only one host is chosen for that VM, and only one key is selected for that VM.
  • the constraint A3 can be summarized below for all VM rows corresponding to the same VM and all possible Hosts:
  • While the present disclosure focuses on several simple constraints (assuming all hosts are capable of creating new VMs), it is understood by one skilled in the art that in some situations the computational resource constraints (or rules/policies) on hosts can limit creating these VMs. In these situations, further constraints can be made to limit which hosts can actually create new VMs, and how many VMs can be created on one host.
  • the costs associated with a particular reducer VM performing a particular reducer task can be stored in a cost matrix. Determining the costs associated with the possible assignments of virtual machines to reducer tasks can include computing, for each virtual machine and for each reducer task, a cost for performing the particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines. It is possible to compute more than one cost matrices, and a function can be provided to compute the aggregate cost based on (an instance of) the variable matrix X, and the one or more cost matrices. The following example shows cost matrices defined based on distribution of keys and network distances between hosts.
  • FIG. 5 shows a key distribution matrix D, according to some embodiments of the disclosure.
  • An entry in the matrix D, D_ij represents the number of values for Key KJ generated by VM V_i.
  • the corresponding values of keys are zeros.
  • FIG. 6 shows a network distance matrix C, according to some embodiments of the disclosure.
  • An entry in the network distance matrix C, C_ab indicates the network distance between the VMs V_a and V_b.
  • C_ii is zero
  • C_ab C_ba.
  • the aggregate cost can depend on the network distance, the amount of data to be transferred between the VM which is chosen for the reducer task of a particular key, and the other VMs from which the data for that key is to be transferred. As an example for Key K_1, if VM V_1 is the chosen VM, then this contributes the following value to the final sum of aggregate costs: x_11*[D_21*C_21+D_31*C_31+. . . D_i1*C_i1+ . . . +Dm1*C_m1]. It is understood that other formulations can be used, depending on the application.
  • the sum of aggregate costs thus gives a measure of the cost of moving all the data for this key from the other VMs to this actual selected VM for the Reducer task of that key to perform the reducer task.
  • the partitioning method adds cost values for all possible variables to form the final aggregate cost metric to minimize on:
  • the above example illustrates the determination of a cost matrix, which can take can take into account one or more factors. It is envisioned that other factors can also be used for determining the costs in the cost matrix. For example, the cost can depend on how busy the various physical machines running these VMs are as well as on the network distance and bandwidth availability between these nodes. In other words, the existing processor/network loads can impact costs.
  • one or more factors can result in creating a new VM on an entirely different physical host if that will reduce the longest reducer completion time.
  • the possible assignments of virtual machines to the reducer tasks comprises assignments of the mapper virtual machines to reducer tasks and assignments of virtual machines to be created on available physical hosts to reducer tasks.
  • constraints can be modified depending on the application.
  • a reducer task can handle more than one key, in other words, a virtual machine can be assigned to any number of reducer tasks (or some other predefined number of reducer tasks).
  • the corresponding constraints can include (A2) a reducer task for a particular key is assigned to only one virtual machine; and (A3_new) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • constraint A3_new is a modified version of constraint A3 due to the relaxation of constraints, the mathematics for the constraint is to be defined slightly differently.
  • Constraint A3 from the first set which was used to indicate the host selection and placement for all the additional non-mapper VMs, can be adjusted (as constraint A3_new) to take into consideration that a VM can handle more than one key, but still with the constraint of one host per VM. Referring back to the example with two additional VMs, 2 possible hosts to select from and 2 keys, the variable matrix is:
  • x1+x3 can be at most 1, but can also be 0, indicating only one host is chosen for the VM if that particular key is chosen for that VM or not.
  • constraint A3_new is as follows:
  • Cloud infrastructure may support independent resource placement (i.e., scheduling) decisions internally among the components such as Nova (compute), Cinder (block storage), and Neutron (networking) (other exemplary components include Swift (Object Storage), Sahara (Elastic MapReduce). Because of the independent decisions, say made by the Nova-scheduler and the Cinder-scheduler, there is a good possibility for the two hosts selected for VM and volume to reside in different racks and hence consuming a good amount of traffic bandwidth, leading to a non-optimal resource placement.
  • independent resource placement i.e., scheduling
  • components such as Nova (compute), Cinder (block storage), and Neutron (networking)
  • Swift Object Storage
  • Sahara Elastic MapReduce
  • FIG. 7 shows an exemplary system for implementing the improved partitioning method, e.g., determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure.
  • the system 700 includes a partitioning system 702 .
  • Partitioning system 702 includes one or more processors 704 and one or more memory elements 706 .
  • Memory element 706 can store data and instructions for facilitating any of the partitioning functions therein.
  • Partitioning system 702 further includes costs module 708 and constraints solver 710 , that when executed by the at least one processors 704 are configured to a constraints solver that when executed by the at least one processor is configured to perform any one or more parts of the partitioning method (such as the method illustrated by FIG. 2 ).
  • System 700 further includes one or more schedulers 712 which can keep track of the states of the resources and instruct resources to perform certain tasks.
  • the scheduler 712 may be configured to implement MapReduce using various VMs.
  • the scheduler 712 can provide storage management. Examples of schedulers include Neutron, Cinder, and Nova.
  • the costs module 708 interfaces with one or more schedulers 712 .
  • the costs module 708 can be configured to determine a distribution of keys, and states associated with hosts, virtual machines, network links, network topology, etc.
  • the costs module 708 can provide a repository for (updated) states of resources in the virtual cloud infrastructure.
  • the costs module 708 can gather information usable for determining one or more cost matrices.
  • System 700 also includes a rules/policies part 714 that can interface with tenant or administrators who may want to constrain the partitioning method.
  • Costs module 708 can interface with rules/policies part 714 , e.g., if certain rules/policies may affect cost function definitions.
  • the constraints solver 710 may interface with rules/policies part 714 to determine one or more constraints for the constraints optimization problem. Using the cost matrices from costs module 708 , and constraints determined from rules/policies part 714 , constraints solver 710 can determine optimized assignments of reducer VMs to reducer tasks. Accordingly, the constraints solver 710 can interface with schedulers 712 to execute those optimized assignments.
  • the improved partitioning method and system advantageously provide a smart resource placement decision making engine that is universally applicable for many kinds of resource placement decisions, and can communicate with all the services (e.g., such as varied cloud infrastructure components usable with OpenStack).
  • the improved partitioning method and system can solve for minimizing (or maximizing) certain optimization metrics while satisfying a set of constraints.
  • the framework lends itself easily to help satisfy tenant APIs that could allow a tenant to specify the resource request, along with business rules and policies (which can translate to complex constraints usable by the partitioning framework).
  • One aspect of the improved partitioning method involves providing dynamic assignment/placement of reducers after the map tasks are done instead of a priori assignment/placement. Requiring that all the map tasks are complete is not a performance issue. If the number of keys produced by all the mappers is small, the overall reducer time is small anyway; if the number of keys produced is large, making the reducer placement decision after all the map tasks are complete adds only a small amount to the overall completion time. Note that even if the reducer tasks are started a priori, they cannot start execution until all the map tasks are complete. It is also understood by one skilled in the art that the time for solving the optimization problem is generally far less than the time an inefficient partitioning method would have
  • the improved partitioning method is performed after the map tasks are complete. However, in some cases, the improved partitioning method is performed once a distribution of keys can be estimated (but the map tasks are not necessarily complete). Furthermore, embodiments generally assume that that the mapper VMs are already created and the mapper VMs are able to run the reducer tasks as reducer VMs. However, it is envisioned that not all the mapper VMs are able to run the reducer tasks as reducer VMs, especially if some of the mapper VMs are scheduled to perform other tasks.
  • network distance is part of the (aggregate) cost
  • a measure of network distance between two virtual machines i.e., the respective hosts that is running the two virtual machines
  • the constraints optimization problem may require one VM per reducer or mapper, but it is envisioned that the optimization problem is applicable in situations where a VM can run multiple reducers or mappers.
  • some embodiments can assume that the time to complete a reducer task is directly proportional to the number of (key-value) pairs assigned to it. However, it is envisioned that some variations of the present disclosure can estimate the time to complete a reducer task differently (e.g., based on further factors).
  • a network used herein represents a series of points, nodes, or network elements of interconnected communication paths for receiving and transmitting packets of information that propagate through a communication system.
  • a network offers communicative interface between sources and/or hosts, and may be any local area network (LAN), wireless local area network (WLAN), metropolitan area network (MAN), Intranet, Extranet, Internet, WAN, virtual private network (VPN), or any other appropriate architecture or system that facilitates communications in a network environment depending on the network topology.
  • a network can comprise any number of hardware or software elements coupled to (and in communication with) each other through a communications medium.
  • network element is meant to encompass any of the aforementioned elements, as well as partitioning systems, servers (physical or virtual), end user devices, routers, switches, cable boxes, gateways, bridges, loadbalancers, firewalls, inline service nodes, proxies, processors, modules, or any other suitable device, component, element, proprietary appliance, or object operable to exchange, receive, and transmit information in a network environment.
  • These network elements may include any suitable hardware, software, components, modules, interfaces, or objects that facilitate the partitioning operations thereof. This may be inclusive of appropriate algorithms and communication protocols that allow for the effective exchange of data or information.
  • partitioning systems described herein may include software to achieve (or to foster) the functions discussed herein for determining optimized assignment of reducer VMs to reducer tasks where the software is executed on one or more processors to carry out the functions. This could include the implementation of instances of costs modules, constraints solvers, and/or any other suitable element that would foster the activities discussed herein.
  • each of these elements can have an internal structure (e.g., a processor, a memory element, etc.) to facilitate some of the operations described herein.
  • these functions for partitioning may be executed externally to these elements, or included in some other network element to achieve the intended functionality.
  • partitioning systems may include software (or reciprocating software) that can coordinate with other network elements in order to achieve the functions described herein.
  • one or several devices may include any suitable algorithms, hardware, software, components, modules, interfaces, or objects that facilitate the operations thereof.
  • the partitioning functions outlined herein may be implemented by logic encoded in one or more non-transitory, tangible media (e.g., embedded logic provided in an application specific integrated circuit [ASIC], digital signal processor [DSP] instructions, software [potentially inclusive of object code and source code] to be executed by one or more processors, or other similar machine, etc.).
  • one or more memory elements can store data used for the operations described herein. This includes the memory element being able to store instructions (e.g., software, code, etc.) that are executed to carry out the activities described in this Specification.
  • the memory element is further configured to store databases or data structures such as variable matrices, cost matrices, states of resources, constraints, etc., disclosed herein.
  • the processor can execute any type of instructions associated with the data to achieve the operations detailed herein in this Specification.
  • the processor could transform an element or an article (e.g., data) from one state or thing to another state or thing.
  • the activities outlined herein may be implemented with fixed logic or programmable logic (e.g., software/computer instructions executed by the processor) and the elements identified herein could be some type of a programmable processor, programmable digital logic (e.g., a field programmable gate array [FPGA], an erasable programmable read only memory (EPROM), an electrically erasable programmable ROM (EEPROM)) or an ASIC that includes digital logic, software, code, electronic instructions, or any suitable combination thereof.
  • FPGA field programmable gate array
  • EPROM erasable programmable read only memory
  • EEPROM electrically erasable programmable ROM
  • any of these elements can include memory elements for storing information to be used in achieving improved partitioning method, as outlined herein.
  • each of these devices may include a processor that can execute software or an algorithm to perform the improved partitioning method as discussed in this Specification.
  • These devices may further keep information in any suitable memory element [random access memory (RAM), ROM, EPROM, EEPROM, ASIC, etc.], software, hardware, or in any other suitable component, device, element, or object where appropriate and based on particular needs.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • ASIC application specific integrated circuitry
  • any of the memory items discussed herein should be construed as being encompassed within the broad term ‘memory element.’
  • any of the potential processing elements, modules, and machines described in this Specification should be construed as being encompassed within the broad term ‘processor.’
  • Each of the network elements can also include suitable interfaces for receiving, transmitting, and/or otherwise communicating data or information in a network environment.
  • FIG. 2 illustrate only some of the possible scenarios that may be executed by, or within, the partitioning systems described herein. Some of these steps may be deleted or removed where appropriate, or these steps may be modified or changed considerably without departing from the scope of the present disclosure. In addition, a number of these operations have been described as being executed concurrently with, or in parallel to, one or more additional operations. However, the timing of these operations may be altered considerably. The preceding operational flows have been offered for purposes of example and discussion. Substantial flexibility is provided by partitioning systems in that any suitable arrangements, chronologies, configurations, and timing mechanisms may be provided without departing from the teachings of the present disclosure.

Abstract

The present disclosure relates to assignment or generation of reducer virtual machines after the “map” phase is substantially complete in MapReduce. Instead of a priori placement, distribution of keys after the “map” phase over the mapper virtual machines can be used to efficiently reducer tasks in virtualized cloud infrastructure like OpenStack. By solving a constraint optimization problem, reducer VMs can be optimally assigned to process keys subject to certain constraints. In particular, the present disclosure describes a special variable matrix. Furthermore, the present disclosure describes several possible cost matrices for representing the costs determined based on the key distribution over the mapper VMs (and other suitable factors).

Description

    TECHNICAL FIELD
  • This disclosure relates in general to the field of computing and, more particularly, to systems and methods for providing optimized virtual machine assignments to reducer tasks.
  • BACKGROUND
  • Computer networking technology allows execution of complicated computing tasks by sharing the work among the various hardware resources within the network. This resource sharing facilitates computing tasks that were previously too burdensome or impracticable to complete. For example, the term “big data” has been used to describe data sets that are extremely large and complex, making them difficult to process. Many implementations of computing and networking technologies have been devised to process big data. One commonly used operation for operating on these large datasets is MapReduce. In one example, MapReduce used with Hadoop (framework for distributed computing) can allow writing of applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault tolerant manner. When working in a virtualized environment (e.g., OpenStack Cloud Infrastructures), MapReduce can be implemented using many virtual machines distributed on physical hosts. Processing these large datasets is computationally intensive, and taking up resources in a data center can be costly.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
  • FIG. 1 illustrates the process for MapReduce having map tasks and reducer tasks being performed in a virtualized computing environment, according to some embodiments of the disclosure;
  • FIG. 2 shows an exemplary flow diagram illustrating a method for determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure;
  • FIG. 3 illustrates a distribution of keys over mapper virtual machines after map tasks are complete, according to some embodiments of the disclosure;
  • FIG. 4 shows an exemplary variable matrix X, according to some embodiments of the disclosure;
  • FIG. 5 shows a key distribution matrix D, according to some embodiments of the disclosure;
  • FIG. 6 shows a network distance matrix C, according to some embodiments of the disclosure; and
  • FIG. 7 shows an exemplary system for determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure.
  • DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS Overview
  • The present disclosure relates to assignment or generation of reducer virtual machines (VMs) after the “map” phase is substantially complete in MapReduce. Instead of a priori placement, distribution of keys after the “map” phase over the mapper virtual machines can be used to efficiently place reducer tasks to virtual machines in virtualized cloud infrastructure like OpenStack. By solving a constraint optimization problem, reducer VMs can be optimally assigned to process keys subject to certain constraints. In particular, the present disclosure describes a special variable matrix. Furthermore, the present disclosure describes several possible cost matrices for representing the costs determined based on the key distribution over the mapper VMs (and other suitable factors).
  • In some embodiments, a method for determining virtual machine assignments for reducer tasks on physical hosts (sometimes referred to as a “partitioning method”) can include determining a distribution of keys over mapper virtual machines after map tasks are complete, determining costs associated with possible assignments of virtual machines to reducer tasks on the keys based on the distribution of keys; and solving for assignments of virtual machines to the reducer tasks based on the costs and subject to one or more constraints. In other words, the assignment of virtual machines to reducer tasks can be formulated as a constraints optimization problem, where one or more optimal or desirable solutions may exist. From the solution(s), a solution can be selected which may provide the optimal assignment of virtual machines to reducer tasks, or at least an assignment that is better than other possible assignments. Furthermore, the costs associated with possible assignments of virtual machines to reducer tasks comprises, for each possible virtual machine and for each reducer task, a cost for the particular possible virtual machine to perform the particular reducer task. These costs can in some cases be computed based on the distribution of keys.
  • Advantageously, the resulting optimized assignment of VMs to reducer tasks can utilize resources in the data center more efficiently, and in some cases, allow MapReduce to be completed faster than a priori placements of reducer VMs. In particular, the distribution of keys provide some guidance for the optimization, such that certain costs in the data center for a given set of assignments of reducer VMs can be determined and minimized. Generally speaking, the distribution of keys over the mapper virtual machines comprises, for each key and for each mapper virtual machine, a number of key-value pairs for the particular key stored with the particular mapper virtual machine.
  • In some embodiments, the method can not only determine assignments of mapper virtual machines (VMs used as mappers in the “map” phase) to reducer tasks, the method can also determine assignments of virtual machines to be created on available physical hosts to reducer tasks. In particular, the partitioning method determines optimized assignments from possible assignments (i.e., solves for substantially optimized assignments of virtual machines) by using a specialized variable matrix defining the possible assignments. Specifically, the variable matrix can have dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created.
  • Broadly speaking, the partitioning method assesses the costs for various possible assignments of reducer VMs to reducer tasks by computing, for each virtual machine and for each reducer task, a cost for performing the particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines. In some embodiments, other factors are used for computing the cost. These factors can include one or more of the following: network distance(s) from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key, processor utilization of the particular virtual machine performing the reducer task for the particular key, memory utilization of the particular virtual machine performing the reducer task for the particular key, bandwidth availability(-ies) of the communication path from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key, and disk input/output speeds of the particular virtual machine performing the reducer task for the particular key.
  • To limit the possible assignments, the partitioning method is configured with one or more constraints. These constraints can advantageously implement certain rules and policies on the possible assignments, as well as ensuring the solution to the optimization problem is a correct one. In one example, the one or more constraints includes the following: (1) a virtual machine is assigned to at most one reducer task, (2) a reducer task for a particular key is assigned to only one virtual machine, and (3) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host. In another example where the constraints are relaxed (e.g., if a virtual machine is capable of performing up to a predefined number of reducer task(s)), the one or more constraints can include the following: (1) a reducer task for a particular key is assigned to only one virtual machine, and (2) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • Example Embodiments
  • Understanding Basics of MapReduce in a Virtualized Environment
  • A MapReduce job (e.g., as a Hadoop workload) usually splits the input data-set into independent chunks to be processed in parallel manner. The job has two main phases of work—“map” and “reduce”—hence MapReduce. In the “map” phase, the given problem is divided into smaller sub-problems, each mapper then works on the subset of data providing an output with a set of (key, value) pairs (or referred herein as key-value pairs). In the “reduce” phase, the output from the mappers is handled by a set of reducers, where each reducer summarizes the data based on the provided keys. When MapReduce is implemented in a virtualized environment, e.g., using OpenStack cloud infrastructure, the mappers and reducers are provisioned as virtual machines (“VMs” or sometimes referred to as virtual compute nodes) on physical hosts.
  • FIG. 1 illustrates the process for MapReduce having map tasks and reducer tasks being performed in a virtualized computing environment, according to some embodiments of the disclosure. First, data is provided to M mapper VMs (shown as MAP VM_1, MAP VM_2, . . . MAP VM_M) to perform the respective mapper tasks. During the MapReduce job, all the map tasks may be completed before reducer tasks start. Once the mapper tasks are complete, output from the mapper VMs can have N keys. For reduce, key-value pairs with the same key ought to end up at (or be placed at/assigned to) the same reducer VM. This is called partitioning. In one example, it is assumed one reducer VM performs reducer task for one key. The example would have N reducer VMs (shown as REDUCE VM_1, REDUCE VM_2, . . . REDUCE VM_N).
  • A MapReduce system usually provides a default partitioning function, e.g., hash(key) mod R to select a reducer VM for a particular key. However, due to the effects of lopsided key distributions, multi-tenancy, network congestion, etc., such a simple partition function can cause some of the reducer VMs to take excessively long time, thus delaying the overall completion of the job. For at least that reason, the placement VMs in a physical topology of hosts/servers and their assignments to reducer tasks can play an important role in deciding the performance of such workloads.
  • Improved Partitioning Method
  • The present disclosure describes an improved partitioning method which can determine virtual machine assignments for reducer tasks on physical hosts to enable faster as well as balanced completion of all the reducer tasks. In some embodiments, the improved partitioning method can address how to make optimized placements of the reducer VMs in a virtualized Hadoop environment on cloud infrastructures such as OpenStack. The improved partitioning method can perform technical tasks such as improve load balancing among the reducer VMs (and the hosts on which the reducer VMs are provided), determine whether to create new reducer VMs and how many, which host to place the new reducer VMs, etc.
  • FIG. 2 shows an exemplary flow diagram illustrating an improved partitioning method for determining virtual machine assignments for reducer tasks on physical hosts, according to some embodiments of the disclosure. Once the map tasks are complete, the partitioning method determines a distribution of keys over mapper virtual machines (box 202). Based on the distribution of keys, the partitioning method determines costs associated with possible assignments of virtual machines to reducer tasks on the keys (box 204). Based on the costs, the partitioning method solves for substantially optimized assignments of virtual machines to the reducer tasks subject to one or more constraints (box 206).
  • The flow diagram illustrates that the improved partitioning method solves a constraints optimization problem to determine optimal assignments of VMs to reducer tasks. This can be done by minimizing cost based on the distribution of keys in view of one or more constraint(s). It is envisioned by the disclosure that an equivalent implementation may solve the problem by maximizing another metric (as opposed to minimizing cost).
  • Distribution of Keys after Map Tasks are Done and Exemplary Assignments of Reducer VMs Based on the Distribution of Keys
  • One interesting feature of the improved partitioning method is that the method uses the distribution of keys as part of the cost function of the constraints optimization problem when optimizing the assignment of reducer VMs. The distribution of keys is an important factor in partitioning because the transfer and processing of these keys in a virtualized cloud infrastructure can take up a lot of network and computing resources. The network and computing resources needed for performing a reducer task is directly related to the cost for a particular reducer VM to perform the reducer task. The distribution of keys over the mapper virtual machines would generally include, for each key and for each mapper VM, a number of key-value pairs for the particular key stored with the particular mapper VM (on the physical host of the VM). The distribution of keys provide information relating to where the keys are stored, such that costs for transferring and/or processing these keys on certain reducer VMs can be determined.
  • FIG. 3 illustrates a distribution of keys over mapper VMs after map tasks are complete, according to some embodiments of the disclosure. In this example, the table or matrix has the mapper VMs (Mapper 1, Mapper 2, Mapper 3) represented as rows. The columns show the counts/numbers of how many key-value pairs having a particular key are stored with a particular mapper VM. In this example, Key1 has 10000 key-value pairs with Mapper 1, 100000 key-value pairs with Mapper 2, and 20 key-value pairs with mapper 3. It is envisioned that other kinds of numbers can be used to represent distribution of keys (e.g., percentages, fractions, scores, sizes, etc.)
  • Considering the example of distribution of keys shown in FIG. 3, possible assignments of reducer VMs to reduce these keys are:
      • Assign the same VM that was used as Mapper 2 as the reducer VM for Key1 (because many key-value pairs of Key1 is already with Mapper 2),
      • Assign the same VM that was used as Mapper 3 as the reducer VM for Key2 (because many key-value pairs of Key 2 is already with Mapper 3), and
      • Assign the same VM that was used as Mapper 1 as the reducer VM for Key3 (because the VM used as Mapper 1 is not busy).
  • The above exemplary assignments can ensure that the amount of data that has to be moved from the mappers to reducers is minimized or reduced. In this example, these assignments can be determined based on which VM had the most key-value pairs for a particular key, which can directly relate to the cost of moving the data from mappers to reducers.
  • Note it can be seen in the above example related to FIG. 3 that the reducer for Key3 could have been run on the same VM used as Mapper 3 since that VM used as Mapper 3 also has a large number of Key3 key-value pairs. So solution to the optimization can vary depending on how the cost is defined.
  • Defining the Constraint Optimization Problem: a Basic Setup
  • In order to efficiently manage the sharing of these complex computing tasks, available computing and network resources should be intelligently allocated. To find optimal placement, the method solves a constraints optimization problem by minimizing costs to find the optimal solution subject to resource constraints. Specifically, the costs can be determined based on at least the distribution of keys over the mapper VMs. This optimal solution, once found, can be used by the compute VM schedulers such as OpenStack compute scheduler while deciding which physical host to use to either spin up a VM or to reuse an existing mapper VM on that host. In some embodiments, an aggregate cost can be a measure of computational and network resources consumed for completing a particular reducer task on a particular VM, which also is an indication of the total time taken by that particular reducer task on the particular VM. By minimizing this aggregate cost, it is possible to solve for one or more optimal assignments of reducer VMs to reducer tasks.
  • In a simplified embodiment, the optimization assumes there will be one reducer VM per key, and if there are more keys currently being output by mapper VMs than there are Mapper VMs, additional VMs can be created. Later in the present disclosure, a more complicated embodiment is described where the optimization problem does not have the assumption of one key per one reducer VM, where the partitioning method can allow for more than one key per reducer VM.
  • Referring back to the simplified embodiment where the partitioning method assumes one key per one reducer VM, the constraints optimization problem is reduced to finding the optimal solution of deciding which VM should be used for performing a particular reducer task for a particular key based on the key distribution. Broadly speaking, determining a solution to the assignment problem is not trivial. Many factors besides key distribution can affect the cost of assigning a particular reducer VM to perform a reducer task for a particular key.
  • The costs associated with possible assignments of virtual machines to reducer tasks can include, for each possible virtual machine and for each reducer task, a cost for the particular possible virtual machine to perform the particular reducer task. Such a cost can be computed based on one or more of the following:
      • The amount of data to be transferred between the mapper VM and the particular reducer VM based on the distribution of keys over the mapper VMs (related to the amount of time and bandwidth required to transfer the key-value pairs to the particular reducer VM thus affecting the cost of performing the particular reducer task);
      • Network distance(s) from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key (related to the amount of time required to transfer the key-value pairs to the particular reducer VM thus affecting the cost of performing the particular reducer task;
      • Processor utilization of the particular virtual machine (on a physical host) performing the reducer task for the particular key (lower utilization generally means lower cost of performing the particular reducer task using the particular reducer VM);
      • Memory utilization of the particular virtual machine performing the reducer task for the particular key (lower utilization generally means lower cost of performing the particular reducer task using the particular reducer VM);
      • Bandwidth availability(-ies) of the communication path from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key (higher bandwidth generally means lower cost and better network utilization for transferring the key-value pairs); and
      • Disk input/output speeds of the particular virtual machine performing the reducer task for the particular key (higher speeds generally mean lower cost and faster execution of the reducer task).
  • Defining the Variable Matrix X
  • In view of all these varied factors, a constraint solver can effectively solve the problem of reducer VM assignments and generation. Specifically, the constraint solver can use a variable matrix (in combination with one or more cost matrices) and solve for one or more optimal solutions subject to one or more constraints. The mechanics of the constraint solver searches through the different possible instances of the variable matrix (subject to the one or more constraints) to determine which one (or more) of the instances would result in the lowest costs (or lowered costs).
  • FIG. 4 shows an exemplary variable matrix X, according to some embodiments of the disclosure. Each entries in the matrix is denoted by x_ij, where i ranges from 1 to m, and j ranges from 1 to n. Given n Keys, the improved partitioning method aims to determine n Reducer VMs that can be assigned to reduce the n Keys. There are also M (existing) mapper VMs, which are VMs that are already existing and placed in certain physical hosts. Hence each of these M mapper VMs can be represented by one row each in the variable matrix. However, when M is less than n, p=n-M number of additional VMs should be created. To accommodate p additional VMs to be created on physical hosts (where p=n-M), the variable matrix includes further rows for these additional VMs. Depending on q available hosts, the variable matrix can include up to p*q new rows in the variable matrix (or some other prescribed number of possible additional VMs to be created). In other words, the p*q rows indicate the option of each of these new VMs to potentially have q host options of where they can be created. The result is a variable matrix with m number of rows, where m=M+p*q. Phrased differently, the variable matrix has dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created.
  • Referring back to FIG. 4, each variable x_ij in the variable matrix is 1 if a reducer for key K_j selects the VM denoted by the V_i. V_i is either one of the existing Mapper VMs, or a new VM to be created on one of the q available hosts. For all V_i that is a non-mapper new VM to be created, it is possible to know which host the new VM can be created in, based on which variable row it is. (In terms of mathematics: for i>M, (i−M) % q can give the host that V_i corresponds to, where M is number of Mappers, q is the number of hosts, and % is the modulo (reminder) operator.)
  • Advantageously, this variable matrix setup involving multiple VM rows for existing VMs and all the additional VMs required allows us to mathematically solve for the best solution, including giving an opportunity for a physical host to create more than one VMs.
  • First Set of Exemplary Constraints
  • The possible instances of the variable matrix X is limited by one or more constraints, generally business rules/policies may govern how many VMs can be created on one host, or how many reducer tasks a reducer VM can perform. For that reason, one or more constraints are provided to limit the possible instances of the variable matrix X.
  • Referring back to the simplified embodiment, there can be three constraints: (A1) a virtual machine is assigned to at most one reducer task; (A2) a reducer task for a particular key is assigned to only one virtual machine; and (A3) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
  • The first constraint (A1) requires that for VM rows (mapper VMs and non-mapper VMs to be created) in the variable matrix X, the sum of the x_ij values is less than or equal to 1. This constraint means there is at most one reducer task for a particular key per reducer VM. The constraint A1 can be summarized below:
  • Constraint A1
    For all VM rows
    add constraint: Sum(x_ij) ≦ 1 for j = [1, n]
    The ≦ is used to accommodate the scenarios when there are lesser
    keys than the number of mapper VMs (e.g., when n < M, some mapper
    VMs may not be used as a reducer VM)
  • The second constraint (A1) requires that for all VM rows and all keys, the sum of the column should equal to 1. This constraint means that a key can be reduced by only one VM at a time. The constraint A2 can be summarized below:
  • Constraint A2
    For all keys, i.e., all values of j,
    add constraint Sum(x_ij) = 1 for i = [1, m]
  • The third constraint (A3) requires that for the additional VMs to be created, a VM is created in only one host. To explain this constraint, consider the following example. Suppose there are two additional VMs needed (p=2), there are 2 available hosts (q=2), and there are 2 keys (n=2). The rows and columns in the variable matrix for these VMs are as follows:
  • K_1 K_2
    V_11 x1 x2
    V_12 X3 X4
    V_21 X5 X6
    V_22 X7 X8
  • Here V—11 indicates the first VM on the first Host, V—12 indicates the first VM on the second Host, and so on. So the first VM can be created in Host 1 or Host 2, and it can be linked to one Key only. Hence the constraint for Host 1 would be: x1+x2+x3+x4=1. At any point of time only one value of the variables: x1, x2, x3, or x4 can be 1, indicating only one host is chosen for that VM, and only one key is selected for that VM. Similarly the constraint for the second Host would be: x5+x6+x7+x8=1. The constraint A3 can be summarized below for all VM rows corresponding to the same VM and all possible Hosts:
  • Constraint A3
    For every additional VM (p total) required,
    add constraint Sum (x_ij) = 1, for all values of i corresponding to the
    same VM, for all keys j = [1, n], i.e., all the VM rows in the variable
    matrix corresponding to the current VM.
    Note that the constraint set A3 only means that a single VM can be
    associated to only one host, but it does not mean that a single host
    cannot actually create multiple VMs (which is not a constraint)
  • While the present disclosure focuses on several simple constraints (assuming all hosts are capable of creating new VMs), it is understood by one skilled in the art that in some situations the computational resource constraints (or rules/policies) on hosts can limit creating these VMs. In these situations, further constraints can be made to limit which hosts can actually create new VMs, and how many VMs can be created on one host.
  • Determining the Cost Matrices
  • The costs associated with a particular reducer VM performing a particular reducer task can be stored in a cost matrix. Determining the costs associated with the possible assignments of virtual machines to reducer tasks can include computing, for each virtual machine and for each reducer task, a cost for performing the particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines. It is possible to compute more than one cost matrices, and a function can be provided to compute the aggregate cost based on (an instance of) the variable matrix X, and the one or more cost matrices. The following example shows cost matrices defined based on distribution of keys and network distances between hosts.
  • To represent distribution of keys as part of the cost, a key distribution matrix D can be defined. FIG. 5 shows a key distribution matrix D, according to some embodiments of the disclosure. An entry in the matrix D, D_ij, represents the number of values for Key KJ generated by VM V_i. In this example, for the non-Mapper VM variables (the additional VMs to be generated), the corresponding values of keys are zeros.
  • To represent the network distance as part of the cost, a network distance matrix C can be defined. FIG. 6 shows a network distance matrix C, according to some embodiments of the disclosure. An entry in the network distance matrix C, C_ab, indicates the network distance between the VMs V_a and V_b. Here note that C_ii is zero, and C_ab=C_ba.
  • The aggregate cost can depend on the network distance, the amount of data to be transferred between the VM which is chosen for the reducer task of a particular key, and the other VMs from which the data for that key is to be transferred. As an example for Key K_1, if VM V_1 is the chosen VM, then this contributes the following value to the final sum of aggregate costs: x_11*[D_21*C_21+D_31*C_31+. . . D_i1*C_i1+ . . . +Dm1*C_m1]. It is understood that other formulations can be used, depending on the application. The sum of aggregate costs thus gives a measure of the cost of moving all the data for this key from the other VMs to this actual selected VM for the Reducer task of that key to perform the reducer task. To minimize the costs over many possible assignments, the partitioning method adds cost values for all possible variables to form the final aggregate cost metric to minimize on:
  • Function aggregate_cost (Set of all x_ij, Matrix D, Matrix C)
    cost = 0
    for every x_ij in Set of all x_ij for i in [1,m], and j in [1,n]
    cost_multiple_sum = 0
    for every k in [1,m]
    if k is not equal to i
    cost_multiple_sum = cost_multiple_sum + D_kj *
    C_kj
    cost = cost + x_ij * cost_multiple_sum
    return cost
  • To summarize, the above example describes the constraints optimization problem with the cost objective function, and the constraints, where the variables indicate which key corresponds to which VM, and the host selection of the additional non-Mapper VMs, with the assumption of one key per reducer VM:
  • Minimize aggregate_cost(Set of all x_ij, Matrix D, Matrix C)
    Subject to constraints A1, A2 and A3.
  • The above example illustrates the determination of a cost matrix, which can take can take into account one or more factors. It is envisioned that other factors can also be used for determining the costs in the cost matrix. For example, the cost can depend on how busy the various physical machines running these VMs are as well as on the network distance and bandwidth availability between these nodes. In other words, the existing processor/network loads can impact costs.
  • In some cases, one or more factors can result in creating a new VM on an entirely different physical host if that will reduce the longest reducer completion time. In other words, the possible assignments of virtual machines to the reducer tasks comprises assignments of the mapper virtual machines to reducer tasks and assignments of virtual machines to be created on available physical hosts to reducer tasks.
  • A Second Set of Exemplary Constraints
  • One skilled in the art would appreciate that the constraints can be modified depending on the application. For instance, in some embodiments, a reducer task can handle more than one key, in other words, a virtual machine can be assigned to any number of reducer tasks (or some other predefined number of reducer tasks). The corresponding constraints can include (A2) a reducer task for a particular key is assigned to only one virtual machine; and (A3_new) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host. Note that constraint A3_new is a modified version of constraint A3 due to the relaxation of constraints, the mathematics for the constraint is to be defined slightly differently.
  • To accommodate this relaxation in the constraints, it is possible to provide a second set of exemplary constraints (in place of the first set, while keeping the same or similar cost matrices as described above). Since constraint A1 from the first set is no longer applicable, it will not be used. However, constraint A2 from the first set is still applicable (requiring a single key is reduced by only one VM at a time). Constraint A3 from the first set which was used to indicate the host selection and placement for all the additional non-mapper VMs, can be adjusted (as constraint A3_new) to take into consideration that a VM can handle more than one key, but still with the constraint of one host per VM. Referring back to the example with two additional VMs, 2 possible hosts to select from and 2 keys, the variable matrix is:
  • K_1 K_2
    V_11 x1 x2
    V_12 X3 X4
    V_21 X5 X6
    V_22 X7 X8
  • The constraint A3_new is thus applicable for the first VM variable rows—V_11, and V_12 (indicating VM in first host or second host), where:

  • 0≦x1+x3≦1

  • 0≦x2+x4≦1

  • 0≦x1+x4≦1

  • 0≦x2+x3≦1(for the first VM)
  • Here in the constraint A3_new, x1+x3 can be at most 1, but can also be 0, indicating only one host is chosen for the VM if that particular key is chosen for that VM or not.
  • Similarly for the second VM, where the constraint A3_new prescribes:

  • 0≦x5+x7≦1

  • 0≦x6+x8≦1

  • 0≦x5+x8≦1

  • 0≦x6+x7≦(for the second VM)
  • If there are a maximum of p potential additional VMs to be added and there are q host options, and there are n keys, to generalize, for every additional VM of the p additional VMs required, a constraint is placed for every combination of two variables, one from one VM row, and another from any of the (q-1) VM variable rows we have for that specific VM. In the above example, the combinations were:
      • (x1, x3), (x1, x4), (x2, x3), (x2, x4) for the first VM, and
      • (x5, x7), (x5, x8), (x6, x7), (x6, x8), for the second VM.
  • Hence for every combination represented as (a, b), the constraint can be 0≦a+b 1. To summarize, constraint A3_new is as follows:
  • Constraint set A3_new
    for each additional VM “V” from the maximum set of p additional VMs
    potentially required, for every possible combination (a, b), where “a”
    represents any one variable in one VM row, and “b” represents any one
    variable in any of the (q-1) remaining VM variable rows for the current
    VM (one of possible p),
    add constraint 0 ≦ (a + b) ≦ 1
  • To summarize, the above example describes the constraints optimization problem with the cost objective function, and the constraints, where the variables indicate which key corresponds to which VM, and the host selection of the additional non-Mapper VMs, with the assumption of one key per reducer VM:
  • Minimize aggregate_cost(Set of all x_ij, Matrix D, Matrix C)
    Subject to constraints A2 and A3_new.
  • System Overview
  • Cloud infrastructure may support independent resource placement (i.e., scheduling) decisions internally among the components such as Nova (compute), Cinder (block storage), and Neutron (networking) (other exemplary components include Swift (Object Storage), Sahara (Elastic MapReduce). Because of the independent decisions, say made by the Nova-scheduler and the Cinder-scheduler, there is a good possibility for the two hosts selected for VM and volume to reside in different racks and hence consuming a good amount of traffic bandwidth, leading to a non-optimal resource placement.
  • Existing scheduling mechanisms support handling simple constraints. However, there are no guarantees of providing a globally optimal solution and computing platforms generally do not handle complex optimization constraints that could not only involve state variables local to the service, but also from the other services, covering all the resources—compute, storage, and network. Tenants can have complex business rules and policies that govern the data center resources, and the resource placement decisions (i.e., partitioning in the case of MapReduce) should consider these requirements. For example, tenants may expect all the storage to reside locally where the compute is, or may expect to minimize the network bandwidth usage. There could be also be cost-related business rules on what kinds of instances to schedule depending on the time (thus affecting cost function definitions). Tenant policies may also request to minimize the distance. Advantageously, the improved partitioning method can address any one or more of these tenant specifications while solving for optimized assignments of VMs to reducer tasks.
  • FIG. 7 shows an exemplary system for implementing the improved partitioning method, e.g., determining virtual machine assignment for reducer tasks on physical hosts, according to some embodiments of the disclosure. The system 700 includes a partitioning system 702. Partitioning system 702 includes one or more processors 704 and one or more memory elements 706. Memory element 706 can store data and instructions for facilitating any of the partitioning functions therein. Partitioning system 702 further includes costs module 708 and constraints solver 710, that when executed by the at least one processors 704 are configured to a constraints solver that when executed by the at least one processor is configured to perform any one or more parts of the partitioning method (such as the method illustrated by FIG. 2).
  • System 700 further includes one or more schedulers 712 which can keep track of the states of the resources and instruct resources to perform certain tasks. For instance, the scheduler 712 may be configured to implement MapReduce using various VMs. In some cases, the scheduler 712 can provide storage management. Examples of schedulers include Neutron, Cinder, and Nova. In system 700, the costs module 708 interfaces with one or more schedulers 712. For instance, the costs module 708 can be configured to determine a distribution of keys, and states associated with hosts, virtual machines, network links, network topology, etc. The costs module 708 can provide a repository for (updated) states of resources in the virtual cloud infrastructure. Generally speaking, the costs module 708 can gather information usable for determining one or more cost matrices.
  • System 700 also includes a rules/policies part 714 that can interface with tenant or administrators who may want to constrain the partitioning method. Costs module 708 can interface with rules/policies part 714, e.g., if certain rules/policies may affect cost function definitions. Furthermore, the constraints solver 710 may interface with rules/policies part 714 to determine one or more constraints for the constraints optimization problem. Using the cost matrices from costs module 708, and constraints determined from rules/policies part 714, constraints solver 710 can determine optimized assignments of reducer VMs to reducer tasks. Accordingly, the constraints solver 710 can interface with schedulers 712 to execute those optimized assignments.
  • The improved partitioning method and system advantageously provide a smart resource placement decision making engine that is universally applicable for many kinds of resource placement decisions, and can communicate with all the services (e.g., such as varied cloud infrastructure components usable with OpenStack). The improved partitioning method and system can solve for minimizing (or maximizing) certain optimization metrics while satisfying a set of constraints. The framework lends itself easily to help satisfy tenant APIs that could allow a tenant to specify the resource request, along with business rules and policies (which can translate to complex constraints usable by the partitioning framework).
  • Trade-Off for Waiting Until Map Tasks are Complete
  • One aspect of the improved partitioning method involves providing dynamic assignment/placement of reducers after the map tasks are done instead of a priori assignment/placement. Requiring that all the map tasks are complete is not a performance issue. If the number of keys produced by all the mappers is small, the overall reducer time is small anyway; if the number of keys produced is large, making the reducer placement decision after all the map tasks are complete adds only a small amount to the overall completion time. Note that even if the reducer tasks are started a priori, they cannot start execution until all the map tasks are complete. It is also understood by one skilled in the art that the time for solving the optimization problem is generally far less than the time an inefficient partitioning method would have
  • Variations and Implementations
  • While the above disclosure describes a matrix having certain variables in the rows and certain variables in the columns, it is understood by one skilled in the art that the rows and columns can be switched for an equivalent implementation.
  • The embodiments disclosed herein are intended to illustrate how a constraints solver can be used to optimize reducer VM assignments to keys. One skilled in the art would appreciate that other embodiments are envisioned where one or more assumptions/simplifications can be made to make the optimization problem less complicated. Furthermore, one skilled in the art would appreciate that other combination of constraints can be applied depending on the application while leveraging the advantages of the present embodiments.
  • Generally, the improved partitioning method is performed after the map tasks are complete. However, in some cases, the improved partitioning method is performed once a distribution of keys can be estimated (but the map tasks are not necessarily complete). Furthermore, embodiments generally assume that that the mapper VMs are already created and the mapper VMs are able to run the reducer tasks as reducer VMs. However, it is envisioned that not all the mapper VMs are able to run the reducer tasks as reducer VMs, especially if some of the mapper VMs are scheduled to perform other tasks.
  • If network distance is part of the (aggregate) cost, one skilled in the art can expect that a measure of network distance between two virtual machines (i.e., the respective hosts that is running the two virtual machines) can be determined or estimated from the physical topology of hosts in the cloud infrastructure.
  • In some cases, the constraints optimization problem may require one VM per reducer or mapper, but it is envisioned that the optimization problem is applicable in situations where a VM can run multiple reducers or mappers.
  • To further simplify matters, some embodiments can assume that the time to complete a reducer task is directly proportional to the number of (key-value) pairs assigned to it. However, it is envisioned that some variations of the present disclosure can estimate the time to complete a reducer task differently (e.g., based on further factors).
  • Within the context of the disclosure, a network used herein represents a series of points, nodes, or network elements of interconnected communication paths for receiving and transmitting packets of information that propagate through a communication system. A network offers communicative interface between sources and/or hosts, and may be any local area network (LAN), wireless local area network (WLAN), metropolitan area network (MAN), Intranet, Extranet, Internet, WAN, virtual private network (VPN), or any other appropriate architecture or system that facilitates communications in a network environment depending on the network topology. A network can comprise any number of hardware or software elements coupled to (and in communication with) each other through a communications medium. As used herein in this Specification, the term ‘network element’ is meant to encompass any of the aforementioned elements, as well as partitioning systems, servers (physical or virtual), end user devices, routers, switches, cable boxes, gateways, bridges, loadbalancers, firewalls, inline service nodes, proxies, processors, modules, or any other suitable device, component, element, proprietary appliance, or object operable to exchange, receive, and transmit information in a network environment. These network elements may include any suitable hardware, software, components, modules, interfaces, or objects that facilitate the partitioning operations thereof. This may be inclusive of appropriate algorithms and communication protocols that allow for the effective exchange of data or information.
  • In one implementation, partitioning systems described herein may include software to achieve (or to foster) the functions discussed herein for determining optimized assignment of reducer VMs to reducer tasks where the software is executed on one or more processors to carry out the functions. This could include the implementation of instances of costs modules, constraints solvers, and/or any other suitable element that would foster the activities discussed herein. Additionally, each of these elements can have an internal structure (e.g., a processor, a memory element, etc.) to facilitate some of the operations described herein. In other embodiments, these functions for partitioning may be executed externally to these elements, or included in some other network element to achieve the intended functionality. Alternatively, partitioning systems may include software (or reciprocating software) that can coordinate with other network elements in order to achieve the functions described herein. In still other embodiments, one or several devices may include any suitable algorithms, hardware, software, components, modules, interfaces, or objects that facilitate the operations thereof.
  • In certain example implementations, the partitioning functions outlined herein may be implemented by logic encoded in one or more non-transitory, tangible media (e.g., embedded logic provided in an application specific integrated circuit [ASIC], digital signal processor [DSP] instructions, software [potentially inclusive of object code and source code] to be executed by one or more processors, or other similar machine, etc.). In some of these instances, one or more memory elements can store data used for the operations described herein. This includes the memory element being able to store instructions (e.g., software, code, etc.) that are executed to carry out the activities described in this Specification. The memory element is further configured to store databases or data structures such as variable matrices, cost matrices, states of resources, constraints, etc., disclosed herein. The processor can execute any type of instructions associated with the data to achieve the operations detailed herein in this Specification. In one example, the processor could transform an element or an article (e.g., data) from one state or thing to another state or thing. In another example, the activities outlined herein may be implemented with fixed logic or programmable logic (e.g., software/computer instructions executed by the processor) and the elements identified herein could be some type of a programmable processor, programmable digital logic (e.g., a field programmable gate array [FPGA], an erasable programmable read only memory (EPROM), an electrically erasable programmable ROM (EEPROM)) or an ASIC that includes digital logic, software, code, electronic instructions, or any suitable combination thereof.
  • Any of these elements (e.g., the network elements, partitioning systems, etc.) can include memory elements for storing information to be used in achieving improved partitioning method, as outlined herein. Additionally, each of these devices may include a processor that can execute software or an algorithm to perform the improved partitioning method as discussed in this Specification. These devices may further keep information in any suitable memory element [random access memory (RAM), ROM, EPROM, EEPROM, ASIC, etc.], software, hardware, or in any other suitable component, device, element, or object where appropriate and based on particular needs. Any of the memory items discussed herein should be construed as being encompassed within the broad term ‘memory element.’ Similarly, any of the potential processing elements, modules, and machines described in this Specification should be construed as being encompassed within the broad term ‘processor.’ Each of the network elements can also include suitable interfaces for receiving, transmitting, and/or otherwise communicating data or information in a network environment.
  • Additionally, it should be noted that with the examples provided above, interaction may be described in terms of two, three, or four parts. However, this has been done for purposes of clarity and example only. In certain cases, it may be easier to describe one or more of the functionalities of a given set of flows by only referencing a limited number of network elements. It should be appreciated that the systems described herein are readily scalable and, further, can accommodate a large number of components, as well as more complicated/sophisticated arrangements and configurations. Accordingly, the examples provided should not limit the scope or inhibit the broad techniques of partitioning, as potentially applied to a myriad of other architectures.
  • It is also important to note that the steps in the FIG. 2 illustrate only some of the possible scenarios that may be executed by, or within, the partitioning systems described herein. Some of these steps may be deleted or removed where appropriate, or these steps may be modified or changed considerably without departing from the scope of the present disclosure. In addition, a number of these operations have been described as being executed concurrently with, or in parallel to, one or more additional operations. However, the timing of these operations may be altered considerably. The preceding operational flows have been offered for purposes of example and discussion. Substantial flexibility is provided by partitioning systems in that any suitable arrangements, chronologies, configurations, and timing mechanisms may be provided without departing from the teachings of the present disclosure.
  • Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims. In order to assist the United States Patent and Trademark Office (USPTO) and, additionally, any readers of any patent issued on this application in interpreting the claims appended hereto, Applicant wishes to note that the Applicant: (a) does not intend any of the appended claims to invoke paragraph six (6) of 35 U.S.C. section 112 as it exists on the date of the filing hereof unless the words “means for” or “step for” are specifically used in the particular claims; and (b) does not intend, by any statement in the specification, to limit this disclosure in any way that is not otherwise reflected in the appended claims.

Claims (20)

1. A method for determining and executing optimal virtual machine assignments for reducer tasks on physical hosts, the method comprising:
determining a distribution of keys over mapper virtual machines after map tasks are complete;
determining costs associated with assignments of virtual machines to reducer tasks on the keys based on the distribution of keys;
defining a variable matrix for assigning virtual machines to reducer tasks, wherein the variable matrix comprises values indicating whether a virtual machine is to be assigned to reduce one or more keys, and the variable matrix has dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created;
determining the optimal virtual machine assignments for the reducer tasks, using a constraints solver, based on the variable matrix and the costs, subject to one or more constraints on the variable matrix; and
assigning, according to the optimal virtual machine assignments, the reducer tasks for execution by the virtual machines on the physical hosts.
2. The method of claim 1, wherein the assignments of virtual machines to the reducer tasks comprises assignments of the mapper virtual machines to reducer tasks and assignments of virtual machines to be created on available physical hosts to reducer tasks.
3. The method of claim 1, wherein the distribution of keys over the mapper virtual machines comprises, for each key and for each mapper virtual machine, a number of key-value pairs for the particular key stored with the particular mapper virtual machine.
4. The method of claim 1, wherein the costs associated with assignments of virtual machines to reducer tasks comprises, for each virtual machine and for each reducer task, a cost for the particular virtual machine to perform the particular reducer task.
5. The method of claim 1, wherein:
the optimal assignments comprises assignments of virtual machines to be created on available physical hosts to reducer tasks; and
the method further comprises creating the virtual machines on available physical hosts according to the optimal assignments.
6. The method of claim 1, wherein determining the costs associated with the assignments of virtual machines to reducer tasks comprises computing, for each virtual machine and for each reducer task, a cost for performing a particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines.
7. The method of claim 6, wherein the cost for performing the particular reducer task for the particular key is computed based at least on network distance(s) from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key.
8. The method of claim 6, wherein the cost for performing the particular reducer task for the particular key is computed based at least on processor utilization of the particular virtual machine performing the reducer task for the particular key.
9. The method of claim 6, wherein the cost for performing the particular reducer task for the particular key is computed based at least on memory utilization of the particular virtual machine performing the reducer task for the particular key.
10. The method of claim 6, wherein the cost for performing the particular reducer task for the particular key is computed based at least on bandwidth availability(-ies) of the communication path from the virtual machine(s) on which the key-value pairs for the particular key is stored to the particular virtual machine performing the reducer task for the particular key.
11. The method of claim 6, wherein the cost for performing the particular reducer task for the particular key is computed based at least on disk input/output speeds of the particular virtual machine performing the reducer task for the particular key.
12. The method of claim 1, wherein the one or more constraints comprises one or more of the following:
(1) a virtual machine is assigned to at most one reducer task;
(2) a reducer task for a particular key is assigned to only one virtual machine; and
(3) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
13. The method of claim 1, wherein the one or more constraints comprises one or more of the following, if a virtual machine is capable of performing up to a predefined number of reducer task(s):
(1) a reducer task for a particular key is assigned to only one virtual machine; and
(2) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
14. A system for determining and executing optimal virtual machine assignments for reducer tasks on physical hosts comprising:
at least one memory element;
at least one processor coupled to the at least one memory element;
a costs module that when executed by the at least one processor is configured to:
determine a distribution of keys over mapper virtual machines after map tasks are complete; and
determine costs associated with assignments of virtual machines to reducer tasks on the keys based on the distribution of keys;
defining a variable matrix for assigning virtual machines to reducer tasks, wherein the variable matrix comprises values indicating whether a virtual machine is to be assigned to reduce one or more keys, and the variable matrix has dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created; and
a constraints solver that when executed by the at least one processor is configured to:
determine the optimal virtual machine assignments for the reducer tasks based on the variable matrix and the costs, subject to one or more constraints on the variable matrix; and
providing the optimal virtual machine assignments for the reducer tasks to a scheduler to execute the optimal virtual machine assignments on the physical hosts.
15. The system of claim 14, wherein the assignments of virtual machines to the reducer tasks comprises assignments of the mapper virtual machines to reducer tasks and assignments of virtual machines to be created on available physical hosts to reducer tasks.
16. A computer-readable non-transitory medium comprising one or more instructions, for determining and executing optimal virtual machine assignments for reducer tasks on physical hosts, that when executed on a processor configure the processor to perform one or more operations comprising:
determining a distribution of keys over mapper virtual machines after map tasks are complete;
determining costs associated with assignments of virtual machines to reducer tasks on the keys based on the distribution of keys;
defining a variable matrix for assigning virtual machines to reducer tasks, wherein the variable matrix comprises values indicating whether a virtual machine is to be assigned to reduce one or more keys, and the variable matrix has dimensions of at least n by (M+p*q), where n is the number of keys, M is the number of mapper virtual machines, p is n-M, and q is the number of available physical hosts on which a virtual machine can be created;
determining the optimal virtual machine assignments for the reducer tasks, using a constraints solver, based on the variable matrix and the costs, subject to one or more constraints on the variable matrix; and
assigning, according to the optimal virtual machine assignments, the reducer tasks for execution by the virtual machines on the physical hosts.
17. The computer-readable non-transitory medium of claim 16, wherein:
the optimal assignments comprises assignments of virtual machines to be created on available physical hosts to reducer tasks; and
the one or more operations further comprise creating the virtual machines on available physical hosts according to the optimal assignments.
18. The computer-readable non-transitory medium of claim 16, wherein determining the costs associated with the assignments of virtual machines to reducer tasks comprises computing, for each virtual machine and for each reducer task, a cost for performing a particular reducer task for a particular key using a particular virtual machine based on the distribution of keys over the mapper virtual machines.
19. The computer-readable non-transitory medium of claim 16, wherein the one or more constraints comprises one or more of the following:
(1) a virtual machine is assigned to at most one reducer task;
(2) a reducer task for a particular key is assigned to only one virtual machine; and
(3) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
20. The computer-readable non-transitory medium of claim 16, wherein the one or more constraints comprises one or more of the following, if a virtual machine is capable of performing up to a predefined number of reducer task (s):
(1) a reducer task for a particular key is assigned to only one virtual machine; and
(2) if a reducer task is assigned to a virtual machine to be created on a physical host, the virtual machine is created on only one physical host.
US14/509,691 2014-10-08 2014-10-08 Optimized assignments and/or generation virtual machine for reducer tasks Active US9367344B2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US14/509,691 US9367344B2 (en) 2014-10-08 2014-10-08 Optimized assignments and/or generation virtual machine for reducer tasks
CN201580054119.XA CN107111517B (en) 2014-10-08 2015-10-05 Optimized allocation and/or generation of virtual machines for reducer tasks
EP15781537.4A EP3204855A1 (en) 2014-10-08 2015-10-05 Optimized assignments and/or generation virtual machine for reducer tasks
PCT/US2015/054035 WO2016057410A1 (en) 2014-10-08 2015-10-05 Optimized assignments and/or generation virtual machine for reducer tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/509,691 US9367344B2 (en) 2014-10-08 2014-10-08 Optimized assignments and/or generation virtual machine for reducer tasks

Publications (2)

Publication Number Publication Date
US20160103695A1 true US20160103695A1 (en) 2016-04-14
US9367344B2 US9367344B2 (en) 2016-06-14

Family

ID=54330077

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/509,691 Active US9367344B2 (en) 2014-10-08 2014-10-08 Optimized assignments and/or generation virtual machine for reducer tasks

Country Status (4)

Country Link
US (1) US9367344B2 (en)
EP (1) EP3204855A1 (en)
CN (1) CN107111517B (en)
WO (1) WO2016057410A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575749B1 (en) * 2015-12-17 2017-02-21 Kersplody Corporation Method and apparatus for execution of distributed workflow processes
US20170052712A1 (en) * 2015-08-18 2017-02-23 Oracle International Corporation System and method for dynamic cache distribution for in-memory data grids
US20170371707A1 (en) * 2016-06-22 2017-12-28 EMC IP Holding Company LLC Data analysis in storage system
CN107769938A (en) * 2016-08-16 2018-03-06 北京金山云网络技术有限公司 The system and method that a kind of Openstack platforms support Multi net voting region
US20180293108A1 (en) * 2015-12-31 2018-10-11 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus, and System
US20180300173A1 (en) * 2017-04-12 2018-10-18 Cisco Technology, Inc. Serverless computing and task scheduling
CN109324898A (en) * 2018-08-27 2019-02-12 北京奇虎科技有限公司 A kind of method for processing business and system
US20190104031A1 (en) * 2016-04-04 2019-04-04 NEC Laboratories Europe GmbH Method for providing operating information for a network
US11036532B2 (en) * 2017-11-29 2021-06-15 Microsoft Technology Licensing, Llc Fast join and leave virtual network
US11360743B2 (en) * 2019-07-21 2022-06-14 Cyber Reliant Corp. Data set including a secure key
US11374815B2 (en) * 2020-03-25 2022-06-28 Fujitsu Limited Network configuration diagram generate method and recording medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9846589B2 (en) 2015-06-04 2017-12-19 Cisco Technology, Inc. Virtual machine placement optimization with generalized organizational scenarios
US9804895B2 (en) * 2015-08-28 2017-10-31 Vmware, Inc. Constrained placement in hierarchical randomized schedulers
US10476748B2 (en) 2017-03-01 2019-11-12 At&T Intellectual Property I, L.P. Managing physical resources of an application
CN108256182B (en) * 2018-01-02 2020-10-27 西安交通大学 Layout method of dynamically reconfigurable FPGA
US10620987B2 (en) * 2018-07-27 2020-04-14 At&T Intellectual Property I, L.P. Increasing blade utilization in a dynamic virtual environment

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5519867A (en) * 1993-07-19 1996-05-21 Taligent, Inc. Object-oriented multitasking system
US6298370B1 (en) * 1997-04-04 2001-10-02 Texas Instruments Incorporated Computer operating process allocating tasks between first and second processors at run time based upon current processor load
US7234139B1 (en) * 2000-11-24 2007-06-19 Catharon Productions, Inc. Computer multi-tasking via virtual threading using an interpreter
US9020801B2 (en) * 2003-08-11 2015-04-28 Scalemp Inc. Cluster-based operating system-agnostic virtual computing system
US7962915B2 (en) * 2005-03-18 2011-06-14 International Business Machines Corporation System and method for preserving state for a cluster of data servers in the presence of load-balancing, failover, and fail-back events
US8375386B2 (en) * 2005-06-29 2013-02-12 Microsoft Corporation Failure management for a virtualized computing environment
US8276145B2 (en) * 2008-06-27 2012-09-25 Microsoft Corporation Protected mode scheduling of operations
US8276148B2 (en) * 2009-12-04 2012-09-25 International Business Machines Corporation Continuous optimization of archive management scheduling by use of integrated content-resource analytic model
US9130912B2 (en) 2010-03-05 2015-09-08 International Business Machines Corporation System and method for assisting virtual machine instantiation and migration
US8645966B2 (en) * 2010-03-11 2014-02-04 International Business Machines Corporation Managing resource allocation and configuration of model building components of data analysis applications
US8595234B2 (en) 2010-05-17 2013-11-26 Wal-Mart Stores, Inc. Processing data feeds
US8381015B2 (en) * 2010-06-30 2013-02-19 International Business Machines Corporation Fault tolerance for map/reduce computing
US8484653B2 (en) * 2010-07-28 2013-07-09 Red Hat Israel, Ltd. Mechanism for delayed hardware upgrades in virtualization systems
US8806486B2 (en) * 2010-09-03 2014-08-12 Time Warner Cable Enterprises, Llc. Methods and systems for managing a virtual data center with embedded roles based access control
US9307048B2 (en) 2010-12-28 2016-04-05 Microsoft Technology Licensing, Llc System and method for proactive task scheduling of a copy of outlier task in a computing environment
US8954967B2 (en) * 2011-05-31 2015-02-10 International Business Machines Corporation Adaptive parallel data processing
US8997107B2 (en) * 2011-06-28 2015-03-31 Microsoft Technology Licensing, Llc Elastic scaling for cloud-hosted batch applications
JP5682709B2 (en) 2011-07-04 2015-03-11 富士通株式会社 Arrangement design program and method, and information processing apparatus
JP6329899B2 (en) 2011-07-26 2018-05-23 オラクル・インターナショナル・コーポレイション System and method for cloud computing
US9317336B2 (en) 2011-07-27 2016-04-19 Alcatel Lucent Method and apparatus for assignment of virtual resources within a cloud environment
US8909785B2 (en) 2011-08-08 2014-12-09 International Business Machines Corporation Smart cloud workload balancer
US9727383B2 (en) 2012-02-21 2017-08-08 Microsoft Technology Licensing, Llc Predicting datacenter performance to improve provisioning
US20130268672A1 (en) 2012-04-05 2013-10-10 Valerie D. Justafort Multi-Objective Virtual Machine Placement Method and Apparatus
US8972983B2 (en) * 2012-04-26 2015-03-03 International Business Machines Corporation Efficient execution of jobs in a shared pool of resources
CN103379114B (en) * 2012-04-28 2016-12-14 国际商业机器公司 For the method and apparatus protecting private data in Map Reduce system
US8972986B2 (en) * 2012-05-25 2015-03-03 International Business Machines Corporation Locality-aware resource allocation for cloud computing
US8924977B2 (en) 2012-06-18 2014-12-30 International Business Machines Corporation Sequential cooperation between map and reduce phases to improve data locality
US9354938B2 (en) 2013-04-10 2016-05-31 International Business Machines Corporation Sequential cooperation between map and reduce phases to improve data locality
US9411622B2 (en) 2013-06-25 2016-08-09 Vmware, Inc. Performance-driven resource management in a distributed computer system
US9769084B2 (en) 2013-11-02 2017-09-19 Cisco Technology Optimizing placement of virtual machines

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170052712A1 (en) * 2015-08-18 2017-02-23 Oracle International Corporation System and method for dynamic cache distribution for in-memory data grids
US10296227B2 (en) * 2015-08-18 2019-05-21 Oracle International Corporation System and method for dynamic cache distribution for in-memory data grids
US9575749B1 (en) * 2015-12-17 2017-02-21 Kersplody Corporation Method and apparatus for execution of distributed workflow processes
US10360024B2 (en) * 2015-12-17 2019-07-23 Kersplody Corporation Method and apparatus for execution of distributed workflow processes
US20180293108A1 (en) * 2015-12-31 2018-10-11 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus, and System
US10915365B2 (en) * 2015-12-31 2021-02-09 Huawei Technologies Co., Ltd. Determining a quantity of remote shared partitions based on mapper and reducer nodes
US10979319B2 (en) * 2016-04-04 2021-04-13 Nec Corporation Method for providing operating information for a network
US20190104031A1 (en) * 2016-04-04 2019-04-04 NEC Laboratories Europe GmbH Method for providing operating information for a network
CN107528871A (en) * 2016-06-22 2017-12-29 伊姆西公司 Data analysis in storage system
US20170371707A1 (en) * 2016-06-22 2017-12-28 EMC IP Holding Company LLC Data analysis in storage system
US10579419B2 (en) * 2016-06-22 2020-03-03 EMC IP Holding Company LLC Data analysis in storage system
CN107769938A (en) * 2016-08-16 2018-03-06 北京金山云网络技术有限公司 The system and method that a kind of Openstack platforms support Multi net voting region
US20180300173A1 (en) * 2017-04-12 2018-10-18 Cisco Technology, Inc. Serverless computing and task scheduling
US10884807B2 (en) * 2017-04-12 2021-01-05 Cisco Technology, Inc. Serverless computing and task scheduling
US11036532B2 (en) * 2017-11-29 2021-06-15 Microsoft Technology Licensing, Llc Fast join and leave virtual network
CN109324898A (en) * 2018-08-27 2019-02-12 北京奇虎科技有限公司 A kind of method for processing business and system
US11360743B2 (en) * 2019-07-21 2022-06-14 Cyber Reliant Corp. Data set including a secure key
US20220253287A1 (en) * 2019-07-21 2022-08-11 Cyber Reliant Corp. Data set including a secure key
US11681499B2 (en) * 2019-07-21 2023-06-20 Cyber Reliant Corp. Data set including a secure key
US11374815B2 (en) * 2020-03-25 2022-06-28 Fujitsu Limited Network configuration diagram generate method and recording medium

Also Published As

Publication number Publication date
CN107111517A (en) 2017-08-29
EP3204855A1 (en) 2017-08-16
CN107111517B (en) 2020-12-01
WO2016057410A1 (en) 2016-04-14
US9367344B2 (en) 2016-06-14

Similar Documents

Publication Publication Date Title
US9367344B2 (en) Optimized assignments and/or generation virtual machine for reducer tasks
US20160350146A1 (en) Optimized hadoop task scheduler in an optimally placed virtualized hadoop cluster using network cost optimizations
US9846589B2 (en) Virtual machine placement optimization with generalized organizational scenarios
US10452995B2 (en) Machine learning classification on hardware accelerators with stacked memory
Krishnamurthy et al. Pratyaastha: an efficient elastic distributed sdn control plane
US10540588B2 (en) Deep neural network processing on hardware accelerators with stacked memory
US9021477B2 (en) Method for improving the performance of high performance computing applications on Cloud using integrated load balancing
US9503387B2 (en) Instantiating incompatible virtual compute requests in a heterogeneous cloud environment
US9141430B2 (en) Scheduling mapreduce job sets
EP3283974B1 (en) Systems and methods for executing software threads using soft processors
US20160379686A1 (en) Server systems with hardware accelerators including stacked memory
US9184982B2 (en) Balancing the allocation of virtual machines in cloud systems
Fotohi et al. A cluster based job scheduling algorithm for grid computing
US10713096B2 (en) System and method for handling data skew at run time
Djebbar et al. Optimization of tasks scheduling by an efficacy data placement and replication in cloud computing
Tang et al. Mrorder: Flexible job ordering optimization for online mapreduce workloads
Al Sallami et al. Load balancing with neural network
JP6158751B2 (en) Computer resource allocation apparatus and computer resource allocation program
Khan et al. Static Approach for Efficient Task Allocation in Distributed Environment
Pop et al. The Art of Scheduling for Big Data Science.
Hu et al. Diameter/aspl-based mapping of applications with uncertain communication over random interconnection networks
Liu et al. Joint load-balancing and energy-aware virtual machine placement for network-on-chip systems
Sharma et al. Credit based scheduling using deadline in cloud computing environment
Hu et al. Application Mapping and Scheduling of Uncertain Communication Patterns onto Non-Random and Random Network Topologies
Hu et al. The impact of job mapping on random network topology

Legal Events

Date Code Title Description
AS Assignment

Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:UDUPI, YATHIRAJ B.;DUTTA, DEBOJYOTI;MARATHE, MADHAV V.;AND OTHERS;SIGNING DATES FROM 20141001 TO 20141007;REEL/FRAME:033914/0204

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY